tsphere_internals.html - sphere - GPU-based 3D discrete element method algorithm with optional fluid coupling
 (HTM) git clone git://src.adamsgaard.dk/sphere
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) LICENSE
       ---
       tsphere_internals.html (20735B)
       ---
            1 
            2 <!DOCTYPE html>
            3 
            4 <html xmlns="http://www.w3.org/1999/xhtml">
            5   <head>
            6     <meta charset="utf-8" />
            7     <title>sphere internals &#8212; sphere 2.15-beta documentation</title>
            8     <link rel="stylesheet" href="_static/classic.css" type="text/css" />
            9     <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
           10     
           11     <script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
           12     <script type="text/javascript" src="_static/jquery.js"></script>
           13     <script type="text/javascript" src="_static/underscore.js"></script>
           14     <script type="text/javascript" src="_static/doctools.js"></script>
           15     <script type="text/javascript" src="_static/language_data.js"></script>
           16     
           17     <link rel="index" title="Index" href="genindex.html" />
           18     <link rel="search" title="Search" href="search.html" />
           19     <link rel="prev" title="Python API" href="python_api.html" /> 
           20   </head><body>
           21     <div class="related" role="navigation" aria-label="related navigation">
           22       <h3>Navigation</h3>
           23       <ul>
           24         <li class="right" style="margin-right: 10px">
           25           <a href="genindex.html" title="General Index"
           26              accesskey="I">index</a></li>
           27         <li class="right" >
           28           <a href="py-modindex.html" title="Python Module Index"
           29              >modules</a> |</li>
           30         <li class="right" >
           31           <a href="python_api.html" title="Python API"
           32              accesskey="P">previous</a> |</li>
           33         <li class="nav-item nav-item-0"><a href="index.html">sphere 2.15-beta documentation</a> &#187;</li> 
           34       </ul>
           35     </div>  
           36 
           37     <div class="document">
           38       <div class="documentwrapper">
           39         <div class="bodywrapper">
           40           <div class="body" role="main">
           41             
           42   <div class="section" id="sphere-internals">
           43 <h1>sphere internals<a class="headerlink" href="#sphere-internals" title="Permalink to this headline">¶</a></h1>
           44 <p>The <em>sphere</em> executable has the following options:</p>
           45 <div class="highlight-text notranslate"><div class="highlight"><pre><span></span>$ ../../sphere --help
           46 ../../sphere: particle dynamics simulator
           47 Usage: ../../sphere [OPTION[S]]... [FILE1 ...]
           48 Options:
           49 -h, --help                print help
           50 -V, --version                print version information and exit
           51 -q, --quiet                suppress status messages to stdout
           52 -d &lt;device&gt;                execute on device with specified id
           53 -n, --dry                show key experiment parameters and quit
           54 -f, --fluid                simulate fluid between particles
           55 -r, --render                render input files to images instead of
           56                             simulating the temporal evolution
           57 -dc, --dont-check        don&#39;t check values before running
           58 
           59 Raytracer (-r) specific options:
           60 -m &lt;method&gt; &lt;maxval&gt; [-l &lt;lower cutoff val&gt;], or
           61 --method &lt;method&gt; &lt;maxval&gt; [-l &lt;lower cutoff val&gt;]
           62         color visualization method, possible values:
           63         normal, pres, vel, angvel, xdisp, angpos
           64         &#39;normal&#39; is the default mode
           65         if -l is appended, don&#39;t render particles with value below
           66 -c, --contacts                Print a list of particle-particle contacts
           67 </pre></div>
           68 </div>
           69 <p>The most common way to invoke <em>sphere</em> is however via the Python API (e.g. <a class="reference internal" href="python_api.html#sphere.run" title="sphere.run"><code class="xref py py-func docutils literal notranslate"><span class="pre">sphere.run()</span></code></a>, <a class="reference internal" href="python_api.html#sphere.render" title="sphere.render"><code class="xref py py-func docutils literal notranslate"><span class="pre">sphere.render()</span></code></a>, etc.).</p>
           70 <p>subsection{The <em>sphere</em> algorithm}
           71 label{subsec:spherealgo}
           72 The <em>sphere</em>-binary is launched from the system terminal by passing the simulation ID as an input parameter; texttt{./sphere_&lt;architecture&gt; &lt;simulation_ID&gt;}. The sequence of events in the program is the following:
           73 #. System check, including search for NVIDIA CUDA compatible devices (texttt{main.cpp}).</p>
           74 <ol class="arabic simple">
           75 <li><p>Initial data import from binary input file (texttt{main.cpp}).</p></li>
           76 <li><p>Allocation of memory for all host variables (particles, grid, walls, etc.) (texttt{main.cpp}).</p></li>
           77 <li><p>Continued import from binary input file (texttt{main.cpp}).</p></li>
           78 <li><p>Control handed to GPU-specific function texttt{gpuMain(ldots)} (texttt{device.cu}).</p></li>
           79 <li><p>Memory allocation of device memory (texttt{device.cu}).</p></li>
           80 <li><p>Transfer of data from host to device variables (texttt{device.cu}).</p></li>
           81 <li><p>Initialization of Thrustfootnote{url{<a class="reference external" href="https://code.google.com/p/thrust/">https://code.google.com/p/thrust/</a>}} radix sort configuration (texttt{device.cu}).</p></li>
           82 <li><p>Calculation of GPU workload configuration (thread and block layout) (texttt{device.cu}).</p></li>
           83 <li><p>Status and data written to verb”&lt;simulation_ID&gt;.status.dat” and verb”&lt;simulation_ID&gt;.output0.bin”, both located in texttt{output/} folder (texttt{device.cu}).</p></li>
           84 <li><p>Main loop (while texttt{time.current &lt;= time.total}) (functions called in texttt{device.cu}, function definitions in seperate files). Each kernel call is wrapped in profiling- and error exception handling functions:</p></li>
           85 </ol>
           86 <blockquote>
           87 <div><ol class="arabic">
           88 <li><p>label{loopstart}CUDA thread synchronization point.</p></li>
           89 <li><p>texttt{calcParticleCellID&lt;&lt;&lt;,&gt;&gt;&gt;(ldots)}: Particle-grid hash value calculation (texttt{sorting.cuh}).</p></li>
           90 <li><p>CUDA thread synchronization point.</p></li>
           91 <li><p>texttt{thrust::sort_by_key(ldots)}: Thrust radix sort of particle-grid hash array (texttt{device.cu}).</p></li>
           92 <li><p>texttt{cudaMemset(ldots)}: Writing zero value (texttt{0xffffffff}) to empty grid cells (texttt{device.cu}).</p></li>
           93 <li><p>texttt{reorderArrays&lt;&lt;&lt;,&gt;&gt;&gt;(ldots)}: Reordering of particle arrays, based on sorted particle-grid-hash values (texttt{sorting.cuh}).</p></li>
           94 <li><p>CUDA thread synchronization point.</p></li>
           95 <li><p>Optional: texttt{topology&lt;&lt;&lt;,&gt;&gt;&gt;(ldots)}: If particle contact history is required by the contact model, particle contacts are identified, and stored per particle. Previous, now non-existant contacts are discarded (texttt{contactsearch.cuh}).</p></li>
           96 <li><p>CUDA thread synchronization point.</p></li>
           97 <li><p>texttt{interact&lt;&lt;&lt;,&gt;&gt;&gt;(ldots)}: For each particle: Search of contacts in neighbor cells, processing of optional collisions and updating of resulting forces and torques. Values are written to read/write device memory arrays (texttt{contactsearch.cuh}).</p></li>
           98 <li><p>CUDA thread synchronization point.</p></li>
           99 <li><p>texttt{integrate&lt;&lt;&lt;,&gt;&gt;&gt;(ldots)}: Updating of spatial degrees of freedom by a second-order Taylor series expansion integration (texttt{integration.cuh}).</p></li>
          100 <li><p>CUDA thread synchronization point.</p></li>
          101 <li><p>texttt{summation&lt;&lt;&lt;,&gt;&gt;&gt;(ldots)}: Particle contributions to the net force on the walls are summated (texttt{integration.cuh}).</p></li>
          102 <li><p>CUDA thread synchronization point.</p></li>
          103 <li><p>texttt{integrateWalls&lt;&lt;&lt;,&gt;&gt;&gt;(ldots)}: Updating of spatial degrees of freedom of walls (texttt{integration.cuh}).</p></li>
          104 <li><p>Update of timers and loop-related counters (e.g. texttt{time.current}), (texttt{device.cu}).</p></li>
          105 <li><p>If file output interval is reached:</p>
          106 <blockquote>
          107 <div><blockquote>
          108 <div><p>item Optional write of data to output binary (verb”&lt;simulation_ID&gt;.output#..bin”), (texttt{file_io.cpp}).
          109 item Update of verb”&lt;simulation_ID&gt;.status#..bin” (texttt{device.cu}).</p>
          110 </div></blockquote>
          111 <p>item Return to point ref{loopstart}, unless texttt{time.current &gt;= time.total}, in which case the program continues to point ref{loopend}.</p>
          112 </div></blockquote>
          113 </li>
          114 </ol>
          115 </div></blockquote>
          116 <ol class="arabic simple">
          117 <li><p>label{loopend}Liberation of device memory (texttt{device.cu}).</p></li>
          118 <li><p>Control returned to texttt{main(ldots)}, liberation of host memory (texttt{main.cpp}).</p></li>
          119 <li><p>End of program, return status equal to zero (0) if no problems where encountered.</p></li>
          120 </ol>
          121 <div class="section" id="numerical-algorithm">
          122 <h2>Numerical algorithm<a class="headerlink" href="#numerical-algorithm" title="Permalink to this headline">¶</a></h2>
          123 <p>The <em>sphere</em>-binary is launched from the system terminal by passing the simulation ID as an input parameter; texttt{./sphere_&lt;architecture&gt; &lt;simulation_ID&gt;}. The sequence of events in the program is the following:</p>
          124 <ol class="arabic simple">
          125 <li><p>System check, including search for NVIDIA CUDA compatible devices (texttt{main.cpp}).</p></li>
          126 <li><p>Initial data import from binary input file (texttt{main.cpp}).</p></li>
          127 <li><p>Allocation of memory for all host variables (particles, grid, walls, etc.) (texttt{main.cpp}).</p></li>
          128 <li><p>Continued import from binary input file (texttt{main.cpp}).</p></li>
          129 <li><p>Control handed to GPU-specific function texttt{gpuMain(ldots)} (texttt{device.cu}).</p></li>
          130 <li><p>Memory allocation of device memory (texttt{device.cu}).</p></li>
          131 <li><p>Transfer of data from host to device variables (texttt{device.cu}).</p></li>
          132 <li><p>Initialization of Thrustfootnote{url{<a class="reference external" href="https://code.google.com/p/thrust/">https://code.google.com/p/thrust/</a>}} radix sort configuration (texttt{device.cu}).</p></li>
          133 <li><p>Calculation of GPU workload configuration (thread and block layout) (texttt{device.cu}).</p></li>
          134 <li><p>Status and data written to verb”&lt;simulation_ID&gt;.status.dat” and verb”&lt;simulation_ID&gt;.output0.bin”, both located in texttt{output/} folder (texttt{device.cu}).</p></li>
          135 <li><p>Main loop (while texttt{time.current &lt;= time.total}) (functions called in texttt{device.cu}, function definitions in seperate files). Each kernel call is wrapped in profiling- and error exception handling functions:</p></li>
          136 </ol>
          137 <blockquote>
          138 <div><ol class="arabic">
          139 <li><p>label{loopstart}CUDA thread synchronization point.</p></li>
          140 <li><p>texttt{calcParticleCellID&lt;&lt;&lt;,&gt;&gt;&gt;(ldots)}: Particle-grid hash value calculation (texttt{sorting.cuh}).</p></li>
          141 <li><p>CUDA thread synchronization point.</p></li>
          142 <li><p>texttt{thrust::sort_by_key(ldots)}: Thrust radix sort of particle-grid hash array (texttt{device.cu}).</p></li>
          143 <li><p>texttt{cudaMemset(ldots)}: Writing zero value (texttt{0xffffffff}) to empty grid cells (texttt{device.cu}).</p></li>
          144 <li><p>texttt{reorderArrays&lt;&lt;&lt;,&gt;&gt;&gt;(ldots)}: Reordering of particle arrays, based on sorted particle-grid-hash values (texttt{sorting.cuh}).</p></li>
          145 <li><p>CUDA thread synchronization point.</p></li>
          146 <li><p>Optional: texttt{topology&lt;&lt;&lt;,&gt;&gt;&gt;(ldots)}: If particle contact history is required by the contact model, particle contacts are identified, and stored per particle. Previous, now non-existant contacts are discarded (texttt{contactsearch.cuh}).</p></li>
          147 <li><p>CUDA thread synchronization point.</p></li>
          148 <li><p>texttt{interact&lt;&lt;&lt;,&gt;&gt;&gt;(ldots)}: For each particle: Search of contacts in neighbor cells, processing of optional collisions and updating of resulting forces and torques. Values are written to read/write device memory arrays (texttt{contactsearch.cuh}).</p></li>
          149 <li><p>CUDA thread synchronization point.</p></li>
          150 <li><p>texttt{integrate&lt;&lt;&lt;,&gt;&gt;&gt;(ldots)}: Updating of spatial degrees of freedom by a second-order Taylor series expansion integration (texttt{integration.cuh}).</p></li>
          151 <li><p>CUDA thread synchronization point.</p></li>
          152 <li><p>texttt{summation&lt;&lt;&lt;,&gt;&gt;&gt;(ldots)}: Particle contributions to the net force on the walls are summated (texttt{integration.cuh}).</p></li>
          153 <li><p>CUDA thread synchronization point.</p></li>
          154 <li><p>texttt{integrateWalls&lt;&lt;&lt;,&gt;&gt;&gt;(ldots)}: Updating of spatial degrees of freedom of walls (texttt{integration.cuh}).</p></li>
          155 <li><p>Update of timers and loop-related counters (e.g. texttt{time.current}), (texttt{device.cu}).</p></li>
          156 <li><p>If file output interval is reached:</p>
          157 <blockquote>
          158 <div><ul class="simple">
          159 <li><p>Optional write of data to output binary (verb”&lt;simulation_ID&gt;.output#..bin”), (texttt{file_io.cpp}).</p></li>
          160 <li><p>Update of verb”&lt;simulation_ID&gt;.status#..bin” (texttt{device.cu}).</p></li>
          161 </ul>
          162 </div></blockquote>
          163 </li>
          164 <li><p>Return to point ref{loopstart}, unless texttt{time.current &gt;= time.total}, in which case the program continues to point ref{loopend}.</p></li>
          165 </ol>
          166 </div></blockquote>
          167 <ol class="arabic simple">
          168 <li><p>label{loopend}Liberation of device memory (texttt{device.cu}).</p></li>
          169 <li><p>Control returned to texttt{main(ldots)}, liberation of host memory (texttt{main.cpp}).</p></li>
          170 <li><p>End of program, return status equal to zero (0) if no problems where encountered.</p></li>
          171 </ol>
          172 <p>The length of the computational time steps (texttt{time.dt}) is calculated via equation ref{eq:dt}, where length of the time intervals is defined by:</p>
          173 <div class="math">
          174 <p><img src="_images/math/e721d97e515c11c9ef7209be37e1c98029ffdfa2.png" alt="\Delta t = 0.075 \min \left( m/\max(k_n,k_t) \right)"/></p>
          175 </div><p>where <img class="math" src="_images/math/e9bc7da808d33a16a8347f27a519bd067186aa66.png" alt="m"/> is the particle mass, and <img class="math" src="_images/math/9630132210b904754c9ab272b61cb527d12263ca.png" alt="k"/> are the elastic stiffnesses.
          176 The time step is set by this relationship in <code class="xref py py-func docutils literal notranslate"><span class="pre">initTemporal()</span></code>.
          177 This equation ensures that the elastic wave (traveling at the speed of sound) is resolved a number of times while traveling through the smallest particle.</p>
          178 <p>subsubsection{Host and device memory types}
          179 label{subsubsec:memorytypes}
          180 A full, listed description of the <em>sphere</em> source code variables can be found in appendix ref{apx:SourceCodeVariables}, page pageref{apx:SourceCodeVariables}. There are three types of memory types employed in the <em>sphere</em> source code, with different characteristics and physical placement in the system (figure ref{fig:memory}).</p>
          181 <p>The floating point precision operating internally in <em>sphere</em> is defined in texttt{datatypes.h}, and can be either single (texttt{float}), or double (texttt{double}). Depending on the GPU, the calculations are performed about double as fast in single precision, in relation to double precision. In dense granular configuraions, the double precision however results in greatly improved numerical stability, and is thus set as the default floating point precision. The floating point precision is stored as the type definitions texttt{Float}, texttt{Float3} and texttt{Float4}. The floating point values in the in- and output datafiles are emph{always} written in double precision, and, if necessary, automatically converted by <em>sphere</em>.</p>
          182 <p>Three-dimensional variables (e.g. spatial vectors in <cite>E^3</cite>) are in global memory stored as texttt{Float4} arrays, since these read and writes can be coalesced, while e.g. texttt{float3}’s cannot. This alone yields a <cite>sim`20</cite>times` performance boost, even though it involves 25% more (unused) data.</p>
          183 <p>paragraph{Host memory} is the main random-access computer memory (RAM), i.e. read and write memory accessible by CPU processes, but inaccessible by CUDA kernels executed on the device.</p>
          184 <p>paragraph{Device memory} is the main, global device memory. It resides off-chip on the GPU, often in the form of 1–6 GB DRAM. The read/write access from the CUDA kernels is relatively slow. The arrays residing in (global) device memory are prefixed by <code class="docutils literal notranslate"><span class="pre">dev_</span></code> in the source code.</p>
          185 <p>marginpar{Todo: Expand section on device memory types}</p>
          186 <p>paragraph{Constant memory} values cannot be changed after they are set, and are used for scalars or small vectors. Values are set in the <code class="docutils literal notranslate"><span class="pre">transferToConstantMemory(...)}</span></code> function, called in the beginning of texttt{gpuMain(ldots)} in texttt{device.cu}. Constant memory variables have a global scope, and are prefixed by <code class="docutils literal notranslate"><span class="pre">devC_</span></code> in the source code.</p>
          187 <p>%subsection{The main loop}
          188 %label{subsec:mainloop}
          189 %The <em>sphere</em> software calculates particle movement and rotation based on the forces applied to it, by application of Newton’s law of motion (Newton’s second law with constant particle mass: <cite>F_{mathrm{net}} = m cdot a_{mathrm{cm}}</cite>). This is done in a series of algorithmic steps, see list on page pageref{loopstart}. The steps are explained in the following sections with reference to the <em>sphere</em>-source file; texttt{sphere.cu}. The intent with this document is emph{not} to give a full theoretical background of the methods, but rather how the software performs the calculations.</p>
          190 <p>subsection{Performance}
          191 marginpar{Todo: insert graph of performance vs. np and performance vs. <cite>Delta t</cite>}.
          192 subsubsection{Particles and computational time}</p>
          193 <p>subsection{Compilation}
          194 label{subsec:compilation}
          195 An important note is that the texttt{C} examples of the NVIDIA CUDA SDK should be compiled before <em>sphere</em>. Consult the <cite>Getting started guide</cite>, supplied by Nvidia for details on this step.</p>
          196 <p><em>sphere</em> is supplied with several Makefiles, which automate the compilation process. To compile all components, open a shell, go to the texttt{src/} subfolder and type texttt{make}. The GNU Make will return the parameters passed to the individual CUDA and GNU compilers (texttt{nvcc} and texttt{gcc}). The resulting binary file (texttt{sphere}) is placed in the <em>sphere</em> root folder. <code class="docutils literal notranslate"><span class="pre">src/Makefile</span></code> will also compile the raytracer.</p>
          197 </div>
          198 <div class="section" id="c-reference">
          199 <h2>C++ reference<a class="headerlink" href="#c-reference" title="Permalink to this headline">¶</a></h2>
          200 <dl class="class">
          201 <dt id="_CPPv43DEM">
          202 <span id="_CPPv33DEM"></span><span id="_CPPv23DEM"></span><span id="DEM"></span><span class="target" id="classDEM"></span><em class="property">class </em><code class="sig-name descname">DEM</code><a class="headerlink" href="#_CPPv43DEM" title="Permalink to this definition">¶</a><br /></dt>
          203 <dd></dd></dl>
          204 
          205 </div>
          206 </div>
          207 
          208 
          209           </div>
          210         </div>
          211       </div>
          212       <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
          213         <div class="sphinxsidebarwrapper">
          214   <h3><a href="index.html">Table of Contents</a></h3>
          215   <ul>
          216 <li><a class="reference internal" href="#">sphere internals</a><ul>
          217 <li><a class="reference internal" href="#numerical-algorithm">Numerical algorithm</a></li>
          218 <li><a class="reference internal" href="#c-reference">C++ reference</a></li>
          219 </ul>
          220 </li>
          221 </ul>
          222 
          223   <h4>Previous topic</h4>
          224   <p class="topless"><a href="python_api.html"
          225                         title="previous chapter">Python API</a></p>
          226   <div role="note" aria-label="source link">
          227     <h3>This Page</h3>
          228     <ul class="this-page-menu">
          229       <li><a href="_sources/sphere_internals.rst.txt"
          230             rel="nofollow">Show Source</a></li>
          231     </ul>
          232    </div>
          233 <div id="searchbox" style="display: none" role="search">
          234   <h3 id="searchlabel">Quick search</h3>
          235     <div class="searchformwrapper">
          236     <form class="search" action="search.html" method="get">
          237       <input type="text" name="q" aria-labelledby="searchlabel" />
          238       <input type="submit" value="Go" />
          239     </form>
          240     </div>
          241 </div>
          242 <script type="text/javascript">$('#searchbox').show(0);</script>
          243         </div>
          244       </div>
          245       <div class="clearer"></div>
          246     </div>
          247     <div class="related" role="navigation" aria-label="related navigation">
          248       <h3>Navigation</h3>
          249       <ul>
          250         <li class="right" style="margin-right: 10px">
          251           <a href="genindex.html" title="General Index"
          252              >index</a></li>
          253         <li class="right" >
          254           <a href="py-modindex.html" title="Python Module Index"
          255              >modules</a> |</li>
          256         <li class="right" >
          257           <a href="python_api.html" title="Python API"
          258              >previous</a> |</li>
          259         <li class="nav-item nav-item-0"><a href="index.html">sphere 2.15-beta documentation</a> &#187;</li> 
          260       </ul>
          261     </div>
          262     <div class="footer" role="contentinfo">
          263         &#169; Copyright 2014, Anders Damsgaard.
          264       Created using <a href="http://sphinx-doc.org/">Sphinx</a> 2.2.0.
          265     </div>
          266   </body>
          267