<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://bakhi.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://bakhi.github.io/" rel="alternate" type="text/html" /><updated>2026-05-06T00:30:45+00:00</updated><id>https://bakhi.github.io/feed.xml</id><title type="html">Heejin Park</title><subtitle>Personal Website</subtitle><author><name>Heejin Park</name></author><entry><title type="html">DSTAT - Resource Monitoring</title><link href="https://bakhi.github.io/productivity/dstat/" rel="alternate" type="text/html" title="DSTAT - Resource Monitoring" /><published>2022-04-14T00:00:00+00:00</published><updated>2022-04-14T00:00:00+00:00</updated><id>https://bakhi.github.io/productivity/dstat</id><content type="html" xml:base="https://bakhi.github.io/productivity/dstat/"><![CDATA[<p>When you want to see the information or statistics of system such as IO devices, CPU and network, <strong>dstat</strong> can be the one that is easy to utilize. dstat is a monitoring tool that shows resource utilizations in real time. It facilitates system monitoring while performing like vmstat, netstat, iostat, etc.</p>

<h2 id="how-to-install">How to install?</h2>

<p><strong>Ubuntu and debian</strong></p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sudo </span>apt <span class="nb">install </span>dstat
</code></pre></div></div>

<p><strong>CentOS and Fedora</strong></p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sudo </span>yum <span class="nb">install </span>dstat
</code></pre></div></div>

<p><strong>macOS</strong></p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>brew <span class="nb">install </span>tmux
</code></pre></div></div>

<h2 id="how-to-use">How to use?</h2>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>dstat
</code></pre></div></div>

<p><img src="/assets/images/posts/dstat.png" alt="dstat" /></p>

<p>The default options of dstat is <code class="language-plaintext highlighter-rouge">-cdngy</code>, showing the following statistics.</p>

<ul>
  <li><strong>CPU usage</strong> (-c, –cpu) : CPU usage by user processes, system processes and CPU idle time. By using <code class="language-plaintext highlighter-rouge">-C 0,1,2,3,...</code> you can check each core state separately.</li>
  <li>
    <p><strong>Disk stats</strong> (-d, –disk) : Disk usage (read and write). <code class="language-plaintext highlighter-rouge">-D sda, dba, ...</code>  option shows each storage information separately.</p>
  </li>
  <li><strong>Network stats</strong> (-c, –net) : Network throughput (send and receive). <code class="language-plaintext highlighter-rouge">-N eth0, eth1, ...</code> option shows each network interface  stat separately.</li>
  <li>
    <p><strong>Page stats</strong> (-g, –page) : Paging in/out stat</p>
  </li>
  <li><strong>System stats</strong> (-y, –sys) : System status including interrupts and context switches</li>
</ul>

<h4 id="useful-options">Useful options</h4>

<ul>
  <li><code class="language-plaintext highlighter-rouge">top-cpu</code> : shows the most expensive CPU process</li>
  <li><code class="language-plaintext highlighter-rouge">top-cputime</code> : shows a process using the most CPU time (in ms)</li>
  <li><code class="language-plaintext highlighter-rouge">top-mem </code>: shows a process using the most memory</li>
  <li><code class="language-plaintext highlighter-rouge">top-io</code> : shows the most expensive I/O process</li>
</ul>

<p class="notice--info"><strong>Tip</strong>: the sequence of typed options match the statistics printed out. For instance, if you type <code class="language-plaintext highlighter-rouge">dstat -dnc</code>, dstat shows the disk, network, and CPU stats in order.</p>

<h2 id="references">References</h2>

<p><a href="https://linux.die.net/man/1/dstat">Manual</a></p>

<p><a href="https://github.com/dstat-real/dstat">Source repository</a></p>]]></content><author><name>Heejin Park</name></author><category term="Productivity" /><category term="Linux utilities" /><summary type="html"><![CDATA[When you want to see the information or statistics of system such as IO devices, CPU and network, dstat can be the one that is easy to utilize. dstat is a monitoring tool that shows resource utilizations in real time. It facilitates system monitoring while performing like vmstat, netstat, iostat, etc.]]></summary></entry><entry><title type="html">TMUX - A Terminal Multiplexer</title><link href="https://bakhi.github.io/productivity/tmux/" rel="alternate" type="text/html" title="TMUX - A Terminal Multiplexer" /><published>2022-04-07T00:00:00+00:00</published><updated>2022-04-07T00:00:00+00:00</updated><id>https://bakhi.github.io/productivity/tmux</id><content type="html" xml:base="https://bakhi.github.io/productivity/tmux/"><![CDATA[<p>You may want to see multiple terminals in a single screen. For instance, you want to edit the code while monitoring the system utilization or kernel log from serial communication. Tmux is a terminal multiplexer which allows to split your screen to multiple terminals. You can also make program keep running while not visible in the background.</p>

<p><img src="/assets/images/posts/tmux.png" alt="tmux" /></p>

<h2 id="how-to-install">How to install?</h2>

<p><strong>Ubuntu and debian</strong></p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sudo </span>apt <span class="nb">install </span>tmux
</code></pre></div></div>

<p><strong>CentOS and Fedora</strong></p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sudo </span>yum <span class="nb">install </span>tmux
</code></pre></div></div>

<p><strong>macOS</strong></p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>brew <span class="nb">install </span>tmux
</code></pre></div></div>

<h2 id="components">Components</h2>

<ul>
  <li>Session: basic unit of tmux consists of multiple windows</li>
  <li>Window: single terminal screen</li>
  <li>Pane: divided screen from a single window.</li>
</ul>

<p class="notice--info">When you launch tmux, you may create a new session and generates multiple windows in it. Each window can be divided into multiple panes.</p>

<h2 id="commands">Commands</h2>

<p><code class="language-plaintext highlighter-rouge">tmux</code>: start new window</p>

<p><code class="language-plaintext highlighter-rouge">tmux ls</code> : list all available sessions</p>

<p><code class="language-plaintext highlighter-rouge">tmux attach</code>: attach the session</p>

<p><code class="language-plaintext highlighter-rouge">tmux detatch</code>: detach the session</p>

<h2 id="shortcuts">Shortcuts</h2>

<p><code class="language-plaintext highlighter-rouge">CTRL + b</code> : prefix key by default.</p>

<p><code class="language-plaintext highlighter-rouge">[prefix] + </code> : command mode where you can type tmux commands</p>

<p><code class="language-plaintext highlighter-rouge">[prefix] + c</code> : creates a new window</p>

<p><code class="language-plaintext highlighter-rouge">[prefix] + 0-9 </code> : switch to a window with its index</p>

<p><code class="language-plaintext highlighter-rouge">[prefix] + " </code> : split the active pane horizontally</p>

<p><code class="language-plaintext highlighter-rouge">[prefix] + % </code> : split the active pane vertically</p>

<p><code class="language-plaintext highlighter-rouge">[prefix] + arrow key </code> : switch to another pane</p>

<p><code class="language-plaintext highlighter-rouge">keep [prefix] pressed + arrow key </code> : resize the active pane</p>

<p><code class="language-plaintext highlighter-rouge">[prefix] + z </code> : zoom out the active pane</p>

<p><code class="language-plaintext highlighter-rouge">[prefix] + x </code> : kill the current window</p>

<p><code class="language-plaintext highlighter-rouge">[prefix] + d </code> : detach the session</p>

<h2 id="configuration">Configuration</h2>

<p>Like <code class="language-plaintext highlighter-rouge">~/.bashrc</code>, you can set local configuration to <code class="language-plaintext highlighter-rouge">~/tmux.conf</code>.  The global configuration file should be located at <code class="language-plaintext highlighter-rouge">/etc/tmux.conf</code>.</p>

<p>You can make your own configuration.</p>

<p>FYI - <a href="https://github.com/gpakosz/.tmux">A good example</a></p>]]></content><author><name>Heejin Park</name></author><category term="Productivity" /><category term="Linux utilities" /><summary type="html"><![CDATA[You may want to see multiple terminals in a single screen. For instance, you want to edit the code while monitoring the system utilization or kernel log from serial communication. Tmux is a terminal multiplexer which allows to split your screen to multiple terminals. You can also make program keep running while not visible in the background.]]></summary></entry><entry><title type="html">DLXOS Setup</title><link href="https://bakhi.github.io/operating%20system/DLXOS-setup/" rel="alternate" type="text/html" title="DLXOS Setup" /><published>2021-01-25T00:00:00+00:00</published><updated>2021-01-25T00:00:00+00:00</updated><id>https://bakhi.github.io/operating%20system/DLXOS-setup</id><content type="html" xml:base="https://bakhi.github.io/operating%20system/DLXOS-setup/"><![CDATA[<h1 id="setup-dlxos-with-ubuntu-2004">Setup DLXOS with Ubuntu 20.04</h1>

<p>This post is for the students of EE469, who want to set dev env in the local machine where Ubuntu 20.04 is installed.</p>

<ul>
  <li>No VM is needed</li>
  <li>No access <code class="language-plaintext highlighter-rouge">ecegrid</code> or <code class="language-plaintext highlighter-rouge">shay</code> servers</li>
</ul>

<p>Step 1) install dependencies</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sudo </span>add-apt-repository https://packages.ubuntu.com/focal/libstdc++5
<span class="nv">$ </span><span class="nb">sudo </span>apt update
<span class="nv">$ </span><span class="nb">sudo </span>apt <span class="nb">install</span> <span class="nt">-y</span> libstdc++5:i386 libstdc++5:amd64
</code></pre></div></div>

<p>Step 2) Make your own directory</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cd</span> ~/
<span class="nv">$ </span><span class="nb">mkdir </span>EE469
<span class="nv">$ </span><span class="nb">cd </span>EE469
</code></pre></div></div>

<p>Step 3) Copy DLXOS and from shay server</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>scp <span class="o">[</span>your-account]@ecegrid.ecn.purdue.edu:~ee469/labs/common/dlxos_new.tar.gz <span class="nb">.</span>
<span class="nv">$ </span><span class="nb">tar</span> <span class="nt">-xvzf</span> dlxos_new.tar.gz
</code></pre></div></div>

<p>Step 4) Setup path</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vi ~/.bashrc
<span class="c"># at the end of the file, append the following lines</span>
<span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span><span class="nv">$HOME</span>/ee469/dlxos_new/bin:<span class="nv">$PATH</span>
<span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span>~/ee469/dlxos_new/bin:<span class="nv">$PATH</span>
</code></pre></div></div>]]></content><author><name>Heejin Park</name></author><category term="Operating system" /><category term="Operating system" /><summary type="html"><![CDATA[Setup DLXOS with Ubuntu 20.04]]></summary></entry><entry><title type="html">Directly Access Your Physical Memory (dev/mem)</title><link href="https://bakhi.github.io/devmem/" rel="alternate" type="text/html" title="Directly Access Your Physical Memory (dev/mem)" /><published>2020-12-10T00:00:00+00:00</published><updated>2020-12-10T00:00:00+00:00</updated><id>https://bakhi.github.io/devmem</id><content type="html" xml:base="https://bakhi.github.io/devmem/"><![CDATA[<h1 id="1-what-is-devmem">1 What is /dev/mem?</h1>

<p>“/dev/mem” is a character device file, image of the main memory of system. It allows to directly access any phys address.</p>

<h1 id="2-how-to-use">2 How to use</h1>

<h2 id="21-requisites">2.1 Requisites</h2>

<p>To use /dev/mem, your kernel must be configured with “CONFIG_STRICT_DEVMEM=n”, or it prevent access from even privileged user.</p>

<ul>
  <li>“CONFIG_STRICT_DEVMEM=y”, the default kernel configuration in general, disallows to access RAM area via /dev/mem or only allows first 1MB size of RAM</li>
</ul>

<p>FYI: “CONFIG_IO_STRICT_DEVMEM=y”, disallows to access register via dev/mem</p>

<h2 id="22-lets-code">2.2 Let’s code</h2>

<p>To start with, open dev/mem device file and map the phys address we are interested in. Note that the phys address should be page-aligned. Depending on request, read or write value to the mapped address.  To avoid a code sequence optimization from compiler, use <code class="language-plaintext highlighter-rouge">volatile</code> when you read or write.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">mem_fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="s">"/dev/mem"</span><span class="p">,</span> <span class="n">O_RDWR</span> <span class="o">|</span> <span class="n">O_SYNC</span><span class="p">);</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">map_base</span> <span class="o">=</span> <span class="n">mmap</span><span class="p">(</span><span class="nb">NULL</span><span class="p">,</span>
			<span class="n">PAGE_SIZE</span><span class="p">,</span>
			<span class="n">PROT_READ</span> <span class="o">|</span> <span class="n">PROT_WRITE</span><span class="p">,</span>
			<span class="n">MAP_SHARED</span><span class="p">,</span>
			<span class="n">mem_fd</span><span class="p">,</span>
			<span class="n">phys_addr</span><span class="p">);</span>	<span class="c1">// phys_addr should be page-aligned.	</span>

<span class="kt">void</span> <span class="o">*</span><span class="n">virt_addr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="n">map_base</span> <span class="o">+</span> <span class="n">offset_in_page</span><span class="p">;</span>

<span class="k">if</span> <span class="p">(</span><span class="n">is_read</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">read_result</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="k">volatile</span> <span class="kt">uint64_t</span><span class="o">*</span><span class="p">)</span><span class="n">virt_addr</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>	<span class="c1">// write</span>
    <span class="o">*</span><span class="p">(</span><span class="k">volatile</span> <span class="kt">uint64_t</span><span class="o">*</span><span class="p">)</span><span class="n">virt_addr</span> <span class="o">=</span> <span class="n">write_value</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<center><b>Simple example for using /dev/mem</b></center>

<p>Useful tool: <a href="https://busybox.net/">busybox</a>.</p>

<p><center><img src="/assets/images/posts/busybox.png" style="width: 70%; height: 70%" /></center>
<center><b>Example of devmem utility from busybox: it tries to directly access MMIO region of Mali GPU and achieves its GPU ID.</b></center></p>

<h1 id="3-problem-impact-of-cache">3 Problem: impact of cache</h1>

<p>If you directly use RAM, the memory might be cached by the CPU which possibly incurs cache coherence problem.</p>

<ul>
  <li>When you write, the exact value could not be written to the memory yet but CPU cache.</li>
  <li>We don’t know when it will be flushed.</li>
</ul>

<p><strong>Scenario</strong>: I implemented kernel module that allocates pages by alloc_pages (low-level page request mechanism). When user-space application requested, the module allocates the page and passes its phys address to the user-space. The application starts to write something into the allocated page by using /dev/mem.</p>

<p><strong>Observation</strong>: When writing some values via /dev/mem, it seems the phys memory is corrupted after the write is done as follows.</p>

<p>
<center><img src="/assets/images/posts/devmem_0.png" /></center>
<center>Memory is being corrupted as time goes on</center>
</p>

<p><strong>Guess</strong>: It seems that phys memory where we try to write was cached before. Then we write it directly using dev/mem which bypasses the CPU cache and later, the cache is flushed which corrupts the data we wrote. This may be affected by other cores as well when the device has multiple cores.</p>

<p><strong>Troubleshooting</strong>:</p>

<p>i) turn off all the other cores except for the main core (cpu0) which seems work (no data corruption; no flush from other cores)</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">echo </span>0 | <span class="nb">sudo tee</span> /sys/devices/system/cpu/cpu1/online
</code></pre></div></div>

<p>ii) map the allocated phys memory as DMA memory, which prevents cached access.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">page</span> <span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="n">alloc_pages</span><span class="p">(</span><span class="n">gfp</span><span class="p">,</span> <span class="n">pool</span><span class="o">-&gt;</span><span class="n">order</span><span class="p">);</span>
<span class="n">dma_addr_t</span> <span class="n">dma_addr</span> <span class="o">=</span> <span class="n">dma_map_page</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="n">PAGE_SIZE</span> <span class="o">&lt;&lt;</span> <span class="n">pool</span><span class="o">-&gt;</span><span class="n">order</span><span class="p">),</span> <span class="n">DMA_BIDIRECTIONAL</span><span class="p">);</span>
</code></pre></div></div>

<p class="notice--info"><strong>Note</strong>: Depending on the memory region the phys address you try to access, the mapping is done with either “pgprot_noncached” or “pgprot_writecombined”. For instance, MMIO region is mapped as <code class="language-plaintext highlighter-rouge">noncached</code> while normal memory region as <code class="language-plaintext highlighter-rouge">writecombined.</code> It also depends on the your kernel arch, so look into how it is implemented in your kernel source code, “<em>driver/char/ mem.c</em>”.</p>]]></content><author><name>Heejin Park</name></author><category term="dev/mem" /><category term="phys mem access" /><summary type="html"><![CDATA[1 What is /dev/mem?]]></summary></entry><entry><title type="html">Mali Bifrost - Cache Clean</title><link href="https://bakhi.github.io/mali-cache-clean/" rel="alternate" type="text/html" title="Mali Bifrost - Cache Clean" /><published>2020-11-03T00:00:00+00:00</published><updated>2020-11-03T00:00:00+00:00</updated><id>https://bakhi.github.io/mali-cache-clean</id><content type="html" xml:base="https://bakhi.github.io/mali-cache-clean/"><![CDATA[<h1 id="what-invokes-cache-clean">What Invokes Cache Clean?</h1>

<ul>
  <li>When power state is changed (see kbase_pm_l2_update_state() / kbase_pm_shaders_update_state())</li>
  <li>When the job is done (see jd_done_worker()) – unlikely happen</li>
  <li>When GPU context is switched (see at kbase_js_pull() / kbase_js_unpull()) – unlikely happen</li>
</ul>

<p>mali_kbase_jm_rb.c</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">1633</span> <span class="kt">void</span> <span class="nf">kbase_backend_complete_wq</span><span class="p">(</span><span class="k">struct</span> <span class="n">kbase_device</span> <span class="o">*</span><span class="n">kbdev</span><span class="p">,</span>
<span class="mi">1634</span>                         <span class="k">struct</span> <span class="n">kbase_jd_atom</span> <span class="o">*</span><span class="n">katom</span><span class="p">)</span>
<span class="mi">1635</span> <span class="p">{</span>
<span class="mi">1636</span>     <span class="cm">/*
1637      * If cache flush required due to HW workaround then perform the flush
1638      * now
1639      */</span>
<span class="mi">1640</span>     <span class="n">kbase_backend_cache_clean</span><span class="p">(</span><span class="n">kbdev</span><span class="p">,</span> <span class="n">katom</span><span class="p">);</span>
<span class="mi">1641</span> <span class="err">}</span>
</code></pre></div></div>

<p>mali_kbase_device_hw.c</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">872</span> <span class="kt">void</span> <span class="n">kbase_gpu_start_cache_clean_nolock</span><span class="p">(</span><span class="k">struct</span> <span class="n">kbase_device</span> <span class="o">*</span><span class="n">kbdev</span><span class="p">)</span>
 <span class="mi">873</span> <span class="p">{</span>
 <span class="mi">874</span>     <span class="n">u32</span> <span class="n">irq_mask</span><span class="p">;</span>
 <span class="mi">875</span> 
 <span class="mi">876</span>     <span class="n">lockdep_assert_held</span><span class="p">(</span><span class="o">&amp;</span><span class="n">kbdev</span><span class="o">-&gt;</span><span class="n">hwaccess_lock</span><span class="p">);</span>
 <span class="mi">877</span> 
 <span class="mi">878</span>     <span class="k">if</span> <span class="p">(</span><span class="n">kbdev</span><span class="o">-&gt;</span><span class="n">cache_clean_in_progress</span><span class="p">)</span> <span class="p">{</span>
 <span class="mi">879</span>         <span class="cm">/* If this is called while another clean is in progress, we
 880          * can't rely on the current one to flush any new changes in
 881          * the cache. Instead, trigger another cache clean immediately
 882          * after this one finishes.
 883          */</span>
 <span class="mi">884</span>         <span class="n">kbdev</span><span class="o">-&gt;</span><span class="n">cache_clean_queued</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
 <span class="mi">885</span>         <span class="k">return</span><span class="p">;</span>
 <span class="mi">886</span>     <span class="p">}</span>
 <span class="mi">887</span> 
 <span class="mi">888</span>     <span class="cm">/* Enable interrupt */</span>
 <span class="mi">889</span>     <span class="cm">/** EE("GPU_IRQ_MASK - CLEAN_CACHES_COMPLETED"); */</span>
 <span class="mi">890</span>     <span class="n">irq_mask</span> <span class="o">=</span> <span class="n">kbase_reg_read</span><span class="p">(</span><span class="n">kbdev</span><span class="p">,</span> <span class="n">GPU_CONTROL_REG</span><span class="p">(</span><span class="n">GPU_IRQ_MASK</span><span class="p">));</span>
 <span class="mi">891</span>     <span class="n">kbase_reg_write</span><span class="p">(</span><span class="n">kbdev</span><span class="p">,</span> <span class="n">GPU_CONTROL_REG</span><span class="p">(</span><span class="n">GPU_IRQ_MASK</span><span class="p">),</span>                                                                                                
 <span class="mi">892</span>                 <span class="n">irq_mask</span> <span class="o">|</span> <span class="n">CLEAN_CACHES_COMPLETED</span><span class="p">);</span>
 <span class="mi">893</span> 
 <span class="mi">894</span>     <span class="n">KBASE_TRACE_ADD</span><span class="p">(</span><span class="n">kbdev</span><span class="p">,</span> <span class="n">CORE_GPU_CLEAN_INV_CACHES</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0u</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
 <span class="mi">895</span>     <span class="n">kbase_reg_write</span><span class="p">(</span><span class="n">kbdev</span><span class="p">,</span> <span class="n">GPU_CONTROL_REG</span><span class="p">(</span><span class="n">GPU_COMMAND</span><span class="p">),</span>
 <span class="mi">896</span>                     <span class="n">GPU_COMMAND_CLEAN_INV_CACHES</span><span class="p">);</span>
 <span class="mi">897</span> 
 <span class="mi">898</span>     <span class="n">kbdev</span><span class="o">-&gt;</span><span class="n">cache_clean_in_progress</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
 <span class="mi">899</span> <span class="p">}</span>
</code></pre></div></div>

<p>Besides, the device driver configures the job slot if cache clean and/or invalidate will be required before and after the job is executed. The configuration is done right before putting job chain to the slot. While it is done by the device driver, the configuration, in fact, instructed by the user-space app/rutnime that is in the atom structure as core_req.</p>

<h1 id="pm-policy">PM Policy</h1>

<p>mali_kbase_pm_policy.c</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">32</span> <span class="k">static</span> <span class="k">const</span> <span class="k">struct</span> <span class="n">kbase_pm_policy</span> <span class="o">*</span><span class="k">const</span> <span class="n">all_policy_list</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
   <span class="mi">33</span> <span class="err">#</span><span class="n">ifdef</span> <span class="n">CONFIG_MALI_NO_MALI</span>
   <span class="mi">34</span>     <span class="o">&amp;</span><span class="n">kbase_pm_always_on_policy_ops</span><span class="p">,</span>
   <span class="mi">35</span>     <span class="o">&amp;</span><span class="n">kbase_pm_coarse_demand_policy_ops</span><span class="p">,</span>
   <span class="mi">36</span> <span class="err">#</span><span class="k">if</span> <span class="o">!</span><span class="n">MALI_CUSTOMER_RELEASE</span>
   <span class="mi">37</span>     <span class="o">&amp;</span><span class="n">kbase_pm_always_on_demand_policy_ops</span><span class="p">,</span>
   <span class="mi">38</span> <span class="err">#</span><span class="n">endif</span>
   <span class="mi">39</span> <span class="err">#</span><span class="k">else</span>               <span class="cm">/* CONFIG_MALI_NO_MALI */</span>
   <span class="mi">40</span>     <span class="o">&amp;</span><span class="n">kbase_pm_coarse_demand_policy_ops</span><span class="p">,</span>
   <span class="mi">41</span> <span class="err">#</span><span class="k">if</span> <span class="o">!</span><span class="n">MALI_CUSTOMER_RELEASE</span>
   <span class="mi">42</span>     <span class="o">&amp;</span><span class="n">kbase_pm_always_on_demand_policy_ops</span><span class="p">,</span>
   <span class="mi">43</span> <span class="err">#</span><span class="n">endif</span>  
   <span class="mi">44</span>     <span class="o">&amp;</span><span class="n">kbase_pm_always_on_policy_ops</span>
   <span class="mi">45</span> <span class="err">#</span><span class="n">endif</span> <span class="cm">/* CONFIG_MALI_NO_MALI */</span>
   <span class="mi">46</span> <span class="p">};</span>

</code></pre></div></div>

<p>The device driver manages the GPU power state by continuously reading the state from the GPU and updating it. For instance, if no in-flight jobs, the device driver tries to turn off the shader and thus L2/tiler cores for power saving. The “pm_always_on” guarantees no power related register I/O during run time.</p>

<h1 id="gpu-protected-mode">GPU Protected Mode</h1>
<ul>
  <li>L2 shall be powered down and GPU shall come out of fully coherent mode before entering protected mode.</li>
  <li>When entering into protected mode, we must ensure that the GPU is not operating in coherent mode as well. This is to ensure that no protected memory can be leaked.</li>
</ul>

<p>From the comments in the source code, I guess the protected mode prevents data leakage possible from cache coherence/flush but could not find an caller to enter it.</p>]]></content><author><name>Heejin Park</name></author><category term="GPU" /><category term="Mali" /><category term="Bifrost" /><summary type="html"><![CDATA[What Invokes Cache Clean?]]></summary></entry><entry><title type="html">Tegra SoC Host1X</title><link href="https://bakhi.github.io/mobile%20gpu/host1x/" rel="alternate" type="text/html" title="Tegra SoC Host1X" /><published>2020-10-25T00:00:00+00:00</published><updated>2020-10-25T00:00:00+00:00</updated><id>https://bakhi.github.io/mobile%20gpu/host1x</id><content type="html" xml:base="https://bakhi.github.io/mobile%20gpu/host1x/"><![CDATA[<p>https://lists.freedesktop.org/archives/dri-devel/2012-December/031410.html</p>

<h1 id="tegra-reverse-engineering">Tegra Reverse Engineering</h1>
<ul>
  <li>https://github.com/kusma/tegra-re</li>
  <li>https://github.com/grate-driver/grate</li>
</ul>

<h1 id="ocelot">Ocelot</h1>
<p>Opensource JIT compilation for GPU compute applications</p>]]></content><author><name>Heejin Park</name></author><category term="Mobile GPU" /><category term="GPU" /><category term="Nvidia" /><category term="Jetson Nano" /><summary type="html"><![CDATA[https://lists.freedesktop.org/archives/dri-devel/2012-December/031410.html]]></summary></entry><entry><title type="html">Explore Jetson Nano GPU Driver</title><link href="https://bakhi.github.io/jetson-nano/" rel="alternate" type="text/html" title="Explore Jetson Nano GPU Driver" /><published>2020-10-19T00:00:00+00:00</published><updated>2020-10-19T00:00:00+00:00</updated><id>https://bakhi.github.io/jetson-nano</id><content type="html" xml:base="https://bakhi.github.io/jetson-nano/"><![CDATA[<p>Analyze Nvidia Jetson Nano device driver code to understand how job is submitted and interacted with IRQ.</p>

<h1 id="1-job-submission">1. Job Submission</h1>

<p style="text-align: right;">nvgpu/include/nvgpu/channel.h</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">114</span> <span class="k">struct</span> <span class="n">priv_cmd_queue</span> <span class="p">{</span>
<span class="mi">115</span>     <span class="k">struct</span> <span class="n">nvgpu_mem</span> <span class="n">mem</span><span class="p">;</span>
<span class="mi">116</span>     <span class="n">u32</span> <span class="n">size</span><span class="p">;</span>   <span class="cm">/* num of entries in words */</span>
<span class="mi">117</span>     <span class="n">u32</span> <span class="n">put</span><span class="p">;</span>    <span class="cm">/* put for priv cmd queue */</span>
<span class="mi">118</span>     <span class="n">u32</span> <span class="n">get</span><span class="p">;</span>    <span class="cm">/* get for priv cmd queue */</span>
<span class="mi">119</span> <span class="p">};</span>
<span class="mi">120</span> 
<span class="mi">121</span> <span class="k">struct</span> <span class="n">priv_cmd_entry</span> <span class="p">{</span>
<span class="mi">122</span>     <span class="n">bool</span> <span class="n">valid</span><span class="p">;</span>
<span class="mi">123</span>     <span class="k">struct</span> <span class="n">nvgpu_mem</span> <span class="o">*</span><span class="n">mem</span><span class="p">;</span>
<span class="mi">124</span>     <span class="n">u32</span> <span class="n">off</span><span class="p">;</span>    <span class="cm">/* offset in mem, in u32 entries */</span>
<span class="mi">125</span>     <span class="n">u64</span> <span class="n">gva</span><span class="p">;</span>
<span class="mi">126</span>     <span class="n">u32</span> <span class="n">get</span><span class="p">;</span>    <span class="cm">/* start of entry in queue */</span>
<span class="mi">127</span>     <span class="n">u32</span> <span class="n">size</span><span class="p">;</span>   <span class="cm">/* in words */</span>
<span class="mi">128</span> <span class="p">};</span>
<span class="mi">129</span> 
<span class="mi">130</span> <span class="k">struct</span> <span class="n">channel_gk20a_job</span> <span class="p">{</span>
<span class="mi">131</span>     <span class="k">struct</span> <span class="n">nvgpu_mapped_buf</span> <span class="o">**</span><span class="n">mapped_buffers</span><span class="p">;</span>
<span class="mi">132</span>     <span class="kt">int</span> <span class="n">num_mapped_buffers</span><span class="p">;</span>
<span class="mi">133</span>     <span class="k">struct</span> <span class="n">gk20a_fence</span> <span class="o">*</span><span class="n">post_fence</span><span class="p">;</span>
<span class="mi">134</span>     <span class="k">struct</span> <span class="n">priv_cmd_entry</span> <span class="o">*</span><span class="n">wait_cmd</span><span class="p">;</span>
<span class="mi">135</span>     <span class="k">struct</span> <span class="n">priv_cmd_entry</span> <span class="o">*</span><span class="n">incr_cmd</span><span class="p">;</span>
<span class="mi">136</span>     <span class="k">struct</span> <span class="n">nvgpu_list_node</span> <span class="n">list</span><span class="p">;</span>
<span class="mi">137</span> <span class="p">};</span>
</code></pre></div></div>

<ul>
  <li>Job and command structure used in kernel-space.</li>
</ul>

<p style="text-align: right;">nvgpu/common/submit.c</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">317</span> <span class="k">static</span> <span class="kt">int</span> <span class="nf">nvgpu_submit_channel_gpfifo</span><span class="p">(</span><span class="k">struct</span> <span class="n">channel_gk20a</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span>
<span class="mi">318</span>                 <span class="k">struct</span> <span class="n">nvgpu_gpfifo_entry</span> <span class="o">*</span><span class="n">gpfifo</span><span class="p">,</span>
<span class="mi">319</span>                 <span class="k">struct</span> <span class="n">nvgpu_gpfifo_userdata</span> <span class="n">userdata</span><span class="p">,</span>
<span class="mi">320</span>                 <span class="n">u32</span> <span class="n">num_entries</span><span class="p">,</span>
<span class="mi">321</span>                 <span class="n">u32</span> <span class="n">flags</span><span class="p">,</span>
<span class="mi">322</span>                 <span class="k">struct</span> <span class="n">nvgpu_channel_fence</span> <span class="o">*</span><span class="n">fence</span><span class="p">,</span>
<span class="mi">323</span>                 <span class="k">struct</span> <span class="n">gk20a_fence</span> <span class="o">**</span><span class="n">fence_out</span><span class="p">,</span>
<span class="mi">324</span>                 <span class="k">struct</span> <span class="n">fifo_profile_gk20a</span> <span class="o">*</span><span class="n">profile</span><span class="p">)</span>
<span class="mi">325</span> <span class="p">{</span>
	<span class="p">...</span>
<span class="mi">537</span>     <span class="k">if</span> <span class="p">(</span><span class="n">wait_cmd</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">538</span>         <span class="n">nvgpu_submit_append_priv_cmdbuf</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">wait_cmd</span><span class="p">);</span>
<span class="mi">539</span>     <span class="p">}</span>
<span class="mi">540</span> 
<span class="mi">541</span>     <span class="n">err</span> <span class="o">=</span> <span class="n">nvgpu_submit_append_gpfifo</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">gpfifo</span><span class="p">,</span> <span class="n">userdata</span><span class="p">,</span>
<span class="mi">542</span>             <span class="n">num_entries</span><span class="p">);</span>
<span class="mi">543</span>     <span class="nf">if</span> <span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">544</span>         <span class="k">goto</span> <span class="n">clean_up_job</span><span class="p">;</span>
<span class="mi">545</span>     <span class="p">}</span>
<span class="mi">546</span> 
<span class="mi">547</span>     <span class="cm">/*
548      * And here's where we add the incr_cmd we generated earlier. It should
549      * always run!
550      */</span>
<span class="mi">551</span>     <span class="nf">if</span> <span class="p">(</span><span class="n">incr_cmd</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">552</span>         <span class="n">nvgpu_submit_append_priv_cmdbuf</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">incr_cmd</span><span class="p">);</span>
<span class="mi">553</span>     <span class="p">}</span>
<span class="mi">554</span> 
<span class="mi">555</span>     <span class="nf">if</span> <span class="p">(</span><span class="n">fence_out</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">556</span>         <span class="o">*</span><span class="n">fence_out</span> <span class="o">=</span> <span class="n">gk20a_fence_get</span><span class="p">(</span><span class="n">post_fence</span><span class="p">);</span>
<span class="mi">557</span>     <span class="p">}</span>
<span class="mi">558</span> 
<span class="mi">559</span>     <span class="nf">if</span> <span class="p">(</span><span class="n">need_job_tracking</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">560</span>         <span class="cm">/* TODO! Check for errors... */</span>
<span class="mi">561</span>         <span class="n">gk20a_channel_add_job</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">job</span><span class="p">,</span> <span class="n">skip_buffer_refcounting</span><span class="p">);</span>
<span class="mi">562</span>     <span class="p">}</span>
<span class="mi">563</span>     <span class="nf">gk20a_fifo_profile_snapshot</span><span class="p">(</span><span class="n">profile</span><span class="p">,</span> <span class="n">PROFILE_APPEND</span><span class="p">);</span>
<span class="mi">565</span>     <span class="n">g</span><span class="o">-&gt;</span><span class="n">ops</span><span class="p">.</span><span class="n">fifo</span><span class="p">.</span><span class="n">userd_gp_put</span><span class="p">(</span><span class="n">g</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
	<span class="p">...</span>
<span class="mi">599</span> <span class="err">}</span>
</code></pre></div></div>

<p><strong>Add commands to the ring buffer</strong></p>

<p>First, the driver appends gpfifo entries into the shared memory (ring buffer). The entries from user-space copied into the ring buffer. Note that wait_cmd and/or incr_cmd will be appended before and after the actual command. When a new command is appended, the driver increments put pointer (by depending on # of entries, whether wait_cmd or incr_cmd is appended).</p>

<ul>
  <li>Copy gpfifo entries from the user-space into gpfifo.mem in the kernel-space (using cpu_va).</li>
</ul>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">203</span> <span class="k">static</span> <span class="kt">int</span> <span class="n">nvgpu_submit_append_gpfifo_user_direct</span><span class="p">(</span><span class="k">struct</span> <span class="n">channel_gk20a</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span>
<span class="mi">204</span>         <span class="k">struct</span> <span class="n">nvgpu_gpfifo_userdata</span> <span class="n">userdata</span><span class="p">,</span>
<span class="mi">205</span>         <span class="n">u32</span> <span class="n">num_entries</span><span class="p">)</span>
<span class="mi">206</span> <span class="p">{</span>
<span class="mi">207</span>     <span class="k">struct</span> <span class="n">gk20a</span> <span class="o">*</span><span class="n">g</span> <span class="o">=</span> <span class="n">c</span><span class="o">-&gt;</span><span class="n">g</span><span class="p">;</span>
<span class="mi">208</span>     <span class="k">struct</span> <span class="n">nvgpu_gpfifo_entry</span> <span class="o">*</span><span class="n">gpfifo_cpu</span> <span class="o">=</span> <span class="n">c</span><span class="o">-&gt;</span><span class="n">gpfifo</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">cpu_va</span><span class="p">;</span>
<span class="mi">209</span>     <span class="n">u32</span> <span class="n">gpfifo_size</span> <span class="o">=</span> <span class="n">c</span><span class="o">-&gt;</span><span class="n">gpfifo</span><span class="p">.</span><span class="n">entry_num</span><span class="p">;</span>
<span class="mi">210</span>     <span class="n">u32</span> <span class="n">len</span> <span class="o">=</span> <span class="n">num_entries</span><span class="p">;</span>
<span class="mi">211</span>     <span class="n">u32</span> <span class="n">start</span> <span class="o">=</span> <span class="n">c</span><span class="o">-&gt;</span><span class="n">gpfifo</span><span class="p">.</span><span class="n">put</span><span class="p">;</span>
<span class="mi">212</span>     <span class="n">u32</span> <span class="n">end</span> <span class="o">=</span> <span class="n">start</span> <span class="o">+</span> <span class="n">len</span><span class="p">;</span> <span class="cm">/* exclusive */</span>
<span class="mi">213</span>     <span class="kt">int</span> <span class="n">err</span><span class="p">;</span>
<span class="mi">214</span> 
<span class="mi">215</span>     <span class="k">if</span> <span class="p">(</span><span class="n">end</span> <span class="o">&gt;</span> <span class="n">gpfifo_size</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">216</span>         <span class="cm">/* wrap-around */</span>
<span class="mi">217</span>         <span class="kt">int</span> <span class="n">length0</span> <span class="o">=</span> <span class="n">gpfifo_size</span> <span class="o">-</span> <span class="n">start</span><span class="p">;</span>
<span class="mi">218</span>         <span class="kt">int</span> <span class="n">length1</span> <span class="o">=</span> <span class="n">len</span> <span class="o">-</span> <span class="n">length0</span><span class="p">;</span>
<span class="mi">219</span> 
<span class="mi">220</span>         <span class="n">err</span> <span class="o">=</span> <span class="n">g</span><span class="o">-&gt;</span><span class="n">os_channel</span><span class="p">.</span><span class="n">copy_user_gpfifo</span><span class="p">(</span>
<span class="mi">221</span>                 <span class="n">gpfifo_cpu</span> <span class="o">+</span> <span class="n">start</span><span class="p">,</span> <span class="n">userdata</span><span class="p">,</span>
<span class="mi">222</span>                 <span class="mi">0</span><span class="p">,</span> <span class="n">length0</span><span class="p">);</span>
<span class="mi">223</span>         <span class="k">if</span> <span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">224</span>             <span class="k">return</span> <span class="n">err</span><span class="p">;</span>
<span class="mi">225</span>         <span class="p">}</span>
<span class="mi">226</span> 
<span class="mi">227</span>         <span class="n">err</span> <span class="o">=</span> <span class="n">g</span><span class="o">-&gt;</span><span class="n">os_channel</span><span class="p">.</span><span class="n">copy_user_gpfifo</span><span class="p">(</span>
<span class="mi">228</span>                 <span class="n">gpfifo_cpu</span><span class="p">,</span> <span class="n">userdata</span><span class="p">,</span>
<span class="mi">229</span>                 <span class="n">length0</span><span class="p">,</span> <span class="n">length1</span><span class="p">);</span>
<span class="mi">230</span>         <span class="k">if</span> <span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">231</span>             <span class="k">return</span> <span class="n">err</span><span class="p">;</span>
<span class="mi">232</span>         <span class="p">}</span>
<span class="mi">233</span>     <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="mi">234</span>         <span class="n">err</span> <span class="o">=</span> <span class="n">g</span><span class="o">-&gt;</span><span class="n">os_channel</span><span class="p">.</span><span class="n">copy_user_gpfifo</span><span class="p">(</span>
<span class="mi">235</span>                 <span class="n">gpfifo_cpu</span> <span class="o">+</span> <span class="n">start</span><span class="p">,</span> <span class="n">userdata</span><span class="p">,</span>
<span class="mi">236</span>                 <span class="mi">0</span><span class="p">,</span> <span class="n">len</span><span class="p">);</span>
<span class="mi">237</span>         <span class="k">if</span> <span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">238</span>             <span class="k">return</span> <span class="n">err</span><span class="p">;</span>
<span class="mi">239</span>         <span class="p">}</span>
<span class="mi">240</span>     <span class="p">}</span>
<span class="mi">241</span> 
<span class="mi">242</span>     <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="mi">243</span> <span class="p">}</span>
</code></pre></div></div>

<ul>
  <li>gpfifo_size is channel’s total fifo size. If the size exceeds the channel’s gpfifo size, it wrap-around</li>
</ul>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">270</span> <span class="cm">/*
271  * Copy source gpfifo entries into the gpfifo ring buffer, potentially
272  * splitting into two memcpys to handle wrap-around.
273  */</span>
<span class="mi">274</span> <span class="k">static</span> <span class="kt">int</span> <span class="n">nvgpu_submit_append_gpfifo</span><span class="p">(</span><span class="k">struct</span> <span class="n">channel_gk20a</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span>
<span class="mi">275</span>         <span class="k">struct</span> <span class="n">nvgpu_gpfifo_entry</span> <span class="o">*</span><span class="n">kern_gpfifo</span><span class="p">,</span>
<span class="mi">276</span>         <span class="k">struct</span> <span class="n">nvgpu_gpfifo_userdata</span> <span class="n">userdata</span><span class="p">,</span>
<span class="mi">277</span>         <span class="n">u32</span> <span class="n">num_entries</span><span class="p">)</span>
<span class="mi">278</span> <span class="p">{</span>
<span class="mi">279</span>     <span class="k">struct</span> <span class="n">gk20a</span> <span class="o">*</span><span class="n">g</span> <span class="o">=</span> <span class="n">c</span><span class="o">-&gt;</span><span class="n">g</span><span class="p">;</span>
<span class="mi">280</span>     <span class="kt">int</span> <span class="n">err</span><span class="p">;</span>
<span class="mi">281</span> 
<span class="mi">282</span>     <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">kern_gpfifo</span> <span class="o">&amp;&amp;</span> <span class="o">!</span><span class="n">c</span><span class="o">-&gt;</span><span class="n">gpfifo</span><span class="p">.</span><span class="n">pipe</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">283</span>         <span class="cm">/*
284          * This path (from userspace to sysmem) is special in order to
285          * avoid two copies unnecessarily (from user to pipe, then from
286          * pipe to gpu sysmem buffer).
287          */</span>
<span class="mi">288</span>         <span class="n">err</span> <span class="o">=</span> <span class="n">nvgpu_submit_append_gpfifo_user_direct</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">userdata</span><span class="p">,</span>
<span class="mi">289</span>                 <span class="n">num_entries</span><span class="p">);</span>
<span class="mi">290</span>         <span class="k">if</span> <span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">291</span>             <span class="k">return</span> <span class="n">err</span><span class="p">;</span>
<span class="mi">292</span>         <span class="p">}</span>
<span class="mi">293</span>     <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">kern_gpfifo</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">294</span>         <span class="cm">/* from userspace to vidmem, use the common path */</span>
<span class="mi">295</span>         <span class="n">err</span> <span class="o">=</span> <span class="n">g</span><span class="o">-&gt;</span><span class="n">os_channel</span><span class="p">.</span><span class="n">copy_user_gpfifo</span><span class="p">(</span><span class="n">c</span><span class="o">-&gt;</span><span class="n">gpfifo</span><span class="p">.</span><span class="n">pipe</span><span class="p">,</span> <span class="n">userdata</span><span class="p">,</span>
<span class="mi">296</span>                 <span class="mi">0</span><span class="p">,</span> <span class="n">num_entries</span><span class="p">);</span>
<span class="mi">297</span>         <span class="k">if</span> <span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">298</span>             <span class="k">return</span> <span class="n">err</span><span class="p">;</span>
<span class="mi">299</span>         <span class="p">}</span>
<span class="mi">300</span> 
<span class="mi">301</span>         <span class="n">nvgpu_submit_append_gpfifo_common</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">c</span><span class="o">-&gt;</span><span class="n">gpfifo</span><span class="p">.</span><span class="n">pipe</span><span class="p">,</span>
<span class="mi">302</span>                 <span class="n">num_entries</span><span class="p">);</span>
<span class="mi">303</span>     <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>                                                                                                                                              
<span class="mi">304</span>         <span class="cm">/* from kernel to either sysmem or vidmem, don't need
305          * copy_user_gpfifo so use the common path */</span>
<span class="mi">306</span>         <span class="n">nvgpu_submit_append_gpfifo_common</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">kern_gpfifo</span><span class="p">,</span> <span class="n">num_entries</span><span class="p">);</span>
<span class="mi">307</span>     <span class="p">}</span>
<span class="mi">308</span> 
<span class="mi">309</span>     <span class="n">trace_write_pushbuffers</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">num_entries</span><span class="p">);</span>
<span class="mi">310</span> 
<span class="mi">311</span>     <span class="n">c</span><span class="o">-&gt;</span><span class="n">gpfifo</span><span class="p">.</span><span class="n">put</span> <span class="o">=</span> <span class="p">(</span><span class="n">c</span><span class="o">-&gt;</span><span class="n">gpfifo</span><span class="p">.</span><span class="n">put</span> <span class="o">+</span> <span class="n">num_entries</span><span class="p">)</span> <span class="o">&amp;</span>
<span class="mi">312</span>         <span class="p">(</span><span class="n">c</span><span class="o">-&gt;</span><span class="n">gpfifo</span><span class="p">.</span><span class="n">entry_num</span> <span class="o">-</span> <span class="mi">1U</span><span class="p">);</span>
<span class="mi">313</span> 
<span class="mi">314</span>     <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="mi">315</span> <span class="p">}</span>
</code></pre></div></div>

<p style="text-align: right;">nvgpu/os/linux-channel.c</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">376</span> <span class="k">static</span> <span class="kt">int</span> <span class="n">nvgpu_channel_copy_user_gpfifo</span><span class="p">(</span><span class="k">struct</span> <span class="n">nvgpu_gpfifo_entry</span> <span class="o">*</span><span class="n">dest</span><span class="p">,</span>
<span class="mi">377</span>         <span class="k">struct</span> <span class="n">nvgpu_gpfifo_userdata</span> <span class="n">userdata</span><span class="p">,</span> <span class="n">u32</span> <span class="n">start</span><span class="p">,</span> <span class="n">u32</span> <span class="n">length</span><span class="p">)</span>
<span class="mi">378</span> <span class="p">{</span>
<span class="mi">379</span>     <span class="k">struct</span> <span class="n">nvgpu_gpfifo_entry</span> <span class="n">__user</span> <span class="o">*</span><span class="n">user_gpfifo</span> <span class="o">=</span> <span class="n">userdata</span><span class="p">.</span><span class="n">entries</span><span class="p">;</span>
<span class="mi">380</span>     <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">n</span><span class="p">;</span>
<span class="mi">381</span> 
<span class="mi">382</span>     <span class="n">n</span> <span class="o">=</span> <span class="n">copy_from_user</span><span class="p">(</span><span class="n">dest</span><span class="p">,</span> <span class="n">user_gpfifo</span> <span class="o">+</span> <span class="n">start</span><span class="p">,</span>
<span class="mi">383</span>             <span class="n">length</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">nvgpu_gpfifo_entry</span><span class="p">));</span>
<span class="mi">384</span> 
<span class="mi">385</span>     <span class="k">return</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">?</span> <span class="mi">0</span> <span class="o">:</span> <span class="o">-</span><span class="n">EFAULT</span><span class="p">;</span>
<span class="mi">386</span> <span class="p">}</span>

</code></pre></div></div>

<p style="text-align: right;">nvgpu/gpu/nvgpu/gk20a/fifo_gk20a.c</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">4416</span> <span class="kt">void</span> <span class="nf">gk20a_fifo_userd_gp_put</span><span class="p">(</span><span class="k">struct</span> <span class="n">gk20a</span> <span class="o">*</span><span class="n">g</span><span class="p">,</span> <span class="k">struct</span> <span class="n">channel_gk20a</span> <span class="o">*</span><span class="n">c</span><span class="p">)</span>
<span class="mi">4417</span> <span class="p">{</span>
<span class="mi">4418</span>     <span class="n">gk20a_bar1_writel</span><span class="p">(</span><span class="n">g</span><span class="p">,</span>
<span class="mi">4419</span>         <span class="n">c</span><span class="o">-&gt;</span><span class="n">userd_gpu_va</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">u32</span><span class="p">)</span> <span class="o">*</span> <span class="n">ram_userd_gp_put_w</span><span class="p">(),</span>
<span class="mi">4420</span>         <span class="n">c</span><span class="o">-&gt;</span><span class="n">gpfifo</span><span class="p">.</span><span class="n">put</span><span class="p">);</span>
<span class="mi">4421</span> <span class="err">}</span>    
</code></pre></div></div>

<ul>
  <li>Finally, g-&gt;ops.fifo.userd_gp_put(g, c) used to update put pointer from GPU side.</li>
</ul>

<h1 id="2-interrupt">2. Interrupt</h1>

<p style="text-align: right;">nvidia/drivers/video/tegra/host/host1x_instr.c</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">373</span> <span class="k">static</span> <span class="kt">int</span> <span class="nf">t20_intr_init</span><span class="p">(</span><span class="k">struct</span> <span class="n">nvhost_intr</span> <span class="o">*</span><span class="n">intr</span><span class="p">)</span>
<span class="mi">374</span> <span class="p">{</span>
<span class="mi">375</span>     <span class="k">struct</span> <span class="n">nvhost_master</span> <span class="o">*</span><span class="n">dev</span> <span class="o">=</span> <span class="n">intr_to_dev</span><span class="p">(</span><span class="n">intr</span><span class="p">);</span>
<span class="mi">376</span>     <span class="kt">int</span> <span class="n">err</span><span class="p">;</span>
<span class="mi">377</span> 
<span class="mi">378</span>     <span class="nf">intr_op</span><span class="p">().</span><span class="n">disable_all_syncpt_intrs</span><span class="p">(</span><span class="n">intr</span><span class="p">);</span>
<span class="mi">379</span> 
<span class="mi">380</span>     <span class="n">err</span> <span class="o">=</span> <span class="n">request_threaded_irq</span><span class="p">(</span><span class="n">intr</span><span class="o">-&gt;</span><span class="n">syncpt_irq</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span>
<span class="mi">381</span>                 <span class="n">syncpt_thresh_cascade_isr</span><span class="p">,</span>
<span class="mi">382</span>                 <span class="n">IRQF_ONESHOT</span><span class="p">,</span> <span class="s">"host_syncpt"</span><span class="p">,</span> <span class="n">dev</span><span class="p">);</span>
<span class="mi">383</span>     <span class="k">if</span> <span class="p">(</span><span class="n">err</span><span class="p">)</span>
<span class="mi">384</span>         <span class="k">return</span> <span class="n">err</span><span class="p">;</span>
<span class="mi">385</span> 
<span class="mi">386</span>     <span class="cm">/* master disable for general (not syncpt) host interrupts */</span>
<span class="mi">387</span>     <span class="nf">host1x_sync_writel</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">host1x_sync_intmask_r</span><span class="p">(),</span> <span class="mi">0</span><span class="p">);</span>
<span class="mi">388</span> 
<span class="mi">389</span>     <span class="cm">/* clear status &amp; extstatus */</span>
<span class="mi">390</span>     <span class="nf">host1x_sync_writel</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">host1x_sync_hintstatus_ext_r</span><span class="p">(),</span>
<span class="mi">391</span>             <span class="mh">0xfffffffful</span><span class="p">);</span>
<span class="mi">392</span>     <span class="nf">host1x_sync_writel</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">host1x_sync_hintstatus_r</span><span class="p">(),</span>
<span class="mi">393</span>             <span class="mh">0xfffffffful</span><span class="p">);</span>
<span class="mi">394</span> 
<span class="mi">395</span>     <span class="n">err</span> <span class="o">=</span> <span class="n">request_threaded_irq</span><span class="p">(</span><span class="n">intr</span><span class="o">-&gt;</span><span class="n">general_irq</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span>
<span class="mi">396</span>                 <span class="n">t20_intr_host1x_isr</span><span class="p">,</span>
<span class="mi">397</span>                 <span class="n">IRQF_ONESHOT</span><span class="p">,</span> <span class="s">"host_status"</span><span class="p">,</span> <span class="n">intr</span><span class="p">);</span>
<span class="mi">398</span>     <span class="nf">if</span> <span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">399</span>         <span class="n">free_irq</span><span class="p">(</span><span class="n">intr</span><span class="o">-&gt;</span><span class="n">syncpt_irq</span><span class="p">,</span> <span class="n">dev</span><span class="p">);</span>
<span class="mi">400</span>         <span class="k">return</span> <span class="n">err</span><span class="p">;</span>
<span class="mi">401</span>     <span class="p">}</span>
<span class="mi">402</span> 
<span class="mi">403</span>     <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="mi">404</span> <span class="err">}</span>
</code></pre></div></div>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="mi">39</span> <span class="k">static</span> <span class="n">irqreturn_t</span> <span class="n">syncpt_thresh_cascade_isr</span><span class="p">(</span><span class="kt">int</span> <span class="n">irq</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">dev_id</span><span class="p">)</span>
 <span class="mi">40</span> <span class="p">{</span>
 <span class="mi">41</span>     <span class="k">struct</span> <span class="n">nvhost_master</span> <span class="o">*</span><span class="n">dev</span> <span class="o">=</span> <span class="n">dev_id</span><span class="p">;</span>
 <span class="mi">42</span>     <span class="k">struct</span> <span class="n">nvhost_intr</span> <span class="o">*</span><span class="n">intr</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">intr</span><span class="p">;</span>
 <span class="mi">43</span>     <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">reg</span><span class="p">;</span>
 <span class="mi">44</span>     <span class="kt">int</span> <span class="n">i</span><span class="p">,</span> <span class="n">id</span><span class="p">;</span>
 <span class="mi">45</span>     <span class="k">struct</span> <span class="n">nvhost_timespec</span> <span class="n">isr_recv</span><span class="p">;</span>
 <span class="mi">46</span> 
 <span class="mi">47</span>     <span class="n">nvhost_ktime_get_ts</span><span class="p">(</span><span class="o">&amp;</span><span class="n">isr_recv</span><span class="p">);</span>
 <span class="mi">48</span> 
 <span class="mi">49</span>     <span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">DIV_ROUND_UP</span><span class="p">(</span><span class="n">nvhost_syncpt_nb_hw_pts</span><span class="p">(</span><span class="o">&amp;</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">syncpt</span><span class="p">),</span> <span class="mi">32</span><span class="p">);</span>
 <span class="mi">50</span>             <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
 <span class="mi">51</span>         <span class="n">reg</span> <span class="o">=</span> <span class="n">host1x_sync_readl</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span>
 <span class="mi">52</span>                 <span class="n">host1x_sync_syncpt_thresh_cpu0_int_status_r</span><span class="p">()</span> <span class="o">+</span>
 <span class="mi">53</span>                 <span class="n">i</span> <span class="o">*</span> <span class="n">REGISTER_STRIDE</span><span class="p">);</span>
 <span class="mi">54</span> 
 <span class="mi">55</span>         <span class="n">for_each_set_bit</span><span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">reg</span><span class="p">,</span> <span class="mi">32</span><span class="p">)</span> <span class="p">{</span>
 <span class="mi">56</span>             <span class="k">struct</span> <span class="n">nvhost_intr_syncpt</span> <span class="o">*</span><span class="n">sp</span><span class="p">;</span>
 <span class="mi">57</span>             <span class="kt">int</span> <span class="n">sp_id</span> <span class="o">=</span> <span class="n">i</span> <span class="o">*</span> <span class="mi">32</span> <span class="o">+</span> <span class="n">id</span><span class="p">;</span>
 <span class="mi">58</span>             <span class="kt">int</span> <span class="n">graphics_host_sp</span> <span class="o">=</span>
 <span class="mi">59</span>                 <span class="n">nvhost_syncpt_graphics_host_sp</span><span class="p">(</span><span class="o">&amp;</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">syncpt</span><span class="p">);</span>
 <span class="mi">60</span> 
 <span class="mi">61</span>             <span class="k">if</span> <span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="o">!</span><span class="n">nvhost_syncpt_is_valid_hw_pt</span><span class="p">(</span><span class="o">&amp;</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">syncpt</span><span class="p">,</span>
 <span class="mi">62</span>                     <span class="n">sp_id</span><span class="p">)))</span> <span class="p">{</span>
 <span class="mi">63</span>                 <span class="n">dev_err</span><span class="p">(</span><span class="o">&amp;</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">dev</span><span class="p">,</span> <span class="s">"%s(): syncpoint id %d is beyond the number of syncpoints (%d)</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
 <span class="mi">64</span>                     <span class="n">__func__</span><span class="p">,</span> <span class="n">sp_id</span><span class="p">,</span>
 <span class="mi">65</span>                     <span class="n">nvhost_syncpt_nb_hw_pts</span><span class="p">(</span><span class="o">&amp;</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">syncpt</span><span class="p">));</span>
 <span class="mi">66</span>                 <span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
 <span class="mi">67</span>             <span class="p">}</span>
 <span class="mi">68</span> 
 <span class="mi">69</span>             <span class="n">sp</span> <span class="o">=</span> <span class="n">intr</span><span class="o">-&gt;</span><span class="n">syncpt</span> <span class="o">+</span> <span class="n">sp_id</span><span class="p">;</span>
 <span class="mi">70</span>             <span class="n">sp</span><span class="o">-&gt;</span><span class="n">isr_recv</span> <span class="o">=</span> <span class="n">isr_recv</span><span class="p">;</span>
 <span class="mi">71</span> 
 <span class="mi">72</span>             <span class="cm">/* handle graphics host syncpoint increments                                                                                                  
 73              * immediately
 74              */</span>
 <span class="mi">75</span>             <span class="k">if</span> <span class="p">(</span><span class="n">sp_id</span> <span class="o">==</span> <span class="n">graphics_host_sp</span><span class="p">)</span> <span class="p">{</span>
 <span class="mi">76</span>                 <span class="n">dev_warn</span><span class="p">(</span><span class="o">&amp;</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">dev</span><span class="p">,</span> <span class="s">"%s(): syncpoint id %d incremented</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
 <span class="mi">77</span>                      <span class="n">__func__</span><span class="p">,</span> <span class="n">graphics_host_sp</span><span class="p">);</span>
 <span class="mi">78</span>                 <span class="n">nvhost_syncpt_patch_check</span><span class="p">(</span><span class="o">&amp;</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">syncpt</span><span class="p">);</span>
 <span class="mi">79</span>                 <span class="n">t20_intr_syncpt_intr_ack</span><span class="p">(</span><span class="n">sp</span><span class="p">,</span> <span class="nb">false</span><span class="p">);</span>
 <span class="mi">80</span>             <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
 <span class="mi">81</span>                 <span class="n">t20_intr_syncpt_intr_ack</span><span class="p">(</span><span class="n">sp</span><span class="p">,</span> <span class="nb">true</span><span class="p">);</span>
 <span class="mi">82</span>                 <span class="n">nvhost_syncpt_thresh_fn</span><span class="p">(</span><span class="n">sp</span><span class="p">);</span>
 <span class="mi">83</span>             <span class="p">}</span>
 <span class="mi">84</span>         <span class="p">}</span>
 <span class="mi">85</span>     <span class="p">}</span>
 <span class="mi">86</span> 
 <span class="mi">87</span> <span class="n">out</span><span class="o">:</span>
 <span class="mi">88</span>     <span class="k">return</span> <span class="n">IRQ_HANDLED</span><span class="p">;</span>
 <span class="mi">89</span> <span class="p">}</span>

</code></pre></div></div>

<p><strong>Interrupt registration.</strong></p>

<ul>
  <li>Register interrupt in here. In general computation, the isr invoke nvhost_syncpt_thresh_fn() which handles syncpt.</li>
  <li>The code above represents the ISR and the handling is cascaded.</li>
</ul>

<p style="text-align: right;">nvidia/drivers/video/tegra/host/nvhost_intr.c</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="cm">/*** host syncpt interrupt service functions ***/</span>
<span class="mi">351</span> <span class="kt">void</span> <span class="n">nvhost_syncpt_thresh_fn</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">dev_id</span><span class="p">)</span>
<span class="mi">352</span> <span class="p">{</span>    
<span class="mi">353</span>     <span class="k">struct</span> <span class="n">nvhost_intr_syncpt</span> <span class="o">*</span><span class="n">syncpt</span> <span class="o">=</span> <span class="n">dev_id</span><span class="p">;</span>
<span class="mi">354</span>     <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">id</span> <span class="o">=</span> <span class="n">syncpt</span><span class="o">-&gt;</span><span class="n">id</span><span class="p">;</span>
<span class="mi">355</span>     <span class="k">struct</span> <span class="n">nvhost_intr</span> <span class="o">*</span><span class="n">intr</span> <span class="o">=</span> <span class="n">intr_syncpt_to_intr</span><span class="p">(</span><span class="n">syncpt</span><span class="p">);</span>
<span class="mi">356</span>     <span class="k">struct</span> <span class="n">nvhost_master</span> <span class="o">*</span><span class="n">dev</span> <span class="o">=</span> <span class="n">intr_to_dev</span><span class="p">(</span><span class="n">intr</span><span class="p">);</span>
<span class="mi">357</span>     <span class="kt">int</span> <span class="n">err</span><span class="p">;</span>                                                                                                                                              
<span class="mi">358</span> 
<span class="mi">359</span>     <span class="cm">/* make sure host1x is powered */</span>
<span class="mi">360</span>     <span class="n">err</span> <span class="o">=</span> <span class="n">nvhost_module_busy</span><span class="p">(</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">dev</span><span class="p">);</span>
<span class="mi">361</span>     <span class="k">if</span> <span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">362</span>         <span class="n">WARN</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s">"failed to powerON host1x."</span><span class="p">);</span>
<span class="mi">363</span>         <span class="k">return</span><span class="p">;</span>
<span class="mi">364</span>     <span class="p">}</span>
<span class="mi">365</span> 
<span class="mi">366</span>     <span class="k">if</span> <span class="p">(</span><span class="n">nvhost_dev_is_virtual</span><span class="p">(</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">dev</span><span class="p">))</span>
<span class="mi">367</span>         <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">process_wait_list</span><span class="p">(</span><span class="n">intr</span><span class="p">,</span> <span class="n">syncpt</span><span class="p">,</span>
<span class="mi">368</span>                 <span class="n">nvhost_syncpt_read_min</span><span class="p">(</span><span class="o">&amp;</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">syncpt</span><span class="p">,</span> <span class="n">id</span><span class="p">));</span>
<span class="mi">369</span>     <span class="k">else</span>
<span class="mi">370</span>         <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">process_wait_list</span><span class="p">(</span><span class="n">intr</span><span class="p">,</span> <span class="n">syncpt</span><span class="p">,</span>
<span class="mi">371</span>                 <span class="n">nvhost_syncpt_update_min</span><span class="p">(</span><span class="o">&amp;</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">syncpt</span><span class="p">,</span> <span class="n">id</span><span class="p">));</span>
<span class="mi">372</span> 
<span class="mi">373</span>     <span class="n">nvhost_module_idle</span><span class="p">(</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">dev</span><span class="p">);</span>
<span class="mi">374</span> <span class="p">}</span>                 
</code></pre></div></div>

<ul>
  <li><strong>process_wait_list()</strong> eventually invokes callback, the registered work (e.g. channel update())  that is registered when the gpfifo is submitted as shown below.</li>
</ul>

<p style="text-align: right;">nvgpu/common/sync/channel_sync.c</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">163</span>     <span class="nf">if</span> <span class="p">(</span><span class="n">register_irq</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">164</span>         <span class="k">struct</span> <span class="n">channel_gk20a</span> <span class="o">*</span><span class="n">referenced</span> <span class="o">=</span> <span class="n">gk20a_channel_get</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="mi">165</span> 
<span class="mi">166</span>         <span class="n">WARN_ON</span><span class="p">(</span><span class="o">!</span><span class="n">referenced</span><span class="p">);</span>
<span class="mi">167</span> 
<span class="mi">168</span>         <span class="k">if</span> <span class="p">(</span><span class="n">referenced</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">169</span>             <span class="cm">/* note: channel_put() is in
170              * channel_sync_syncpt_update() */</span>
<span class="mi">171</span> 
<span class="mi">172</span>             <span class="n">err</span> <span class="o">=</span> <span class="n">nvgpu_nvhost_intr_register_notifier</span><span class="p">(</span>
<span class="mi">173</span>                 <span class="n">sp</span><span class="o">-&gt;</span><span class="n">nvhost_dev</span><span class="p">,</span>                                 
<span class="mi">174</span>                 <span class="n">sp</span><span class="o">-&gt;</span><span class="n">id</span><span class="p">,</span> <span class="n">thresh</span><span class="p">,</span>
<span class="mi">175</span>                 <span class="n">channel_sync_syncpt_update</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
<span class="mi">176</span>             <span class="k">if</span> <span class="p">(</span><span class="n">err</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">177</span>                 <span class="n">gk20a_channel_put</span><span class="p">(</span><span class="n">referenced</span><span class="p">);</span>
<span class="mi">178</span>             <span class="p">}</span>
<span class="mi">179</span> 
<span class="mi">180</span>             <span class="cm">/* Adding interrupt action should
181              * never fail. A proper error handling
182              * here would require us to decrement
183              * the syncpt max back to its original
184              * value. */</span>
<span class="mi">185</span>             <span class="n">WARN</span><span class="p">(</span><span class="n">err</span><span class="p">,</span>
<span class="mi">186</span>                  <span class="s">"failed to set submit complete interrupt"</span><span class="p">);</span>
<span class="mi">187</span>         <span class="p">}</span>
<span class="mi">188</span>     <span class="p">}</span>
</code></pre></div></div>

<h1 id="3-syncpt-sync-point">3. SYNCPT (Sync Point)</h1>

<p style="text-align: right;">nvidia/drivers/video/tegra/host/nvhost_syncpt.c</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="mi">113</span> <span class="cm">/**
 114  * Updates the last value read from hardware.
 115  */</span>
 <span class="mi">116</span> <span class="n">u32</span> <span class="n">nvhost_syncpt_update_min</span><span class="p">(</span><span class="k">struct</span> <span class="n">nvhost_syncpt</span> <span class="o">*</span><span class="n">sp</span><span class="p">,</span> <span class="n">u32</span> <span class="n">id</span><span class="p">)</span>
 <span class="mi">117</span> <span class="p">{</span>
 <span class="mi">118</span>     <span class="n">u32</span> <span class="n">val</span><span class="p">;</span>
 <span class="mi">119</span> 
 <span class="mi">120</span>     <span class="n">val</span> <span class="o">=</span> <span class="n">syncpt_op</span><span class="p">().</span><span class="n">update_min</span><span class="p">(</span><span class="n">sp</span><span class="p">,</span> <span class="n">id</span><span class="p">);</span>
 <span class="mi">121</span>     <span class="n">trace_nvhost_syncpt_update_min</span><span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">val</span><span class="p">);</span>
 <span class="mi">122</span> 
 <span class="mi">123</span>     <span class="k">return</span> <span class="n">val</span><span class="p">;</span>
 <span class="mi">124</span> <span class="p">}</span>
</code></pre></div></div>

<p style="text-align: right;">nvidia/drivers/video/tegra/host/host1x/host1x_syncpt.c</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">39</span> <span class="cm">/**
 40  * Updates the last value read from hardware.
 41  * (was nvhost_syncpt_update_min)
 42  */</span>
 <span class="mi">43</span> <span class="k">static</span> <span class="n">u32</span> <span class="nf">t20_syncpt_update_min</span><span class="p">(</span><span class="k">struct</span> <span class="n">nvhost_syncpt</span> <span class="o">*</span><span class="n">sp</span><span class="p">,</span> <span class="n">u32</span> <span class="n">id</span><span class="p">)</span>
 <span class="mi">44</span> <span class="p">{</span>
 <span class="mi">45</span>     <span class="k">struct</span> <span class="n">nvhost_master</span> <span class="o">*</span><span class="n">dev</span> <span class="o">=</span> <span class="n">syncpt_to_dev</span><span class="p">(</span><span class="n">sp</span><span class="p">);</span>
 <span class="mi">46</span>     <span class="n">u32</span> <span class="n">old</span><span class="p">,</span> <span class="n">live</span><span class="p">;</span>
 <span class="mi">47</span> 
 <span class="mi">48</span>     <span class="k">do</span> <span class="p">{</span>
 <span class="mi">49</span>         <span class="n">old</span> <span class="o">=</span> <span class="n">nvhost_syncpt_read_min</span><span class="p">(</span><span class="n">sp</span><span class="p">,</span> <span class="n">id</span><span class="p">);</span>
 <span class="mi">50</span>         <span class="n">live</span> <span class="o">=</span> <span class="n">host1x_sync_readl</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span>
 <span class="mi">51</span>                 <span class="p">(</span><span class="n">host1x_sync_syncpt_0_r</span><span class="p">()</span> <span class="o">+</span> <span class="n">id</span> <span class="o">*</span> <span class="mi">4</span><span class="p">));</span>
 <span class="mi">52</span>     <span class="p">}</span> <span class="k">while</span> <span class="p">((</span><span class="n">u32</span><span class="p">)</span><span class="n">atomic_cmpxchg</span><span class="p">(</span><span class="o">&amp;</span><span class="n">sp</span><span class="o">-&gt;</span><span class="n">min_val</span><span class="p">[</span><span class="n">id</span><span class="p">],</span> <span class="n">old</span><span class="p">,</span> <span class="n">live</span><span class="p">)</span> <span class="o">!=</span> <span class="n">old</span><span class="p">);</span>
 <span class="mi">53</span> 
 <span class="mi">54</span>     <span class="k">return</span> <span class="n">live</span><span class="p">;</span>
 <span class="mi">55</span> <span class="err">}</span>
</code></pre></div></div>

<p><strong>nvhost_syncpt_thresh_fn()</strong> updates syncpt value (threshold for sync) by reading min value (I guess last get pointer from GPU side) from GPU.</p>

<p style="text-align: right;">nvidia/drivers/video/tegra/host/nvhost_intr.c</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">260</span> <span class="cm">/**
261  * Remove &amp; handle all waiters that have completed for the given syncpt
262  */</span>
<span class="mi">263</span> <span class="k">static</span> <span class="kt">int</span> <span class="n">process_wait_list</span><span class="p">(</span><span class="k">struct</span> <span class="n">nvhost_intr</span> <span class="o">*</span><span class="n">intr</span><span class="p">,</span>
<span class="mi">264</span>                  <span class="k">struct</span> <span class="n">nvhost_intr_syncpt</span> <span class="o">*</span><span class="n">syncpt</span><span class="p">,</span>
<span class="mi">265</span>                  <span class="n">u32</span> <span class="n">threshold</span><span class="p">)</span>
<span class="mi">266</span> <span class="p">{</span>
<span class="mi">267</span>     <span class="k">struct</span> <span class="n">list_head</span> <span class="o">*</span><span class="n">completed</span><span class="p">[</span><span class="n">NVHOST_INTR_ACTION_COUNT</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="nb">NULL</span><span class="p">};</span>
<span class="mi">268</span>     <span class="k">struct</span> <span class="n">list_head</span> <span class="n">high_prio_handlers</span><span class="p">[</span><span class="n">NVHOST_INTR_HIGH_PRIO_COUNT</span><span class="p">];</span>
<span class="mi">269</span>     <span class="n">bool</span> <span class="n">run_low_prio_work</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="mi">270</span>     <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">;</span>
<span class="mi">271</span>     <span class="kt">int</span> <span class="n">empty</span><span class="p">;</span>
<span class="mi">272</span> 
<span class="mi">273</span>     <span class="cm">/* take lock on waiter list */</span>
<span class="mi">274</span>     <span class="n">spin_lock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">syncpt</span><span class="o">-&gt;</span><span class="n">lock</span><span class="p">);</span>
<span class="mi">275</span> 
<span class="mi">276</span>     <span class="cm">/* keep high priority workers in local list */</span>
<span class="mi">277</span>     <span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">NVHOST_INTR_HIGH_PRIO_COUNT</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">278</span>         <span class="n">INIT_LIST_HEAD</span><span class="p">(</span><span class="n">high_prio_handlers</span> <span class="o">+</span> <span class="n">i</span><span class="p">);</span>
<span class="mi">279</span>         <span class="n">completed</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">high_prio_handlers</span> <span class="o">+</span> <span class="n">i</span><span class="p">;</span>
<span class="mi">280</span>     <span class="p">}</span>
<span class="mi">281</span> 
<span class="mi">282</span>     <span class="cm">/* .. and low priority workers in global list */</span>
<span class="mi">283</span>     <span class="k">for</span> <span class="p">(</span><span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">NVHOST_INTR_ACTION_COUNT</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">,</span> <span class="o">++</span><span class="n">j</span><span class="p">)</span>
<span class="mi">284</span>         <span class="n">completed</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">syncpt</span><span class="o">-&gt;</span><span class="n">low_prio_handlers</span> <span class="o">+</span> <span class="n">j</span><span class="p">;</span>
<span class="mi">285</span> 
<span class="mi">286</span>     <span class="cm">/* this functions fills completed data */</span>
<span class="mi">287</span>     <span class="n">remove_completed_waiters</span><span class="p">(</span><span class="o">&amp;</span><span class="n">syncpt</span><span class="o">-&gt;</span><span class="n">wait_head</span><span class="p">,</span> <span class="n">threshold</span><span class="p">,</span>
<span class="mi">288</span>         <span class="n">syncpt</span><span class="o">-&gt;</span><span class="n">isr_recv</span><span class="p">,</span> <span class="n">completed</span><span class="p">);</span>
<span class="mi">289</span> 
<span class="mi">290</span>     <span class="cm">/* check if there are still waiters left */</span>
<span class="mi">291</span>     <span class="n">empty</span> <span class="o">=</span> <span class="n">list_empty</span><span class="p">(</span><span class="o">&amp;</span><span class="n">syncpt</span><span class="o">-&gt;</span><span class="n">wait_head</span><span class="p">);</span>
<span class="mi">292</span> 
<span class="mi">293</span>     <span class="cm">/* if not, disable interrupt. If yes, update the inetrrupt */</span>
<span class="mi">294</span>     <span class="k">if</span> <span class="p">(</span><span class="n">empty</span><span class="p">)</span>
<span class="mi">295</span>         <span class="n">intr_op</span><span class="p">().</span><span class="n">disable_syncpt_intr</span><span class="p">(</span><span class="n">intr</span><span class="p">,</span> <span class="n">syncpt</span><span class="o">-&gt;</span><span class="n">id</span><span class="p">);</span>
<span class="mi">296</span>     <span class="k">else</span>
<span class="mi">297</span>         <span class="n">reset_threshold_interrupt</span><span class="p">(</span><span class="n">intr</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">syncpt</span><span class="o">-&gt;</span><span class="n">wait_head</span><span class="p">,</span>
<span class="mi">298</span>                       <span class="n">syncpt</span><span class="o">-&gt;</span><span class="n">id</span><span class="p">);</span>
<span class="mi">299</span> 
<span class="mi">300</span>     <span class="cm">/* remove low priority handlers from this list */</span>
<span class="mi">301</span>     <span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="n">NVHOST_INTR_HIGH_PRIO_COUNT</span><span class="p">;</span>
<span class="mi">302</span>          <span class="n">i</span> <span class="o">&lt;</span> <span class="n">NVHOST_INTR_ACTION_COUNT</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="mi">303</span>         <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">list_empty</span><span class="p">(</span><span class="n">completed</span><span class="p">[</span><span class="n">i</span><span class="p">]))</span>
<span class="mi">304</span>             <span class="n">run_low_prio_work</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="mi">305</span>         <span class="n">completed</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="mi">306</span>     <span class="p">}</span>
<span class="mi">307</span> 
<span class="mi">308</span>     <span class="cm">/* release waiter lock */</span>
<span class="mi">309</span>     <span class="n">spin_unlock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">syncpt</span><span class="o">-&gt;</span><span class="n">lock</span><span class="p">);</span>
<span class="mi">310</span> 
<span class="mi">311</span>     <span class="n">run_handlers</span><span class="p">(</span><span class="n">completed</span><span class="p">);</span>
<span class="mi">312</span> 
<span class="mi">313</span>     <span class="cm">/* schedule a separate task to handle low priority handlers */</span>
<span class="mi">314</span>     <span class="k">if</span> <span class="p">(</span><span class="n">run_low_prio_work</span><span class="p">)</span>
<span class="mi">315</span>         <span class="n">queue_work</span><span class="p">(</span><span class="n">intr</span><span class="o">-&gt;</span><span class="n">low_prio_wq</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">syncpt</span><span class="o">-&gt;</span><span class="n">low_prio_work</span><span class="p">);</span>
<span class="mi">316</span> 
<span class="mi">317</span>     <span class="k">return</span> <span class="n">empty</span><span class="p">;</span>
<span class="mi">318</span> <span class="p">}</span>

</code></pre></div></div>

<p><strong>process_wait_list()</strong> picks waiters (for completion) of which value is smaller than the threshold (read from GPU as we discussed above) and runs the corresponding handlers.</p>]]></content><author><name>Heejin Park</name></author><category term="GPU" /><category term="Jetson Nano" /><summary type="html"><![CDATA[Analyze Nvidia Jetson Nano device driver code to understand how job is submitted and interacted with IRQ.]]></summary></entry></feed>