bojle
diff --git a/‎docs/_images/loom.png
366 KB b/‎docs/_images/loom.png
366 KB
diff --git a/‎docs/_sources/blog/index.rst.txt
Lines changed: 1 addition & 0 deletions b/‎docs/_sources/blog/index.rst.txt
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/_sources/blog/why_even_bother_with_fpgas.rst.txt
Lines changed: 193 additions & 0 deletions b/‎docs/_sources/blog/why_even_bother_with_fpgas.rst.txt
Lines changed: 193 additions & 0 deletions
diff --git a/‎docs/_sources/index.rst.txt
Lines changed: 2 additions & 2 deletions b/‎docs/_sources/index.rst.txt
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/_sources/links.rst.txt
Lines changed: 12 additions & 0 deletions b/‎docs/_sources/links.rst.txt
Lines changed: 12 additions & 0 deletions
diff --git a/‎docs/_static/2024-11-17_17-53.png
1.16 MB b/‎docs/_static/2024-11-17_17-53.png
1.16 MB
diff --git a/‎docs/_static/css/custom.css
Lines changed: 10 additions & 2 deletions b/‎docs/_static/css/custom.css
Lines changed: 10 additions & 2 deletions
diff --git a/‎docs/_static/hetnn.png
717 KB b/‎docs/_static/hetnn.png
717 KB
diff --git a/‎docs/_static/loom.png
366 KB b/‎docs/_static/loom.png
366 KB
diff --git a/‎docs/blog/boost_graphs_remove_vertex.html
Lines changed: 3 additions & 3 deletions b/‎docs/blog/boost_graphs_remove_vertex.html
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/blog/ghidra_decompiler_cli_guide.html
Lines changed: 1 addition & 1 deletion b/‎docs/blog/ghidra_decompiler_cli_guide.html
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/blog/index.html
Lines changed: 4 additions & 3 deletions b/‎docs/blog/index.html
Lines changed: 4 additions & 3 deletions
diff --git a/‎docs/blog/indian_coffee.html
Lines changed: 1 addition & 1 deletion b/‎docs/blog/indian_coffee.html
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/blog/kv260.html
Lines changed: 1 addition & 1 deletion b/‎docs/blog/kv260.html
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/blog/lament.html
Lines changed: 1 addition & 1 deletion b/‎docs/blog/lament.html
Lines changed: 1 addition & 1 deletion
@@ -4,6 +4,7 @@ Blog
 .. toctree::
    :titlesonly:
 
+   Why Even Bother With FPGAs? <why_even_bother_with_fpgas>
    No-ISA is the Best ISA <no_isa_is_the_best_isa>
    When Reverse Engineering, Your Pattern Seeking Brain Is Your Friend <pattern_seeking_brain>
    Ghidra Decompiler - Standalone CLI Guide <ghidra_decompiler_cli_guide>
 
@@ -0,0 +1,193 @@
+Why Even Bother With FPGAs?
+###########################
+
+.. TODO
+  - why write this post (fpgas have skeptics)
+  - central theme (nano second inference)
+  - conclusion
+
+FPGAs being alternative processors enjoy a fair bit of skepticism, especially
+from people higher up in the pyramid of computer abstractions (Software
+Engineers and the like). This post is my attempt at trying to persuade the
+skeptics by way of an instance where FPGAs blow every other kind of processor
+out of the water.
+
+**TLDR**; FPGAs can allow full DNN inference at nanosecond latency only limited
+by the time it takes for electrons to move across a circuit. In comparison,
+CPU/GPUs may only be able to run a couple instructions in nanosecond timeframe,
+entire inference will require many million/billion of these instructions.
+
+FPGAs for the Unenlightened
+---------------------------
+
+FPGAs are circuit emulators. Digital Circuits consists of logic gates and
+connections between them, FPGAs emulate logic gates and their connections.
+
+Logic gates can be represented by their `Truth Table
+<https://en.wikipedia.org/wiki/Truth_table>`_. Truth tables are a form of hash
+table where the key is a tuple of binary values corresponding to each input and
+output is a single bit representing the output of the gate.  One kind of FPGA
+(SRAM-based), emulate logic gates by storing truth tables in memory. 
+
+Connections are emulated via Programmable Interconnects. Think of a network
+switch, programmable interconnects are pretty much like the same except on a
+very low-level. `This document
+<https://cse.usf.edu/~haozheng/teach/cda4253/doc/fpga-arch-overview.pdf>`_
+explains in detail the different VLSI architectures present in modern FPGAs.
+
+A programmer usually does not describe circuits in the form of logic gates, they
+use abstractions in the form of HDLs to behaviorally describe operations that a
+circuit must perform. A compiler converts/maps HDL programs to FPGA primitives.
+
+As it should be obvious by now, FPGAs are unlike processors. They do not have
+any "Instruction Set Architecture". If there is a need, the programmer must
+design and implement an ISA [#fpga_arch]_. FPGAs require thinking of problems as
+circuits with inputs and outputs.
+
+The Central Argument for FPGAs
+------------------------------
+
+Now, let's build the argument. 
+
+Deep Neural Networks (DNN) inference on demands a lot of compute and is a pretty
+challenging problem. Solutions to this problem manifests in the form of ASIC
+accelerators and GPUs. More performance can always be brought by scaling said
+processors but of-course there is a limit to how far one can scale. For example,
+on the `NVIDIA Jetson Nano <https://developer.nvidia.com/embedded/jetson-nano>`_
+the time taken to infer a single image for the CNN model ResNet50 is ~72ms. What
+if we needed something much faster, say the same inference in integral
+nanoseconds? GPUs/ASICs would only be able to execute a couple instructions in
+that timeframe let alone complete the inference. Certainly they won't suffice.
+
+This requirement is not made up. Nanosecond DNN inference is a real problem
+faced by a team at CERN working on the Large Hadron Collider. 
+
+Here's a little description of the problem from their `paper
+<https://arxiv.org/pdf/2006.10159>`_:
+
+   *The hardware triggering system in a particle detector at the CERN LHC is one
+   of the most extreme environments one can imagine deploying DNNs. Latency is
+   restricted to O(1)µs, governed by the frequency of particle collisions and
+   the amount of on-detector buffers. The system consists of a limited amount of
+   FPGA resources, all of which are located in underground caverns 50-100 meters
+   below the ground surface, working on thousands of different tasks in
+   parallel. Due to the high number of tasks being performed, limited cooling
+   capabilities, limited space in the cavern, and the limited number of
+   processors, algorithms must be kept as resource-economic as possible. In
+   order to minimize the latency and maximize the precision of tasks that can be
+   performed in the hardware trigger, ML solutions are being explored as fast
+   approximations of the algorithms currently in use.*
+
+Solutions
+---------
+
+There are, broadly speaking, two ways of solving this problem:
+
+1. The ASIC Way
+===============
+
+This includes CPUs/GPUs/TPUs or any other ASIC. The idea would be to to have a
+large grid of multipliers and adders to carry out as many multiply-accumulate
+operations in parallel. To achieve more performance, research would be put to
+increase the frequency of the chip (Moore's law). Compilers and specialized
+frameworks help abstract computation. And if, we need more performance,
+specialized engineers (who have mastered assembly language) are called upon to
+write performant kernels, making use of clever tricks to have the fastest
+possible dot product.
+
+2. The FPGA way
+===============
+
+Through this way, the idea is to exploit FPGA's programming model. Instead of
+writing a program for our problem, we design a circuit for it. Each layer of
+a neural network would be represented by a circuit. Inside the layer, all
+dot-products themselves are represented by a circuit. If the neural network is
+not prohibitively large, we can even fit the entire NN as a combinational
+circuit. 
+
+As you might have learnt in your digital circuits course, combinational circuits 
+do not contain any clocks i.e. there's no notion of frequency — inputs come in,
+outputs go out. The speed of computation is only bottleneck'ed by the time it
+takes electrons to pass in that chip. How cool is that?!
+
+Flaws with the FPGA way
+-----------------------
+
+One of the biggest flaw with fitting entire problems on the FPGA is that of
+`combinatorial
+explosion <https://en.wikipedia.org/wiki/Combinatorial_explosion>`_ in
+complexity. For example, in order to design a circuit for a multiplier, there
+are `well known algorithms
+<https://en.wikipedia.org/wiki/Booth's_multiplication_algorithm>`_ that result
+in very efficient multiplier. One can avoid going this route by directly
+encoding the multipliers into truth-tables. Instead of calculating the outputs
+of a multiplication, we remember and look-it-up. Here's verilog for a 2-bit
+multiplication:
+
+.. code::
+
+  module mul ( input signed [1:0] a, input signed [1:0] b, output signed [3:0] out);
+   assign out[3] = (a[0] & a[1] & b[0] & b[1]);
+   assign out[2] = (~a[0] & a[1] & b[1]) | (a[1] & ~b[0] & b[1]);
+   assign out[1] = (~a[0] & a[1] & b[0]) | (a[0] & ~b[0] & b[1]) | (a[0] & ~a[1] & b[1]);
+   assign out[0] = (a[0] & b[0]);
+  endmodule
+
+Each output is just a combination of its inputs.
+
+Here's the problem: this method of designing multipliers does not scale! The
+2bit multiplier takes 4 LUTs (pretty reasonable). But the same for an 8bit
+multiplier takes ~18,000 LUTs and 3+ hrs to synthesize (awful). The increase is
+at the rate of 2^n. Many large neural networks will have a hard time to fit on
+the FPGA in this way.
+
+This doesn’t signal the end for FPGAs, however. There’s still a strong case to be
+made for their use—just as the team at CERN has demonstrated. In fact, they are
+actively leveraging this potential. They discovered that neural network layers
+can be *heterogeneously quantized* — meaning each layer can have a different
+precision level depending on its significance in the computation pipeline, as
+outlined in their work `here <https://fastmachinelearning.org/hls4ml/>`_
+
+If an entire network cannot fit on an FPGA, fast reconfiguration can provide a
+solution. This involves configuring the hardware for one layer, processing its
+outputs, then reconfiguring the hardware for the next layer, and so on. The
+approach can be further refined to enable reconfiguration at a per-channel
+level, allowing smaller FPGAs with limited resources to participate. A
+'compiler' would orchestrate the computation offline, determining the sequence
+and timing of reconfigurations before the actual computation begins.
+
+Recent interest in hyper-quantization i.e. `1bit
+<https://github.com/kyegomez/BitNet>`_, 2bit, 3bit ... networks is a
+big win for the FPGA way. The lower the resolution, the more efficient and
+practical the solution becomes, making FPGAs a great fit for this approach.
+
+Conclusion
+----------
+
+With the FPGA way, many problems spanning different domains can be solved in
+interesting and (sometimes) superior ways. At my workplace, we've started
+research in the FPGA way, trying to bring it out of the depths of complexities
+and solve practical problems.
+
+The intention of this post is not to compare ASICs and FPGAs (comparisons are
+futile), but to highlight how FPGAs ought to be seen and used. In the following
+few months, i'll write more on this research as I uncover it myself. I'll leave
+you with some links advocating for the FPGA way [#fpga_way]_
+
+- `Learning and Memorization - Satrajit Chatterjee
+  <https://proceedings.mlr.press/v80/chatterjee18a/chatterjee18a.pdf>`_
+- `LUTnet <https://arxiv.org/abs/1904.00938>`_
+- `George Constantinides and his team
+  <https://scholar.google.com/citations?user=NTn1NJAAAAAJ&hl=en>`_
+- `hls4ml team <https://fastmachinelearning.org/hls4ml/>`_
+
+.. rubric:: Footnotes
+
+.. [#fpga_arch] The term "architecture" is a bit overloaded. The first
+   meaning is of the VLSI sense i.e. how LUTs and interconnect are organized to
+   make the FPGA. Another usage is for describing what all higher level
+   components are being designed **on top** of the FPGA. Think matmul engines,
+   caches etc. "Architecture" has meaning on different levels of circuit design.
+
+.. [#fpga_way] The is a term i've coined myself. I've not seen anyone else use
+   it in their works.
@@ -4,12 +4,12 @@ Shreeyash's Webpage
 "*The life so short, the craft so long to learn*" 
 — Hippocrates
 
-.. image:: https://upload.wikimedia.org/wikipedia/commons/0/00/Modern_Loose_Reed_Power_Loom-marsden.png
+.. image:: _static/loom.png
     :alt: Reed Power Loom
     :width: 400
     :align: center
 
-programmer, technical minimalist, appreciate all things creative. interested in fast computers.
+Programmer, Technical Minimalist, appreciate all things creative. Interested in fast computers.
 
 This is my place on the interwebz. Check out:
 
 
@@ -25,6 +25,8 @@ Essays I've Liked
   <https://terrytao.wordpress.com/career-advice/work-hard/>`__
 - `Turd Sandwiches and Purpose In Life - AvE [video]
   <https://youtu.be/E7RgtMGL7CA?si=n-JG-tI3TODkEODk>`__
+- `The TANDBERG Way - Olve Maudal
+  <https://youtu.be/34FLhwkrwoQ?si=QU1Q_wMIDMyzutwg>`__
 
 Papers I've Liked
 -----------------
@@ -54,6 +56,16 @@ Programming/Hacking
   <https://www.reddit.com/r/unix/comments/6gxduc/how_is_gnu_yes_so_fast/>`__
 - `Greppability is an underrated code metric <https://morizbuesing.com/blog/greppability-code-metric/>`__
 
+Blogs
+-----
+
+- `Josh Haberman <https://blog.reverberate.org/>`__
+- `John Regehr <https://blog.regehr.org/>`__
+- `Jannis Harder <https://jix.one/>`__
+- `Max Bernstein <https://bernsteinbear.com/blog/>`__
+- `Evan Martin <https://neugierig.org/software/blog/archive.html>`__
+- `Nirav Patel (of Framework) <https://eclecti.cc/>`__
+
 Books
 -----
 
 
@@ -1,8 +1,16 @@
-@import url('https://fonts.googleapis.com/css2?family=EB+Garamond:ital,wght@0,400..800;1,400..800&family=JetBrains+Mono:ital,wght@0,100..800;1,100..800&display=swap');
+@import url('https://fonts.googleapis.com/css2?family=Merriweather:ital,wght@0,300;0,400;0,700;0,900;1,300;1,400;1,700;1,900&display=swap');
 
 body {
-  font-family: "JetBrains Mono", monospace;
+  font-family: "Merriweather", Georgia, serif;
   font-optical-sizing: auto;
   font-weight: 300;
   font-style: normal;
+  background-color: #fffff8;
+  color: #282828;
 }
+
+div.body {
+    background-color: #fffff8;
+    color: #282828;
+}
+
@@ -8,15 +8,15 @@
     <title>How to remove a vertex from a boost graph? &#8212; Thoughts, et cetera  documentation</title>
     <link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=d1102ebc" />
     <link rel="stylesheet" type="text/css" href="../_static/alabaster.css?v=12dfc556" />
-    <link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=54f8742a" />
+    <link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=a966dca0" />
     <script src="../_static/documentation_options.js?v=5929fcd5"></script>
     <script src="../_static/doctools.js?v=9a2dae69"></script>
     <script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
     <link rel="icon" href="../_static/flying-katakana-man.ico"/>
     <link rel="index" title="Index" href="../genindex.html" />
     <link rel="search" title="Search" href="../search.html" />
     <link rel="next" title="Paper Reading Sundays" href="../sunday.html" />
-    <link rel="prev" title="Blog" href="index.html" />
+    <link rel="prev" title="Ghidra Decompiler - CLI guide" href="ghidra_decompiler_cli_guide.html" />
 
   <link rel="stylesheet" href="../_static/custom.css" type="text/css" />
 
@@ -168,7 +168,7 @@ <h3>Related Topics</h3>
 <ul>
   <li><a href="../index.html">Documentation overview</a><ul>
   <li><a href="index.html">Blog</a><ul>
-      <li>Previous: <a href="index.html" title="previous chapter">Blog</a></li>
+      <li>Previous: <a href="ghidra_decompiler_cli_guide.html" title="previous chapter">Ghidra Decompiler - CLI guide</a></li>
       <li>Next: <a href="../sunday.html" title="next chapter">Paper Reading Sundays</a></li>
   </ul></li>
   </ul></li>
 
@@ -8,7 +8,7 @@
     <title>Ghidra Decompiler - CLI guide &#8212; Thoughts, et cetera  documentation</title>
     <link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=d1102ebc" />
     <link rel="stylesheet" type="text/css" href="../_static/alabaster.css?v=12dfc556" />
-    <link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=54f8742a" />
+    <link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=a966dca0" />
     <script src="../_static/documentation_options.js?v=5929fcd5"></script>
     <script src="../_static/doctools.js?v=9a2dae69"></script>
     <script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
 
@@ -8,14 +8,14 @@
     <title>Blog &#8212; Thoughts, et cetera  documentation</title>
     <link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=d1102ebc" />
     <link rel="stylesheet" type="text/css" href="../_static/alabaster.css?v=12dfc556" />
-    <link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=54f8742a" />
+    <link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=a966dca0" />
     <script src="../_static/documentation_options.js?v=5929fcd5"></script>
     <script src="../_static/doctools.js?v=9a2dae69"></script>
     <script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
     <link rel="icon" href="../_static/flying-katakana-man.ico"/>
     <link rel="index" title="Index" href="../genindex.html" />
     <link rel="search" title="Search" href="../search.html" />
-    <link rel="next" title="No-ISA is the Best ISA" href="no_isa_is_the_best_isa.html" />
+    <link rel="next" title="Why Even Bother With FPGAs?" href="why_even_bother_with_fpgas.html" />
     <link rel="prev" title="Shreeyash’s Webpage" href="../index.html" />
 
   <link rel="stylesheet" href="../_static/custom.css" type="text/css" />
@@ -38,6 +38,7 @@
 <h1>Blog<a class="headerlink" href="#blog" title="Link to this heading">¶</a></h1>
 <div class="toctree-wrapper compound">
 <ul>
+<li class="toctree-l1"><a class="reference internal" href="why_even_bother_with_fpgas.html">Why Even Bother With FPGAs?</a></li>
 <li class="toctree-l1"><a class="reference internal" href="no_isa_is_the_best_isa.html">No-ISA is the Best ISA</a></li>
 <li class="toctree-l1"><a class="reference internal" href="pattern_seeking_brain.html">When Reverse Engineering, Your Pattern Seeking Brain Is Your Friend</a></li>
 <li class="toctree-l1"><a class="reference internal" href="ghidra_decompiler_cli_guide.html">Ghidra Decompiler - Standalone CLI Guide</a></li>
@@ -74,7 +75,7 @@ <h3>Related Topics</h3>
 <ul>
   <li><a href="../index.html">Documentation overview</a><ul>
       <li>Previous: <a href="../index.html" title="previous chapter">Shreeyash’s Webpage</a></li>
-      <li>Next: <a href="no_isa_is_the_best_isa.html" title="next chapter">No-ISA is the Best ISA</a></li>
+      <li>Next: <a href="why_even_bother_with_fpgas.html" title="next chapter">Why Even Bother With FPGAs?</a></li>
   </ul></li>
 </ul>
 </div>
 
@@ -8,7 +8,7 @@
     <title>The Hitchhiker’s Guide to Coffee Available In The Indian Market &#8212; Thoughts, et cetera  documentation</title>
     <link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=d1102ebc" />
     <link rel="stylesheet" type="text/css" href="../_static/alabaster.css?v=12dfc556" />
-    <link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=54f8742a" />
+    <link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=a966dca0" />
     <script src="../_static/documentation_options.js?v=5929fcd5"></script>
     <script src="../_static/doctools.js?v=9a2dae69"></script>
     <script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
 
@@ -8,7 +8,7 @@
     <title>Impressions on the Xilinx Kria KV260 &#8212; Thoughts, et cetera  documentation</title>
     <link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=d1102ebc" />
     <link rel="stylesheet" type="text/css" href="../_static/alabaster.css?v=12dfc556" />
-    <link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=54f8742a" />
+    <link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=a966dca0" />
     <script src="../_static/documentation_options.js?v=5929fcd5"></script>
     <script src="../_static/doctools.js?v=9a2dae69"></script>
     <script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
 
@@ -8,7 +8,7 @@
     <title>The life so short, the craft so long to learn - A Lament &#8212; Thoughts, et cetera  documentation</title>
     <link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=d1102ebc" />
     <link rel="stylesheet" type="text/css" href="../_static/alabaster.css?v=12dfc556" />
-    <link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=54f8742a" />
+    <link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=a966dca0" />
     <script src="../_static/documentation_options.js?v=5929fcd5"></script>
     <script src="../_static/doctools.js?v=9a2dae69"></script>
     <script src="../_static/sphinx_highlight.js?v=dc90522c"></script>