Skip to content

Commit 1aa3cf8

Browse files
committed
add why even bother with fpgas and css change
Signed-off-by: Shreeyash Pandey <[email protected]>
1 parent 6f2ef5b commit 1aa3cf8

34 files changed

+662
-36
lines changed

docs/_images/loom.png

366 KB
Loading

docs/_sources/blog/index.rst.txt

+1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ Blog
44
.. toctree::
55
:titlesonly:
66

7+
Why Even Bother With FPGAs? <why_even_bother_with_fpgas>
78
No-ISA is the Best ISA <no_isa_is_the_best_isa>
89
When Reverse Engineering, Your Pattern Seeking Brain Is Your Friend <pattern_seeking_brain>
910
Ghidra Decompiler - Standalone CLI Guide <ghidra_decompiler_cli_guide>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
Why Even Bother With FPGAs?
2+
###########################
3+
4+
.. TODO
5+
- why write this post (fpgas have skeptics)
6+
- central theme (nano second inference)
7+
- conclusion
8+
9+
FPGAs being alternative processors enjoy a fair bit of skepticism, especially
10+
from people higher up in the pyramid of computer abstractions (Software
11+
Engineers and the like). This post is my attempt at trying to persuade the
12+
skeptics by way of an instance where FPGAs blow every other kind of processor
13+
out of the water.
14+
15+
**TLDR**; FPGAs can allow full DNN inference at nanosecond latency only limited
16+
by the time it takes for electrons to move across a circuit. In comparison,
17+
CPU/GPUs may only be able to run a couple instructions in nanosecond timeframe,
18+
entire inference will require many million/billion of these instructions.
19+
20+
FPGAs for the Unenlightened
21+
---------------------------
22+
23+
FPGAs are circuit emulators. Digital Circuits consists of logic gates and
24+
connections between them, FPGAs emulate logic gates and their connections.
25+
26+
Logic gates can be represented by their `Truth Table
27+
<https://en.wikipedia.org/wiki/Truth_table>`_. Truth tables are a form of hash
28+
table where the key is a tuple of binary values corresponding to each input and
29+
output is a single bit representing the output of the gate. One kind of FPGA
30+
(SRAM-based), emulate logic gates by storing truth tables in memory.
31+
32+
Connections are emulated via Programmable Interconnects. Think of a network
33+
switch, programmable interconnects are pretty much like the same except on a
34+
very low-level. `This document
35+
<https://cse.usf.edu/~haozheng/teach/cda4253/doc/fpga-arch-overview.pdf>`_
36+
explains in detail the different VLSI architectures present in modern FPGAs.
37+
38+
A programmer usually does not describe circuits in the form of logic gates, they
39+
use abstractions in the form of HDLs to behaviorally describe operations that a
40+
circuit must perform. A compiler converts/maps HDL programs to FPGA primitives.
41+
42+
As it should be obvious by now, FPGAs are unlike processors. They do not have
43+
any "Instruction Set Architecture". If there is a need, the programmer must
44+
design and implement an ISA [#fpga_arch]_. FPGAs require thinking of problems as
45+
circuits with inputs and outputs.
46+
47+
The Central Argument for FPGAs
48+
------------------------------
49+
50+
Now, let's build the argument.
51+
52+
Deep Neural Networks (DNN) inference on demands a lot of compute and is a pretty
53+
challenging problem. Solutions to this problem manifests in the form of ASIC
54+
accelerators and GPUs. More performance can always be brought by scaling said
55+
processors but of-course there is a limit to how far one can scale. For example,
56+
on the `NVIDIA Jetson Nano <https://developer.nvidia.com/embedded/jetson-nano>`_
57+
the time taken to infer a single image for the CNN model ResNet50 is ~72ms. What
58+
if we needed something much faster, say the same inference in integral
59+
nanoseconds? GPUs/ASICs would only be able to execute a couple instructions in
60+
that timeframe let alone complete the inference. Certainly they won't suffice.
61+
62+
This requirement is not made up. Nanosecond DNN inference is a real problem
63+
faced by a team at CERN working on the Large Hadron Collider.
64+
65+
Here's a little description of the problem from their `paper
66+
<https://arxiv.org/pdf/2006.10159>`_:
67+
68+
*The hardware triggering system in a particle detector at the CERN LHC is one
69+
of the most extreme environments one can imagine deploying DNNs. Latency is
70+
restricted to O(1)µs, governed by the frequency of particle collisions and
71+
the amount of on-detector buffers. The system consists of a limited amount of
72+
FPGA resources, all of which are located in underground caverns 50-100 meters
73+
below the ground surface, working on thousands of different tasks in
74+
parallel. Due to the high number of tasks being performed, limited cooling
75+
capabilities, limited space in the cavern, and the limited number of
76+
processors, algorithms must be kept as resource-economic as possible. In
77+
order to minimize the latency and maximize the precision of tasks that can be
78+
performed in the hardware trigger, ML solutions are being explored as fast
79+
approximations of the algorithms currently in use.*
80+
81+
Solutions
82+
---------
83+
84+
There are, broadly speaking, two ways of solving this problem:
85+
86+
1. The ASIC Way
87+
===============
88+
89+
This includes CPUs/GPUs/TPUs or any other ASIC. The idea would be to to have a
90+
large grid of multipliers and adders to carry out as many multiply-accumulate
91+
operations in parallel. To achieve more performance, research would be put to
92+
increase the frequency of the chip (Moore's law). Compilers and specialized
93+
frameworks help abstract computation. And if, we need more performance,
94+
specialized engineers (who have mastered assembly language) are called upon to
95+
write performant kernels, making use of clever tricks to have the fastest
96+
possible dot product.
97+
98+
2. The FPGA way
99+
===============
100+
101+
Through this way, the idea is to exploit FPGA's programming model. Instead of
102+
writing a program for our problem, we design a circuit for it. Each layer of
103+
a neural network would be represented by a circuit. Inside the layer, all
104+
dot-products themselves are represented by a circuit. If the neural network is
105+
not prohibitively large, we can even fit the entire NN as a combinational
106+
circuit.
107+
108+
As you might have learnt in your digital circuits course, combinational circuits
109+
do not contain any clocks i.e. there's no notion of frequency — inputs come in,
110+
outputs go out. The speed of computation is only bottleneck'ed by the time it
111+
takes electrons to pass in that chip. How cool is that?!
112+
113+
Flaws with the FPGA way
114+
-----------------------
115+
116+
One of the biggest flaw with fitting entire problems on the FPGA is that of
117+
`combinatorial
118+
explosion <https://en.wikipedia.org/wiki/Combinatorial_explosion>`_ in
119+
complexity. For example, in order to design a circuit for a multiplier, there
120+
are `well known algorithms
121+
<https://en.wikipedia.org/wiki/Booth's_multiplication_algorithm>`_ that result
122+
in very efficient multiplier. One can avoid going this route by directly
123+
encoding the multipliers into truth-tables. Instead of calculating the outputs
124+
of a multiplication, we remember and look-it-up. Here's verilog for a 2-bit
125+
multiplication:
126+
127+
.. code::
128+
129+
module mul ( input signed [1:0] a, input signed [1:0] b, output signed [3:0] out);
130+
assign out[3] = (a[0] & a[1] & b[0] & b[1]);
131+
assign out[2] = (~a[0] & a[1] & b[1]) | (a[1] & ~b[0] & b[1]);
132+
assign out[1] = (~a[0] & a[1] & b[0]) | (a[0] & ~b[0] & b[1]) | (a[0] & ~a[1] & b[1]);
133+
assign out[0] = (a[0] & b[0]);
134+
endmodule
135+
136+
Each output is just a combination of its inputs.
137+
138+
Here's the problem: this method of designing multipliers does not scale! The
139+
2bit multiplier takes 4 LUTs (pretty reasonable). But the same for an 8bit
140+
multiplier takes ~18,000 LUTs and 3+ hrs to synthesize (awful). The increase is
141+
at the rate of 2^n. Many large neural networks will have a hard time to fit on
142+
the FPGA in this way.
143+
144+
This doesn’t signal the end for FPGAs, however. There’s still a strong case to be
145+
made for their use—just as the team at CERN has demonstrated. In fact, they are
146+
actively leveraging this potential. They discovered that neural network layers
147+
can be *heterogeneously quantized* — meaning each layer can have a different
148+
precision level depending on its significance in the computation pipeline, as
149+
outlined in their work `here <https://fastmachinelearning.org/hls4ml/>`_
150+
151+
If an entire network cannot fit on an FPGA, fast reconfiguration can provide a
152+
solution. This involves configuring the hardware for one layer, processing its
153+
outputs, then reconfiguring the hardware for the next layer, and so on. The
154+
approach can be further refined to enable reconfiguration at a per-channel
155+
level, allowing smaller FPGAs with limited resources to participate. A
156+
'compiler' would orchestrate the computation offline, determining the sequence
157+
and timing of reconfigurations before the actual computation begins.
158+
159+
Recent interest in hyper-quantization i.e. `1bit
160+
<https://github.com/kyegomez/BitNet>`_, 2bit, 3bit ... networks is a
161+
big win for the FPGA way. The lower the resolution, the more efficient and
162+
practical the solution becomes, making FPGAs a great fit for this approach.
163+
164+
Conclusion
165+
----------
166+
167+
With the FPGA way, many problems spanning different domains can be solved in
168+
interesting and (sometimes) superior ways. At my workplace, we've started
169+
research in the FPGA way, trying to bring it out of the depths of complexities
170+
and solve practical problems.
171+
172+
The intention of this post is not to compare ASICs and FPGAs (comparisons are
173+
futile), but to highlight how FPGAs ought to be seen and used. In the following
174+
few months, i'll write more on this research as I uncover it myself. I'll leave
175+
you with some links advocating for the FPGA way [#fpga_way]_
176+
177+
- `Learning and Memorization - Satrajit Chatterjee
178+
<https://proceedings.mlr.press/v80/chatterjee18a/chatterjee18a.pdf>`_
179+
- `LUTnet <https://arxiv.org/abs/1904.00938>`_
180+
- `George Constantinides and his team
181+
<https://scholar.google.com/citations?user=NTn1NJAAAAAJ&hl=en>`_
182+
- `hls4ml team <https://fastmachinelearning.org/hls4ml/>`_
183+
184+
.. rubric:: Footnotes
185+
186+
.. [#fpga_arch] The term "architecture" is a bit overloaded. The first
187+
meaning is of the VLSI sense i.e. how LUTs and interconnect are organized to
188+
make the FPGA. Another usage is for describing what all higher level
189+
components are being designed **on top** of the FPGA. Think matmul engines,
190+
caches etc. "Architecture" has meaning on different levels of circuit design.
191+
192+
.. [#fpga_way] The is a term i've coined myself. I've not seen anyone else use
193+
it in their works.

docs/_sources/index.rst.txt

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@ Shreeyash's Webpage
44
"*The life so short, the craft so long to learn*"
55
— Hippocrates
66

7-
.. image:: https://upload.wikimedia.org/wikipedia/commons/0/00/Modern_Loose_Reed_Power_Loom-marsden.png
7+
.. image:: _static/loom.png
88
:alt: Reed Power Loom
99
:width: 400
1010
:align: center
1111

12-
programmer, technical minimalist, appreciate all things creative. interested in fast computers.
12+
Programmer, Technical Minimalist, appreciate all things creative. Interested in fast computers.
1313

1414
This is my place on the interwebz. Check out:
1515

docs/_sources/links.rst.txt

+12
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ Essays I've Liked
2525
<https://terrytao.wordpress.com/career-advice/work-hard/>`__
2626
- `Turd Sandwiches and Purpose In Life - AvE [video]
2727
<https://youtu.be/E7RgtMGL7CA?si=n-JG-tI3TODkEODk>`__
28+
- `The TANDBERG Way - Olve Maudal
29+
<https://youtu.be/34FLhwkrwoQ?si=QU1Q_wMIDMyzutwg>`__
2830

2931
Papers I've Liked
3032
-----------------
@@ -54,6 +56,16 @@ Programming/Hacking
5456
<https://www.reddit.com/r/unix/comments/6gxduc/how_is_gnu_yes_so_fast/>`__
5557
- `Greppability is an underrated code metric <https://morizbuesing.com/blog/greppability-code-metric/>`__
5658

59+
Blogs
60+
-----
61+
62+
- `Josh Haberman <https://blog.reverberate.org/>`__
63+
- `John Regehr <https://blog.regehr.org/>`__
64+
- `Jannis Harder <https://jix.one/>`__
65+
- `Max Bernstein <https://bernsteinbear.com/blog/>`__
66+
- `Evan Martin <https://neugierig.org/software/blog/archive.html>`__
67+
- `Nirav Patel (of Framework) <https://eclecti.cc/>`__
68+
5769
Books
5870
-----
5971

docs/_static/2024-11-17_17-53.png

1.16 MB
Loading

docs/_static/css/custom.css

+10-2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,16 @@
1-
@import url('https://fonts.googleapis.com/css2?family=EB+Garamond:ital,wght@0,400..800;1,400..800&family=JetBrains+Mono:ital,wght@0,100..800;1,100..800&display=swap');
1+
@import url('https://fonts.googleapis.com/css2?family=Merriweather:ital,wght@0,300;0,400;0,700;0,900;1,300;1,400;1,700;1,900&display=swap');
22

33
body {
4-
font-family: "JetBrains Mono", monospace;
4+
font-family: "Merriweather", Georgia, serif;
55
font-optical-sizing: auto;
66
font-weight: 300;
77
font-style: normal;
8+
background-color: #fffff8;
9+
color: #282828;
810
}
11+
12+
div.body {
13+
background-color: #fffff8;
14+
color: #282828;
15+
}
16+

docs/_static/hetnn.png

717 KB
Loading

docs/_static/loom.png

366 KB
Loading

docs/blog/boost_graphs_remove_vertex.html

+3-3
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,15 @@
88
<title>How to remove a vertex from a boost graph? &#8212; Thoughts, et cetera documentation</title>
99
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=d1102ebc" />
1010
<link rel="stylesheet" type="text/css" href="../_static/alabaster.css?v=12dfc556" />
11-
<link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=54f8742a" />
11+
<link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=a966dca0" />
1212
<script src="../_static/documentation_options.js?v=5929fcd5"></script>
1313
<script src="../_static/doctools.js?v=9a2dae69"></script>
1414
<script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
1515
<link rel="icon" href="../_static/flying-katakana-man.ico"/>
1616
<link rel="index" title="Index" href="../genindex.html" />
1717
<link rel="search" title="Search" href="../search.html" />
1818
<link rel="next" title="Paper Reading Sundays" href="../sunday.html" />
19-
<link rel="prev" title="Blog" href="index.html" />
19+
<link rel="prev" title="Ghidra Decompiler - CLI guide" href="ghidra_decompiler_cli_guide.html" />
2020

2121
<link rel="stylesheet" href="../_static/custom.css" type="text/css" />
2222

@@ -168,7 +168,7 @@ <h3>Related Topics</h3>
168168
<ul>
169169
<li><a href="../index.html">Documentation overview</a><ul>
170170
<li><a href="index.html">Blog</a><ul>
171-
<li>Previous: <a href="index.html" title="previous chapter">Blog</a></li>
171+
<li>Previous: <a href="ghidra_decompiler_cli_guide.html" title="previous chapter">Ghidra Decompiler - CLI guide</a></li>
172172
<li>Next: <a href="../sunday.html" title="next chapter">Paper Reading Sundays</a></li>
173173
</ul></li>
174174
</ul></li>

docs/blog/ghidra_decompiler_cli_guide.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
<title>Ghidra Decompiler - CLI guide &#8212; Thoughts, et cetera documentation</title>
99
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=d1102ebc" />
1010
<link rel="stylesheet" type="text/css" href="../_static/alabaster.css?v=12dfc556" />
11-
<link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=54f8742a" />
11+
<link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=a966dca0" />
1212
<script src="../_static/documentation_options.js?v=5929fcd5"></script>
1313
<script src="../_static/doctools.js?v=9a2dae69"></script>
1414
<script src="../_static/sphinx_highlight.js?v=dc90522c"></script>

docs/blog/index.html

+4-3
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,14 @@
88
<title>Blog &#8212; Thoughts, et cetera documentation</title>
99
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=d1102ebc" />
1010
<link rel="stylesheet" type="text/css" href="../_static/alabaster.css?v=12dfc556" />
11-
<link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=54f8742a" />
11+
<link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=a966dca0" />
1212
<script src="../_static/documentation_options.js?v=5929fcd5"></script>
1313
<script src="../_static/doctools.js?v=9a2dae69"></script>
1414
<script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
1515
<link rel="icon" href="../_static/flying-katakana-man.ico"/>
1616
<link rel="index" title="Index" href="../genindex.html" />
1717
<link rel="search" title="Search" href="../search.html" />
18-
<link rel="next" title="No-ISA is the Best ISA" href="no_isa_is_the_best_isa.html" />
18+
<link rel="next" title="Why Even Bother With FPGAs?" href="why_even_bother_with_fpgas.html" />
1919
<link rel="prev" title="Shreeyash’s Webpage" href="../index.html" />
2020

2121
<link rel="stylesheet" href="../_static/custom.css" type="text/css" />
@@ -38,6 +38,7 @@
3838
<h1>Blog<a class="headerlink" href="#blog" title="Link to this heading"></a></h1>
3939
<div class="toctree-wrapper compound">
4040
<ul>
41+
<li class="toctree-l1"><a class="reference internal" href="why_even_bother_with_fpgas.html">Why Even Bother With FPGAs?</a></li>
4142
<li class="toctree-l1"><a class="reference internal" href="no_isa_is_the_best_isa.html">No-ISA is the Best ISA</a></li>
4243
<li class="toctree-l1"><a class="reference internal" href="pattern_seeking_brain.html">When Reverse Engineering, Your Pattern Seeking Brain Is Your Friend</a></li>
4344
<li class="toctree-l1"><a class="reference internal" href="ghidra_decompiler_cli_guide.html">Ghidra Decompiler - Standalone CLI Guide</a></li>
@@ -74,7 +75,7 @@ <h3>Related Topics</h3>
7475
<ul>
7576
<li><a href="../index.html">Documentation overview</a><ul>
7677
<li>Previous: <a href="../index.html" title="previous chapter">Shreeyash’s Webpage</a></li>
77-
<li>Next: <a href="no_isa_is_the_best_isa.html" title="next chapter">No-ISA is the Best ISA</a></li>
78+
<li>Next: <a href="why_even_bother_with_fpgas.html" title="next chapter">Why Even Bother With FPGAs?</a></li>
7879
</ul></li>
7980
</ul>
8081
</div>

docs/blog/indian_coffee.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
<title>The Hitchhiker’s Guide to Coffee Available In The Indian Market &#8212; Thoughts, et cetera documentation</title>
99
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=d1102ebc" />
1010
<link rel="stylesheet" type="text/css" href="../_static/alabaster.css?v=12dfc556" />
11-
<link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=54f8742a" />
11+
<link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=a966dca0" />
1212
<script src="../_static/documentation_options.js?v=5929fcd5"></script>
1313
<script src="../_static/doctools.js?v=9a2dae69"></script>
1414
<script src="../_static/sphinx_highlight.js?v=dc90522c"></script>

docs/blog/kv260.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
<title>Impressions on the Xilinx Kria KV260 &#8212; Thoughts, et cetera documentation</title>
99
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=d1102ebc" />
1010
<link rel="stylesheet" type="text/css" href="../_static/alabaster.css?v=12dfc556" />
11-
<link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=54f8742a" />
11+
<link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=a966dca0" />
1212
<script src="../_static/documentation_options.js?v=5929fcd5"></script>
1313
<script src="../_static/doctools.js?v=9a2dae69"></script>
1414
<script src="../_static/sphinx_highlight.js?v=dc90522c"></script>

docs/blog/lament.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
<title>The life so short, the craft so long to learn - A Lament &#8212; Thoughts, et cetera documentation</title>
99
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=d1102ebc" />
1010
<link rel="stylesheet" type="text/css" href="../_static/alabaster.css?v=12dfc556" />
11-
<link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=54f8742a" />
11+
<link rel="stylesheet" type="text/css" href="../_static/css/custom.css?v=a966dca0" />
1212
<script src="../_static/documentation_options.js?v=5929fcd5"></script>
1313
<script src="../_static/doctools.js?v=9a2dae69"></script>
1414
<script src="../_static/sphinx_highlight.js?v=dc90522c"></script>

0 commit comments

Comments
 (0)