8 x 8 Multiplier Design using Dadda Algorithm - Low Power , High speed , Area efficient : RTL -> GDSll
This project implements an 8-bit Dadda multiplier, a high-speed hardware multiplication architecture used in modern digital and VLSI systems. The Dadda algorithm reduces partial products in a structured and optimized manner, requiring fewer adders than Wallace tree multipliers while maintaining similar performance. This makes it a highly efficient choice for ASIC and FPGA arithmetic designs.
The design is written in Verilog and follows a semi-custom VLSI flow, including simulation, synthesis, and analysis of timing, power, and area using Cadence tools. The goal of this project is to achieve a compact, fast, and power-efficient multiplier architecture suitable for integration into processors, DSP units, and high-performance embedded systems.
Design and synthesize a hardware-efficient 8×8 Dadda multiplier targeting VLSI applications, focusing on:
--> High speed (low delay)
--> Reduced area compared to Wallace tree
--> Optimized power for ASIC design
--> Structured logic for stable physical implementation
The Dadda multiplier is a fast hardware multiplication structure that minimizes the number of adders needed for partial product reduction. It uses a staged compression approach, reducing height gradually for optimal speed-area trade-off.
| Feature | Explanation |
|---|---|
| Input size | 8×8 multiplier |
| Partial products | 64 bits |
| Technique | Minimum adders, staged compression |
| Adders used | Full adders & half adders |
| Final stage | CPA for 2-row sum |
| Benefit | Faster than array multiplier, less hardware than Wallace |
-
Uses fewer adders compared to Wallace multipliers
-
Maintains high speed with controlled wiring complexity
-
Ideal for VLSI / ASIC / FPGA implementations
-
Demonstrates deep structural digital logic handling
| Feature | Dadda Multiplier | Wallace Tree | Array Multiplier | Booth Multiplier |
|---|---|---|---|---|
| Key Idea | Optimized partial-product reduction | Aggressive partial-product reduction | Direct summation array | Encoded multiplication to reduce operations |
| Speed | Very High | Very High | Low | Medium-High |
| Hardware Usage | Moderate (optimized) | High | Low | Medium |
| Area Requirement | Low-Medium | High | Lowest | Medium |
| Routing & Layout | Better structured, easier placement | Complex routing | Very regular | Moderate |
| Best Use Case | Speed + Area balance (ASIC/VLSI) | Maximum speed priority | Low-cost, low-power designs | Signed multiplication & DSP |
| Stage | Operation | Goal Height | What Happens in Code |
|---|---|---|---|
| Stage-0 | Partial Product Generation | 8 → input matrix | pp[i] = A & {8{B[i]}}; |
| Stage-1 | First compression | Reduce to ≤ 6 | First layer of HA/FA to shrink tallest columns |
| Stage-2 | Second compression | Reduce to ≤ 4 | Deeper FA chain to bring product matrix height further down |
| Stage-3 | Final partial-product reduction | Reduce to ≤ 3 | Remaining FA/HA to get only 2 rows |
| Final Stage | Final addition | 2 → 1 | Ripple/CPA add: assign P = row1 + row2; |
-
Generating 64 partial products
-
Reducing matrix height in controlled stages (6 → 4 → 3 → 2 → 1)
-
Using only Full/Half adders for compression
-
Producing final 16-bit output using structured carry propagation
📌 cadence -> RTL Coding -> Testbench -> Simulation -> Synthesis → Area/Timing/Power Reports -> layout -> gds file
--> Verilog ( .v file ) : (https://github.com/dinesh-jonnalagadda/8-x-8-multiplier-using-dadda-algorithm/blob/main/dadda8x8.v)
--> Testbench ( .v file ) : (https://github.com/dinesh-jonnalagadda/8-x-8-multiplier-using-dadda-algorithm/blob/main/tb_dadda8x8.v)
--> TCL file ( .tcl file ) : (https://github.com/dinesh-jonnalagadda/8-x-8-multiplier-using-dadda-algorithm/blob/f0cbdc6c9b7ca20deccd1e07405060136181d454/run.tcl)
--> Input_Constraints ( .sdc file ) : (https://github.com/dinesh-jonnalagadda/8-x-8-multiplier-using-dadda-algorithm/blob/main/constraints_input.sdc)
This section summarizes the synthesis results (Area, Timing, and Power) for the Dadda 8×8 Multiplier synthesized using Cadence Genus.
Area report ( .rpt file ) : (https://github.com/dinesh-jonnalagadda/8-x-8-multiplier-using-dadda-algorithm/blob/main/reports/area_report.rpt)
| Metric | Value |
|---|---|
| Design | Dadda 8×8 Multiplier |
| Total Cells | 120 |
| Total Area | 1332.144 μm² |
| Library Mode | Timing-Driven |
| Condition | Slow Corner |
Timing report ( .rpt file ) : (https://github.com/dinesh-jonnalagadda/8-x-8-multiplier-using-dadda-algorithm/blob/main/reports/timing_report.rpt)
| Metric | Value |
|---|---|
| Timing Mode | Setup Analysis |
| Critical Path | ✅ Meets Constraint |
| Violations | None |
| Operating Corner | Slow / Worst-Case |
Power report ( .rpt file ) : (https://github.com/dinesh-jonnalagadda/8-x-8-multiplier-using-dadda-algorithm/blob/main/reports/power_report.rpt)
| Power Type | Value | Share |
|---|---|---|
| Leakage Power | 6.49 × 10⁻⁶ W | 9.79% |
| Internal Power | 2.81 × 10⁻⁵ W | 42.3% |
| Switching Power | 3.17 × 10⁻⁵ W | 47.9% |
| Total Power | 6.63 × 10⁻⁵ W (≈66 µW) | 100% |
| Metric | Value | Remarks |
|---|---|---|
| Total Area | 1332.144 μm² | Compact layout |
| Critical Path Delay | Within constraint | Timing closure achieved |
| Total Power | 6.63×10⁻⁵ W | Very low power |
| Design Type | Semi-Custom (Cadence Genus) | Synthesized successfully |
| Category | Tools / Technologies |
|---|---|
| Hardware Description Language | Verilog HDL (2001 Standard) |
| Simulation | Cadence NCSim / NCLaunch |
| Logic Synthesis | Cadence Genus Synthesis Solution |
| Place & Route | Cadence Innovus Implementation System |
| Technology Node | 90 nm CMOS Standard Cell Library |
| Verification | Functional Simulation, STA (Setup/Hold), DRC, LVS |
| Reports & Debugging | Waveforms, Timing Reports, Area/Power Analysis |
| GDS Export | Innovus Stream Out (GDSII Generation) |
The 8×8 Dadda multiplier was successfully designed and synthesized, achieving low area, low power, and competitive performance. By using controlled partial-product reduction stages, the design minimizes hardware overhead while maintaining high speed. This makes the Dadda architecture a strong choice for ASIC and high-performance digital arithmetic systems.
