Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

.PHONY: all
all:
# @pixi run test
@pixi run main
@pixi run bench
@pixi run test
# @pixi run bench

.PHONY: %
%:
Expand Down
104 changes: 81 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,49 +2,107 @@

**A Quantum Circuit Composer & Simulator in Mojo** 🔥⚛️

QLabs is a quantum circuit simulation library implemented in Mojo, designed for educational purposes and high-performance quantum circuit simulation.

## Education
## 🎓 Educational Purposes

This project reimplements and extends the ideas from the following tutorial paper:
### 🎯 Project Objectives

- **Mojo Implementation**: Re-implement the approach from the referenced paper [1] in Mojo for a Pythonic syntax and enhanced readability.
- **Learning by Doing**: Gain hands-on experience with quantum circuit simulation to understand the capabilities and limitations of classical simulation.
- **Performance & Safety**: Leverage Mojo's strong static typing and compilation for blazing-fast and safe operations.
- **Hardware Acceleration**: Utilize Mojo’s universal GPU programming support to accelerate simulations.

### 🛠️ Implementations

- ✅ **State Vector and Gate Circuit Implementations**
- Low-level: See `examples/low_level.mojo` using `qlabs.base` tools.
- High-level: See `examples/circuit_level.mojo` using `qlabs.abstractions` tools.
- ✅ **Partial GPU Support (Cross-Platform: NVIDIA/AMD)**
- Low-level: See `examples/gpu_low_level.mojo` using `qlabs.base` and `qlabs.base.gpu` tools.
- High-level: See `examples/circuit_level.mojo` using `qlabs.abstractions` tools.
- ✅ **Quantum States Statistics Calculation**
- ☐ **Qubit Measurements** (Access to the full State Vector is already available, providing even more flexibility)
- ☐ **Continuous Statistics Tracking During Circuit Execution**
- ☐ **Gradient Computations**
- ☐ **Tensor Network Implementation**

[1] This project reimplements and extends the ideas from the following tutorial paper:

> **How to Write a Simulator for Quantum Circuits from Scratch: A Tutorial**
> *Michael J. McGuffin, Jean-Marc Robert, and Kazuki Ikeda*
> Published: 2025-06-09 on [arXiv:2506.08142v1](https://arxiv.org/abs/2506.08142v1) (last accessed: 2025-06-12)

### 🎯 Project Objectives

* **Mojo Implementation:** Re-implement the approach from the paper in Mojo for more Pythonic synthax and better readability.
* **Learning by Doing:** Gain hands-on experience with quantum circuit simulation to better understand the capabilities and limitations of classical simulation.
* **Performance & Safety:** Leverage Mojo's strong static typing and compilation for blazing-fast and safe operations.
* **Hardware Acceleration:** Utilize Mojo’s universal GPU programming support to accelerate simulations.

### 🔥 Current Implementation
### 🔥 Library Performance

The current implementation uses a State Vector approach, which is an efficient method for simulating small-scale quantum circuits (20–30 qubits) with high precision. This approach also enables relatively straightforward exact gradient computations.
Achieves high-speed execution through Mojo's compilation, with up to 100x performance improvement when using GPU acceleration for larger numbers of qubits.

An alternative implementation for the futur could be using the Tensor Network approach. This method is more suitable for larger circuits but offers lower precision and would involves more computationally expensive gradient calculations.
![Benchmark Results](img/benchmark_H100.png)

## Usage
## 🚀 Usage

### ⚙️ Environment Setup
To get started, you need to install `pixi` to manage project dependencies.

Follow these steps to set up your environment, build the library and run some examples:
### 📦 Install Pixi

If you don't have Pixi installed yet:
```bash
curl -sSf https://pixi.sh/install.sh | bash
```
Install all project dependencies:
```
pixi install
```

Build and run examples of the simulator:
### ⚙️ Main Commands

Run the following commands using `pixi run` or `make`:

```bash
pixi run main
pixi run format # Format the repository's Mojo code with the Mojo formatter
pixi run package # Compile the qlabs package into a .mojopkg file in build/
pixi run test # Run all tests in tests/
pixi run main # Run all example files in examples/
pixi run bench # Run all benchmarks as defined in benchmarks/main.mojo
pixi run plot # Run benchmarks and plot their results in data/
```

### 🧑‍💻 Example: Quantum Circuit with `qlabs.abstractions`

```python
from qlabs.base import StateVector, Hadamard, SWAP, NOT, PauliY, PauliZ
from qlabs.abstractions import GateCircuit, StateVectorSimulator, ShowAfterEachGate

num_qubits = 3
qc = GateCircuit(num_qubits)

qc.apply_gates(
Hadamard(0),
SWAP(0, 2),
NOT(1, anti_controls=[2]),
NOT(0, controls=[1]),
PauliY(0),
SWAP(1, 2, controls=[0]),
PauliZ(1),
)

print("Quantum circuit created:\n", qc) # Visualization not fully implemented
> Expected output:
> --|H|---x--------|X|--|Y|---*-------
> | | |
> --------|---|X|---*---------x--|Z|--
> | | |
> --------x----o--------------x-------

qsimu = StateVectorSimulator(
qc,
initial_state=StateVector.from_bitstring("0" * num_qubits),
use_gpu_if_available=True, # GPU support not fully implemented
verbose=True,
verbose_step_size=ShowAfterEachGate, # Options: ShowAfterEachGate, ShowOnlyEnd
)

final_state = qsimu.run()

print("Final quantum state:\n", final_state)
print("Normalised purity of qubit 0 (the top one):", final_state.normalised_purity([0]))
```

## 📄 License

This project is open-source and licensed under Apache License 2.0.
This project is open-source and licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). Contributions are welcome!
27 changes: 9 additions & 18 deletions TODOs.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,39 +4,26 @@

### Implementations

- 5 / 5 : Start adding support for GPU in the base classes if needed (not possible to use SIMD(ComplexFloat32) anymore, or keep them but seperate them when moving data to GPU)
- struct StateVector
- struct ComplexMatrix
- struct Gate

- 5 / ? : GPU implementation of:
- qubit_wise_multiply()
- 5 / 4 : GPU implementation of:
- qubit_wise_multiply() (with different type of control gates and for multiple qubits)
- apply_swap()
- partial_trace()
- StateVector.to_density_matrix()

- 4 / 3 : Export benchmark results as plots.

- 2 / 4 : Efficient support for tracking a state statistic like entropy during the execution of the circuit by the simulator.

- 3 / 3 : Implement naive implementation of the functions to compare performances
- matrix multiplication (but starting from right or smart)
- partial trace

### Tests

- 5 / 2 : Test qubit_wise_multiply_extended() that can take multiple qubits gates (2 and more, iSWAP for example)

- 5 / 2 : Test for everything that will be implement in GPU
- qubit_wise_multiply()
- qubit_wise_multiply() (with different type of control gates and for multiple qubits)
- apply_swap()
- struct StateVector's methods
- struct ComplexMatrix's methods
- struct Gate's Gate
- partial_trace()

### Benchmarks

- 3 / 2 : Reproduce table from page 10
- 3 / 2 : partial_trace() Reproduce table from page 10

## Droped for now

Expand All @@ -60,3 +47,7 @@
- 2 / 4 : qubit_wise_multiply_extended() but for gates applied to non-adjacent qubits

- 2 / 3 : Implement concurence (2-qubits entanglement metric) computePairwiseQubitConcurrences()

- 1 / 3 : Implement naive implementation of the functions to compare performances
- matrix multiplication (but starting from right or smart)
- partial trace
26 changes: 13 additions & 13 deletions benchmarks/all_benchmarks.mojo → benchmarks/bench_main.mojo
Original file line number Diff line number Diff line change
Expand Up @@ -15,29 +15,29 @@ def main():
print("Running all benchmarks...")
# bench_qubit_wise_multiply()
bench_qubit_wise_multiply_inplace[
min_number_qubits=5,
max_number_qubits=25,
number_qubits_step_size=2,
min_number_qubits=1,
max_number_qubits=20,
number_qubits_step_size=1,
min_number_layers=5,
max_number_layers=4000,
max_number_layers=3500,
number_layers_step_size=400,
fixed_number_qubits=11,
fixed_number_layers=20,
fixed_number_qubits=13,
fixed_number_layers=5,
]()

@parameter
if not has_accelerator():
print("No compatible GPU found")
else:
bench_qubit_wise_multiply_inplace[
min_number_qubits=5,
max_number_qubits=25,
number_qubits_step_size=2,
bench_qubit_wise_multiply_inplace_gpu[
min_number_qubits=1,
max_number_qubits=26, # 29 is OOM for my 3070 Ti Laptop GPU
number_qubits_step_size=1,
min_number_layers=5,
max_number_layers=4000,
max_number_layers=7000,
number_layers_step_size=400,
fixed_number_qubits=11,
fixed_number_layers=20,
fixed_number_qubits=13,
fixed_number_layers=5,
]()

# bench_qubit_wise_multiply_extended()
Expand Down
16 changes: 12 additions & 4 deletions benchmarks/bench_qubit_wise_multiply_gpu.mojo
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,12 @@ fn benchmark_qubit_wise_multiply_inplace_gpu[
for qubit in range(num_qubits):
if current_state == 0:
ctx.enqueue_function[
qubit_wise_multiply_inplace_gpu[number_control_bits=0]
qubit_wise_multiply_inplace_gpu[
state_vector_size=state_vector_size,
gate_set_size=gate_set_size,
circuit_number_control_gates=circuit_number_control_gates,
number_control_bits=0,
]
](
gate_set_re_tensor,
gate_set_im_tensor,
Expand All @@ -259,7 +264,6 @@ fn benchmark_qubit_wise_multiply_inplace_gpu[
quantum_state_re_tensor,
quantum_state_im_tensor,
num_qubits, # number_qubits
state_vector_size, # quantum_state_size
quantum_state_out_re_tensor,
quantum_state_out_im_tensor,
control_bits_circuit_tensor,
Expand All @@ -270,7 +274,12 @@ fn benchmark_qubit_wise_multiply_inplace_gpu[
current_state = 1
else:
ctx.enqueue_function[
qubit_wise_multiply_inplace_gpu[number_control_bits=0]
qubit_wise_multiply_inplace_gpu[
state_vector_size=state_vector_size,
gate_set_size=gate_set_size,
circuit_number_control_gates=circuit_number_control_gates,
number_control_bits=0,
]
](
gate_set_re_tensor,
gate_set_im_tensor,
Expand All @@ -285,7 +294,6 @@ fn benchmark_qubit_wise_multiply_inplace_gpu[
quantum_state_out_re_tensor,
quantum_state_out_im_tensor,
num_qubits, # number_qubits
state_vector_size, # quantum_state_size
quantum_state_re_tensor,
quantum_state_im_tensor,
control_bits_circuit_tensor,
Expand Down
1 change: 0 additions & 1 deletion benchmarks/bench_simulate_random_circuit.mojo
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,6 @@ fn simulate_random_circuit[num_qubits: Int, number_layers: Int]() -> None:
qsimu = StateVectorSimulator(
qc,
initial_state=initial_state,
optimisation_level=0, # No optimisations for now
verbose=False,
# verbose_step_size=ShowAfterEachLayer, # ShowAfterEachGate, ShowOnlyEnd
verbose_step_size=ShowAfterEachGate, # ShowAfterEachGate, ShowOnlyEnd
Expand Down
24 changes: 18 additions & 6 deletions benchmarks/plot_results.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ def process_benchmark_data(filepath):
layers_cpu_df = process_benchmark_data("data/qubit_wise_multiply_inplace_layers.csv")
qubits_cpu_df = process_benchmark_data("data/qubit_wise_multiply_inplace_qubits.csv")


# --- 3. Plotting ---

# Create a figure with two subplots side-by-side
Expand All @@ -48,21 +47,26 @@ def process_benchmark_data(filepath):

# Plot 1: Performance vs. Number of Layers
ax1.plot(
layers_cpu_df["layers"],
layers_cpu_df["layers"] * layers_cpu_df["qubits"][0],
layers_cpu_df["time_ms"],
marker="o",
linestyle="-",
label="CPU",
)
ax1.plot(
layers_gpu_df["layers"],
layers_gpu_df["layers"]
* layers_cpu_df["qubits"][0], # Scale x-axis by number of qubits
layers_gpu_df["time_ms"],
marker="s",
linestyle="--",
label="GPU",
)
ax1.set_title("Performance vs. Number of Layers (13 Qubits)")
ax1.set_xlabel("Number of Layers")
ax1.set_title(
f"Execution Time vs. Number of Layers\n({layers_cpu_df['qubits'][0]} Qubits)"
)
ax1.set_xlabel(
f"Number of Gates\n(Number of Layers x {layers_cpu_df['qubits'][0]} Qubits)"
)
ax1.set_ylabel("Mean Execution Time (ms)")
ax1.legend()
ax1.grid(True, linestyle="--", alpha=0.6)
Expand All @@ -82,13 +86,21 @@ def process_benchmark_data(filepath):
linestyle="--",
label="GPU",
)
ax2.set_title("Performance vs. Number of Qubits (20 Layers)")
ax2.set_title(
f"Execution Time vs. Number of Qubits\n({qubits_cpu_df['layers'][0]} Layers)"
)
ax2.set_xlabel("Number of Qubits")
# We can make the y-axis a log scale if the values vary widely
ax2.set_ylabel("Mean Execution Time (ms) - Log Scale")
ax2.set_yscale("log") # Use a logarithmic scale to better see the differences
ax2.legend()
ax2.grid(True, which="both", linestyle="--", alpha=0.6)
# set the x-ticks to be the x-values of the qubits
unique_cpu_qubits = qubits_cpu_df["qubits"].unique()
unique_gpu_qubits = qubits_gpu_df["qubits"].unique()
unique_qubits = sorted(set(unique_cpu_qubits) | set(unique_gpu_qubits))
ax2.set_xticks(unique_qubits)
ax2.set_xticklabels(unique_qubits, rotation=45)


# Adjust layout to prevent labels from overlapping
Expand Down
Loading