Running Benchmarks

Tessera ships with a benchmark suite that compares its transpiler output against Qiskit's across a set of standard circuits. It measures gate count, circuit depth, transpile time, and simulation correctness. Results are tracked in benchmark_store.json and historical runs are documented in benchmarks/benchmarks.md.

Prerequisites

Benchmarks require dev dependencies including qiskit-aerfor simulation. If you haven't already:

pip install -e .[dev]

Running the Benchmarks

Dry run. View results without writing anything:

python benchmarks/benchmarks.py

Write results to benchmark_store.json:

python benchmarks/benchmarks.py --write

Refuses to write if any metric on any circuit regressed against the stored ceiling, or if any simulation distribution mismatched Qiskit.

Write results even if some metrics regressed:

python benchmarks/benchmarks.py --allow-loosen

Accepts intentional regressions, for example trading gate count for circuit depth. The best_max_* watermarks are still preserved as monotonic minimums so the prior best is never lost.

Understanding the Output

Per-circuit results table:

Metric                       Tessera     Qiskit
---------------------------------------------
Gate Count                        31         27
Circuit Depth                     15         11
Transpile Time (s)            0.0011     0.0060
Simulation Match                True

Summary table: After all circuits run, a summary table shows all four circuits side by side with gate count, depth, and simulation match at a glance.

Diff chart: Compares this run against the stored ceilings and best-seen watermarks in benchmark_store.json:

Circuit        Metric   Current   Stored   Best   Delta  Status
----------------------------------------------------------------------
QFT-like       Gates         31       31     31       0  same
Stress Test    Gates         35       35     35       0  same

Status values:

Status	Meaning
`same`	Matches the stored ceiling exactly
`tightened`	Better than the ceiling but not a new best
`NEW BEST!`	Better than the stored best watermark
`REGRESSED`	Worse than the stored ceiling
`no baseline`	No entry in benchmark_store.json yet

benchmark_store.json

benchmarks/benchmark_store.json tracks two values per metric per circuit:

max_gates / max_depth: current ceiling. --write updates these to the latest run's numbers. --allow-loosen updates them even if they went up.
best_max_gates / best_max_depth: best ever seen. These only ever ratchet downward. They are never overwritten with a worse value regardless of which flag you use.

If you need to fully reset the store, for example after fixing a bug that changes the baseline, clear the file to {} and rerun with --write. The store will be populated fresh from the current run with no prior history to compare against.

Benchmark Circuits

The suite currently runs four circuits against the IBM backend on FakeNairobiV2 (7 qubits):

Circuit	Qubits	Gates In	Description
Bell State	2	4	H + CX + measure. Baseline sanity check.
GHZ State	3	6	H + chain of CX gates. Tests linear routing.
QFT-like	5	22	Rotation-heavy with intentional duplicate Rz gates to exercise MergeRotationsPass.
Stress Test	5	18	Mixed gate set with frequent non-adjacent interactions. Exercises the full pipeline under load.

Adding a New Benchmark Circuit

Open benchmarks/benchmarks.py and add a circuit builder function following the same pattern as the existing ones:

def make_my_circuit():
    qc = QuantumCircuit(3, 3)
    qc.h(0)
    qc.cx(0, 1)
    qc.cx(1, 2)
    qc.measure(list(range(3)), list(range(3)))
    return qc

Then add it to the circuits list in the main block:

circuits = [
    ("Bell State",   make_bell_state()),
    ("GHZ State",    make_ghz_state()),
    ("QFT-like",     make_qft_like()),
    ("Stress Test",  make_stress_test()),
    ("My Circuit",   make_my_circuit()),  # add here
]

Run once without flags to see the results, then use --write to add it to the store. Document it in benchmarks/benchmarks.md under a new run section.

Documenting a Run

After any run that changes the stored ceilings, add a new entry to benchmarks/benchmarks.md under a new ### Run N section. Include the date, what changed since the last run, the full results table, and any relevant notes. See the existing run entries in that file for the format to follow.