The Simulators API provides four distinct quantum state backends, each optimized for different circuit classes and simulation goals. All simulators follow a common interface (Result run(circuit, ...)) and dispatch gate operations via the Instruction enum defined in the Circuit API.
All simulators implement instruction execution through a common switch/case pattern:
void StatevectorSimulator::apply_instruction(Statevector& sv, const Instruction& inst) {
switch (inst.type) {
case GT::X: gates::apply_x(sv, inst.qubits[0]); break;
case GT::CX: gates::apply_cx(sv, inst.qubits[0], inst.qubits[1]); break;
case GT::RX: gates::apply_rx(sv, inst.qubits[0], inst.params[0]); break;
// ... 40+ gate types ...
}
}Benefits:
- Centralized gate application logic (reuses
gates::namespace functions) - Consistent handling of parameterized gates
- Stateless (no mutable simulator state during simulation)
- Easy to extend with new gate types
- Circuit validation: Verify qubit indices, parameter counts
- State initialization: Allocate and initialize state representation (SV, DM, tableau, or MPS)
- Instruction iteration: Loop over
circuit.instructionsin order - Gate dispatch: Call appropriate
gates::apply_*orapply_gatefunction - Measurement handling: If any
GT::MEASUREinstruction is present andshots > 0, the circuit is re-executed from|0...0⟩once per shot so that each collapse is independent. Outcomes are written into the classical register (n_clbits-wide). If no MEASURE instructions are present, the state is simulated once andsample_countssamples the final distribution. - Result collection: Extract final state and sampled bitstrings (if shots > 0)
Exact simulation of pure quantum states using the aligned Statevector representation.
See Statevector API for full details. State stored as:
- Dual aligned arrays:
real_partsandimag_parts(64-byte alignment for AVX-512) - Structure-of-Arrays (SoA) layout enabling SIMD vectorization
- Complexity:
$O(2^n)$ space for$n$ qubits
struct Options {
int max_parallel_threads = 0; // 0 = auto (all cores)
uint64_t max_memory_mb = 0; // 0 = auto
int precision = 64; // 32 or 64 bit (not yet used)
bool zero_threshold = true;
double threshold = 1e-10; // Unused; for API compatibility
};- max_parallel_threads: Cap on OpenMP thread count (0 = system default)
- max_memory_mb: Memory budget (0 = no limit); used for preemptive error checking
- precision: Reserved for future 32-bit float variants
- zero_threshold, threshold: Legacy fields; may be removed in future versions
struct Result {
Statevector final_state; // Full amplitude vector
std::unordered_map<std::string, int> counts; // Sampled bitstrings (if shots > 0)
std::vector<double> expectation_values; // Empty; use Estimator for expectations
double simulation_time_seconds = 0.0;
bool success = true;
std::string error_message;
};Fields:
- final_state: Complete quantum state after circuit execution
- counts: Measurement outcome histogram; keys are bitstrings (e.g., "01"), values are occurrence counts
- expectation_values: Reserved for future use (currently not populated by simulator)
For exact state inspection (no measurement):
StatevectorSimulator sim;
auto result = sim.run(circuit, 0); // shots=0: no sampling
Statevector state = result.final_state;For measured sampled outcomes:
auto result = sim.run(circuit, 1024, 42); // shots=1024, seed=42
std::unordered_map<std::string, int> counts = result.counts;
// E.g., {"00": 256, "11": 768}Workflow summary:
- User calls
run(circuit, shots, seed) - Create Statevector initialized to
$|0\cdots0\rangle$ - Iterate over circuit instructions, calling
apply_instruction - If shots > 0, sample
sv.sample_counts(shots, seed)via cumulative distribution - Return Result with final state and counts
For variational inner loops, StatevectorSimulator::eval_expectation(circuit, observable) simulates the circuit and computes the expectation value in-place, bypassing the Result struct and avoiding an Estimator ideal path.
-
Time:
$O(2^n k)$ where$k$ = number of instructions;$O(2^{2k})$ for$k$ -qubit gates -
Space:
$O(2^n)$ for statevector -
Parallelization: Each gate parallelizes if
$2^n \geq 2^{20}$ (via OpenMP) -
Measurement:
$O(2^n + \text{shots} \cdot \log 2^n)$ to sample (cumulative distribution + binary search)
- Exact simulation of small-medium circuits (5–20 qubits, depth < 1000)
- Reference implementation for validation
- Analysis of quantum state properties (amplitudes, entanglement)
- Gradient computation via parameter-shift rule (used by Estimator)
- Exponential memory: 2 GB per 28 qubits (double precision), 1 GB per 27 qubits
- No noise support (use DensityMatrixSimulator for noisy circuits)
- Clifford-only circuits require stabilizer tableau (CliffordSimulator is more efficient)
Exact simulation of mixed quantum states with integrated Kraus operator noise application.
class DensityMatrix {
int n_qubits;
size_t dim; // 2^n_qubits
std::vector<Complex128> data; // row-major, dim × dim matrix
};Properties:
- Stores full density matrix
$\rho$ in row-major order:data[i*dim + j]=$\rho_{ij}$ - Valid states satisfy:
$\text{Tr}(\rho) = 1$ and$\rho = \rho^\dagger$ (Hermitian) - Purity:
$\gamma = \text{Tr}(\rho^2) \in [0, 1]$ ;$\gamma = 1$ iff pure,$\gamma = 1/2^n$ iff maximally mixed - Complexity:
$O(4^n)$ space for$n$ qubits
Pure state conversion:
Initial state
Gate application
Algorithm:
-
Sort targets and compute offset masks:
sub_offsets[s]maps sub-index$s$ to target qubit positions -
Enumerate background indices (non-target qubits):
$2^{n-k}$ distinct background configurations -
For each background configuration and each column of the density matrix:
- Read 2D sub-vector (dimension
$2^k \times 2^k$ ) - Apply
$U$ matrix multiplication (left multiply): row update
- Read 2D sub-vector (dimension
-
For each row and each background configuration:
- Read 2D sub-vector
- Apply
$U^\dagger$ matrix multiplication (right multiply): column update
Code Structure:
void DensityMatrix::apply_gate(const std::vector<Complex128>& U,
const std::vector<int>& qubits) {
int k = qubits.size();
size_t sub_dim = 1ULL << k;
std::vector<size_t> sub_offsets(sub_dim); // Precompute target offsets
std::vector<size_t> bg_indices(dim >> k); // Precompute background indices
std::vector<Complex128> scratch(sub_dim); // Scratch buffer: O(2^k)
// Left multiply: rho = U * rho (row update)
for (size_t bg : bg_indices)
for (size_t col = 0; col < dim; ++col)
// Read, apply, write using scratch buffer
// Right multiply: rho = rho * U† (column update)
for (size_t row = 0; row < dim; ++row)
for (size_t bg : bg_indices)
// Read, apply, write using scratch buffer
}Complexity:
Benefits:
- Avoids expensive allocations per gate
- Sequential memory access patterns (amenable to SIMD, though not vectorized in current code)
- No full-matrix copy overhead
For noise channels with
Implementation:
void DensityMatrix::apply_kraus(const std::vector<std::vector<Complex128>>& kraus_ops,
const std::vector<int>& qubits) {
std::vector<Complex128> result_data(dim * dim, Complex128(0.0, 0.0));
for (const auto& K : kraus_ops) {
DensityMatrix temp = *this;
temp.apply_gate(K, qubits); // temp = K * rho * K†
for (size_t i = 0; i < dim * dim; ++i)
result_data[i] += temp.data[i];
}
data = std::move(result_data);
}Complexity:
DensityMatrixSimulator sim;
NoiseModel noise_model;
noise_model.add_quantum_error(
lindblad::NoiseChannels::depolarizing(0.001), "cx", {0, 1});
auto result = sim.run(circuit, noise_model, 1024); // shots=1024Execution Steps:
- Initialize
$\rho = |0\rangle\langle 0|$ (dim$\times$ dim) - For each instruction in circuit:
- If noise is attached and any errors have
after_gate = false: apply those Kraus channels before the unitary viaapply_kraus - Apply gate via
apply_gate(Hamiltonian evolution) - If noise is attached and any errors have
after_gate = true: apply those Kraus channels after the unitary viaapply_kraus
- If noise is attached and any errors have
- Sample measurement outcomes via spectral decomposition of marginal density matrices
- Return counts and final state
Hermitian operator expectation (e.g., Pauli observable):
double DensityMatrix::expectation_value(const std::vector<Complex128>& hermitian_op) const {
double exp = 0.0;
for (size_t i = 0; i < dim; ++i)
for (size_t j = 0; j < dim; ++j)
exp += (data[i*dim + j] * hermitian_op[j*dim + i]).real;
return exp;
}SparsePauliOp expectation (specialized for sparse Pauli strings):
Extracts diagonal terms of Pauli strings using bit masks; computes traces without full matrix multiplication:
Complexity:
-
Time per gate:
$O(4^n \cdot 2^k)$ for$k$ -qubit gate -
Space:
$O(4^n)$ for density matrix +$O(2^k)$ scratch per gate -
Noise overhead: Factor of
$m$ for$m$ Kraus operators
- Noisy simulation (with Kraus operators from
NoiseModel) - Analysis of mixed state properties (purity, entanglement)
- Validation of noise models
- Circuits up to 10–12 qubits (exponential storage in pure states impractical)
- 16 GB per 10 qubits (complex128); impractical beyond ~11 qubits
- Slower than statevector for pure-state circuits (overhead of full density matrix)
- No advantage over statevector without noise
Efficient exact simulation of Clifford circuits (gates from
Tableau:
Each row represents a Pauli operator as:
- Columns
$0$ to$N-1$ : X-part (1 = X or Y component present) - Columns
$N$ to$2N-1$ : Z-part (1 = Z or Y component present) - Column
$2N$ : phase bit (0 =$+1$ , 1 =$-1$ )
Invariant: All stabilizers commute pairwise; destabilizers anti-commute with corresponding stabilizers.
Complexity:
Each gate updates the tableau via row operations:
Hadamard (
- For all rows:
(row.x, row.z) = (row.z, row.x)
S-gate ($S = \text{diag}(1, i)$): Updates Z component
- For each row: If
row.x[i] = 1, XOR withrow.z
CNOT (control=i, target=j): Coupled row operations
- If row has
x[j] = 1, XOR withx[i] - If row has
x[i] = 1, XOR withz[j] - Conditionally update phase
Pauli gates (X, Y, Z): Update phase based on row anticommutation
In Clifford simulation, measurement of qubit
- Find a destabilizer row with
x[j] = 1(must exist for non-trivial state) - Invert via Gaussian elimination to convert to stabilizer
- Measure outcome: randomly 0 or 1 (if deterministic, derived from phase)
- Project state onto measurement outcome via controlled row operations
Complexity:
Check if the Pauli string commutes with all stabilizers:
- If yes, expectation is deterministic (
$\pm 1$ ) - If no, measurement outcome is random (expectation
$= 0$ )
Complexity:
CliffordSimulator sim;
auto result = sim.run(clifford_circuit, 1024); // Must be Clifford-onlyUse Cases:
- Verification of Clifford circuits (error correction, stabilizer codes)
- Analysis of stabilizer codes (exact marginal probabilities)
- Research into Clifford-only gates and measurement outcomes
Strengths:
- Polynomial ($O(N^2)$) space and time for Clifford circuits
- Exact for arbitrarily large systems (limited by RAM, not exponential scaling)
- Ideal for stabilizer codes and error correction benchmarks
Limitations:
- Only supports Clifford gate set (H, S, CNOT, X, Y, Z, T/T† forbidden)
- No support for parameterized gates (RX, RY, etc.)
- Non-Clifford gates cause error or fallback
Is-Clifford Check:
bool CliffordSimulator::is_clifford(const QuantumCircuit& circuit) {
for (const auto& inst : circuit.instructions) {
if (!is_clifford_gate(inst.type)) return false;
}
return true;
}Approximate simulation of arbitrary circuits using Matrix Product State (MPS) representation, enabling simulation of larger systems at the cost of controlled truncation error.
An MPS decomposes an
where each
Data Structure:
struct MPSTensor {
int bond_left, bond_right;
// shape: (bond_left, physical_dim=2, bond_right)
std::vector<Complex128> data;
Complex128& operator()(int left, int phys, int right) {
return data[left * 2 * bond_right + phys * bond_right + right];
}
};
class MPSState {
std::vector<MPSTensor> tensors;
int max_bond_dim; // chi parameter
double cutoff; // SVD truncation threshold
};Complexity:
-
Space:
$O(n \cdot \chi^2)$ where$\chi$ = max bond dimension (typically 16–256) -
Time per gate:
$O(\chi^4)$ for single-qubit,$O(\chi^6)$ for two-qubit -
Accuracy: Controlled by
$\chi$ andcutoff; larger$\chi$ = more accurate
Single-qubit gate on site
- Apply gate matrix to physical index of
$M_i$ - Reshape to matrix form
(bond_left, bond_right × 2) - Absorb into a neighboring bond (left or right)
Two-qubit gate on sites
- Contract tensors:
M_i @ bond @ M_{i+1}→ rank-4 tensor - Apply
$U$ gate to physical indices - Reshape to matrix and perform SVD:
$U = L \cdot S \cdot R^\dagger$ - Truncate singular values: keep only
$\chi$ largest with sum$\geq (1 - \text{cutoff})$ - Absorb
$L \cdot S$ into$M_i$ ; absorb$R^\dagger$ into$M_{i+1}$ - Update bond dimension and track truncation error
Non-adjacent gates: Apply SWAPs to move gates adjacent, then apply gate, then SWAP back
Arbitrary UNITARY gates (R.1.10.7 — direct tensor dispatch for 1q and 2q):
- 1-qubit UNITARY: contracts the 2x2 matrix directly into one site tensor
via
MPSState::apply_single_qubit_gate. No SVD, no full statevector conversion. Memory cost is bounded by the bond dimension and independent ofn_qubits. - 2-qubit UNITARY: contracts the 4x4 matrix into the two-site tensor via
apply_two_qubit_gate, followed by a truncated SVD bounded bymax_bond_dimandcutoff. Non-adjacent qubit pairs are handled by the existing swap network. The dispatch swaps bits 0 and 1 of the matrix row/column indices to bridgeapply_unitary's LSB-at-first-arg convention withapply_two_qubit_gate's MSB-at-first-arg convention. - 3+ qubit UNITARY: falls back to the statevector path —
to_statevector()→gates::apply_unitary→mps_from_sv. The fallback is bounded byMPS_SV_MAX_QUBITS(= 25); beyond that the simulator throws with a clear error naming the offending UNITARY and qubit count, rather than the generic "Too many qubits for full statevector conversion" surfaced from insideto_statevector(). Decompose >2q unitaries into 1q/2q factors for wider registers.
Sequential measurement (physically realistic):
- Measure qubit 0 by contracting boundary from left
- Condition state on outcome; collapse corresponding MPS branch
- Project boundary to next qubit and repeat for qubit 1, 2, ...
-
$O(N \cdot \chi^3)$ time for full bitstring
Complexity:
MPSSimulator::run uses two named thresholds:
MPS_SV_CROSSOVER = 18: circuits withn_qubits ≤ 18sample via full statevector conversion (faster)MPS_SV_MAX_QUBITS = 25:to_statevector()throws forn_qubits > 25(memory guard, ~512 MB at that size)
After measure_sequential projects a qubit, the remaining MPS tensors are renormalized by dividing by sqrt(prob) where prob is the boundary-contraction probability of the observed outcome — not the local Frobenius norm (which differs for non-canonical MPS and would not give correct conditional probabilities).
For small systems, convert back to statevector via boundary contraction:
Statevector MPSState::to_statevector() const {
// Contract all tensors: M_0 @ M_1 @ ... @ M_{n-1}
// O(n * chi^3) time, O(2^n) space
// Throws if n_qubits > MPS_SV_MAX_QUBITS (25)
}MPSSimulator sim(32, 1024); // n_qubits=32, max_bond_dim=1024
// MPS is allocated internally
auto result = sim.run(large_circuit, 100); // 100 shots- Approximate simulation of large-scale circuits (20–40+ qubits)
- Research into entanglement dynamics and scaling
- Benchmarking approximate methods against classical baselines
- Analysis of weakly-entangled states (shallow circuits, product states)
- Approximate: truncation error grows with depth and entanglement
- Non-trivial configuration (choosing
$\chi$ ,cutofffor target accuracy) - Slower than statevector for small systems (overhead of SVD)
- Two-qubit gates require adjacency (need SWAPs for non-local gates)
-
max_bond_dim (
$\chi$ ): Controls truncation; typical range 16–512- Larger
$\chi$ = more accurate but slower and more memory -
$\chi = 2^n$ recovers exact simulation
- Larger
-
cutoff: SVD truncation threshold; typical 1e-12 to 1e-8
- Discards singular values
$< \text{cutoff}$ - Minimal impact on accuracy for cutoff
$\leq 1e-10$
- Discards singular values
| Circuit Property | Simulator | Notes |
|---|---|---|
| Small, pure, arbitrary gates | Statevector | Exact, fast, limited to ~25 qubits |
| Clifford-only, any size | Clifford | Polynomial scaling, exact |
| Noisy execution | DensityMatrix | Integrates Kraus operators, ~10 qubits max |
| Large-scale, weakly-entangled | MPS | Approximate, ~30+ qubits, configure |
| Default/Hybrid | Auto-dispatch | Detect circuit properties, pick best simulator |
- Estimator uses
StatevectorSimulatorfor ideal zero-shot circuits; automatically routes toDensityMatrixSimulatorwhenoptions.noise_modelis non-ideal oroptions.shots > 0. See Estimator API for the routing rules. - Sampler routes to appropriate simulator based on
NoiseModel(density matrix if noise present) - MAQAOA supports both statevector (direct evolution) and density matrix (noisy path)
- Gates API — Gate implementation details and optimization
- Statevector API — Aligned memory layout and measurement
- Operators API — Pauli string and sparse operator representations
- Noise API — Kraus channels and NoiseModel construction
- Estimator API — Expectation value computation using simulators
- Sampler API — Bitstring sampling via simulator backends