How to Make Experiments Reproducible
enc.generate() draws from NumPy’s global random state by default, which
changes between Python sessions. Setting a global seed or passing a seeded
HDCGenerator produces identical hypervectors on every run.
Setting a global seed
import pyhdc
import random
import numpy as np
import torch
random.seed(42) # sets the global seed for Python's built-in random
np.random.seed(42) # sets the global seed NumPY seed
if pyhdc.TORCH_AVAILABLE:
torch.manual_seed(42) # sets the global seed for PyTorch
torch.cuda.manual_seed_all(42) # sets the global seed for all CUDA devices
enc = pyhdc.MAP_C(dimension=10_000)
hv = enc.generate() # always the same for seed=42
print(hv.data[:5])
Basic reproducibility with seeded generators
Pass a seeded generator to the encoding constructor:
import pyhdc
from pyhdc.generation import CommonLCGGenerators
gen = CommonLCGGenerators.numerical_recipes(seed=42)
enc = pyhdc.MAP_C(dimension=10_000, generator=gen)
hv = enc.generate() # always the same for seed=42
print(hv.data[:5])
Re-run the same generation by calling reset() before each run:
gen.reset()
hv_run1 = enc.generate()
gen.reset()
hv_run2 = enc.generate()
import numpy as np
print(np.allclose(hv_run1.data, hv_run2.data)) # True
Building a reproducible codebook
from pyhdc.generation import CommonPCGGenerators
gen = CommonPCGGenerators.pcg32(seed=0)
enc = pyhdc.MAP_C(dimension=10_000, generator=gen)
items = ['apple', 'banana', 'cherry']
gen.reset()
codebook = {name: enc.generate() for name in items}
Snapshotting and restoring state
If you need to resume generation mid-experiment from a known point, snapshot
the state with get_state(); the exact return type is
generator-specific:
gen.reset()
_ = enc.generate() # consume one vector
state = gen.get_state() # snapshot
hv_a = enc.generate()
# Restore and re-generate from snapshot
gen.set_seed(gen._seed) # or: recreate with same seed and advance manually
# Note: get_state / restore API is generator-dependent; reset() is the
# most portable option for full reproducibility
Bypassing the generator for a single call
Pass use_generator=False to generate one vector from NumPy’s default
random state without advancing the custom generator:
hv_np = enc.generate(use_generator=False) # uses NumPy, not the LCG
Choosing a generator for reproducibility
All built-in generator families accept a seed parameter. Recommended
choices:
PCG (
CommonPCGGenerators.pcg32) : best statistical quality, fully reproducibleLCG (
CommonLCGGenerators.numerical_recipes) : simplest, most portableXorshift (
CommonXorshiftGenerators.xorshift64) : very fast for large batches
See Random Number Generators for a full comparison.