How to Make Experiments Reproducible

enc.generate() draws from NumPy’s global random state by default, which changes between Python sessions. Setting a global seed or passing a seeded HDCGenerator produces identical hypervectors on every run.

Setting a global seed

import pyhdc
import random
import numpy as np
import torch

random.seed(42)       # sets the global seed for Python's built-in random
np.random.seed(42)   # sets the global seed NumPY seed
if pyhdc.TORCH_AVAILABLE:
   torch.manual_seed(42)  # sets the global seed for PyTorch
   torch.cuda.manual_seed_all(42)  # sets the global seed for all CUDA devices

enc = pyhdc.MAP_C(dimension=10_000)
hv  = enc.generate()   # always the same for seed=42
print(hv.data[:5])

Basic reproducibility with seeded generators

Pass a seeded generator to the encoding constructor:

import pyhdc
from pyhdc.generation import CommonLCGGenerators

gen = CommonLCGGenerators.numerical_recipes(seed=42)
enc = pyhdc.MAP_C(dimension=10_000, generator=gen)

hv = enc.generate()          # always the same for seed=42
print(hv.data[:5])

Re-run the same generation by calling reset() before each run:

gen.reset()
hv_run1 = enc.generate()

gen.reset()
hv_run2 = enc.generate()

import numpy as np
print(np.allclose(hv_run1.data, hv_run2.data))   # True

Building a reproducible codebook

from pyhdc.generation import CommonPCGGenerators

gen = CommonPCGGenerators.pcg32(seed=0)
enc = pyhdc.MAP_C(dimension=10_000, generator=gen)

items = ['apple', 'banana', 'cherry']

gen.reset()
codebook = {name: enc.generate() for name in items}

Snapshotting and restoring state

If you need to resume generation mid-experiment from a known point, snapshot the state with get_state(); the exact return type is generator-specific:

gen.reset()
_ = enc.generate()   # consume one vector
state = gen.get_state()   # snapshot

hv_a = enc.generate()

# Restore and re-generate from snapshot
gen.set_seed(gen._seed)  # or: recreate with same seed and advance manually
# Note: get_state / restore API is generator-dependent; reset() is the
# most portable option for full reproducibility

Bypassing the generator for a single call

Pass use_generator=False to generate one vector from NumPy’s default random state without advancing the custom generator:

hv_np = enc.generate(use_generator=False)   # uses NumPy, not the LCG

Choosing a generator for reproducibility

All built-in generator families accept a seed parameter. Recommended choices:

  • PCG (CommonPCGGenerators.pcg32) : best statistical quality, fully reproducible

  • LCG (CommonLCGGenerators.numerical_recipes) : simplest, most portable

  • Xorshift (CommonXorshiftGenerators.xorshift64) : very fast for large batches

See Random Number Generators for a full comparison.