How to Make Experiments Reproducible ===================================== ``enc.generate()`` draws from NumPy's global random state by default, which changes between Python sessions. Setting a global seed or passing a seeded ``HDCGenerator`` produces identical hypervectors on every run. Setting a global seed ---------------------- .. code-block:: python import pyhdc import random import numpy as np import torch random.seed(42) # sets the global seed for Python's built-in random np.random.seed(42) # sets the global seed NumPY seed if pyhdc.TORCH_AVAILABLE: torch.manual_seed(42) # sets the global seed for PyTorch torch.cuda.manual_seed_all(42) # sets the global seed for all CUDA devices enc = pyhdc.MAP_C(dimension=10_000) hv = enc.generate() # always the same for seed=42 print(hv.data[:5]) Basic reproducibility with seeded generators ---------------------------------------------- Pass a seeded generator to the encoding constructor: .. code-block:: python import pyhdc from pyhdc.generation import CommonLCGGenerators gen = CommonLCGGenerators.numerical_recipes(seed=42) enc = pyhdc.MAP_C(dimension=10_000, generator=gen) hv = enc.generate() # always the same for seed=42 print(hv.data[:5]) Re-run the same generation by calling ``reset()`` before each run: .. code-block:: python gen.reset() hv_run1 = enc.generate() gen.reset() hv_run2 = enc.generate() import numpy as np print(np.allclose(hv_run1.data, hv_run2.data)) # True Building a reproducible codebook ---------------------------------- .. code-block:: python from pyhdc.generation import CommonPCGGenerators gen = CommonPCGGenerators.pcg32(seed=0) enc = pyhdc.MAP_C(dimension=10_000, generator=gen) items = ['apple', 'banana', 'cherry'] gen.reset() codebook = {name: enc.generate() for name in items} Snapshotting and restoring state ---------------------------------- If you need to resume generation mid-experiment from a known point, snapshot the state with ``get_state()``; the exact return type is generator-specific: .. code-block:: python gen.reset() _ = enc.generate() # consume one vector state = gen.get_state() # snapshot hv_a = enc.generate() # Restore and re-generate from snapshot gen.set_seed(gen._seed) # or: recreate with same seed and advance manually # Note: get_state / restore API is generator-dependent; reset() is the # most portable option for full reproducibility Bypassing the generator for a single call ------------------------------------------ Pass ``use_generator=False`` to generate one vector from NumPy's default random state without advancing the custom generator: .. code-block:: python hv_np = enc.generate(use_generator=False) # uses NumPy, not the LCG Choosing a generator for reproducibility ------------------------------------------ All built-in generator families accept a ``seed`` parameter. Recommended choices: * **PCG** (``CommonPCGGenerators.pcg32``) : best statistical quality, fully reproducible * **LCG** (``CommonLCGGenerators.numerical_recipes``) : simplest, most portable * **Xorshift** (``CommonXorshiftGenerators.xorshift64``) : very fast for large batches See :doc:`../user_manual/generators` for a full comparison.