Dual Backend Architecture

PyHDC supports two backends for its array operations: NumPy (the default) and PyTorch. Every encoding, operation, and hypervector works with either backend and the API surface is identical.

Design rationale

NumPy is the universal default because it has no install dependencies beyond Python and works on every platform. PyTorch is supported for three reasons:

  1. GPU acceleration: CUDA tensors enable large-scale HDC (50,000+ dimensions, millions of codebook entries) to run in seconds instead of minutes.

  2. Integration with deep learning pipelines: if you are using HDC as a layer in a neural network or alongside PyTorch models, keeping everything in the same tensor ecosystem avoids data copies.

  3. Vectorised batch operations: PyTorch’s broadcasting and GPU-native matmul make batched bundling and similarity faster for large batches; GPU generation of 10,000 × 10,000 codebooks takes under a second versus several minutes on CPU.

The TORCH_AVAILABLE flag

At import time, PyHDC tries to import torch. If it succeeds, pyhdc.TORCH_AVAILABLE is True. If it fails, it is False and all backend="torch" requests raise ImportError.

import pyhdc

print(pyhdc.TORCH_AVAILABLE)   # True or False

The BackendManager

The internal BackendManager class is a static utility that dispatches array operations to the correct backend. Advanced users who are extending PyHDC (writing custom encodings or components) may use it directly:

  • BackendManager.to_numpy(array) : convert to NumPy

  • BackendManager.to_torch(array, device) : convert to PyTorch tensor

  • BackendManager.get_device(hypervector) : return the device string

Normal users never need to call BackendManager: the encoding and Hypervector methods handle dispatch automatically.

Backend selection

Set the backend at encoding construction time:

enc_np  = pyhdc.MAP_C(dimension=10_000)                           # numpy
enc_cpu = pyhdc.MAP_C(dimension=10_000, backend="torch")          # torch CPU
enc_gpu = pyhdc.MAP_C(dimension=10_000, backend="torch",
                       device="cuda")                              # torch GPU

All operations on hypervectors generated by enc_gpu run on the GPU.

What changes between backends

From the user’s perspective, almost nothing changes. The only observable differences are:

  • hv.data returns numpy.ndarray (NumPy) or torch.Tensor (PyTorch)

  • hv.device returns None (NumPy) or a device string (PyTorch)

  • hv.backend returns "numpy" or "torch"

  • PyTorch similarity on a GPU tensor returns a torch.Tensor, not a Python float. Wrap with float() when needed.

PyTorch batching

PyTorch enables larger and faster batched operations:

Batched generation

enc   = pyhdc.MAP_C(dimension=10_000, backend="torch", device="cuda")
batch = enc.generate(size=(1000, 10_000))   # shape (1000, 10000) tensor

Batched similarity (2-D)

a = enc.generate(size=500)   # shape (500, 10000)
b = enc.generate(size=500)   # shape (500, 10000)
sims = enc.similarity(a, b)  # shape (500,)  # one score per row pair

Batched bundling

batch = enc.generate(size=(10, 3))   # 10 groups of 3; same dimension,
                                      # shape is (10, 3, 10000)
result = enc.bundle(batch, batch_dim=1)
print(result.shape)   # (10, 10000): 10 bundled results

Memory layout and data movement

PyHDC generators always produce CPU data first (they are Python-level sequences that get converted to arrays). When you create a GPU encoding, the generated floats are first assembled into a NumPy array and then transferred to the GPU. This means generation is not purely on-GPU.

The data-movement methods on Hypervector make transfers explicit:

  • .to_torch(device=None) : NumPy → PyTorch CPU (or specified device)

  • .to_numpy() : PyTorch → NumPy

  • .cuda(device=None) : move to CUDA (shortcut for .to("cuda"))

  • .cpu() : move to CPU

  • .to(device) : move to any device string

These all copy the data; the original hypervector is unchanged.

Limitations

  • No automatic gradient tracking : HDC operations do not participate in PyTorch autograd. Gradients do not flow through bind, bundle, or similarity.

  • Mixed backends raise ValueError : you cannot mix NumPy and PyTorch hypervectors in the same operation; convert explicitly first.

  • Generator output is always CPU-first : even for GPU encodings, generation goes through NumPy before being sent to the GPU device.