How to Wrap Existing Arrays as Hypervectors
enc.from_array() wraps a pre-existing NumPy array or PyTorch tensor as a
Hypervector. The typical use cases are loading saved codebooks
from disk and converting feature vectors from other libraries.
Basic usage
import pyhdc
import numpy as np
enc = pyhdc.MAP_C(dimension=10_000)
# Wrap a NumPy array
arr = np.random.uniform(-1, 1, size=10_000).astype(np.float32)
hv = enc.from_array(arr)
print(hv.shape) # (10000,)
print(hv.backend) # numpy
print(hv.encoding) # MAP_C instance
The array must have the same last dimension as the encoding’s dimension:
bad_arr = np.zeros(5_000)
enc.from_array(bad_arr) # DimensionsNotMatchingError
Load a saved codebook from disk
# Load a codebook that was saved as a NumPy .npy file
# Shape: (dimension, num_items) -- each column is one hypervector
data = np.load('codebook.npy') # shape (10000, 100)
enc = pyhdc.MAP_C(dimension=10_000)
codebook = enc.from_array(data) # one (10000, 100) batch hypervector
query = enc.generate()
# similarity of query against each of the 100 columns -> (100,) array
scores = enc.similarity(query, codebook)
best_idx = int(scores.argmax())
Use select() to pick columns from the batch by index
along the batch axis, and stack() to concatenate hypervectors into
one (D, N) batch:
subset = codebook.select([0, 2, 4]) # (10000, 3) batch
extended = pyhdc.stack([query, codebook]) # (10000, 101), query as column 0
Wrap a higher-rank (D, N, M) tensor
The same flow extends to tensors with more than one batch axis. from_array
is a thin wrapper: it auto-detects the backend and returns a
Hypervector without transposing, reshaping, or validating the
axis order. The dimension-first contract still holds: axis 0 must equal the
encoding’s dimension (it is the hypervector dimension D), and the
trailing axes are the batch.
So a (D, N, M) array holds N * M hypervectors, one per trailing-axis
column:
enc = pyhdc.MAP_C(dimension=10_000)
# axis 0 is D, axes 1 and 2 are the batch -> 8 * 4 = 32 hypervectors
data = np.random.uniform(-1, 1, size=(10_000, 8, 4)).astype(np.float32)
tensor = enc.from_array(data) # one (10000, 8, 4) batch hypervector
print(tensor.shape) # (10000, 8, 4)
Operate on the wrapped tensor the same way you would a (D, N) batch. Index a
single column with two trailing indices, reduce along a batch axis with
axis=, or compare a query against every column with similarity:
one = tensor[:, 0, 0] # column (0, 0) -> a single (10000,) vector
# bundle along axis 2 (the last batch axis) -> (10000, 8)
per_row = enc.bundle(tensor, axis=2)
# bundle along both batch axes (1, 2) -> a single (10000,) vector
total = enc.bundle(tensor, axis=(1, 2))
query = enc.generate()
# query against every column -> (8, 4) score array, one score per column
scores = enc.similarity(query, tensor)
The trailing axes carry through every operation. Bundling with axis= reduces
the axes you name and leaves axis 0 (the dimension) intact. similarity
reduces over axis 0 and returns one score per surviving trailing column.
Wrap a PyTorch tensor
from_array auto-detects whether the input is a NumPy array or PyTorch
tensor:
import torch
t = torch.randn(10_000, dtype=torch.float32)
enc_torch = pyhdc.MAP_C(dimension=10_000, backend="torch")
hv = enc_torch.from_array(t)
print(hv.backend) # torch
Extract the underlying array
Access .data to get the raw NumPy array or PyTorch tensor back:
arr_back = hv.data # numpy.ndarray or torch.Tensor
You can use this to pass hypervectors to libraries that do not know about PyHDC, such as scikit-learn or matplotlib.
Dtype notes
The dtype of the wrapped array should match what the encoding expects.
Mismatches generate a warning but do not raise an error. For example,
MAP_C expects float32; wrapping float64 will still work but may
incur an implicit conversion.