How to Encode Data into Hypervectors ==================================== An *encoding* fixes the algebra (bundle, bind, similarity). A *data encoder* turns raw values into hypervectors. PyHDC provides two families: * **Codebook encoders** (``Level``, ``Circular``, ``Thermometer``, ``Empty``, ``Identity``, ``Random``) hold a precomputed ``(D, L)`` basis. A value picks the nearest level and the encoder returns that column. * **Functional encoders** (``Projection``, ``Sinusoid``, ``Density``, ``FractionalPower``) transform a feature vector. Every encoder wraps one :class:`~pyhdc.Encoding` instance and is dimension-first: a scalar encodes to ``(D,)``, a batch of ``B`` values to ``(D, B)``. ``encoder.encode(value)`` and ``encoder(value)`` are the same call. For the concepts behind the two families, see :doc:`../user_manual/data_encoders`. Encode a scalar with Level -------------------------- :class:`~pyhdc.Level` spaces ``levels`` hypervectors so that nearby values map to correlated codes and distant values to near-orthogonal ones: .. code-block:: python import pyhdc enc = pyhdc.MAP_I(dimension=10_000) level = pyhdc.Level(enc, levels=100, low=0.0, high=1.0) hv = level.encode(0.5) # one (10000,) hypervector print(hv.shape) # (10000,) Similarity to a fixed hypervector falls monotonically as the value moves away: .. code-block:: python zero = level.encode(0.0) print(enc.similarity(zero, level.encode(0.1))) # ~= 0.90 print(enc.similarity(zero, level.encode(0.5))) # ~= 0.51 print(enc.similarity(zero, level.encode(1.0))) # ~= 0.02 Values outside ``[low, high]`` clamp to the nearest endpoint. Encode many values at once -------------------------- Pass a list (or any 1-D array) to encode a whole batch in one call. Each value becomes one column: .. code-block:: python batch = level.encode([0.0, 0.25, 0.5, 0.75, 1.0]) print(batch.shape) # (10000, 5) Periodic values with Circular ----------------------------- :class:`~pyhdc.Circular` wraps the level index modulo ``levels``, so the top of the range rejoins the bottom. Use it for angles, hours, days, or any cyclic quantity: .. code-block:: python hours = pyhdc.Circular(enc, levels=24, low=0.0, high=24.0) print(enc.similarity(hours.encode(23.0), hours.encode(0.0))) # ~= 0.9 (adjacent across the wrap) print(enc.similarity(hours.encode(0.0), hours.encode(12.0))) # ~= 0.0 (opposite on the ring) Discrete encoders: Thermometer and Density ------------------------------------------- :class:`~pyhdc.Thermometer` (a cumulative code) and :class:`~pyhdc.Density` (a population code) are defined for the discrete families (MAP_I, MAP_B, BSC, and the BSDC family). Build them on a discrete encoding: .. code-block:: python binary = pyhdc.MAP_B(dimension=10_000) therm = pyhdc.Thermometer(binary, levels=20, low=0.0, high=1.0) dens = pyhdc.Density(binary, low=0.0, high=1.0) print(therm.encode(0.5).shape) # (10000,) print(dens.encode(0.5).shape) # (10000,) Constructing either on a continuous or phase family (MAP_C, the HRR family, VTB, MBAT, FHRR) raises ``NotImplementedError``, because those domains have no two endpoint elements to interpolate between. Encode a feature vector with Projection --------------------------------------- :class:`~pyhdc.Projection` applies a fixed random linear map to a length-``F`` feature vector, then the encoding's normalize step. It accepts a single vector or a batch: .. code-block:: python import numpy as np proj = pyhdc.Projection(enc, features=8) print(proj.encode(np.random.rand(8)).shape) # (10000,) print(proj.encode(np.random.rand(8, 5)).shape) # (10000, 5) :class:`~pyhdc.Sinusoid` is the related random-Fourier-feature map for the cosine and HRR families. ``Projection`` needs a family with a normalize step (MAP, HRR, VTB, MBAT, FHRR) and raises ``NotImplementedError`` on BSC and BSDC. ``Empty``, ``Random``, and ``Identity`` round out the set as structural codebooks: all-zero, independent-random, and the binding-identity element respectively. They take the same ``(encoding, levels, low, high)`` constructor as ``Level``. Putting it together: a value-feature classifier ------------------------------------------------ Encoders compose with the core operations. The pattern below binds each feature value to a per-feature key, bundles the bound pairs into one record hypervector, bundles records into a class prototype, and classifies a whole test batch in a single :doc:`cross-similarity ` call: .. code-block:: python import numpy as np import pyhdc np.random.seed(0) enc = pyhdc.MAP_I(dimension=10_000) F = 4 # features per record keys = [enc.generate() for _ in range(F)] # one key hypervector per feature value = pyhdc.Level(enc, levels=64, low=0.0, high=1.0) def encode_record(row): pairs = [pyhdc.bind(keys[i], value.encode(float(row[i]))) for i in range(F)] return pyhdc.bundle(*pairs) rng = np.random.default_rng(0) class0 = rng.normal(0.3, 0.05, size=(3, F)) # a low-valued class class1 = rng.normal(0.7, 0.05, size=(3, F)) # a high-valued class protos = pyhdc.stack([ pyhdc.bundle(*[encode_record(r) for r in class0]), pyhdc.bundle(*[encode_record(r) for r in class1]), ]) # (10000, 2) test = np.vstack([rng.normal(0.3, 0.05, size=(3, F)), rng.normal(0.7, 0.05, size=(3, F))]) testhv = pyhdc.stack([encode_record(r) for r in test]) # (10000, 6) scores = protos.similarity(testhv, mode="cross") # (2, 6) print(np.asarray(scores).argmax(axis=0)) # [0 0 0 1 1 1] The first three test records score highest against prototype 0 and the last three against prototype 1, which is the correct split. See also -------- * :doc:`choose_encoding` : which encoding family to build the encoder on, and the per-encoder family constraints. * :doc:`compute_similarity` : pairwise and cross similarity, including the ``mode="cross"`` matrix used above. * :doc:`../user_manual/data_encoders` : the encoder object model and the family-aware basis builders.