How to Wrap Existing Arrays as Hypervectors
=============================================

``enc.from_array()`` wraps a pre-existing NumPy array or PyTorch tensor as a
:class:`~pyhdc.Hypervector`. The typical use cases are loading saved codebooks
from disk and converting feature vectors from other libraries.

Basic usage
-----------

.. code-block:: python

   import pyhdc
   import numpy as np

   enc = pyhdc.MAP_C(dimension=10_000)

   # Wrap a NumPy array
   arr = np.random.uniform(-1, 1, size=10_000).astype(np.float32)
   hv  = enc.from_array(arr)

   print(hv.shape)    # (10000,)
   print(hv.backend)  # numpy
   print(hv.encoding) # MAP_C instance

The array must have the same last dimension as the encoding's ``dimension``:

.. code-block:: python

   bad_arr = np.zeros(5_000)
   enc.from_array(bad_arr)   # DimensionsNotMatchingError

Load a saved codebook from disk
---------------------------------

.. code-block:: python

   # Load a codebook that was saved as a NumPy .npy file
   # Shape: (dimension, num_items) -- each column is one hypervector
   data = np.load('codebook.npy')   # shape (10000, 100)

   enc      = pyhdc.MAP_C(dimension=10_000)
   codebook = enc.from_array(data)   # one (10000, 100) batch hypervector

   query    = enc.generate()
   # similarity of query against each of the 100 columns -> (100,) array
   scores   = enc.similarity(query, codebook)
   best_idx = int(scores.argmax())

Use :meth:`~pyhdc.Hypervector.select` to pick columns from the batch by index
along the batch axis, and :func:`~pyhdc.stack` to concatenate hypervectors into
one ``(D, N)`` batch:

.. code-block:: python

   subset = codebook.select([0, 2, 4])   # (10000, 3) batch
   extended = pyhdc.stack([query, codebook])   # (10000, 101), query as column 0

Wrap a higher-rank ``(D, N, M)`` tensor
----------------------------------------

The same flow extends to tensors with more than one batch axis. ``from_array``
is a thin wrapper: it auto-detects the backend and returns a
:class:`~pyhdc.Hypervector` without transposing, reshaping, or validating the
axis order. The dimension-first contract still holds: **axis 0 must equal the
encoding's** ``dimension`` (it is the hypervector dimension ``D``), and the
trailing axes are the batch.

So a ``(D, N, M)`` array holds ``N * M`` hypervectors, one per trailing-axis
column:

.. code-block:: python

   enc = pyhdc.MAP_C(dimension=10_000)

   # axis 0 is D, axes 1 and 2 are the batch -> 8 * 4 = 32 hypervectors
   data   = np.random.uniform(-1, 1, size=(10_000, 8, 4)).astype(np.float32)
   tensor = enc.from_array(data)   # one (10000, 8, 4) batch hypervector

   print(tensor.shape)   # (10000, 8, 4)

Operate on the wrapped tensor the same way you would a ``(D, N)`` batch. Index a
single column with two trailing indices, reduce along a batch axis with
``axis=``, or compare a query against every column with ``similarity``:

.. code-block:: python

   one = tensor[:, 0, 0]          # column (0, 0) -> a single (10000,) vector

   # bundle along axis 2 (the last batch axis) -> (10000, 8)
   per_row = enc.bundle(tensor, axis=2)

   # bundle along both batch axes (1, 2) -> a single (10000,) vector
   total = enc.bundle(tensor, axis=(1, 2))

   query  = enc.generate()
   # query against every column -> (8, 4) score array, one score per column
   scores = enc.similarity(query, tensor)

The trailing axes carry through every operation. Bundling with ``axis=`` reduces
the axes you name and leaves axis 0 (the dimension) intact. ``similarity``
reduces over axis 0 and returns one score per surviving trailing column.

Wrap a PyTorch tensor
----------------------

``from_array`` auto-detects whether the input is a NumPy array or PyTorch
tensor:

.. code-block:: python

   import torch

   t  = torch.randn(10_000, dtype=torch.float32)
   enc_torch = pyhdc.MAP_C(dimension=10_000, backend="torch")
   hv = enc_torch.from_array(t)

   print(hv.backend)   # torch

Extract the underlying array
-----------------------------

Access ``.data`` to get the raw NumPy array or PyTorch tensor back:

.. code-block:: python

   arr_back = hv.data   # numpy.ndarray or torch.Tensor

You can use this to pass hypervectors to libraries that do not know about
PyHDC, such as scikit-learn or matplotlib.

Dtype notes
-----------

The dtype of the wrapped array should match what the encoding expects.
Mismatches generate a warning but do not raise an error. For example,
``MAP_C`` expects ``float32``; wrapping ``float64`` will still work but may
incur an implicit conversion.