Dual Backend Architecture
=========================

PyHDC supports two backends for its array operations: **NumPy** (the default)
and **PyTorch**. Every encoding, operation, and hypervector works with either
backend and the API surface is identical.

Design rationale
-----------------

NumPy is the universal default because it has no install dependencies beyond
Python and works on every platform. PyTorch is supported for three reasons:

1. **GPU acceleration**: CUDA tensors enable large-scale HDC (50,000+
   dimensions, millions of codebook entries) to run in seconds instead of
   minutes.
2. **Integration with deep learning pipelines**: if you are using HDC as a
   layer in a neural network or alongside PyTorch models, keeping everything
   in the same tensor ecosystem avoids data copies.
3. **Vectorised batch operations**: PyTorch's broadcasting and GPU-native
   matmul make batched bundling and similarity faster for large batches;
   GPU generation of 10,000 x 10,000 codebooks takes under a second versus
   several minutes on CPU.

The ``TORCH_AVAILABLE`` flag
-----------------------------

At import time, PyHDC tries to ``import torch``. If it succeeds,
``pyhdc.TORCH_AVAILABLE`` is ``True``.  If it fails, it is ``False`` and all
``backend="torch"`` requests raise ``ImportError``.

.. code-block:: python

   import pyhdc

   print(pyhdc.TORCH_AVAILABLE)   # True or False

The ``BackendManager``
-----------------------

The internal :class:`~pyhdc.BackendManager` class is a static utility that
dispatches array operations to the correct backend. Advanced users who are
extending PyHDC (writing custom encodings or components) may use it directly:

* ``BackendManager.to_numpy(array)`` : convert to NumPy
* ``BackendManager.to_torch(array, device)`` : convert to PyTorch tensor
* ``BackendManager.get_device(hypervector)`` : return the device string

Normal users never need to call ``BackendManager``: the encoding and
``Hypervector`` methods handle dispatch automatically.

Backend selection
------------------

Set the backend at encoding construction time:

.. code-block:: python

   enc_np  = pyhdc.MAP_C(dimension=10_000)                           # numpy
   enc_cpu = pyhdc.MAP_C(dimension=10_000, backend="torch")          # torch CPU
   enc_gpu = pyhdc.MAP_C(dimension=10_000, backend="torch",
                          device="cuda")                              # torch GPU

All operations on hypervectors generated by ``enc_gpu`` run on the GPU.

Global backend and device defaults
------------------------------------

Rather than pass ``backend=`` and ``device=`` to every encoding, you can set
process-wide defaults. Any encoding constructed without an explicit
``backend``/``device`` argument inherits these:

.. code-block:: python

   import pyhdc

   pyhdc.prefer_torch(device=None)   # default backend -> torch (CPU unless device given)
   pyhdc.prefer_cuda(index=None)     # default backend -> torch on CUDA (optional device index)
   pyhdc.prefer_numpy()              # default backend -> numpy
   pyhdc.prefer_cpu()                # default device -> CPU

   pyhdc.get_default_backend()       # current default backend ("numpy" or "torch")
   pyhdc.get_default_device()        # current default device (None or a device string)

   enc = pyhdc.MAP_C(dimension=10_000)   # inherits the current defaults

``prefer_torch`` and ``prefer_cuda`` raise exceptions if PyTorch (or CUDA) is unavailable.

What changes between backends
-------------------------------

From the user's perspective, almost nothing changes. The only observable
differences are:

* ``hv.data`` returns ``numpy.ndarray`` (NumPy) or ``torch.Tensor`` (PyTorch)
* ``hv.device`` returns ``None`` (NumPy) or a device string (PyTorch)
* ``hv.backend`` returns ``"numpy"`` or ``"torch"``
* PyTorch similarity on a GPU tensor returns a ``torch.Tensor``, not a Python
  float. Wrap with ``float()`` when needed.

PyTorch batching
-----------------

PyTorch enables larger and faster batched operations:

**Batched generation**

.. code-block:: python

   enc   = pyhdc.MAP_C(dimension=10_000, backend="torch", device="cuda")
   batch = enc.generate(size=(10_000, 1000))   # shape (10000, 1000) tensor

**Batched similarity**

.. code-block:: python

   A = enc.generate(size=(10_000, 500))   # shape (10000, 500)
   B = enc.generate(size=(10_000, 500))   # shape (10000, 500)
   sims = enc.similarity(A, B)            # shape (500,)  # one score per column pair

**Batched bundling**

.. code-block:: python

   batch = enc.generate(size=(10_000, 3))   # 3 hypervectors as columns,
                                            # shape is (10000, 3)
   result = enc.bundle(batch)
   print(result.shape)   # (10000,): one bundled prototype

Memory layout and data movement
---------------------------------

PyHDC generators always produce CPU data first (they are Python-level
sequences that get converted to arrays). When you create a GPU encoding, the
generated floats are first assembled into a NumPy array and then transferred
to the GPU. This means generation is not purely on-GPU.

The data-movement methods on ``Hypervector`` make transfers explicit:

* ``.to_torch(device=None)`` : NumPy -> PyTorch CPU (or specified device)
* ``.to_numpy()`` : PyTorch -> NumPy
* ``.cuda(device=None)`` : move to CUDA (shortcut for ``.to("cuda")``)
* ``.cpu()`` : move to CPU
* ``.to(device)`` : move to any device string

These all copy the data; the original hypervector is unchanged.

Limitations
-----------

* **No automatic gradient tracking** : HDC operations do not participate in
  PyTorch autograd. Gradients do not flow through bind, bundle, or similarity.
* **Mixed backends raise ValueError** : you cannot mix NumPy and PyTorch
  hypervectors in the same operation; convert explicitly first.
* **Generator output is always CPU-first** : even for GPU encodings,
  generation goes through NumPy before being sent to the GPU device.