Five-Minute Quickstart
======================

The five steps below cover the core PyHDC workflow: generating hypervectors,
applying all three primitive operations, and building a tiny item memory, all
in under 30 lines of code.


Step 1: Pick an encoding
------------------------

An *encoding* defines how hypervectors are generated and how bundle, bind, and
similarity are implemented. PyHDC ships 15 encoding classes; ``MAP_B`` is the
recommended starting point; it uses bipolar values in {-1, 1} and supports all
operations including exact unbinding.

.. code-block:: python

   import pyhdc

   enc = pyhdc.MAP_B(dimension=10_000)

The only required parameter is ``dimension``, the length of every hypervector
produced by this encoding. 10,000 is a common default; lower values (1,000)
are faster but noisier, higher values (50,000) are more accurate but use more
memory.


Step 2: Generate hypervectors
------------------------------

.. code-block:: python

   v = enc.generate()

   print(v)            # Hypervector(shape=(10000,), backend=numpy, encoding=MAP_B)
   print(v.shape)      # (10000,)
   print(v.dtype)      # int8
   print(v.backend)    # numpy

Each call to ``.generate()`` draws a fresh random hypervector.  Two
independently generated hypervectors are nearly orthogonal by design.

You can generate a *batch* of hypervectors in one call. Batches are
dimension-first: a batch of ``N`` vectors has shape ``(D, N)``, one hypervector
per column.

.. code-block:: python

   batch = enc.generate(size=(10_000, 100))
   print(batch.shape)   # (10000, 100)


Step 3: The three operations
-----------------------------

Similarity
^^^^^^^^^^

Returns a scalar in [-1, 1]. Use it to measure how related two hypervectors
are (0 ~= unrelated, 1 = identical).

.. code-block:: python

   a = enc.generate()
   b = enc.generate()
   c = a   # same object

   print(a.similarity(b))   # ~= 0.0  # unrelated
   print(a.similarity(c))   # 1.0     # identical

Bundling
^^^^^^^^

Produces a hypervector that is *similar to all inputs*. Think of it as a fuzzy
set union.

.. code-block:: python

   x = enc.generate()
   y = enc.generate()
   z = enc.generate()

   bundle = x.bundle(y, z)

   print(bundle.similarity(x))   # ~= 0.6
   print(bundle.similarity(y))   # ~= 0.6
   print(bundle.similarity(z))   # ~= 0.6

Binding and unbinding
^^^^^^^^^^^^^^^^^^^^^

Binding produces a hypervector that is *dissimilar to both inputs*, but from
which either input can be recovered if you have the other (unbinding).

.. code-block:: python

   key   = enc.generate()
   value = enc.generate()

   record = key.bind(value)

   print(record.similarity(key))    # ~= 0.0: dissimilar to both
   print(record.similarity(value))  # ~= 0.0

   recovered = record.unbind(key)
   print(recovered.similarity(value))  # ~= 1.0: value recovered


Step 4: Build a tiny item memory
---------------------------------

An *item memory* (or codebook) is a dictionary mapping labels to hypervectors.
Here we encode five colours, bundle three of them into a "palette", and then
query which colours are in it.

.. code-block:: python

   colour_names = ['red', 'green', 'blue', 'yellow', 'purple']
   codebook = {name: enc.generate() for name in colour_names}

   # Bundle three colours into a palette
   palette = pyhdc.bundle(codebook['red'], codebook['green'], codebook['blue'])

   # Query: which colours are in the palette?
   for name, hv in codebook.items():
       sim = palette.similarity(hv)
       print(f"{name:8s}: {sim:.3f}")

   # Output:
   # red     :  0.573
   # green   :  0.568
   # blue    :  0.561
   # yellow  :  0.012   <- not in palette
   # purple  : -0.003   <- not in palette

Items in the bundle have noticeably higher similarity than items that were
not bundled. This is the fundamental query mechanism of HDC.


Step 5: Switch to PyTorch
--------------------------

The API is identical regardless of backend. Just pass ``backend="torch"``
when creating the encoding:

.. code-block:: python

   if pyhdc.TORCH_AVAILABLE:
       enc_torch = pyhdc.MAP_B(dimension=10_000, backend="torch")
       v = enc_torch.generate()
       print(v.backend)   # torch

       # GPU: requires CUDA
       enc_gpu = pyhdc.MAP_B(dimension=10_000, backend="torch", device="cuda")

Or set a process-wide default so every new encoding uses it:

.. code-block:: python

   pyhdc.prefer_torch()                  # or pyhdc.prefer_cuda()
   enc = pyhdc.MAP_B(dimension=10_000)   # inherits the torch backend
   pyhdc.prefer_numpy()                  # reset to numpy

You can also move an existing hypervector between backends:

.. code-block:: python

   v_numpy = enc.generate()
   v_torch = v_numpy.to_torch()
   v_back  = v_torch.to_numpy()


Putting it all together
------------------------

Here is the complete quickstart script as a single block:

.. code-block:: python

   import pyhdc

   enc = pyhdc.MAP_B(dimension=10_000)

   # Three primitives
   a, b = enc.generate(), enc.generate()
   print(a.similarity(b))              # ~= 0.0
   print(a.bundle(b).similarity(a))    # ~= 0.6
   record = a.bind(b)
   recovered = record.unbind(b)
   print(recovered.similarity(a))      # ~= 1.0

   # Item memory
   colours  = {c: enc.generate() for c in ['red','green','blue','yellow','purple']}
   palette  = pyhdc.bundle(colours['red'], colours['green'], colours['blue'])
   rankings = sorted(colours, key=lambda c: palette.similarity(colours[c]), reverse=True)
   print(rankings[:3])   # ['red', 'green', 'blue'] (order may vary)


Continue here
-------------

* :doc:`../tutorials/index` : five end-to-end tutorials, starting with
  :doc:`../tutorials/tutorial_1_text_classification`
* :doc:`../how_to/choose_encoding` : how to pick the right encoding for your
  use case
* :doc:`../user_manual/encodings_overview` : a full comparison of all 15
  encoding families