How to Remap Similarity Output to [0, 1]
Since v1.1.0, all PyHDC similarity functions return values in [-1, 1].
If your code expects [0, 1], for example when feeding results to scikit-learn
metrics or comparing against v1.0.x thresholds, use remap_to_unit.
The remap function
remap_to_unit applies the linear transformation:
This maps −1 → 0, 0 → 0.5, and +1 → 1. It works on scalars, NumPy arrays, and PyTorch tensors.
from pyhdc.components.similarity import remap_to_unit
print(remap_to_unit(-1.0)) # 0.0
print(remap_to_unit(0.0)) # 0.5
print(remap_to_unit(1.0)) # 1.0
Apply it automatically via the encoding
Pass similarity_remap=remap_to_unit to the encoding constructor. Every
similarity call on that encoding is automatically remapped:
import pyhdc
from pyhdc.components.similarity import remap_to_unit
enc = pyhdc.BSC(dimension=10_000, similarity_remap=remap_to_unit)
a = enc.generate()
b = enc.generate()
print(a.similarity(a)) # 1.0
print(a.similarity(b)) # ~= 0.5 (unrelated; 0.5 in [0,1] means orthogonal)
Apply it manually
If you only need to remap some calls, apply it after the fact:
enc = pyhdc.BSC(dimension=10_000) # default: no remap
raw = a.similarity(b) # [-1, 1]
remapped = remap_to_unit(raw) # [0, 1]
Custom remap function
The similarity_remap parameter accepts any callable that maps a scalar or
array to a scalar or array of the same shape. Examples:
import numpy as np
# Sigmoid remap: maps all [-1, 1] to (0, 1) with smooth S-curve
def sigmoid_remap(x):
return 1 / (1 + np.exp(-5 * x)) # scale factor 5 sharpens the transition
enc = pyhdc.MAP_C(dimension=10_000, similarity_remap=sigmoid_remap)
# Hard threshold: returns 1 if similar, 0 if not
def threshold_remap(x):
return (x > 0.3).astype(float)
Batched similarity with remap
Remapping is applied element-wise, so it works correctly with all batched calling conventions:
enc = pyhdc.MAP_C(dimension=10_000, similarity_remap=remap_to_unit)
batch = enc.generate(size=10)
query = enc.generate()
# Convention 2: per-row pairs
sims = enc.similarity(batch, enc.generate(size=10)) # shape (10,), all in [0,1]
Migration from v1.0.x
In v1.0.x, HammingDistance and Overlap (used by BSC and BSDC) returned
[0, 1]. In v1.1.0 they return [-1, 1]. To restore the old behaviour globally,
add similarity_remap=remap_to_unit when constructing any BSC or BSDC
encoding.