Language Models over Canonical Byte-Pair Encodings
Published in ICML, 2025
Fixing LM probability leakage from unintuitive tokenizations, by adjusting LM distribution
Recommended citation: Vieira et al. (2025). "Language Models over Canonical Byte-Pair Encodings." ICML. 1(1).
Download Paper