PushMe

Live Event Page

Significance-Gain Pair Encoding for LLMs: A Statistical Alternative to Frequency-Based Subword Merging

arXiv:2603.19261v1 Announce Type: new Abstract: Subword tokenization is a key design choice for modern language models, including large language models (LLMs), with byte- and ch...

Early report Major update Updated Mar 23, 2026, 4:00 AM UTC