Decoupling prediction from commitment. An autoregressive language model that generates tokens through continuous vector dynamics, eliminating repetition without sampling heuristics.
Standard autoregressive LLMs commit to a discrete token at every step—prediction and commitment are fused into a single operation. This forces uncertainty to collapse immediately, leading to degenerate repetition under greedy decoding and dependence on sampling heuristics.
Token Maturation decouples prediction from commitment. The generation state is a sequence of continuous vectors. New tokens emerge through iterative refinement in embedding space, and discrete commitment happens only when vectors geometrically stabilize—not when probability concentrates.
The entire sequence—including already-generated tokens—lives in continuous embedding space. No discrete indices until final projection.
A "liquid tail" of K vectors is iteratively refined. Tokens are committed only when they reach the front of the buffer.
Commitment occurs via nearest-neighbor projection when vectors stabilize—even if the induced token distribution remains high-entropy.
Each token is immediately discretized via sampling or argmax. Uncertainty must collapse to a single choice at every step.
Committed tokens (white) followed by a liquid tail (cyan) that matures through continuous refinement before projection.
Tokens are represented as vectors zt ∈ ℝd in embedding space. The model predicts continuous vectors, not logits over vocabulary.
The liquid tail is updated via contraction: z̃i ← z̃i + η(α)(ẑi − z̃i), where η depends on position in the tail.
When a vector reaches the front of the tail, it's committed via nearest-neighbor projection: xt = argmaxi ⟨zt, ei⟩.
MSE ensures geometric convergence; InfoNCE prevents collapse toward the mean and anchors predictions to discrete token identities.
Classifier-Free Guidance pulls tail vectors toward the manifold of coherent text, making the liquid tail a window into the model's implicit forward planning.
CFG pulls tail vectors toward the manifold of coherent text, making the liquid tail a window into the model's implicit forward planning.
Token Maturation often converges to a stable structural template while varying surface-level entities:
The structural pattern [Quote → Attribution → Role] remains stable across runs, while specific names, quotes, and institutions vary freely.
Token Maturation eliminates degenerate repetition under pure greedy decoding—no penalties, no temperature, no sampling.
| Metric | Token Maturation (Ours) | Standard Greedy | Greedy + Penalty |
|---|---|---|---|
| Dist-1 ↑ | 0.974 | 0.228 | 0.936 |
| Dist-2 ↑ | 1.000 | 0.301 | 0.999 |
| Rep-2 ↓ | 0.000 | 0.699 | 0.001 |
| Rep-3 ↓ | 0.000 | 0.665 | 0.000 |
| Loop % ↓ | 0% | 90% | 0% |
All models use the same GPT-2 Medium backbone. Loop% = fraction of samples containing any repeated trigram.
A key finding: discrete commitment need not coincide with probability concentration. The entropy of the induced token distribution often remains high (~3.9 nats) throughout maturation, yet generation stays coherent.
@article{tokenmaturation2025,
title={Token Maturation: Autoregressive Language Generation
via Continuous Token Dynamics},
author={Anonymous},
journal={arXiv preprint arXiv:2601.04854},
year={2025}
}