Cassandra T1 Model Family — Diffusion Language Model Architecture

Something from nothing.

Cassandra T1 is a 1.3 billion parameter masked diffusion language model created by SOPHIA XT. Unlike autoregressive models such as Gemma-style architectures, which generate one token at a time left-to-right, Cassandra T1 generates all token positions simultaneously through parallel denoising in 8-16 steps.

Demo and Benchmark Board

The Cassandra model-family page includes animated architecture diagrams, a masked-denoising demo mockup, code integration examples, and internal benchmark targets against a Gemma 4 autoregressive reference. The headline goal is roughly a 98% quality envelope while using far fewer forward passes for longer generations. Final public benchmark numbers should be replaced with measured release evaluations when weights are published.

Architecture

1.33B parameters. 28 transformer layers. Grouped Query Attention (16 heads, 4 KV heads). SwiGLU FFN. RMSNorm. RoPE (theta=500K, 128K context). Sliding window attention (4096) + global tokens (256). BPE vocabulary of 32,768 tokens with spatial coordinate token support.

Novel Contributions

PDE Lattice Denoising Scheduler: Cubic S-curve schedule gamma(t) = 1 - 3t^2 + 2t^3 concentrates denoising steps in the critical transition zone. 3rd-order multi-step corrections adapted from DPM-Solver+++ for discrete text diffusion. 8 lattice steps achieve quality equivalent to ~24 linear denoising steps.
Spatial-Native Token Vocabulary: Dedicated coordinate tokens for bounding boxes (X/Y/W/H bins), depth (foreground/midground/background), and spatial relationships as first-class vocabulary items. The model reasons about spatial positions through attention, not string parsing.
Parabolic Gradient Schedule: Custom learning rate schedule with exponential warmup and parabolic decay — steeper than cosine annealing early, gentler during refinement. Based on SOPHIA XT's Parabolic Optimization Theory.
Edge Deployment: ~800MB INT4 quantized. Runs on smartphones, laptops, and edge devices with no cloud API. All inference is local and private.

Generation Speed

For 512 tokens: autoregressive models (GPT, LLaMA) need 512 forward passes. Cassandra T1 needs 8-16 forward passes — all tokens generated in parallel per step. This enables sub-second inference on consumer GPUs and ~2-3 seconds on phone hardware.

Training

Trained on 1M+ curated instruction samples including conversations, mathematical reasoning, code, spatial annotations, science, medical knowledge, and knowledge distillation from a larger pre-trained model. Multi-objective loss with spatial token upweighting. Beta(2,2) mask ratio sampling for PDE-matched importance distribution.

Applications

Personal AI assistant running entirely on-device with no internet
Private health and financial queries that never leave the phone
Document and receipt understanding with spatial awareness
Code assistance running locally on developer machines

Part of the SOPHIA XT Model Family. Research by SOPHIA XT. View all research.

LLM Information File | Sitemap | Robots.txt