BitNet‑Inspired Ternary Models
1.58‑bit arithmetic (add/sub only) for massive efficiency on commodity CPUs.
- ~95% memory reduction vs fp32
- Up to 10× faster matrix ops on CPU
- Strong quality with distillation & RLHP
Efficiency as first principle: ternary weights, adaptive contexts, and fusion layers — in service of elegant, on‑device intelligence.
1.58‑bit arithmetic (add/sub only) for massive efficiency on commodity CPUs.
Stage‑wise pruning, layer‑wise sensitivity, and adaptive int8/int16 mixes.
Attention sinks + rolling buffers + compressed history for long‑form tasks.
Mobile‑scale vision + compact language with dual attention and RMSNorm.
Teacher attention transfer and human‑in‑the‑loop aesthetic reward shaping.
Family | Params / Size | Signature | Primary Use |
---|---|---|---|
Robin (Personal) | 3–8M / <10MB | Ultra‑compact, instant CPU responses | Chat, code, creative writing |
Cardinal (Pro) | 15–42M / <50MB | Memory‑efficient attention, privacy | Enterprise analysis |
Phoenix (Multimodal) | 25–75MB | Vision + language fusion | Screenshots, documents, VQA |
Rust tensor integration, .nest format, zero‑dep runtime, quantization toolkit.
3–8M personal models, instant CPU latency, tokenizer compression.
15–42M professional variants, privacy/compliance, performance hardening.
Multimodal fusion, mobile‑optimized builds, WASM deployment.
Model | Size | CPU Speed | Memory | Use Case |
---|---|---|---|---|
Robin (Personal) | <10MB | Instant | ≤100MB | Chat, code, writing |
Cardinal (Pro) | <50MB | >=8 tok/s | ≤256MB | Enterprise analysis |
Phoenix (Multimodal) | 25–75MB | 0.2–0.5 img/s | 200–250MB | Vision + voice |
Foundation → Specialization → Distillation, with careful domain shifts.
Match attention patterns and hidden states to compress knowledge.
Human‑in‑the‑loop scoring for helpfulness, elegance, and delight.
A minimal binary format for tiny models: versioned header, metadata (license, provenance), layer blocks, and optional quantization tables. Designed for zero‑copy loading, small footprints, and deterministic behavior.
use superbird::prelude::*;
fn main() -> Result<()> {
let mut model = Robin::from_file("./robin-3m.nest")?;
let out = model.generate("Write a haiku about small models")
.temperature(0.7)
.max_tokens(64)
.run()?;
println!("{}", out);
Ok(())
}