SuperBird Labs

Research

Efficiency as first principle: ternary weights, adaptive contexts, and fusion layers — in service of elegant, on‑device intelligence.

Explore Techniques Ecosystem & Strategy

Core Techniques

BitNet‑Inspired Ternary Models

1.58‑bit arithmetic (add/sub only) for massive efficiency on commodity CPUs.

~95% memory reduction vs fp32
Up to 10× faster matrix ops on CPU
Strong quality with distillation & RLHP

Compression & Quantization Pipeline

Stage‑wise pruning, layer‑wise sensitivity, and adaptive int8/int16 mixes.

Validation gates after each stage
Percentile calibration, symmetric/asymmetric
Channel fusion & architectural optimizations

Adaptive Context

Attention sinks + rolling buffers + compressed history for long‑form tasks.

Cross‑Modal Fusion

Mobile‑scale vision + compact language with dual attention and RMSNorm.

Distillation & RLHP

Teacher attention transfer and human‑in‑the‑loop aesthetic reward shaping.

Model Families

Family	Params / Size	Signature	Primary Use
Robin (Personal)	3–8M / <10MB	Ultra‑compact, instant CPU responses	Chat, code, creative writing
Cardinal (Pro)	15–42M / <50MB	Memory‑efficient attention, privacy	Enterprise analysis
Phoenix (Multimodal)	25–75MB	Vision + language fusion	Screenshots, documents, VQA

Research Timeline

Phase 1 — Foundation
Rust tensor integration, .nest format, zero‑dep runtime, quantization toolkit.
Phase 2 — Robin
3–8M personal models, instant CPU latency, tokenizer compression.
Phase 3 — Cardinal
15–42M professional variants, privacy/compliance, performance hardening.
Phase 4 — Phoenix
Multimodal fusion, mobile‑optimized builds, WASM deployment.

Target Benchmarks

Model	Size	CPU Speed	Memory	Use Case
Robin (Personal)	<10MB	Instant	≤100MB	Chat, code, writing
Cardinal (Pro)	<50MB	>=8 tok/s	≤256MB	Enterprise analysis
Phoenix (Multimodal)	25–75MB	0.2–0.5 img/s	200–250MB	Vision + voice

Deployment

Edge‑First

Runtime optimizer adapts to power/thermal modes
Quantized/mixed precision paths
Memory manager with strict budgets

WASM

Browser inference with tiny footprint
Exports for generate(), memory usage, diagnostics
Works offline; no telemetry

Training Pipeline

Curriculum

Foundation → Specialization → Distillation, with careful domain shifts.

Attention Transfer

Match attention patterns and hidden states to compress knowledge.

Aesthetic RLHP

Human‑in‑the‑loop scoring for helpfulness, elegance, and delight.

Open Problems

Robust ternary training without accuracy cliffs
Tokenizer compression vs domain coverage
Multimodal fusion that scales gracefully on CPU
Quantization policies that optimize for battery life

Contribute on GitHub Join the Lab Discord

Ecosystem & Metrics

Open Source Strategy

MIT/Apache dual licensing for maximum adoption
Build in public: core, models, training, examples
SEPs (SuperBird Enhancement Proposals)

Community Programs

Discord: The SuperBird Nest
Model bounties & architecture competitions
Newsletter + blog cadence (3 posts/week)

The .nest Model Format

A minimal binary format for tiny models: versioned header, metadata (license, provenance), layer blocks, and optional quantization tables. Designed for zero‑copy loading, small footprints, and deterministic behavior.

Design Principles

Deterministic, versioned, and portable
Built‑in quantization metadata (int8/int4/ternary)
Streamable and memory‑mapped friendly

Metadata Fields

Model id, family, license
Training data provenance
Checksum & schema version

SDK Examples

use superbird::prelude::*;
fn main() -> Result<()> {
  let mut model = Robin::from_file("./robin-3m.nest")?;
  let out = model.generate("Write a haiku about small models")
    .temperature(0.7)
    .max_tokens(64)
    .run()?;
  println!("{}", out);
  Ok(())
}