SuperBird Labs logoSuperBird Labs
SuperBird Labs

Research

Efficiency as first principle: ternary weights, adaptive contexts, and fusion layers — in service of elegant, on‑device intelligence.

Core Techniques

BitNet‑Inspired Ternary Models

1.58‑bit arithmetic (add/sub only) for massive efficiency on commodity CPUs.

  • ~95% memory reduction vs fp32
  • Up to 10× faster matrix ops on CPU
  • Strong quality with distillation & RLHP

Compression & Quantization Pipeline

Stage‑wise pruning, layer‑wise sensitivity, and adaptive int8/int16 mixes.

  • Validation gates after each stage
  • Percentile calibration, symmetric/asymmetric
  • Channel fusion & architectural optimizations

Adaptive Context

Attention sinks + rolling buffers + compressed history for long‑form tasks.

Cross‑Modal Fusion

Mobile‑scale vision + compact language with dual attention and RMSNorm.

Distillation & RLHP

Teacher attention transfer and human‑in‑the‑loop aesthetic reward shaping.

Model Families

FamilyParams / SizeSignaturePrimary Use
Robin (Personal)3–8M / <10MBUltra‑compact, instant CPU responsesChat, code, creative writing
Cardinal (Pro)15–42M / <50MBMemory‑efficient attention, privacyEnterprise analysis
Phoenix (Multimodal)25–75MBVision + language fusionScreenshots, documents, VQA

Research Timeline

  1. Phase 1 — Foundation

    Rust tensor integration, .nest format, zero‑dep runtime, quantization toolkit.

  2. Phase 2 — Robin

    3–8M personal models, instant CPU latency, tokenizer compression.

  3. Phase 3 — Cardinal

    15–42M professional variants, privacy/compliance, performance hardening.

  4. Phase 4 — Phoenix

    Multimodal fusion, mobile‑optimized builds, WASM deployment.

Target Benchmarks

ModelSizeCPU SpeedMemoryUse Case
Robin (Personal)<10MBInstant≤100MBChat, code, writing
Cardinal (Pro)<50MB>=8 tok/s≤256MBEnterprise analysis
Phoenix (Multimodal)25–75MB0.2–0.5 img/s200–250MBVision + voice

Deployment

Edge‑First

  • Runtime optimizer adapts to power/thermal modes
  • Quantized/mixed precision paths
  • Memory manager with strict budgets

WASM

  • Browser inference with tiny footprint
  • Exports for generate(), memory usage, diagnostics
  • Works offline; no telemetry

Training Pipeline

Curriculum

Foundation → Specialization → Distillation, with careful domain shifts.

Attention Transfer

Match attention patterns and hidden states to compress knowledge.

Aesthetic RLHP

Human‑in‑the‑loop scoring for helpfulness, elegance, and delight.

Open Problems

  • Robust ternary training without accuracy cliffs
  • Tokenizer compression vs domain coverage
  • Multimodal fusion that scales gracefully on CPU
  • Quantization policies that optimize for battery life

Ecosystem & Metrics

Open Source Strategy

  • MIT/Apache dual licensing for maximum adoption
  • Build in public: core, models, training, examples
  • SEPs (SuperBird Enhancement Proposals)

Community Programs

  • Discord: The SuperBird Nest
  • Model bounties & architecture competitions
  • Newsletter + blog cadence (3 posts/week)

The .nest Model Format

A minimal binary format for tiny models: versioned header, metadata (license, provenance), layer blocks, and optional quantization tables. Designed for zero‑copy loading, small footprints, and deterministic behavior.

.nest fileHeader (version, schema, checksum)Metadata• model_id, family• license, provenance• tokenizer info• quantization schemeLayer Blocks• embeddings• attention• feedforward• outputQuant Tablesint8 / int4ternaryFooter: checksum/signature for integrity

Design Principles

  • Deterministic, versioned, and portable
  • Built‑in quantization metadata (int8/int4/ternary)
  • Streamable and memory‑mapped friendly

Metadata Fields

  • Model id, family, license
  • Training data provenance
  • Checksum & schema version

SDK Examples

use superbird::prelude::*;
fn main() -> Result<()> {
  let mut model = Robin::from_file("./robin-3m.nest")?;
  let out = model.generate("Write a haiku about small models")
    .temperature(0.7)
    .max_tokens(64)
    .run()?;
  println!("{}", out);
  Ok(())
}