The Distributed Open Llama Network

Your hardware. Your models.
Everyone's network.

Dollama connects your machines into a personal inference network. Your desktop handles the heavy lifting while you prompt from your laptop or phone. When you need more, burst into community capacity from others doing the same thing. No per-token billing. No data harvested. No one deciding what you're allowed to ask.

dollama mascot
0 concurrent capacity
0 active users
0 tok/s
0 requests

Three ways to use it

RUN
Remote inference

Run from anywhere, compute from home

Your laptop doesn't need a GPU. Your desktop has one sitting idle. Dollama routes your inference to the best machine you own — prompt from the couch, the bus, or your phone while your workstation does the work. Your battery stays cool. Your responses stay fast.

BURST
Beyond your hardware

Burst beyond your own hardware

Sometimes one machine isn't enough. Dollama lets you tap into a community of people in the same position — borrow their idle cycles when you need a burst, lend yours when you don't. Everyone's ceiling goes up without anyone buying new hardware.

GIVE
Mutual infrastructure

Contribute idle cycles, strengthen the network

When you're not using your GPU, someone else can. Not for a corporation skimming margin — for a network of people building an alternative to the API tollbooth. The more people participate, the more reliable and available the network becomes for everyone.

Three steps to a personal inference network

01

Install alongside Ollama

One command. Dollama runs quietly in your system tray. It connects to your local Ollama instance and registers your machine — as a compute source, a client, or both.

02

Your machines form a mesh

If you have more than one machine running Dollama, they find each other. Your requests route to whichever of your machines is best suited — based on real hardware benchmarks, not guesswork. You always have full priority over your own hardware.

03

Opt into the community network

Beyond your own machines, Dollama connects you to a volunteer network. When you need more than you have, community nodes pick up the slack. When you're idle, you return the favour. An intelligent routing layer adapts prompts and payloads to fit whatever hardware is available.

Your machine
Claude Code / IDE dollama proxy
Has: your files, context, repo
prompt (HTTPS)
Relay
Routes requests Manages queue
Sees: prompt plaintext (v1)
prompt (HTTPS)
Contributor node
Ollama runtime Qwen 3.5 9B
Sees: raw prompt only
v1: all traffic flows through the relay. Direct peer connections with end-to-end encryption are planned for Phase 4.

100K tokens in. 6K tokens out.

Claude Code sends massive context windows — 58 tool definitions, file contents, full conversation history. Dollama's optimization pipeline crunches it all down before it hits the network, so a 9B model on consumer hardware can actually handle it.

What Claude Code sends
~100K tokens
Tool schemas 82 KB Conversation ~50 KB System prompt 16 KB
1 On-demand tool loading
2 Shadow store recovery
3 TOON format compression
4 Stale context pruning
5 Format-aware compression
6 Assistant reasoning trimming
What hits the network
~6K tokens
94% reduction

On-demand tool loading

Claude Code defines 58 tools — 82 KB of schemas sent with every request. Dollama keeps only the 6 essentials loaded and injects a tool catalog. The model requests others as needed. Tool schemas drop from 82 KB to ~3 KB.

Shadow store recovery

Nothing is thrown away. When content is compressed or pruned, the original is retained in a local shadow store. If the model needs it back, it requests the full content by reference — zero information loss.

TOON format compression

Tabular data like JSON arrays of objects is converted to TOON — a column-oriented format that eliminates repeated keys. Tables with uniform structure compress 40-60%, and the model reads it natively.

Compression happens locally on your machine before anything leaves. Full originals stay in the shadow store for the life of the session.

See it in action with Jane Doe 🦌

Meet your deer friend! Ask anything — no login, no API key. Powered by community compute.

Jane Doe 🦌 • Your deer friend on the Dollama network
Connecting...
Responses may be inaccurate · No prompts are logged · Powered by community volunteers

First inference in under two minutes

🦙
Ollama
Required runtime
💾
8GB+ RAM
Recommended
💻
macOS / Linux / Win
Cross-platform
🧠
qwen3.5:9b
Network model
Terminal
curl -fsSL https://dollama.net/install.sh | sh

Requires Ollama installed with qwen3.5:9b pulled if you plan to contribute.

PowerShell
irm https://dollama.net/install.ps1 | iex

Requires Ollama installed with qwen3.5:9b pulled if you plan to contribute.

1

Start Dollama

dollama both
2

Launch Claude Code

dollama launch claude
3

Start coding

That's it. You're running on community compute. Dollama exposes an Anthropic Messages API endpoint at localhost:11435. Any tool that supports a custom base URL can use it — Claude Code, Continue, Aider, or your own scripts.

What the installer does: downloads the latest dollama binary for your platform, verifies the checksum, and moves it to /usr/local/bin. That's it — no services, no daemons, no config changes. Read the script source.

Direct binary download

Verify checksums

curl -fsSL https://dollama.net/dl/latest/checksums.txt
sha256sum -c checksums.txt

Build from source

git clone https://github.com/notangrywaffle/dollama.net.git
cd dollama.net/cli && make build

The herd at a glance

loading
Concurrent capacity
Messages handled simultaneously
loading
Tokens per second
Network throughput (24h avg)
loading
Requests processed
In the last 24 hours
loading
Tokens processed
In the last 24 hours
loading
Contributor nodes
Unique node owners connected
loading
Chat users
Active users in the last 24 hours

Works with

Claude Code Continue Aider Any Anthropic-compatible tool

Open source

Full source code on GitHub. Relay, CLI, installer, and this website — all public.

Built on

Ollama Qwen 3.5 9B Go relay

Honest, not polished

Here's how your data flows today.

What stays on your machine

  • Your files, repos, and working directory
  • Context is assembled locally
  • Only the inference prompt leaves your machine

What's visible in transit (v1)

  • The relay and contributor nodes can see prompts in plaintext
  • We don't log prompt content, but it's not yet encrypted end-to-end

Use Dollama for open-source projects, learning, experimentation, and non-sensitive coding. Don't send credentials or proprietary code.

What's next

  • Direct peer connections with end-to-end encryption are on the roadmap
  • Personal mesh across your own machines is private today
  • Community pooling with privacy-preserving inference is what we're building toward

Making scale matter less

Every model you run through a centralised API is a dependency. On their pricing, their policies, their surveillance, their terms of service deciding what you're allowed to ask.

Dollama is a bet that local-first, community-scaled inference can make that dependency optional. Not by matching their scale — by making scale matter less. Smart routing across heterogeneous hardware. Payload optimisation that fits models into whatever's available. A network where contribution is the membership fee and no single entity holds the keys.

We didn't set out to build an alternative to Big AI. We just wanted our own machines to work together. It turns out that when enough people do that, you get one anyway.

Your hardware. Your models. Your rules.

Common questions

In v1, yes — contributor nodes see the raw inference prompt to generate a response. They don't see your files, identity, or conversation history. Nodes are stateless. End-to-end encryption is planned.
Not yet. Treat the network like any cloud API for now. Use it for open-source, learning, and non-sensitive tasks.
Qwen 3.5 9B. It's fast enough for consumer hardware and capable enough for agentic coding. A single model keeps routing simple. More models may come later.
No — CPU inference works too, just slower. Any machine that can run Ollama can contribute.
No. But when the network is busy, people who contribute get served first. When it's idle, everyone gets instant access.
We love llamas — Ollama is the backbone of this project. But a doe felt right for what we're building: gentle, graceful, and part of a herd. Plus, dollama. It was right there the whole time.

Report a bug or request a feature