The Distributed Open Llama Network

Your hardware. Your models.
Everyone's network.

Dollama connects your machines into a personal inference network. Your desktop handles the heavy lifting while you prompt from your laptop or phone. When you need more, burst into community capacity from others doing the same thing. No per-token billing. No data harvested. No one deciding what you're allowed to ask.

Get started Try the network

0 tokens processed

0 tok/s capacity

0 total users

0 requests served

Why Dollama

Three ways to use it

RUN

Remote inference

Run from anywhere, compute from home

Your laptop doesn't need a GPU. Your desktop has one sitting idle. Dollama routes your inference to the best machine you own — prompt from the couch, the bus, or your phone while your workstation does the work. Your battery stays cool. Your responses stay fast.

BURST

Beyond your hardware

Burst beyond your own hardware

Sometimes one machine isn't enough. Dollama lets you tap into a community of people in the same position — borrow their idle cycles when you need a burst, lend yours when you don't. Everyone's ceiling goes up without anyone buying new hardware.

GIVE

Mutual infrastructure

Contribute idle cycles, strengthen the network

When you're not using your GPU, someone else can. Not for a corporation skimming margin — for a network of people building an alternative to the API tollbooth. The more people participate, the more reliable and available the network becomes for everyone.

How it works

Three steps to a personal inference network

Install alongside Ollama

One command. Dollama runs quietly in your system tray. It connects to your local Ollama instance and registers your machine — as a compute source, a client, or both.

Your machines form a mesh

If you have more than one machine running Dollama, they find each other. Your requests route to whichever of your machines is best suited — based on real hardware benchmarks, not guesswork. You always have full priority over your own hardware.

Opt into the community network

Beyond your own machines, Dollama connects you to a volunteer network. When you need more than you have, community nodes pick up the slack. When you're idle, you return the favour. An intelligent routing layer adapts prompts and payloads to fit whatever hardware is available.

Your machine

Claude Code / IDE dollama proxy

Has: your files, context, repo

prompt (HTTPS)

Relay

Routes requests Manages queue

Sees: prompt plaintext (v1)

prompt (HTTPS)

Contributor node

Ollama runtime Qwen 3.5 9B

Sees: raw prompt only

v1: all traffic flows through the relay. Direct peer connections with end-to-end encryption are planned for Phase 4.

Global network

Always on, somewhere in the world

Cloud providers charge for 24/7 uptime. A volunteer network gets it for free — from geography.

Timezone spread is reliability

When it's 3am in Sydney, contributors in Europe are at their desks with GPUs idle. When Europe sleeps, the Americas pick up. No single region's off-hours takes the network down — the herd is always awake somewhere. That's capacity you can't buy; it emerges naturally when enough people participate from enough places.

Availability win, with a latency caveat

The network routes to the best available node — not necessarily the nearest. When your closest contributor is asleep and a node across the world picks up your request, tokens still arrive; they just travel further. Availability is the guarantee. Latency varies. If round-trip time matters more than cost for a given task, keep it on your own hardware.

Laptop users

Contribute plugged in. Use the network on battery.

Local LLM inference pegs your GPU at full load. On battery, that's minutes of life per prompt. Dollama flips the equation — offload the heavy work when you need to, give it back when you don't.

Plugged in: give back

When your laptop is on mains power, Dollama can serve inference to the network. Your GPU runs at load anyway when you're coding — now that load earns you priority credit while you work. No configuration required; it switches automatically based on whether you're connected.

On battery: take from the herd

Unplug and the GPU stays idle. Your inference requests route to a contributor node instead — someone at a desk with power to spare. You get the same Ollama-quality responses at the same speed, without the fan spin and the battery drain. The couch, the coffee shop, the commute: all the same experience.

Smart compression

100K tokens in. 6K tokens out.

Claude Code sends massive context windows — 58 tool definitions, file contents, full conversation history. Dollama's optimization pipeline crunches it all down before it hits the network, so a 9B model on consumer hardware can actually handle it.

What Claude Code sends

~100K tokens

Tool schemas 82 KB Conversation ~50 KB System prompt 16 KB

1 On-demand tool loading

2 Shadow store recovery

3 TOON format compression

4 Stale context pruning

5 Format-aware compression

6 Assistant reasoning trimming

What hits the network

~6K tokens

94% reduction

On-demand tool loading

Claude Code defines 58 tools — 82 KB of schemas sent with every request. Dollama keeps only the 6 essentials loaded and injects a tool catalog. The model requests others as needed. Tool schemas drop from 82 KB to ~3 KB.

Shadow store recovery

Nothing is thrown away. When content is compressed or pruned, the original is retained in a local shadow store. If the model needs it back, it requests the full content by reference — zero information loss.

TOON format compression

Tabular data like JSON arrays of objects is converted to TOON — a column-oriented format that eliminates repeated keys. Tables with uniform structure compress 40-60%, and the model reads it natively.

Compression happens locally on your machine before anything leaves. Full originals stay in the shadow store for the life of the session.

Try the network

See it in action with Jane Doe 🦌

Meet your deer friend! Ask anything — no login, no API key. Powered by community compute.

Jane Doe 🦌 • Your deer friend on the Dollama network

Connecting...

Responses may be inaccurate · No prompts are logged · Powered by community volunteers

Get started

First inference in under two minutes

🦙

Ollama

Required runtime

💾

8GB+ RAM

Recommended

💻

macOS / Linux / Win

Cross-platform

🧠

qwen3.5:9b

Network model

Terminal

curl -fsSL https://dollama.net/install.sh | sh

Requires Ollama installed with qwen3.5:9b pulled if you plan to contribute.

PowerShell

irm https://dollama.net/install.ps1 | iex

Open Windows PowerShell (not Command Prompt) and paste the line above — irm and iex are PowerShell commands and won't work in cmd.exe. The installer downloads dollama, sets up Ollama, adds a Start Menu shortcut, and opens the dashboard for you.

Aliases disabled? Use Invoke-RestMethod https://dollama.net/install.ps1 | Invoke-Expression instead.

Requires Ollama installed with qwen3.5:9b pulled if you plan to contribute.

Windows may show a SmartScreen warning — click "More info" then "Run anyway". This is normal for unsigned open-source software. Troubleshooting

Start Dollama

The installer already launched dollama in your system tray and opened the dashboard at http://127.0.0.1:11436. To start it manually later, run dollama network (use + share) or dollama private (use only).

dollama network

Launch Claude Code

dollama launch claude

Start coding

That's it. You're running on community compute. Dollama exposes an Anthropic Messages API endpoint at localhost:11435. Any tool that supports a custom base URL can use it — Claude Code, Continue, Aider, or your own scripts.

What the installer does: downloads the latest dollama binary for your platform, verifies the checksum, and installs it to ~/.dollama/bin (macOS/Linux) or %LOCALAPPDATA%\dollama (Windows). On macOS it also creates ~/Applications/dollama.app for Spotlight/Launchpad; on Windows it adds a Start Menu shortcut. All platforms open the dashboard after install. Read the script source.

Direct binary download

macOS (Apple Silicon) macOS (Intel) Linux (x86_64) Linux (ARM64) Windows (x86_64) Windows (ARM64)

After extracting on Windows: move dollama.exe to %LOCALAPPDATA%\dollama, add that folder to PATH, then run dollama install-app and launch via the Start Menu → Dollama shortcut (not by double-clicking the zip download).

Verify checksums

curl -fsSL https://dollama.net/dl/latest/checksums.txt
sha256sum -c checksums.txt

Build from source

git clone https://github.com/notangrywaffle/dollama.net.git
cd dollama.net/cli && make build

Live network

The herd at a glance

Concurrent capacity

Messages handled simultaneously

Processing capacity

Theoretical tok/s

Requests served

All time

Tokens processed

All time

Contributors

All time unique node owners

Users

All time

Works with

Claude Code Continue Aider Any Anthropic-compatible tool

Open source

Full source code on GitHub. Relay, CLI, installer, and this website — all public.

Built on

Ollama Qwen 3.5 9B Go relay

Trust & privacy

Honest, not polished

Here's how your data flows today.

What stays on your machine

Your files, repos, and working directory
Context is assembled locally
Only the inference prompt leaves your machine

What's visible in transit (v1)

The relay and contributor nodes can see prompts in plaintext
We don't log prompt content, but it's not yet encrypted end-to-end

Use Dollama for open-source projects, learning, experimentation, and non-sensitive coding. Don't send credentials or proprietary code.

What's next

Direct peer connections with end-to-end encryption are on the roadmap
Personal mesh across your own machines is private today
Community pooling with privacy-preserving inference is what we're building toward

Full privacy details | Read the source

Why this exists

Making scale matter less

Every model you run through a centralised API is a dependency. On their pricing, their policies, their surveillance, their terms of service deciding what you're allowed to ask.

Dollama is a bet that local-first, community-scaled inference can make that dependency optional. Not by matching their scale — by making scale matter less. Smart routing across heterogeneous hardware. Payload optimisation that fits models into whatever's available. A network where contribution is the membership fee and no single entity holds the keys.

We didn't set out to build an alternative to Big AI. We just wanted our own machines to work together. It turns out that when enough people do that, you get one anyway.

Your hardware. Your models. Your rules.

FAQ

Common questions

In v1, yes — contributor nodes see the raw inference prompt to generate a response. They don't see your files, identity, or conversation history. Nodes are stateless. End-to-end encryption is planned.

Not yet. Treat the network like any cloud API for now. Use it for open-source, learning, and non-sensitive tasks.

Qwen 3.5 9B. It's fast enough for consumer hardware and capable enough for agentic coding. A single model keeps routing simple. More models may come later.

No — CPU inference works too, just slower. Any machine that can run Ollama can contribute.

No. But when the network is busy, people who contribute get served first. When it's idle, everyone gets instant access.

We love llamas — Ollama is the backbone of this project. But a doe felt right for what we're building: gentle, graceful, and part of a herd. Plus, dollama. It was right there the whole time.

Feedback

Your hardware. Your models.Everyone's network.

Three ways to use it

Run from anywhere, compute from home

Burst beyond your own hardware

Contribute idle cycles, strengthen the network

Three steps to a personal inference network

Install alongside Ollama

Your machines form a mesh

Opt into the community network

Always on, somewhere in the world

Timezone spread is reliability

Availability win, with a latency caveat

Contribute plugged in. Use the network on battery.

Plugged in: give back

On battery: take from the herd

100K tokens in. 6K tokens out.

On-demand tool loading

Shadow store recovery

TOON format compression

See it in action with Jane Doe 🦌

First inference in under two minutes

Start Dollama

Launch Claude Code

Start coding

Direct binary download

Verify checksums

Build from source

The herd at a glance

Works with

Open source

Built on

Honest, not polished

What stays on your machine

What's visible in transit (v1)

What's next

Making scale matter less

Common questions

Report a bug or request a feature

Your hardware. Your models.
Everyone's network.