The Distributed Open Llama Network

Your GPU is idle.
Someone's agent needs it.

Share spare compute with the network. Use Qwen3 8B for free in Claude Code, Continue, and other agentic tools. No tokens, no fees — just a community.

qwen3:8b 8B parameters Ollama-native Anthropic API
dollama mascot
-- nodes online
Qwen3 8B
-- requests today

Folding@Home, but for LLM inference

01

Install & choose your role

Grab the dollama CLI. Run as a user to consume inference, as a contributor to share compute, or both to do it all.

02

Smart routing, not random

Contributor nodes report hardware benchmarks (tokens/sec, RAM, VRAM). The coordinator routes each request to the best available node — not a random one. If all nodes are busy, you queue with priority based on your contribution balance. Heartbeat monitoring drops stale nodes automatically.

03

Free inference, fair priority

No billing, no blockchain. A simple token ledger tracks what you've served minus what you've consumed. Contribute more and your requests jump the queue. When the network is idle, everyone gets instant service.

💻
You
Coordinator
⚙️
Contributor
Response

Install once, pick how you participate

Install dollama and it runs in your system tray. Choose your mode from the menu — no terminal required.

USE
I need inference

Use the network

Runs a local proxy in your system tray. Your coding tools connect to it and get LLM inference from the network. Code and context stay on your machine — only the inference prompt is sent.

GIVE
I have spare compute

Contribute cycles

Donates idle GPU/CPU when your machine isn't busy. Runs quietly in the background and pauses automatically when you need your resources. Your contribution builds your priority balance.

BOTH
Recommended

Use & contribute

The default for most people. Use the network for inference and contribute your idle cycles back. You build priority while you help others — the network works best when everyone does both.

Power users: run from the terminal

$ dollama connect
$ dollama serve
$ dollama both

Add --auto-start to launch on boot. See dollama --help for all options.

The more you give, the faster you go

No tokens, no blockchain, no marketplace. Just a running tally that rewards generosity.

Balance model

balance = tokens served − tokens consumed

Every token your node serves for others increases your balance. Every token you consume decreases it. Positive balance means you're a net contributor. Negative means you've used more than you've given.

  • Single queue — all requests enter one queue, sorted by balance. Higher balance = served first.
  • Idle network — when nodes are free, everyone gets instant service regardless of balance.
  • New users — start at zero. You can use the network immediately, but contributors get priority when it's busy.

Concurrency & groups

Burst usage

1–3 concurrent requests cost 1x tokens each. 4+ concurrent requests cost 2x — allowing short bursts but discouraging sustained hogging of network resources.

Team pooling

Groups can pool contributions under a shared balance. Run contributor nodes on office machines and everyone on the team benefits from the shared priority.

No speculation

Balances can't be traded, sold, or transferred outside a group. This is a coordination mechanism, not a financial instrument.

What we see, what we don't

Transparency over marketing. Here's exactly how your data flows through the network today.

What stays on your machine

Your files, repository context, and working directory never leave your machine. The local proxy assembles context locally — only the final inference prompt is sent to the network.

What the coordinator sees

In v1, all traffic routes through the coordinator relay. This means the coordinator can see your inference prompts in plaintext. This is a known tradeoff for simplicity in the initial release. We don't log prompt content, but you should be aware of this when deciding what to send through the network.

What contributor nodes see

Only the raw inference prompt and generated tokens. No file access, no user identity, no conversation history beyond the current request. Nodes are stateless — they process a prompt and move on.

The roadmap: end-to-end encryption

Phase 4 introduces direct peer connections (WebRTC/QUIC) with end-to-end encryption. The coordinator would handle routing only — prompts would be encrypted between your machine and the contributor node. Until then, treat the network like any cloud API: don't send secrets you wouldn't send to a hosted LLM provider.

Install in one command

🦙
Ollama
Required runtime
💾
8GB+ RAM
Recommended
💻
macOS / Linux / Win
Cross-platform
🧠
qwen3:8b
Network model
Terminal
curl -fsSL https://dollama.net/install.sh | sh

Requires Ollama installed with qwen3:8b pulled if you plan to contribute.

PowerShell
irm https://dollama.net/install.ps1 | iex

Requires Ollama installed with qwen3:8b pulled if you plan to contribute.

Use with Claude Code

1

Start dollama

Run dollama connect to start the local proxy on port 11435. This exposes an Anthropic-compatible Messages API endpoint.

2

Launch Claude Code

Run the one-liner below. It points Claude Code at the dollama proxy and selects the network model. That's it — you're running on community compute.

tip

Want to contribute too?

Run dollama both instead — it starts the proxy and registers your machine as a contributor node in one command.

One-liner
ANTHROPIC_AUTH_TOKEN=ollama ANTHROPIC_BASE_URL=http://localhost:11435 ANTHROPIC_API_KEY="" claude --model qwen3:8b
Or add to ~/.claude/settings.json
{
  "env": {
    "ANTHROPIC_BASE_URL": "http://localhost:11435",
    "ANTHROPIC_AUTH_TOKEN": "ollama",
    "ANTHROPIC_API_KEY": ""
  },
  "model": "qwen3:8b"
}
Pull the model (contributors)
ollama pull qwen3:8b

The herd at a glance

--
Nodes online
Active contributors right now
--
Tokens processed
Total inference served
--
Requests completed
Successful inferences

Contributor Leaderboard preview

RankContributorCompute
1 gpu-farm-42 482,100 tok
2 inference-node-alpha 310,850 tok
3 llm-volunteer-99 198,420 tok
4 weekend-contributor 87,300 tok
5 idle-gpu-donor 45,670 tok

Common questions

Your files, repository context, and working directory never leave your machine — the local proxy assembles context locally. However, the inference prompt itself does pass through the coordinator relay in plaintext in v1. Contributor nodes only see the raw prompt and return generated tokens — no file access, no user identity. See the Privacy section for full details.
In v1, all traffic routes through the coordinator relay, which means inference prompts are visible to the coordinator in plaintext. We don't log prompt content, but we're being upfront about this tradeoff. Your files and repo context stay local — only the inference prompt leaves your machine. No data is sold or shared with third parties. End-to-end encryption via direct peer connections is planned for Phase 4. See the Privacy section for the full breakdown.
Qwen3 8B hits the sweet spot: fast enough to run on consumer hardware (including laptops with decent GPUs), capable enough for agentic coding tasks like tool calls and code edits. A single model across the network keeps things simple and lets us optimize routing. More models may come in future phases.
Ollama installed and running, with qwen3:8b pulled. Then just run dollama serve. The CLI handles registration, heartbeats, and inference routing automatically. Any machine that can run Ollama can contribute — dedicated GPUs are great, but even a modern CPU works.
The coordinator maintains a token ledger for each user: balance = tokens served − tokens consumed. When the network is busy, requests queue in order of balance — higher balance gets served first. When it's idle, everyone gets instant service. Running 4+ concurrent requests costs 2x tokens to discourage sustained hogging. Teams can pool contributions under a shared balance. See the Priority section for details.
Yes. The local proxy exposes an Anthropic Messages API endpoint at localhost:11435. Any tool that supports a custom Anthropic-compatible base URL can use it — Continue, Aider, or your own scripts. We're focused on agentic coding tools but the API is standard.
We like llamas — Ollama is the backbone of this project. But a doe felt right for what we're building: gentle, graceful, and part of a herd. Plus, dollama → doe. It was right there the whole time.

Report a bug or request a feature