API Reference
REST + SSE API for the dollama network. Base URL: https://api.dollama.net
REST + SSE API for the dollama network. Base URL: https://api.dollama.net
All authenticated endpoints require a Bearer token in the Authorization header. Tokens use the olm_ prefix.
Authorization: Bearer olm_your_api_key_here
Get a key automatically by running dollama connect (auto-provisions on first run) or dollama login for GitHub-linked auth. You can also call POST /v1/auth/register or use the POST /v1/auth/device GitHub Device Auth flow directly.
sess_ prefix) are single-use and expire after 5 minutesSend a prompt to the network and receive a streaming SSE response. The request is routed to an available node based on priority tier.
| Field | Type | Description |
|---|---|---|
| model | string | Model identifier, e.g. network:qwen3.5:9b |
| payload | object | Full Anthropic Messages API request body (opaque — passed directly to the node) |
| Header | Type | Description |
|---|---|---|
| Content-Type | text/event-stream | SSE stream |
| X-Routing-Tier | string | Routing tier used: own_node, group, priority, or best_effort |
curl --no-buffer -X POST https://api.dollama.net/v1/messages \
-H "Authorization: Bearer olm_your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "network:qwen3.5:9b",
"payload": {
"model": "network:qwen3.5:9b",
"max_tokens": 1024,
"messages": [
{ "role": "user", "content": "Explain monads in one sentence." }
],
"stream": true
}
}'
import httpx
url = "https://api.dollama.net/v1/messages"
headers = {
"Authorization": "Bearer olm_your_api_key_here",
"Content-Type": "application/json",
}
body = {
"model": "network:qwen3.5:9b",
"payload": {
"model": "network:qwen3.5:9b",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Explain monads in one sentence."}
],
"stream": True,
},
}
with httpx.stream("POST", url, headers=headers, json=body) as r:
for line in r.iter_lines():
if line.startswith("data: "):
print(line[6:])
const res = await fetch("https://api.dollama.net/v1/messages", {
method: "POST",
headers: {
"Authorization": "Bearer olm_your_api_key_here",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "network:qwen3.5:9b",
payload: {
model: "network:qwen3.5:9b",
max_tokens: 1024,
messages: [
{ role: "user", content: "Explain monads in one sentence." }
],
stream: true,
},
}),
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
for (const line of chunk.split("\n")) {
if (line.startsWith("data: ")) {
console.log(line.slice(6));
}
}
}
package main
import (
"bufio"
"bytes"
"encoding/json"
"fmt"
"net/http"
"strings"
)
func main() {
body, _ := json.Marshal(map[string]any{
"model": "network:qwen3.5:9b",
"payload": map[string]any{
"model": "network:qwen3.5:9b",
"max_tokens": 1024,
"messages": []map[string]string{
{"role": "user", "content": "Explain monads in one sentence."},
},
"stream": true,
},
})
req, _ := http.NewRequest("POST", "https://api.dollama.net/v1/messages", bytes.NewReader(body))
req.Header.Set("Authorization", "Bearer olm_your_api_key_here")
req.Header.Set("Content-Type", "application/json")
resp, err := http.DefaultClient.Do(req)
if err != nil {
panic(err)
}
defer resp.Body.Close()
scanner := bufio.NewScanner(resp.Body)
for scanner.Scan() {
line := scanner.Text()
if strings.HasPrefix(line, "data: ") {
fmt.Println(line[6:])
}
}
}
Returns live network statistics including online nodes, capacity, request totals, and reliability metrics.
{
"status": "ok",
"version": "v0.19.14",
"network_model": "qwen3.5:9b",
"network": {
"online_nodes": 15,
"total_nodes": 15,
"total_capacity": 300,
"used_capacity": 45,
"totalRequests": 5000,
"totalTokens": 2500000,
"contributor_nodes": 15,
"chat_users": 250,
"reliability": {
"total_requests": 5000,
"success_rate": 0.98,
"avg_ttft_ms": 1200.5,
"avg_tps": 25.3
}
}
}
Returns your current token balance and active request count. Balance determines your priority tier.
{
"tokens_served": 100000,
"tokens_self_served": 50000,
"tokens_consumed": 75000,
"balance": 75000,
"active_requests": 2
}
| Field | Type | Description |
|---|---|---|
| tokens_served | int | Tokens earned by running nodes |
| tokens_self_served | int | Tokens earned from own-node usage (unmetered) |
| tokens_consumed | int | Tokens spent making requests |
| balance | int | Net balance (served + self_served - consumed) |
| active_requests | int | Currently in-flight requests |
Cancel an in-flight inference request. The node is notified and the request transitions to a terminal state.
{
"request_id": "req_..."
}
{
"status": "ok"
}
| Status | Type | Description |
|---|---|---|
| 200 | OK | Cancelled successfully or already in terminal state |
| 400 | Bad Request | Missing request_id |
| 403 | Forbidden | Request belongs to a different user |
The /v1/messages endpoint streams Anthropic-compatible SSE events. Events are pre-formatted by the node and passed through the relay.
event: message_start data: {"type":"message_start","message":{"id":"msg_...","type":"message","role":"assistant","content":[],"model":"qwen3.5:9b","stop_reason":null,"usage":{"input_tokens":25,"output_tokens":0}}} event: content_block_start data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"A monad"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" is a design pattern"}} event: content_block_stop data: {"type":"content_block_stop","index":0} event: message_delta data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":42}} event: message_stop data: {"type":"message_stop"}
| Event | When | Key fields |
|---|---|---|
| message_start | First event in the stream | message.id, message.model, usage.input_tokens |
| content_block_start | New content block begins | index, content_block.type (text or tool_use) |
| content_block_delta | Each token/chunk | delta.text (text) or delta.partial_json (tool use) |
| content_block_stop | Content block complete | index |
| message_delta | Message metadata update | delta.stop_reason, usage.output_tokens |
| message_stop | Stream complete | (none) |
| error | Inference error | error.type, error.message |
All errors return a JSON body with error and message fields. Rate limit errors include a Retry-After header.
| Status | Reason | When |
|---|---|---|
| 400 | Bad Request | Invalid JSON, missing model or payload, malformed Authorization header |
| 401 | Unauthorized | Missing or invalid API key / session token |
| 403 | Forbidden | Account banned or request belongs to another user |
| 408 | Request Timeout | No node accepted the request within the assignment deadline (5s) |
| 413 | Payload Too Large | Request body exceeds 10 MB |
| 429 | Too Many Requests | Rate limited (30 req/min) or concurrent limit reached (10) |
| 503 | Service Unavailable | Queue full (200 max) or no nodes online |
When a limit is hit, the response includes a Retry-After header with the number of seconds to wait. Requests beyond 3 concurrent incur a 2x token cost to the ledger.
| Timeout | Value | Description |
|---|---|---|
| Stream timeout | 90s | Maximum total time for a streaming response |
| Idle timeout | 15s | Maximum time without receiving a token before error |
| Assignment timeout | 5s | Maximum time to assign the request to a node |