Back to dollama.net

API Reference

REST + SSE API for the dollama network. Base URL: https://api.dollama.net

Bearer token auth

All authenticated endpoints require a Bearer token in the Authorization header. Tokens use the olm_ prefix.

Authorization header
Authorization: Bearer olm_your_api_key_here

Get a key automatically by running dollama connect (auto-provisions on first run) or dollama login for GitHub-linked auth. You can also call POST /v1/auth/register or use the POST /v1/auth/device GitHub Device Auth flow directly.

How it works
  • The relay stores a SHA-256 hash of your key — the plaintext is never persisted
  • Auth lookups are cached in Redis for 5 minutes
  • Session tokens (sess_ prefix) are single-use and expire after 5 minutes

Send an inference request

POST /v1/messages Authenticated

Send a prompt to the network and receive a streaming SSE response. The request is routed to an available node based on priority tier.

Request body
FieldTypeDescription
modelstringModel identifier, e.g. network:qwen3.5:9b
payloadobjectFull Anthropic Messages API request body (opaque — passed directly to the node)
Response headers
HeaderTypeDescription
Content-Typetext/event-streamSSE stream
X-Routing-TierstringRouting tier used: own_node, group, priority, or best_effort
Examples
curl
curl --no-buffer -X POST https://api.dollama.net/v1/messages \
  -H "Authorization: Bearer olm_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "network:qwen3.5:9b",
    "payload": {
      "model": "network:qwen3.5:9b",
      "max_tokens": 1024,
      "messages": [
        { "role": "user", "content": "Explain monads in one sentence." }
      ],
      "stream": true
    }
  }'
Python (httpx)
import httpx

url = "https://api.dollama.net/v1/messages"
headers = {
    "Authorization": "Bearer olm_your_api_key_here",
    "Content-Type": "application/json",
}
body = {
    "model": "network:qwen3.5:9b",
    "payload": {
        "model": "network:qwen3.5:9b",
        "max_tokens": 1024,
        "messages": [
            {"role": "user", "content": "Explain monads in one sentence."}
        ],
        "stream": True,
    },
}

with httpx.stream("POST", url, headers=headers, json=body) as r:
    for line in r.iter_lines():
        if line.startswith("data: "):
            print(line[6:])
Node.js (fetch)
const res = await fetch("https://api.dollama.net/v1/messages", {
  method: "POST",
  headers: {
    "Authorization": "Bearer olm_your_api_key_here",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "network:qwen3.5:9b",
    payload: {
      model: "network:qwen3.5:9b",
      max_tokens: 1024,
      messages: [
        { role: "user", content: "Explain monads in one sentence." }
      ],
      stream: true,
    },
  }),
});

const reader = res.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const chunk = decoder.decode(value, { stream: true });
  for (const line of chunk.split("\n")) {
    if (line.startsWith("data: ")) {
      console.log(line.slice(6));
    }
  }
}
Go
package main

import (
	"bufio"
	"bytes"
	"encoding/json"
	"fmt"
	"net/http"
	"strings"
)

func main() {
	body, _ := json.Marshal(map[string]any{
		"model": "network:qwen3.5:9b",
		"payload": map[string]any{
			"model":      "network:qwen3.5:9b",
			"max_tokens": 1024,
			"messages": []map[string]string{
				{"role": "user", "content": "Explain monads in one sentence."},
			},
			"stream": true,
		},
	})

	req, _ := http.NewRequest("POST", "https://api.dollama.net/v1/messages", bytes.NewReader(body))
	req.Header.Set("Authorization", "Bearer olm_your_api_key_here")
	req.Header.Set("Content-Type", "application/json")

	resp, err := http.DefaultClient.Do(req)
	if err != nil {
		panic(err)
	}
	defer resp.Body.Close()

	scanner := bufio.NewScanner(resp.Body)
	for scanner.Scan() {
		line := scanner.Text()
		if strings.HasPrefix(line, "data: ") {
			fmt.Println(line[6:])
		}
	}
}

Network status

GET /v1/status Public

Returns live network statistics including online nodes, capacity, request totals, and reliability metrics.

Response
200 OK
{
  "status": "ok",
  "version": "v0.19.14",
  "network_model": "qwen3.5:9b",
  "network": {
    "online_nodes": 15,
    "total_nodes": 15,
    "total_capacity": 300,
    "used_capacity": 45,
    "totalRequests": 5000,
    "totalTokens": 2500000,
    "contributor_nodes": 15,
    "chat_users": 250,
    "reliability": {
      "total_requests": 5000,
      "success_rate": 0.98,
      "avg_ttft_ms": 1200.5,
      "avg_tps": 25.3
    }
  }
}

Token balance

GET /v1/ledger/balance Authenticated

Returns your current token balance and active request count. Balance determines your priority tier.

Response
200 OK
{
  "tokens_served": 100000,
  "tokens_self_served": 50000,
  "tokens_consumed": 75000,
  "balance": 75000,
  "active_requests": 2
}
FieldTypeDescription
tokens_servedintTokens earned by running nodes
tokens_self_servedintTokens earned from own-node usage (unmetered)
tokens_consumedintTokens spent making requests
balanceintNet balance (served + self_served - consumed)
active_requestsintCurrently in-flight requests

Cancel a request

POST /v1/cancel Authenticated

Cancel an in-flight inference request. The node is notified and the request transitions to a terminal state.

Request body
JSON
{
  "request_id": "req_..."
}
Response
200 OK
{
  "status": "ok"
}
StatusTypeDescription
200OKCancelled successfully or already in terminal state
400Bad RequestMissing request_id
403ForbiddenRequest belongs to a different user

SSE event format

The /v1/messages endpoint streams Anthropic-compatible SSE events. Events are pre-formatted by the node and passed through the relay.

Full stream example
event: message_start
data: {"type":"message_start","message":{"id":"msg_...","type":"message","role":"assistant","content":[],"model":"qwen3.5:9b","stop_reason":null,"usage":{"input_tokens":25,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"A monad"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" is a design pattern"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":42}}

event: message_stop
data: {"type":"message_stop"}
Event reference
EventWhenKey fields
message_startFirst event in the streammessage.id, message.model, usage.input_tokens
content_block_startNew content block beginsindex, content_block.type (text or tool_use)
content_block_deltaEach token/chunkdelta.text (text) or delta.partial_json (tool use)
content_block_stopContent block completeindex
message_deltaMessage metadata updatedelta.stop_reason, usage.output_tokens
message_stopStream complete(none)
errorInference errorerror.type, error.message

Error responses

All errors return a JSON body with error and message fields. Rate limit errors include a Retry-After header.

StatusReasonWhen
400Bad RequestInvalid JSON, missing model or payload, malformed Authorization header
401UnauthorizedMissing or invalid API key / session token
403ForbiddenAccount banned or request belongs to another user
408Request TimeoutNo node accepted the request within the assignment deadline (5s)
413Payload Too LargeRequest body exceeds 10 MB
429Too Many RequestsRate limited (30 req/min) or concurrent limit reached (10)
503Service UnavailableQueue full (200 max) or no nodes online

Rate limits

30
Requests / minute
Per user, fixed window
10
Concurrent requests
Per user, active at once
10 MB
Max payload
Supports base64 images

When a limit is hit, the response includes a Retry-After header with the number of seconds to wait. Requests beyond 3 concurrent incur a 2x token cost to the ledger.

Timeouts
TimeoutValueDescription
Stream timeout90sMaximum total time for a streaming response
Idle timeout15sMaximum time without receiving a token before error
Assignment timeout5sMaximum time to assign the request to a node