dollama.net — API Reference

Authentication

Bearer token auth

All authenticated endpoints require a Bearer token in the Authorization header. Tokens use the olm_ prefix.

Authorization header

Authorization: Bearer olm_your_api_key_here

Get a key automatically by running dollama private (auto-provisions on first run) or dollama login for GitHub-linked auth. You can also call POST /v1/auth/register or use the POST /v1/auth/device GitHub Device Auth flow directly.

How it works

The relay stores a SHA-256 hash of your key — the plaintext is never persisted
Auth lookups are cached in Redis for 5 minutes
Session tokens (sess_ prefix) are single-use and expire after 5 minutes

Main endpoint

Send an inference request

POST /v1/messages Authenticated

Send a prompt to the network and receive a streaming SSE response. The request is routed to an available node based on priority tier.

Request body

Field	Type	Description
model	string	Model identifier, e.g. `network:qwen3.5:9b`
payload	object	Full Anthropic Messages API request body (opaque — passed directly to the node)

Response headers

Header	Type	Description
Content-Type	text/event-stream	SSE stream
X-Routing-Tier	string	Routing tier used: `own_node`, `group`, `priority`, or `best_effort`

Examples

curl

curl --no-buffer -X POST https://api.dollama.net/v1/messages \
  -H "Authorization: Bearer olm_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "network:qwen3.5:9b",
    "payload": {
      "model": "network:qwen3.5:9b",
      "max_tokens": 1024,
      "messages": [
        { "role": "user", "content": "Explain monads in one sentence." }
      ],
      "stream": true
    }
  }'

Python (httpx)

import httpx

url = "https://api.dollama.net/v1/messages"
headers = {
    "Authorization": "Bearer olm_your_api_key_here",
    "Content-Type": "application/json",
}
body = {
    "model": "network:qwen3.5:9b",
    "payload": {
        "model": "network:qwen3.5:9b",
        "max_tokens": 1024,
        "messages": [
            {"role": "user", "content": "Explain monads in one sentence."}
        ],
        "stream": True,
    },
}

with httpx.stream("POST", url, headers=headers, json=body) as r:
    for line in r.iter_lines():
        if line.startswith("data: "):
            print(line[6:])

Node.js (fetch)

const res = await fetch("https://api.dollama.net/v1/messages", {
  method: "POST",
  headers: {
    "Authorization": "Bearer olm_your_api_key_here",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "network:qwen3.5:9b",
    payload: {
      model: "network:qwen3.5:9b",
      max_tokens: 1024,
      messages: [
        { role: "user", content: "Explain monads in one sentence." }
      ],
      stream: true,
    },
  }),
});

const reader = res.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const chunk = decoder.decode(value, { stream: true });
  for (const line of chunk.split("\n")) {
    if (line.startsWith("data: ")) {
      console.log(line.slice(6));
    }
  }
}

Go

package main

import (
	"bufio"
	"bytes"
	"encoding/json"
	"fmt"
	"net/http"
	"strings"
)

func main() {
	body, _ := json.Marshal(map[string]any{
		"model": "network:qwen3.5:9b",
		"payload": map[string]any{
			"model":      "network:qwen3.5:9b",
			"max_tokens": 1024,
			"messages": []map[string]string{
				{"role": "user", "content": "Explain monads in one sentence."},
			},
			"stream": true,
		},
	})

	req, _ := http.NewRequest("POST", "https://api.dollama.net/v1/messages", bytes.NewReader(body))
	req.Header.Set("Authorization", "Bearer olm_your_api_key_here")
	req.Header.Set("Content-Type", "application/json")

	resp, err := http.DefaultClient.Do(req)
	if err != nil {
		panic(err)
	}
	defer resp.Body.Close()

	scanner := bufio.NewScanner(resp.Body)
	for scanner.Scan() {
		line := scanner.Text()
		if strings.HasPrefix(line, "data: ") {
			fmt.Println(line[6:])
		}
	}
}

Network info

Network status

GET /v1/status Public

Returns live network statistics including online nodes, capacity, request totals, and reliability metrics.

Response

200 OK

{
  "status": "ok",
  "version": "v0.19.14",
  "network_model": "qwen3.5:9b",
  "network": {
    "online_nodes": 15,
    "total_nodes": 15,
    "total_capacity": 300,
    "used_capacity": 45,
    "totalRequests": 5000,
    "totalTokens": 2500000,
    "contributor_nodes": 15,
    "chat_users": 250,
    "reliability": {
      "total_requests": 5000,
      "success_rate": 0.98,
      "avg_ttft_ms": 1200.5,
      "avg_tps": 25.3
    }
  }
}

Ledger

Token balance

GET /v1/ledger/balance Authenticated

Returns your current token balance and active request count. Balance determines your priority tier.

Response

200 OK

{
  "tokens_served": 100000,
  "tokens_self_served": 50000,
  "tokens_consumed": 75000,
  "balance": 75000,
  "active_requests": 2
}

Field	Type	Description
tokens_served	int	Tokens earned by running nodes
tokens_self_served	int	Tokens earned from own-node usage (unmetered)
tokens_consumed	int	Tokens spent making requests
balance	int	Net balance (served + self_served - consumed)
active_requests	int	Currently in-flight requests

Request control

Cancel a request

POST /v1/cancel Authenticated

Cancel an in-flight inference request. The node is notified and the request transitions to a terminal state.

Request body

JSON

{
  "request_id": "req_..."
}

Response

200 OK

{
  "status": "ok"
}

Status	Type	Description
200	OK	Cancelled successfully or already in terminal state
400	Bad Request	Missing `request_id`
403	Forbidden	Request belongs to a different user

Streaming

SSE event format

The /v1/messages endpoint streams Anthropic-compatible SSE events. Events are pre-formatted by the node and passed through the relay.

Full stream example

event: message_start
data: {"type":"message_start","message":{"id":"msg_...","type":"message","role":"assistant","content":[],"model":"qwen3.5:9b","stop_reason":null,"usage":{"input_tokens":25,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"A monad"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" is a design pattern"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":42}}

event: message_stop
data: {"type":"message_stop"}

Event reference

Event	When	Key fields
message_start	First event in the stream	`message.id`, `message.model`, `usage.input_tokens`
content_block_start	New content block begins	`index`, `content_block.type` (`text` or `tool_use`)
content_block_delta	Each token/chunk	`delta.text` (text) or `delta.partial_json` (tool use)
content_block_stop	Content block complete	`index`
message_delta	Message metadata update	`delta.stop_reason`, `usage.output_tokens`
message_stop	Stream complete	(none)
error	Inference error	`error.type`, `error.message`

Error handling

Error responses

All errors return a JSON body with error and message fields. Rate limit errors include a Retry-After header.

Status	Reason	When
400	Bad Request	Invalid JSON, missing `model` or `payload`, malformed `Authorization` header
401	Unauthorized	Missing or invalid API key / session token
403	Forbidden	Account banned or request belongs to another user
408	Request Timeout	No node accepted the request within the assignment deadline (5s)
413	Payload Too Large	Request body exceeds 10 MB
429	Too Many Requests	Rate limited (30 req/min) or concurrent limit reached (10)
503	Service Unavailable	Queue full (200 max) or no nodes online

Admission control

Rate limits

30

Requests / minute

Per user, fixed window

10

Concurrent requests

Per user, active at once

10 MB

Max payload

Supports base64 images

When a limit is hit, the response includes a Retry-After header with the number of seconds to wait. Requests beyond 3 concurrent incur a 2x token cost to the ledger.

Timeouts

Timeout	Value	Description
Stream timeout	90s	Maximum total time for a streaming response
Idle timeout	15s	Maximum time without receiving a token before error
Assignment timeout	5s	Maximum time to assign the request to a node