// Chapter 12 · Concepts

The Demand Side

Where the network's work comes from.

8 min5 sectionsConcepts

Where the work comes from.

// 12.0 · ParalleliX AI · the demand side · open-source models on network compute

// Demand path

One live demand surface is ParalleliX AI, a consumer chat app (ChatGPT / Claude style) that runs open-source models on network compute. The Compute API (below) is the second. Deposits live in an on-chain vault (withdraw any time) and the operator share settles on-chain per request; spending is metered off-chain, so it is not trustless per message. Using it costs $PRLX, metered as credits. Each prompt is one inference request the coordinator dispatches whole to a single capable registered node; a single inference is autoregressive and is not split across nodes. The user sends a message and gets a response.

Open ParalleliX AI Telegram botai.parallelix.io · t.me/ParalleliXAIbot

// Console · ParalleliX AI · chat

ParalleliX. AIcredits · off-chain

Summarize this paper's method in three bullets.

The method has three parts: first, it

Deposit $PRLX once on-chain, metered off-chain (no per-message gas). The request runs whole on a registered operator node and returns a Proof-of-Execution. Open-source models, custodial v1.

The chat surface. Deposit $PRLX into the on-chain vault; every prompt is metered off-chain against the credit balance with no per-message gas; withdraw any time.

// Console · Open-source model · running on the pool

parallelix ai // open-source model · running8 live

requests routed to capable nodesoss-models

node-11

99%rtx-4090

node-13

91%rtx-4090

node-15

88%a6000

node-04

87%rtx-4090

node-08

78%a6000

node-07

72%a6000

node-02

61%rtx-4090

node-09

··rtx-3090

answers streamingcredits · metered off-chain

1980 TFLOPS · fp168 streaming

Under the chat: each whole inference request streams on its own node across a heterogeneous GPU pool. Throughput scales with the number of online nodes, not by splitting one prompt.

The five steps

// 12.1 · deposit once on-chain · meter off-chain · no per-message gas

// ParalleliX AI flow · 5 stepson-chain vault · off-chain metering

// step 01
Sign up
Create a ParalleliX AI account. No wallet signature is needed to use the app once credits are funded.
// step 02
Fund credits
Deposit $PRLX into the on-chain ParalleliXAICredits vault. The coordinator mirrors it to a spendable balance; there is no per-message gas. Withdraw unspent credits any time. Sign in once with your wallet to authorize spending.
// step 03
Send a prompt
The request is metered off-chain (message_cost = model_rate × tokens_processed) and debited from the credit balance.
// step 04
Dispatch
The coordinator routes the inference to a registered operator node in the pool, or to project-operated fallback hardware (project GPUs / server CPU) when no network node is online.
// step 05
Receive
The model response streams back. The $PRLX spent flows to the credit ledger, and a share settles to the operators whose nodes served the work.

How a prompt is metered

// 12.2 · message_cost = model_rate × tokens_processed · off-chain

Caveat·Illustrative

The figures below are illustrative. Exact model rates and the token-accounting surface are not finalised before v1. The shape is canonical: deposit once on-chain, meter off-chain per request.

// metering · illustrative

// ParalleliX AI · off-chain metering (illustrative)
deposit            on-chain vault → spendable credits (withdraw any time)
sign-in            wallet signature → session (authorizes spend)
prompt             "summarise this paper"
model_rate         0.0000040 PRLX / token
tokens_processed   1,420
message_cost       model_rate × tokens_processed = 0.00568 PRLX
debit              off-chain meter (no per-message gas)
dispatch           → online node  (fallback: project GPU / server CPU)
settle             operator share on-chain, per served request

What the credit buys

// 12.3 · metering, inference, settlement

// Credit covers · 3 stages

// 01Metering
Off-chain debit of message_cost = model_rate × tokens_processed against the funded credit balance. No per-message gas.
// 02Inference
Open-source models run on network compute. Each request is dispatched whole to one capable registered node (or to fallback hardware), since a single inference is not split across nodes.
// 03Settlement
A share of the spent $PRLX settles to the operators whose nodes served the request. This is the steady-state operator funding source.

The pipeline is general enough to accept direct compute submissions paid in $PRLX, but a permissionless open compute marketplace for third-party submitters is planned, not live. Today the demand is ParalleliX AI, and its inference runs on the registered operator nodes.

The Compute API

// 12.4 · the developer door · OpenAI-compatible · parallel batch · MCP

// Second demand surface

ParalleliX AI is the consumer door. The Compute API is the developer door: the same network, reached programmatically.

A script, an OpenAI-compatible tool, or an AI agent can run inference on the network without a human in the loop. It is served at api.parallelix.io, authenticated by API keys, and billed against the same $PRLX credit balance as the chat app. Spend is metered off-chain by the coordinator; the 85% operator share settles on-chain per request.

// Three doors · one billing rail

// 01OpenAI-compatible
POST /v1/chat/completions speaks the OpenAI chat-completions format. Any tool that already talks to OpenAI points at ParalleliX by changing one base URL. The response carries an extra parallelix object with the serving node id and Proof-of-Execution hash.
// 02Parallel batch
POST /v1/batch takes an array of prompts and fans them out across the online nodes at once. One submission, N sub-tasks executing in parallel, each returning its own result and PoE. This is the parallel-native shape a single OpenAI call cannot express.
// 03MCP connector
parallelix-mcp (on npm) wraps the API as Model Context Protocol tools for Claude Code and Claude Desktop. The pattern: a frontier agent orchestrates; the open-source fleet runs the bulk parallel sub-tasks through a single parallel_map call.

// api surface/v1

# ParalleliX Compute API · api.parallelix.io
POST /v1/chat/completions   OpenAI-compatible single completion
POST /v1/batch              submit prompts[], fan out across the network
GET  /v1/batch/{id}         per-item status + result + PoE (owner-scoped)
GET  /v1/models             models the live network serves right now
GET  /v1/usage              this key's requests, credits spent, balance

# auth: Authorization: Bearer pk_live_…  (created in the app, Developers)
# billing: spends the key wallet's $PRLX credits; 85% settles to operators

The network runs open-source models (currently 7B-class). The Compute API is a cheap parallel executor for bulk independent sub-tasks, not a frontier model. Capacity is the set of nodes online at request time, and GET /v1/models and GET /v1/usage report what is actually available.

Create an API keyai.parallelix.io/developers · npx parallelix-mcp

// Where to go next · reading path

Sign up

Fund credits

Send a prompt

Dispatch

Receive