This document covers the internal architecture of Q AI, including the agentic loop, model selection strategy, tool routing, and deployment topology.
System Architecture
┌─────────────────────────┐
│ Load Balancer │
│ (nginx / ALB) │
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
│ Q AI API │
│ (:9100) │
├─────────────────────────┤
│ ┌───────────────────┐ │
│ │ Router │ │
│ │ /chat │ │
│ │ /chat/stream │ │
│ │ /chat/confirm │ │
│ │ /tools │ │
│ └─────────┬─────────┘ │
│ │ │
│ ┌─────────▼─────────┐ │
│ │ Agentic Loop │ │
│ │ ┌─────────────┐ │ │
│ │ │ Provider │ │ │
│ │ │ Selector │ │ │
│ │ └──────┬──────┘ │ │
│ │ │ │ │
│ │ ┌──────▼──────┐ │ │
│ │ │ Tool Router │ │ │
│ │ └──────┬──────┘ │ │
│ │ │ │ │
│ │ ┌──────▼──────┐ │ │
│ │ │ Confirm Gate│ │ │
│ │ └─────────────┘ │ │
│ └───────────────────┘ │
└──┬───────┬──────┬───────┘
│ │ │
┌────────────▼┐ ┌───▼────┐ ┌▼──────────┐
│ MCP Server │ │Gateway │ │ LLM APIs │
│ (16 tools) │ │(:9099) │ │ (Claude, │
└──────┬──────┘ └───┬────┘ │ GPT...) │
│ │ └───────────┘
┌──────▼──────┐ ┌───▼─────────┐
│ MEV Engine │ │ Chain RPCs │
│ (:8080) │ │ (ETH, BSC..) │
└─────────────┘ └──────────────┘
Agentic Loop
The agentic loop is the core processing pipeline for every Q AI request:
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Input │────►│ Context │────►│ LLM │────►│ Tool │
│ Parser │ │ Builder │ │ Provider │ │ Executor │
└──────────┘ └──────────┘ └──────┬───┘ └────┬─────┘
│ │
│ ┌─────────▼──────┐
│ │ Result Buffer │
│ └─────────┬──────┘
│ │
◄──────────────┘
(loop until done)
│
┌──────▼───────┐
│ Response │
│ Synthesizer │
└──────────────┘
Step-by-Step
-
Input Parser -- validates the incoming JSON-RPC request, extracts the user message, conversation ID, and role.
-
Context Builder -- assembles the LLM prompt:
- System prompt with MEV domain knowledge
- Conversation history (up to context window limit)
- Available tools filtered by user role (RBAC)
- Current engine state summary (optional, for complex queries)
-
LLM Provider -- sends the assembled prompt to the selected model. The provider returns either a text response or one or more tool calls.
-
Tool Executor -- if the LLM requests tool calls:
- Validates tool names and parameters
- Checks RBAC permissions
- For mutating tools: pauses and returns a confirmation request to the user
- For read-only tools: executes immediately
- Returns results to the LLM for the next iteration
-
Result Buffer -- accumulates tool results. The loop continues (steps 3-4) until the LLM produces a final text response with no further tool calls. Typically 1-3 iterations.
-
Response Synthesizer -- formats the final response for the client (JSON for API, markdown for chat, structured data for SDKs).
Confirmation Gate
Mutating tools require explicit user confirmation:
// Q AI response when a mutating tool is requested
{
"type": "confirmation_required",
"action": "bundle_submit",
"params": {
"txs": ["0xabc...", "0xdef..."],
"chain": "ethereum",
"blockNumber": 19482300
},
"message": "I'll submit this 2-tx bundle targeting block 19,482,300 on Ethereum. Proceed?",
"confirm_id": "conf_8a7b6c"
}
// User confirms via POST /api/v1/chat/confirm
{
"confirm_id": "conf_8a7b6c",
"confirmed": true
}
Model Selection Strategy
| Query Complexity | Criteria | Model Selected |
|---|---|---|
| Simple | Status checks, health, single-tool queries | Llama 3 8B / Mistral Medium |
| Standard | Multi-step queries, analytics, bundle analysis | Claude Sonnet / GPT-4o |
| Complex | Forensics, multi-chain analysis, strategy evaluation | Claude Opus / GPT-4-turbo |
| Code | Transaction decoding, contract analysis | DeepSeek Coder / Claude Sonnet |
Selection heuristics:
- Token count -- short queries (fewer than 50 tokens) default to fast models.
- Tool count -- queries likely requiring 3+ tools escalate to larger models.
- Keyword detection -- "forensic", "analyze", "compare" trigger complex models.
- Explicit override -- users can specify
modelin the request. - Fallback -- if the primary provider times out (30s), the next provider in the chain is used.
Tool Routing
User Query: "Show me relay stats and submit this bundle"
│
┌───────────▼────────────┐
│ Tool Router │
│ │
│ 1. Parse tool calls │
│ 2. Check permissions │
│ 3. Classify mutation │
└───┬────────────┬───────┘
│ │
┌───────▼──┐ ┌─────▼──────┐
│ Read-Only│ │ Mutating │
│ │ │ │
│ relay_ │ │ bundle_ │
│ stats │ │ submit │
│ │ │ │
│ Execute │ │ HOLD for │
│ immediately│ │ confirmation│
└──────────┘ └────────────┘
Read-only tools execute in parallel when the LLM requests multiple. Mutating tools are serialized and gated.
Deployment Components
| Component | Container | Port | Health Check | Purpose |
|---|---|---|---|---|
| Q AI API | q-ai-api | 9100 | GET /healthz | Main API server |
| MCP Server | q-ai-mcp | 9101 | GET /health | MCP tool server |
| MEV Engine | engine | 8080 | GET /health | MEV extraction engine |
| Gateway | gateway | 9099 | GET /health | WebSocket proxy |
| Redis | redis | 6379 | redis-cli ping | Conversation cache |
| Prometheus | prometheus | 9090 | GET /-/healthy | Metrics collection |
| Grafana | grafana | 3000 | GET /api/health | Dashboards |
Docker Compose
version: "3.8"
services:
q-ai-api:
image: yoorquezt/q-ai:latest
ports:
- "9100:9100"
environment:
- Q_AI_PORT=9100
- Q_AI_ENGINE_URL=http://engine:8080
- Q_AI_GATEWAY_URL=ws://gateway:9099
- Q_AI_MCP_URL=http://q-ai-mcp:9101
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- OPENAI_API_KEY=${OPENAI_API_KEY}
- Q_AI_REDIS_URL=redis://redis:6379
- Q_AI_LOG_LEVEL=info
depends_on:
engine:
condition: service_healthy
gateway:
condition: service_healthy
redis:
condition: service_healthy
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:9100/healthz"]
interval: 10s
timeout: 5s
retries: 3
q-ai-mcp:
image: yoorquezt/q-ai-mcp:latest
ports:
- "9101:9101"
environment:
- MCP_PORT=9101
- MCP_ENGINE_URL=http://engine:8080
- MCP_LOG_LEVEL=info
depends_on:
engine:
condition: service_healthy
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:9101/health"]
interval: 10s
timeout: 5s
retries: 3
engine:
image: yoorquezt/mev-engine:latest
ports:
- "8080:8080"
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:8080/health"]
interval: 10s
timeout: 5s
retries: 3
gateway:
image: yoorquezt/yqmev-gateway:latest
ports:
- "9099:9099"
environment:
- YQMEV_UPSTREAM_URL=http://engine:8080
depends_on:
engine:
condition: service_healthy
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:9099/health"]
interval: 10s
timeout: 5s
retries: 3
redis:
image: redis:7-alpine
ports:
- "6379:6379"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 3
# Start the full Q AI stack
docker compose -f docker-compose.q-ai.yaml up -d
# Check health
curl http://localhost:9100/healthz