Q AI — Architecture — YoorQuezt Docs

This document covers the internal architecture of Q AI, including the agentic loop, model selection strategy, tool routing, and deployment topology.

System Architecture

                        ┌─────────────────────────┐
                        │      Load Balancer       │
                        │    (nginx / ALB)         │
                        └───────────┬─────────────┘
                                    │
                        ┌───────────▼─────────────┐
                        │      Q AI API            │
                        │      (:9100)             │
                        ├─────────────────────────┤
                        │  ┌───────────────────┐  │
                        │  │   Router           │  │
                        │  │  /chat             │  │
                        │  │  /chat/stream      │  │
                        │  │  /chat/confirm     │  │
                        │  │  /tools            │  │
                        │  └─────────┬─────────┘  │
                        │            │             │
                        │  ┌─────────▼─────────┐  │
                        │  │  Agentic Loop      │  │
                        │  │  ┌─────────────┐   │  │
                        │  │  │ Provider    │   │  │
                        │  │  │ Selector    │   │  │
                        │  │  └──────┬──────┘   │  │
                        │  │         │          │  │
                        │  │  ┌──────▼──────┐   │  │
                        │  │  │ Tool Router │   │  │
                        │  │  └──────┬──────┘   │  │
                        │  │         │          │  │
                        │  │  ┌──────▼──────┐   │  │
                        │  │  │ Confirm Gate│   │  │
                        │  │  └─────────────┘   │  │
                        │  └───────────────────┘  │
                        └──┬───────┬──────┬───────┘
                           │       │      │
              ┌────────────▼┐  ┌───▼────┐ ┌▼──────────┐
              │ MCP Server  │  │Gateway │ │ LLM APIs  │
              │ (16 tools)  │  │(:9099) │ │ (Claude,  │
              └──────┬──────┘  └───┬────┘ │  GPT...)  │
                     │             │      └───────────┘
              ┌──────▼──────┐  ┌───▼─────────┐
              │ MEV Engine  │  │ Chain RPCs   │
              │ (:8080)     │  │ (ETH, BSC..) │
              └─────────────┘  └──────────────┘

Agentic Loop

The agentic loop is the core processing pipeline for every Q AI request:

┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
│  Input   │────►│ Context  │────►│   LLM    │────►│  Tool    │
│ Parser   │     │ Builder  │     │ Provider │     │ Executor │
└──────────┘     └──────────┘     └──────┬───┘     └────┬─────┘
                                         │              │
                                         │    ┌─────────▼──────┐
                                         │    │  Result Buffer  │
                                         │    └─────────┬──────┘
                                         │              │
                                         ◄──────────────┘
                                    (loop until done)
                                         │
                                  ┌──────▼───────┐
                                  │  Response    │
                                  │  Synthesizer │
                                  └──────────────┘

Step-by-Step

Input Parser -- validates the incoming JSON-RPC request, extracts the user message, conversation ID, and role.
Context Builder -- assembles the LLM prompt:
- System prompt with MEV domain knowledge
- Conversation history (up to context window limit)
- Available tools filtered by user role (RBAC)
- Current engine state summary (optional, for complex queries)
LLM Provider -- sends the assembled prompt to the selected model. The provider returns either a text response or one or more tool calls.
Tool Executor -- if the LLM requests tool calls:
- Validates tool names and parameters
- Checks RBAC permissions
- For mutating tools: pauses and returns a confirmation request to the user
- For read-only tools: executes immediately
- Returns results to the LLM for the next iteration
Result Buffer -- accumulates tool results. The loop continues (steps 3-4) until the LLM produces a final text response with no further tool calls. Typically 1-3 iterations.
Response Synthesizer -- formats the final response for the client (JSON for API, markdown for chat, structured data for SDKs).

Confirmation Gate

Mutating tools require explicit user confirmation:

// Q AI response when a mutating tool is requested
{
  "type": "confirmation_required",
  "action": "bundle_submit",
  "params": {
    "txs": ["0xabc...", "0xdef..."],
    "chain": "ethereum",
    "blockNumber": 19482300
  },
  "message": "I'll submit this 2-tx bundle targeting block 19,482,300 on Ethereum. Proceed?",
  "confirm_id": "conf_8a7b6c"
}

// User confirms via POST /api/v1/chat/confirm
{
  "confirm_id": "conf_8a7b6c",
  "confirmed": true
}

Model Selection Strategy

Query Complexity	Criteria	Model Selected
Simple	Status checks, health, single-tool queries	Llama 3 8B / Mistral Medium
Standard	Multi-step queries, analytics, bundle analysis	Claude Sonnet / GPT-4o
Complex	Forensics, multi-chain analysis, strategy evaluation	Claude Opus / GPT-4-turbo
Code	Transaction decoding, contract analysis	DeepSeek Coder / Claude Sonnet

Selection heuristics:

Token count -- short queries (fewer than 50 tokens) default to fast models.
Tool count -- queries likely requiring 3+ tools escalate to larger models.
Keyword detection -- "forensic", "analyze", "compare" trigger complex models.
Explicit override -- users can specify model in the request.
Fallback -- if the primary provider times out (30s), the next provider in the chain is used.

Tool Routing

User Query: "Show me relay stats and submit this bundle"
                    │
        ┌───────────▼────────────┐
        │    Tool Router         │
        │                        │
        │  1. Parse tool calls   │
        │  2. Check permissions  │
        │  3. Classify mutation  │
        └───┬────────────┬───────┘
            │            │
    ┌───────▼──┐   ┌─────▼──────┐
    │ Read-Only│   │  Mutating  │
    │          │   │            │
    │ relay_   │   │ bundle_    │
    │ stats    │   │ submit     │
    │          │   │            │
    │ Execute  │   │ HOLD for   │
    │ immediately│ │ confirmation│
    └──────────┘   └────────────┘

Read-only tools execute in parallel when the LLM requests multiple. Mutating tools are serialized and gated.

Deployment Components

Component	Container	Port	Health Check	Purpose
Q AI API	`q-ai-api`	9100	`GET /healthz`	Main API server
MCP Server	`q-ai-mcp`	9101	`GET /health`	MCP tool server
MEV Engine	`engine`	8080	`GET /health`	MEV extraction engine
Gateway	`gateway`	9099	`GET /health`	WebSocket proxy
Redis	`redis`	6379	`redis-cli ping`	Conversation cache
Prometheus	`prometheus`	9090	`GET /-/healthy`	Metrics collection
Grafana	`grafana`	3000	`GET /api/health`	Dashboards

Docker Compose

version: "3.8"

services:
  q-ai-api:
    image: yoorquezt/q-ai:latest
    ports:
      - "9100:9100"
    environment:
      - Q_AI_PORT=9100
      - Q_AI_ENGINE_URL=http://engine:8080
      - Q_AI_GATEWAY_URL=ws://gateway:9099
      - Q_AI_MCP_URL=http://q-ai-mcp:9101
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - Q_AI_REDIS_URL=redis://redis:6379
      - Q_AI_LOG_LEVEL=info
    depends_on:
      engine:
        condition: service_healthy
      gateway:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:9100/healthz"]
      interval: 10s
      timeout: 5s
      retries: 3

  q-ai-mcp:
    image: yoorquezt/q-ai-mcp:latest
    ports:
      - "9101:9101"
    environment:
      - MCP_PORT=9101
      - MCP_ENGINE_URL=http://engine:8080
      - MCP_LOG_LEVEL=info
    depends_on:
      engine:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:9101/health"]
      interval: 10s
      timeout: 5s
      retries: 3

  engine:
    image: yoorquezt/mev-engine:latest
    ports:
      - "8080:8080"
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:8080/health"]
      interval: 10s
      timeout: 5s
      retries: 3

  gateway:
    image: yoorquezt/yqmev-gateway:latest
    ports:
      - "9099:9099"
    environment:
      - YQMEV_UPSTREAM_URL=http://engine:8080
    depends_on:
      engine:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:9099/health"]
      interval: 10s
      timeout: 5s
      retries: 3

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 3

# Start the full Q AI stack
docker compose -f docker-compose.q-ai.yaml up -d

# Check health
curl http://localhost:9100/healthz