Data Plane and Traffic Flow

The data plane handles the actual request traffic, with the External Processor (ExtProc) playing a central role in managing AI-specific processing.

Components

The data plane consists of several key components:

1. Envoy Proxy

The core proxy that handles all incoming traffic and integrates with:

External Processor for AI-specific processing
Rate Limit Service for token-based rate limiting
Various AI providers as backends

2. AI Gateway External Processor

A specialized extension service of Envoy Proxy that handles AI-specific processing needs. It performs three main functions:

Request Processing
- Routes requests to appropriate AI providers
- Handles model selection and validation
- Manages provider-specific authentication
- Supports different API formats (OpenAI, AWS Bedrock)
Token Management
- Tracks token usage from AI providers
- Handles both streaming and non-streaming responses
- Provides usage data for rate limiting decisions
Provider Integration
- Transforms requests between different AI provider formats
- Normalizes responses to a consistent format
- Manages provider-specific requirements

3. Rate Limit Service

Handles token-based rate limiting by:

Tracking token usage across requests
Enforcing rate limits based on token consumption
Managing rate limit budgets

Request Processing Flow

sequenceDiagram
    participant Client as Client (OpenAI SDK)
    participant Envoy as Envoy Proxy
    participant RLS as Rate Limit Service
    participant Processor as AI Gateway External Processor
    participant Provider as AI Provider / Upstream

    Client->>Envoy: Request
    Envoy->>Processor: Router-level ExtProc Request
    Note over Processor: Extract Model Name
    Processor-->>Envoy: ClearRouteCache;
    Envoy->>RLS: Check Rate Limit
    RLS-->>Envoy: ;
    loop Retry/Fallback loop
        Note over Envoy: Select Upstream/Endpoint
        Envoy->>Processor: Upstream level ExtProc Request
        Note over Processor: Request-Transform & Upstream Authnz
        Processor-->>Envoy: ;
        Envoy->>Provider: Forward Request
        Provider-->>Envoy: Response
    end
    Envoy->>Processor: Process Response
    Note over Processor: Response Transform & Extract Token Usage
    Processor-->>Envoy: Add Usage Metadata
    Envoy->>RLS: Reduce Rate Limit budget
    RLS-->>Envoy: ;
    Envoy->>Client: Response

The data plane processes requests through several key steps:

1. Request Path

Routing: Calculates the destination AI provider based on:
- Request path
- Headers
- Model name extracted from the request body
Request Transformation: Prepares the request for the provider:
- Request body transformation
- Request path modification
- Format adaptation
Upstream Authorization: Handles provider authentication:
- API key management
- Header modifications
- Authentication token handling
Token Rate Limiting Check: Checks the request against the Rate Limit Service:
- Validates token usage
- Enforces rate limits based on configured budgets

2. Response Path

Response Transformation:
- Transforms provider response for client compatibility
- Normalizes response format
- Handles streaming responses
Token Usage Management:
- Extracts token usage from responses
- Calculates usage based on configuration
- Stores usage in per-request dynamic metadata
- Enables rate limiting based on token consumption

Notable Rationale

Why the External Processor is separated into two phases (Router-level and Upstream-level):
- In Envoy, retry/fallback happens after the router filter at the upstream level. For example, when the upstream server returns 5xx, Envoy does not invoke the router level filter again. Instead, it invokes only the upstream level filters. In our case, retry/fallback will make the requests to totally different AI providers. For example, on the first try, it goes to OpenAI, and on the second try, it goes to AWS Bedrock. In this case, we need to do different request transformations and upstream authorizations. So, this logic needs to be in the upstream level filter.
Why the External Processor?
- The External Processor is the most powerful battle-tested and production-ready extension point in Envoy. It allows us to implement complex logic without modifying Envoy’s core codebase.
- Dynamic Modules could be a future alternative as it offers better performance as well as less complexity in the overall architecture. The work is tracked in envoyproxy/ai-gateway#90.

Next Steps

To learn more:

Explore the System Architecture
Check out our Getting Started guide for hands-on experience

Data Plane and Traffic Flow

Data Plane and Traffic Flow

Components

1. Envoy Proxy

2. AI Gateway External Processor

3. Rate Limit Service

Request Processing Flow

1. Request Path

2. Response Path

Notable Rationale

Next Steps

404

页面不见了