Rate Limiting¶
When making many requests to AI Agents, you may need to control the request rate to avoid overwhelming the StackSpot AI API or hitting server-side rate limits.
Terminology: Rate Limiting vs Throttling
The SDK uses "rate limiting" terminology, but the actual behavior is throttling — requests are delayed (queued) rather than immediately rejected. This proactive approach maximizes successful requests. See HTTP Client > Terminology for details.
Global Configuration (Recommended)¶
The easiest way to enable rate limiting is via STKAI.configure():
from stkai import STKAI, Agent, ChatRequest
# Enable rate limiting globally
STKAI.configure(
rate_limit={
"enabled": True,
"strategy": "token_bucket",
"max_requests": 60,
"time_window": 60.0,
}
)
# Rate limiting is automatically applied
agent = Agent(agent_id="my-assistant")
response = agent.chat(ChatRequest(user_prompt="Hello"))
Or via environment variables:
export STKAI_RATE_LIMIT_ENABLED=true
export STKAI_RATE_LIMIT_STRATEGY=token_bucket
export STKAI_RATE_LIMIT_MAX_REQUESTS=60
export STKAI_RATE_LIMIT_TIME_WINDOW=60.0
Full Configuration Reference
See HTTP Client > Rate Limiting for all configuration options, strategies comparison, algorithms explanation, and environment variables.
Manual Configuration¶
For more control, you can manually create rate-limited HTTP clients:
from stkai import Agent, ChatRequest
from stkai import TokenBucketRateLimitedHttpClient, StkCLIHttpClient
# Limit to 60 requests per minute
http_client = TokenBucketRateLimitedHttpClient(
delegate=StkCLIHttpClient(),
max_requests=60,
time_window=60.0,
)
agent = Agent(
agent_id="my-assistant",
http_client=http_client,
)
response = agent.chat(ChatRequest(user_prompt="Hello"))
For adaptive rate limiting (handles HTTP 429 automatically):
from stkai import AdaptiveRateLimitedHttpClient, StkCLIHttpClient
http_client = AdaptiveRateLimitedHttpClient(
delegate=StkCLIHttpClient(),
max_requests=100,
time_window=60.0,
min_rate_floor=0.1, # Never below 10%
max_retries_on_429=3, # Retry on 429
)
agent = Agent(
agent_id="my-assistant",
http_client=http_client,
)
Next Steps¶
- HTTP Client > Rate Limiting - Detailed guide with algorithms, strategies, and configuration
- Configuration - Global SDK configuration
- API Reference - Complete API documentation