Choked Class
The main class for creating configurable rate limiting decorators with dual limiting support.Class Signature
Constructor Parameters
Redis connection URL for distributed rate limiting. Mutually exclusive with api_token.Examples:
"redis://localhost:6379/0""redis://user:pass@host:6379/0""redis://localhost:6379"
API token for managed rate limiting service. Mutually exclusive with redis_url.Contact us for access to the managed service.
Decorator Parameters
When using a Choked instance as a decorator, it accepts these parameters:Unique identifier for the rate limit bucket. Functions with the same key share the same rate limits.Examples:
"openai_chat""user_123_api""embedding_service"
Request rate limit in format ‘number/period’. Optional if token_limit is provided.Format:
"number/period" where period is "s" (seconds) or "m" (minutes)Examples:"10/s"- 10 requests per second"100/m"- 100 requests per minute"1000/s"- 1000 requests per second
Token rate limit in format ‘number/period’. Optional if request_limit is provided.Format:
"number/period" where period is "s" (seconds) or "m" (minutes)Examples:"1000/s"- 1000 tokens per second"100000/m"- 100,000 tokens per minute"50000/m"- 50,000 tokens per minute
Token estimation method. Required when using token_limit.Options:
"openai"- Use OpenAI/tiktoken for text estimation"voyageai"- Use VoyageAI tokenizer for text estimation"default"- Use tiktoken with GPT-4 tokenizer (same as “openai”)
Returns
The__call__ method returns a decorator function that can be applied to both synchronous and asynchronous functions.
Usage Examples
Basic Request Limiting
Token-Only Limiting for AI APIs
Dual Limiting
Async Function Support
Multiple Services
Shared Rate Limits
Behavior Details
Automatic Function Detection
- The decorator automatically detects if the wrapped function is async or sync
- No special configuration needed for async functions
- Both sync and async functions can use the same decorator parameters
Token Estimation
- Token estimators automatically extract text from function arguments
- Supports string arguments, keyword arguments, lists of strings
- Special handling for OpenAI message format:
[{"role": "user", "content": "text"}] - Graceful fallback if estimation fails
Rate Limiting Logic
- Request-only: Each call consumes 1 request token
- Token-only: Each call consumes estimated tokens based on input
- Dual limiting: Each call must acquire both request tokens AND estimated tokens
- Both limits must be satisfied for function to proceed
Exponential Backoff
- When rate limited, sleep time doubles on each retry
- Random jitter (0.8x to 1.2x) applied to prevent thundering herd
- Automatic retry until tokens become available
Error Handling
- Network failures result in automatic retry with backoff
- Token estimation failures fall back to simpler estimators
- Invalid rate formats raise
ValueErrorimmediately
Validation
Constructor Validation
Decorator Validation
Performance Considerations
Backend Performance
- Redis backend: Atomic Lua scripts, minimal network overhead
- Managed service: HTTP-based, additional network latency
- Both backends are optimized for high-throughput scenarios
Token Estimation Performance
- Tokenizers are cached after first use
- Estimation is fast for typical text sizes
- Fallback mechanisms prevent blocking on estimation failures
Memory Usage
- Minimal memory footprint per rate limit bucket
- Token estimators cache models efficiently
- No memory leaks in long-running applications
Thread and Process Safety
- All operations are thread-safe and process-safe
- Redis backend uses atomic operations
- Safe for use in multi-threaded web applications
- Supports distributed rate limiting across multiple processes/servers