Skip to main content

Choked Class

The main class for creating configurable rate limiting decorators with dual limiting support.

Class Signature

class Choked:
    def __init__(self, redis_url: Optional[str] = None, api_token: Optional[str] = None)
    def __call__(self, key: str, request_limit: Optional[str] = None, 
                 token_limit: Optional[str] = None, token_estimator: Optional[str] = None) -> Callable

Constructor Parameters

redis_url
str
Redis connection URL for distributed rate limiting. Mutually exclusive with api_token.Examples:
  • "redis://localhost:6379/0"
  • "redis://user:pass@host:6379/0"
  • "redis://localhost:6379"
api_token
str
API token for managed rate limiting service. Mutually exclusive with redis_url.Contact us for access to the managed service.

Decorator Parameters

When using a Choked instance as a decorator, it accepts these parameters:
key
str
required
Unique identifier for the rate limit bucket. Functions with the same key share the same rate limits.Examples:
  • "openai_chat"
  • "user_123_api"
  • "embedding_service"
request_limit
str
Request rate limit in format ‘number/period’. Optional if token_limit is provided.Format: "number/period" where period is "s" (seconds) or "m" (minutes)Examples:
  • "10/s" - 10 requests per second
  • "100/m" - 100 requests per minute
  • "1000/s" - 1000 requests per second
token_limit
str
Token rate limit in format ‘number/period’. Optional if request_limit is provided.Format: "number/period" where period is "s" (seconds) or "m" (minutes)Examples:
  • "1000/s" - 1000 tokens per second
  • "100000/m" - 100,000 tokens per minute
  • "50000/m" - 50,000 tokens per minute
token_estimator
str
Token estimation method. Required when using token_limit.Options:
  • "openai" - Use OpenAI/tiktoken for text estimation
  • "voyageai" - Use VoyageAI tokenizer for text estimation
  • "default" - Use tiktoken with GPT-4 tokenizer (same as “openai”)

Returns

The __call__ method returns a decorator function that can be applied to both synchronous and asynchronous functions.

Usage Examples

Basic Request Limiting

from choked import Choked

choke = Choked(redis_url="redis://localhost:6379/0")

@choke(key="api_calls", request_limit="10/s")
def make_api_call():
    # This function is rate limited to 10 requests per second
    return "API response"

Token-Only Limiting for AI APIs

from choked import Choked

choke = Choked(api_token="your-api-token")

@choke(key="openai_embed", token_limit="1000000/m", token_estimator="openai")
def get_embeddings(texts):
    # Rate limited by estimated tokens only
    return openai.embeddings.create(input=texts, model="text-embedding-3-small")

Dual Limiting

from choked import Choked

choke = Choked(redis_url="redis://localhost:6379/0")

@choke(key="gpt4_chat", request_limit="50/s", token_limit="100000/m", token_estimator="openai")
def chat_completion(messages):
    # Limited by both requests (50/s) AND tokens (100K/m)
    return openai.chat.completions.create(
        model="gpt-4",
        messages=messages
    )

Async Function Support

import asyncio
from choked import Choked

choke = Choked(redis_url="redis://localhost:6379/0")

@choke(key="async_api", request_limit="5/s")
async def async_api_call(data):
    # Async functions are automatically detected
    await asyncio.sleep(0.1)
    return f"Processed {data}"

async def main():
    result = await async_api_call("test data")
    print(result)

asyncio.run(main())

Multiple Services

from choked import Choked

# Different backends for different services
openai_choke = Choked(api_token="openai-service-token")
redis_choke = Choked(redis_url="redis://localhost:6379/0")

@openai_choke(key="gpt4", request_limit="50/s", token_limit="100000/m", token_estimator="openai")
def openai_call(messages):
    return openai.chat.completions.create(model="gpt-4", messages=messages)

@redis_choke(key="internal_api", request_limit="1000/s")
def internal_call():
    return "Internal API response"

Shared Rate Limits

from choked import Choked

choke = Choked(redis_url="redis://localhost:6379/0")

@choke(key="shared_resource", request_limit="10/m")
def function_a():
    return "A"

@choke(key="shared_resource", request_limit="10/m")
def function_b():
    return "B"

# Both functions compete for the same 10 requests/minute

Behavior Details

Automatic Function Detection

  • The decorator automatically detects if the wrapped function is async or sync
  • No special configuration needed for async functions
  • Both sync and async functions can use the same decorator parameters

Token Estimation

  • Token estimators automatically extract text from function arguments
  • Supports string arguments, keyword arguments, lists of strings
  • Special handling for OpenAI message format: [{"role": "user", "content": "text"}]
  • Graceful fallback if estimation fails

Rate Limiting Logic

  • Request-only: Each call consumes 1 request token
  • Token-only: Each call consumes estimated tokens based on input
  • Dual limiting: Each call must acquire both request tokens AND estimated tokens
  • Both limits must be satisfied for function to proceed

Exponential Backoff

  • When rate limited, sleep time doubles on each retry
  • Random jitter (0.8x to 1.2x) applied to prevent thundering herd
  • Automatic retry until tokens become available

Error Handling

  • Network failures result in automatic retry with backoff
  • Token estimation failures fall back to simpler estimators
  • Invalid rate formats raise ValueError immediately

Validation

Constructor Validation

# Must specify exactly one backend
Choked()  # ValueError: Must specify either redis_url or api_token
Choked(redis_url="...", api_token="...")  # ValueError: Cannot specify both

# Valid constructors
Choked(redis_url="redis://localhost:6379/0")  # ✓
Choked(api_token="your-token")  # ✓

Decorator Validation

choke = Choked(redis_url="redis://localhost:6379/0")

# Must provide at least one limit
@choke(key="api")  # ValueError: At least one limit must be provided

# Invalid rate formats
@choke(key="api", request_limit="invalid")  # ValueError: Invalid rate format
@choke(key="api", request_limit="10/h")     # ValueError: Invalid period 'h'
@choke(key="api", request_limit="10")       # ValueError: Missing period

# Valid decorators
@choke(key="api", request_limit="10/s")  # ✓
@choke(key="api", token_limit="1000/m", token_estimator="openai")  # ✓
@choke(key="api", request_limit="10/s", token_limit="1000/m", token_estimator="openai")  # ✓

Performance Considerations

Backend Performance

  • Redis backend: Atomic Lua scripts, minimal network overhead
  • Managed service: HTTP-based, additional network latency
  • Both backends are optimized for high-throughput scenarios

Token Estimation Performance

  • Tokenizers are cached after first use
  • Estimation is fast for typical text sizes
  • Fallback mechanisms prevent blocking on estimation failures

Memory Usage

  • Minimal memory footprint per rate limit bucket
  • Token estimators cache models efficiently
  • No memory leaks in long-running applications

Thread and Process Safety

  • All operations are thread-safe and process-safe
  • Redis backend uses atomic operations
  • Safe for use in multi-threaded web applications
  • Supports distributed rate limiting across multiple processes/servers

Integration Examples

FastAPI Integration

from fastapi import FastAPI
from choked import Choked

app = FastAPI()
choke = Choked(redis_url="redis://localhost:6379/0")

@app.post("/chat")
@choke(key="openai_chat", request_limit="50/s", token_limit="100000/m", token_estimator="openai")
async def chat_endpoint(messages: list[dict]):
    return openai.chat.completions.create(model="gpt-4", messages=messages)

Flask Integration

from flask import Flask, request
from choked import Choked

app = Flask(__name__)
choke = Choked(redis_url="redis://localhost:6379/0")

@app.route("/api", methods=["POST"])
@choke(key="api_endpoint", request_limit="100/m")
def api_endpoint():
    return {"result": "success"}

Celery Task Integration

from celery import Celery
from choked import Choked

celery_app = Celery("tasks")
choke = Choked(redis_url="redis://localhost:6379/0")

@celery_app.task
@choke(key="background_task", request_limit="10/s")
def background_task(data):
    # Process data with rate limiting
    return process_data(data)