Welcome to Choked

Choked is a simple and powerful Python rate limiting library that uses the token bucket algorithm to control the rate of function calls with support for both request-based and token-based limiting.

Features

Easy to use: Simple class-based API with decorator pattern
Dual limiting: Support both request limits and token limits (for AI/ML APIs)
Flexible backends: Supports both Redis and managed proxy service backends
Async/Sync support: Works with both synchronous and asynchronous functions
Smart token estimation: Built-in estimators for OpenAI, VoyageAI, and general text
Exponential backoff: Smart retry logic with jitter to prevent thundering herd
Distributed: Share rate limits across multiple processes or servers
Multi-worker scaling: Perfect for managing multiple workers using the same API key

Quick Start

Install choked:

pip install choked

Create a Choked instance and use as decorator:

from choked import Choked

# Using Redis backend
choke = Choked(redis_url="redis://localhost:6379/0")

@choke(key="api_calls", request_limit="10/m")
def make_api_call():
    # This function is rate limited to 10 requests per minute
    return "API response"

# The decorator handles rate limiting automatically
make_api_call()  # Works immediately

How it Works

Choked uses a token bucket algorithm with dual limiting support:

Request limiting: Each function call consumes 1 request token
Token limiting: Each function call consumes estimated tokens based on input text
Buckets refill at steady rates (e.g., “100/s” = 100 tokens per second)
When limits are reached, functions wait with exponential backoff

This allows for burst traffic while maintaining average rate limits, perfect for AI/ML APIs.

Perfect for AI/ML APIs

Choked excels with token-based APIs like OpenAI, VoyageAI, and others:

# Using managed proxy service
choke = Choked(api_token="your-api-token")

@choke(key="openai_chat", request_limit="50/s", token_limit="100000/m", token_estimator="openai")
def chat_completion(messages):
    # Rate limited by both requests (50/s) and tokens (100K/m)
    return openai.chat.completions.create(
        model="gpt-4",
        messages=messages
    )

# Automatic token estimation from messages
# Dual rate limiting prevents both request and token overages

Benefits:

Dual limiting: Respect both request and token limits simultaneously
Smart estimation: Automatic token counting for popular AI services
Auto-scaling: Add/remove workers without changing rate limits
No overages: Never exceed your API provider’s limits

Get Started

Essentials

Introduction

Welcome to Choked

Features

Quick Start

How it Works

Perfect for AI/ML APIs

Get Started

Essentials

​Welcome to Choked

​Features

​Quick Start

​How it Works

​Perfect for AI/ML APIs

Welcome to Choked

Features

Quick Start

How it Works

Perfect for AI/ML APIs