API Reference
curator.LLM
The LLM
class serves as the primary interface for prompting Large Language Models in Curator. It provides a flexible and extensible way to generate synthetic data using various LLM providers. Returns CuratorResponse
which holds dataset, statistics (performance, token usage, cost etc) attributes.
Class Definition
class LLM:
def __init__(
self,
model_name: str,
response_format: Type[BaseModel] | None = None,
batch: bool = False,
backend: Optional[str] = None,
generation_params: dict | None = None,
backend_params: BackendParamsType | None = None,
)
Constructor Parameters
model_name
str
Required
Name of the LLM to use
response_format
Type[BaseModel] | None
None
Pydantic model specifying the expected response format
batch
bool
False
Enable batch processing mode
backend
Optional[str]
None
LLM backend to use ("openai", "litellm", or "vllm"). Auto-determined if None
generation_params
dict | None
None
Additional parameters for the generation API
backend_params
BackendParamsType | None
None
Configuration parameters for request processor
Backend Parameters Configuration
The backend_params
dictionary supports various configuration options based on the execution mode. Here's a comprehensive breakdown:
Common Parameters
These parameters are available across all backends:
max_retries
int
3
Maximum number of retry attempts for failed requests
require_all_responses
bool
False
Whether to require successful responses for all prompts
base_url
Optional[str]
None
Optional base URL for API endpoint
request_timeout
int
600
Timeout in seconds for each request
api_key
Optional[str]
None
Api key for the selected model.
in_mtok_cost
Optional[int]
None
Optional cost per million input tokens.
out_mtok_cost
Optional[int]
None
Optional cost per million output tokens.
invalid_finish_reasons
Optional[list]
['content_filter', 'length'
]
List of api finish reasons which are considered failed.
# Example: Common parameters configuration
backend_params = {
"max_retries": 3,
"require_all_responses": True,
"base_url": "https://custom-endpoint.com/v1",
"request_timeout": 300
}
Online Mode Parameters
Parameters for online processor mode:
max_requests_per_minute
int
Maximum number of API requests per minute
max_tokens_per_minute
int
Maximum number of tokens per minute
seconds_to_pause_on_rate_limit
float
Duration to pause when rate limited
max_concurrent_requests
int
Maximum number of concurrent requests.
max_input_tokens_per_minute
int
Maximum number of input tokens allowed per minute. Note: Only valid with seperate token strategy i.e Anthropic.
max_output_tokens_per_minute
int
Maximum number of output tokens allowed per minute. Note: Only valid with seperate token strategy i.e Anthropic.
# Example: Online mode configuration
backend_params = {
"max_requests_per_minute": 2000,
"max_tokens_per_minute": 4_000_000,
"seconds_to_pause_on_rate_limit": 15.0
}
Batch Processing Parameters
Parameters available when batch=True
:
batch_size
int
Number of prompts to process in each batch
batch_check_interval
float
Time interval between batch completion checks
delete_successful_batch_files
bool
Whether to delete successful batch files
delete_failed_batch_files
bool
Whether to delete failed batch files
completion_window
str
Time window to wait for batch completion.
Note: only valid for some providers.
# Example: Batch processing configuration
backend_params = {
"batch_size": 100,
"batch_check_interval": 1.0,
"delete_successful_batch_files": True,
"delete_failed_batch_files": False,
}
Offline Mode Parameters (VLLM)
Parameters for local model deployment with VLLM:
tensor_parallel_size
int
Number of GPUs for tensor parallelism
enforce_eager
bool
Whether to enforce eager execution
max_model_length
int
Maximum sequence length for the model
max_tokens
int
Maximum tokens for generation
min_tokens
int
Minimum tokens for generation
gpu_memory_utilization
float
Target GPU memory utilization (0.0 to 1.0)
batch_size
int
Batch size for VLLM processing
# Example: VLLM configuration
backend_params = {
"tensor_parallel_size": 2,
"max_model_length": 4096,
"max_tokens": 2048,
"min_tokens": 1,
"gpu_memory_utilization": 0.85,
"batch_size": 32
}
Methods
prompt()
def prompt(self, input: _DictOrBaseModel) -> _DictOrBaseModel
Generates a prompt for the LLM based on the input data.
Parameters
input
: Input row used to construct the prompt
Returns
A prompt that can be either:
A string for a single user prompt
A list of dictionaries for multiple messages
Example
def prompt(self, input: dict) -> str:
return f"Generate a {input['type']} about {input['topic']}"
parse()
def parse(self, input: _DictOrBaseModel, response: _DictOrBaseModel) -> _DictOrBaseModel
Processes the LLM's response and optionally can be used to combine it with the input data.
Parameters
input
: Original input row used for the promptresponse
: Raw response from the LLM
Returns
A parsed output combining the input and response data
Example
def parse(self, input: dict, response: str) -> dict:
return {
"prompt_topic": input["topic"],
"generated_text": response,
"timestamp": datetime.now().isoformat()
}
Returns
A CuratorResponse
object which consists statistics about token usage, performance and cost along with the dataset and viewer link.
CuratorResponse
CuratorResponse
Attributes
Core Data
dataset (Dataset): The curated dataset
cache_dir (Optional[str]): Directory for caching results
failed_requests_path (Optional[Path]): Path to file containing failed requests
viewer_url (Optional[str]): URL for Curator Viewer
batch_mode (bool): Whether the processing was done in batch mode
Model Information
model_name (str): Name of the LLM model used
max_requests_per_minute (int | None): Rate limit for requests per minute
max_tokens_per_minute (int | None): Rate limit for tokens per minute
Statistics
token_usage (TokenUsage): Statistics about token usage
cost_info (CostInfo): Information about processing costs
request_stats (RequestStats): Statistics about request processing
performance_stats (PerformanceStats): Performance metrics
metadata (Dict[str, Any]): Additional metadata
Response Format (Optional)
The response_format
class attribute can be set to a Pydantic model to enforce structured output:
from pydantic import BaseModel
class RecipeResponse(BaseModel):
title: str
ingredients: List[str]
instructions: List[str]
class RecipeGenerator(LLM):
response_format = RecipeResponse
Usage Examples
Basic Usage
class Cuisines(BaseModel):
"""A list of cuisines."""
cuisines_list: List[str] = Field(description="A list of cuisines.")
class CuisineGenerator(curator.LLM):
"""A cuisine generator that generates diverse cuisines."""
response_format = Cuisines
def prompt(self, input: dict) -> str:
"""Generate a prompt for the cuisine generator."""
return "Generate 10 diverse cuisines."
def parse(self, input: dict, response: Cuisines) -> dict:
"""Parse the model response along with the input to the model into the desired output format.."""
return [{"cuisine": t} for t in response.cuisines_list]
Last updated