LLM API documentation
curator.LLM
The LLM
class serves as the primary interface for prompting Large Language Models in Curator. It provides a flexible and extensible way to generate synthetic data using various LLM providers.
Class Definition
Constructor Parameters
model_name
str
Required
Name of the LLM to use
response_format
Type[BaseModel] | None
None
Pydantic model specifying the expected response format
batch
bool
False
Enable batch processing mode
backend
Optional[str]
None
LLM backend to use ("openai", "litellm", or "vllm"). Auto-determined if None
generation_params
dict | None
None
Additional parameters for the generation API
backend_params
BackendParamsType | None
None
Configuration parameters for request processor
Backend Parameters Configuration
The backend_params
dictionary supports various configuration options based on the execution mode. Here's a comprehensive breakdown:
Common Parameters
These parameters are available across all backends:
max_retries
int
3
Maximum number of retry attempts for failed requests
require_all_responses
bool
False
Whether to require successful responses for all prompts
base_url
Optional[str]
None
Optional base URL for API endpoint
request_timeout
int
600
Timeout in seconds for each request
Online Mode Parameters
Parameters for online processor mode:
max_requests_per_minute
int
Maximum number of API requests per minute
max_tokens_per_minute
int
Maximum number of tokens per minute
seconds_to_pause_on_rate_limit
float
Duration to pause when rate limited
Batch Processing Parameters
Parameters available when batch=True
:
batch_size
int
Number of prompts to process in each batch
batch_check_interval
float
Time interval between batch completion checks
delete_successful_batch_files
bool
Whether to delete successful batch files
delete_failed_batch_files
bool
Whether to delete failed batch files
Offline Mode Parameters (VLLM)
Parameters for local model deployment with VLLM:
tensor_parallel_size
int
Number of GPUs for tensor parallelism
enforce_eager
bool
Whether to enforce eager execution
max_model_length
int
Maximum sequence length for the model
max_tokens
int
Maximum tokens for generation
min_tokens
int
Minimum tokens for generation
gpu_memory_utilization
float
Target GPU memory utilization (0.0 to 1.0)
batch_size
int
Batch size for VLLM processing
Methods
prompt()
Generates a prompt for the LLM based on the input data.
Parameters
input
: Input row used to construct the prompt
Returns
A prompt that can be either:
A string for a single user prompt
A list of dictionaries for multiple messages
Example
parse()
Processes the LLM's response and optionally can be used to combine it with the input data.
Parameters
input
: Original input row used for the promptresponse
: Raw response from the LLM
Returns
A parsed output combining the input and response data
Example
Response Format (Optional)
The response_format
class attribute can be set to a Pydantic model to enforce structured output:
Usage Examples
Basic Usage
Last updated