For the complete documentation index, see llms.txt. This page is also available as Markdown.

API Reference

curator.LLM

The LLM class serves as the primary interface for prompting Large Language Models in Curator. It provides a flexible and extensible way to generate synthetic data using various LLM providers. Returns CuratorResponse which holds dataset, statistics (performance, token usage, cost etc) attributes.

Class Definition

class LLM:
    def __init__(
        self,
        model_name: str,
        response_format: Type[BaseModel] | None = None,
        batch: bool = False,
        backend: Optional[str] = None,
        generation_params: dict | None = None,
        backend_params: BackendParamsType | None = None,
    )

Constructor Parameters

Parameter
Type
Default
Description

model_name

str

Required

Name of the LLM to use

response_format

Type[BaseModel] | None

None

Pydantic model specifying the expected response format

batch

bool

False

Enable batch processing mode

backend

Optional[str]

None

LLM backend to use ("openai", "litellm", or "vllm"). Auto-determined if None

generation_params

dict | None

None

Additional parameters for the generation API

backend_params

BackendParamsType | None

None

Configuration parameters for request processor

Backend Parameters Configuration

The backend_params dictionary supports various configuration options based on the execution mode. Here's a comprehensive breakdown:

Common Parameters

These parameters are available across all backends:

Parameter
Type
Default
Description

max_retries

int

3

Maximum number of retry attempts for failed requests

require_all_responses

bool

False

Whether to require successful responses for all prompts

base_url

Optional[str]

None

Optional base URL for API endpoint

request_timeout

int

600

Timeout in seconds for each request

api_key

Optional[str]

None

Api key for the selected model.

in_mtok_cost

Optional[int]

None

Optional cost per million input tokens.

out_mtok_cost

Optional[int]

None

Optional cost per million output tokens.

invalid_finish_reasons

Optional[list]

['content_filter', 'length']

List of api finish reasons which are considered failed.

Online Mode Parameters

Parameters for online processor mode:

Parameter
Type
Description

max_requests_per_minute

int

Maximum number of API requests per minute

max_tokens_per_minute

int

Maximum number of tokens per minute

seconds_to_pause_on_rate_limit

float

Duration to pause when rate limited

max_concurrent_requests

int

Maximum number of concurrent requests.

max_input_tokens_per_minute

int

Maximum number of input tokens allowed per minute. Note: Only valid with seperate token strategy i.e Anthropic.

max_output_tokens_per_minute

int

Maximum number of output tokens allowed per minute. Note: Only valid with seperate token strategy i.e Anthropic.

Batch Processing Parameters

Parameters available when batch=True:

Parameter
Type
Description

batch_size

int

Number of prompts to process in each batch

batch_check_interval

float

Time interval between batch completion checks

delete_successful_batch_files

bool

Whether to delete successful batch files

delete_failed_batch_files

bool

Whether to delete failed batch files

completion_window

str

Time window to wait for batch completion.

Note: only valid for some providers.

Offline Mode Parameters (VLLM)

Parameters for local model deployment with VLLM:

Parameter
Type
Description

tensor_parallel_size

int

Number of GPUs for tensor parallelism

enforce_eager

bool

Whether to enforce eager execution

max_model_length

int

Maximum sequence length for the model

max_tokens

int

Maximum tokens for generation

min_tokens

int

Minimum tokens for generation

gpu_memory_utilization

float

Target GPU memory utilization (0.0 to 1.0)

batch_size

int

Batch size for VLLM processing

Methods

prompt()

Generates a prompt for the LLM based on the input data.

Parameters

  • input: Input row used to construct the prompt

Returns

A prompt that can be either:

  1. A string for a single user prompt

  2. A list of dictionaries for multiple messages

Example

parse()

Processes the LLM's response and optionally can be used to combine it with the input data.

Parameters

  • input: Original input row used for the prompt

  • response: Raw response from the LLM

Returns

A parsed output combining the input and response data

Example

Returns

A CuratorResponse object which consists statistics about token usage, performance and cost along with the dataset and viewer link.

CuratorResponse

Attributes

Core Data

  • dataset (Dataset): The curated dataset

  • cache_dir (Optional[str]): Directory for caching results

  • failed_requests_path (Optional[Path]): Path to file containing failed requests

  • viewer_url (Optional[str]): URL for Curator Viewer

  • batch_mode (bool): Whether the processing was done in batch mode

Model Information

  • model_name (str): Name of the LLM model used

  • max_requests_per_minute (int | None): Rate limit for requests per minute

  • max_tokens_per_minute (int | None): Rate limit for tokens per minute

Statistics

  • token_usage (TokenUsage): Statistics about token usage

  • cost_info (CostInfo): Information about processing costs

  • request_stats (RequestStats): Statistics about request processing

  • performance_stats (PerformanceStats): Performance metrics

  • metadata (Dict[str, Any]): Additional metadata

Response Format (Optional)

The response_format class attribute can be set to a Pydantic model to enforce structured output:

Usage Examples

Basic Usage

Last updated