API Reference

curator.LLM

The LLM class serves as the primary interface for prompting Large Language Models in Curator. It provides a flexible and extensible way to generate synthetic data using various LLM providers. Returns CuratorResponse which holds dataset, statistics (performance, token usage, cost etc) attributes.

Class Definition

class LLM:
    def __init__(
        self,
        model_name: str,
        response_format: Type[BaseModel] | None = None,
        batch: bool = False,
        backend: Optional[str] = None,
        generation_params: dict | None = None,
        backend_params: BackendParamsType | None = None,
    )

Constructor Parameters

Parameter

Type

Default

Description

model_name

str

Required

Name of the LLM to use

response_format

Type[BaseModel] | None

None

Pydantic model specifying the expected response format

batch

bool

False

Enable batch processing mode

backend

Optional[str]

None

LLM backend to use ("openai", "litellm", or "vllm"). Auto-determined if None

generation_params

dict | None

None

Additional parameters for the generation API

backend_params

BackendParamsType | None

None

Configuration parameters for request processor

Backend Parameters Configuration

The backend_params dictionary supports various configuration options based on the execution mode. Here's a comprehensive breakdown:

Common Parameters

These parameters are available across all backends:

Parameter

Type

Default

Description

max_retries

int

3

Maximum number of retry attempts for failed requests

require_all_responses

bool

False

Whether to require successful responses for all prompts

base_url

Optional[str]

None

Optional base URL for API endpoint

request_timeout

int

600

Timeout in seconds for each request

api_key

Optional[str]

None

Api key for the selected model.

in_mtok_cost

Optional[int]

None

Optional cost per million input tokens.

out_mtok_cost

Optional[int]

None

Optional cost per million output tokens.

invalid_finish_reasons

Optional[list]

['content_filter', 'length']

List of api finish reasons which are considered failed.

# Example: Common parameters configuration
backend_params = {
    "max_retries": 3,
    "require_all_responses": True,
    "base_url": "https://custom-endpoint.com/v1",
    "request_timeout": 300
}

Online Mode Parameters

Parameters for online processor mode:

Parameter

Type

Description

max_requests_per_minute

int

Maximum number of API requests per minute

max_tokens_per_minute

int

Maximum number of tokens per minute

seconds_to_pause_on_rate_limit

float

Duration to pause when rate limited

max_concurrent_requests

int

Maximum number of concurrent requests.

max_input_tokens_per_minute

int

Maximum number of input tokens allowed per minute. Note: Only valid with seperate token strategy i.e Anthropic.

max_output_tokens_per_minute

int

Maximum number of output tokens allowed per minute. Note: Only valid with seperate token strategy i.e Anthropic.

# Example: Online mode configuration
backend_params = {
    "max_requests_per_minute": 2000,
    "max_tokens_per_minute": 4_000_000,
    "seconds_to_pause_on_rate_limit": 15.0
}

Batch Processing Parameters

Parameters available when batch=True:

Parameter

Type

Description

batch_size

int

Number of prompts to process in each batch

batch_check_interval

float

Time interval between batch completion checks

delete_successful_batch_files

bool

Whether to delete successful batch files

delete_failed_batch_files

bool

Whether to delete failed batch files

completion_window

str

Time window to wait for batch completion.

Note: only valid for some providers.

# Example: Batch processing configuration
backend_params = {
    "batch_size": 100,
    "batch_check_interval": 1.0,
    "delete_successful_batch_files": True,
    "delete_failed_batch_files": False,
}

Offline Mode Parameters (VLLM)

Parameters for local model deployment with VLLM:

Parameter

Type

Description

tensor_parallel_size

int

Number of GPUs for tensor parallelism

enforce_eager

bool

Whether to enforce eager execution

max_model_length

int

Maximum sequence length for the model

max_tokens

int

Maximum tokens for generation

min_tokens

int

Minimum tokens for generation

gpu_memory_utilization

float

Target GPU memory utilization (0.0 to 1.0)

batch_size

int

Batch size for VLLM processing

# Example: VLLM configuration
backend_params = {
    "tensor_parallel_size": 2,
    "max_model_length": 4096,
    "max_tokens": 2048,
    "min_tokens": 1,
    "gpu_memory_utilization": 0.85,
    "batch_size": 32
}

Methods

prompt()

def prompt(self, input: _DictOrBaseModel) -> _DictOrBaseModel

Generates a prompt for the LLM based on the input data.

Parameters

input: Input row used to construct the prompt

Returns

A prompt that can be either:

A string for a single user prompt
A list of dictionaries for multiple messages

Example

def prompt(self, input: dict) -> str:
    return f"Generate a {input['type']} about {input['topic']}"

parse()

def parse(self, input: _DictOrBaseModel, response: _DictOrBaseModel) -> _DictOrBaseModel

Processes the LLM's response and optionally can be used to combine it with the input data.

Parameters

input: Original input row used for the prompt
response: Raw response from the LLM

Returns

A parsed output combining the input and response data

Example

def parse(self, input: dict, response: str) -> dict:
    return {
        "prompt_topic": input["topic"],
        "generated_text": response,
        "timestamp": datetime.now().isoformat()
    }

Returns

A CuratorResponse object which consists statistics about token usage, performance and cost along with the dataset and viewer link.

`CuratorResponse`

Attributes

Core Data

dataset (Dataset): The curated dataset
cache_dir (Optional[str]): Directory for caching results
failed_requests_path (Optional[Path]): Path to file containing failed requests
viewer_url (Optional[str]): URL for Curator Viewer
batch_mode (bool): Whether the processing was done in batch mode

Model Information

model_name (str): Name of the LLM model used
max_requests_per_minute (int | None): Rate limit for requests per minute
max_tokens_per_minute (int | None): Rate limit for tokens per minute

Statistics

token_usage (TokenUsage): Statistics about token usage
cost_info (CostInfo): Information about processing costs
request_stats (RequestStats): Statistics about request processing
performance_stats (PerformanceStats): Performance metrics
metadata (Dict[str, Any]): Additional metadata

Response Format (Optional)

The response_format class attribute can be set to a Pydantic model to enforce structured output:

from pydantic import BaseModel

class RecipeResponse(BaseModel):
    title: str
    ingredients: List[str]
    instructions: List[str]

class RecipeGenerator(LLM):
    response_format = RecipeResponse

Usage Examples

Basic Usage


class Cuisines(BaseModel):
    """A list of cuisines."""

    cuisines_list: List[str] = Field(description="A list of cuisines.")


class CuisineGenerator(curator.LLM):
    """A cuisine generator that generates diverse cuisines."""

    response_format = Cuisines

    def prompt(self, input: dict) -> str:
        """Generate a prompt for the cuisine generator."""
        return "Generate 10 diverse cuisines."

    def parse(self, input: dict, response: Cuisines) -> dict:
        """Parse the model response along with the input to the model into the desired output format.."""
        return [{"cuisine": t} for t in response.cuisines_list]

PreviousFinetuning a model to identify features of a product NextBespoke MiniCheck

Last updated 2 months ago