LLM API documentation

curator.LLM

The LLM class serves as the primary interface for prompting Large Language Models in Curator. It provides a flexible and extensible way to generate synthetic data using various LLM providers.

Class Definition

class LLM:
    def __init__(
        self,
        model_name: str,
        response_format: Type[BaseModel] | None = None,
        batch: bool = False,
        backend: Optional[str] = None,
        generation_params: dict | None = None,
        backend_params: BackendParamsType | None = None,
    )

Constructor Parameters

Parameter
Type
Default
Description

model_name

str

Required

Name of the LLM to use

response_format

Type[BaseModel] | None

None

Pydantic model specifying the expected response format

batch

bool

False

Enable batch processing mode

backend

Optional[str]

None

LLM backend to use ("openai", "litellm", or "vllm"). Auto-determined if None

generation_params

dict | None

None

Additional parameters for the generation API

backend_params

BackendParamsType | None

None

Configuration parameters for request processor

Backend Parameters Configuration

The backend_params dictionary supports various configuration options based on the execution mode. Here's a comprehensive breakdown:

Common Parameters

These parameters are available across all backends:

Parameter
Type
Default
Description

max_retries

int

3

Maximum number of retry attempts for failed requests

require_all_responses

bool

False

Whether to require successful responses for all prompts

base_url

Optional[str]

None

Optional base URL for API endpoint

request_timeout

int

600

Timeout in seconds for each request

# Example: Common parameters configuration
backend_params = {
    "max_retries": 3,
    "require_all_responses": True,
    "base_url": "https://custom-endpoint.com/v1",
    "request_timeout": 300
}

Online Mode Parameters

Parameters for online processor mode:

Parameter
Type
Description

max_requests_per_minute

int

Maximum number of API requests per minute

max_tokens_per_minute

int

Maximum number of tokens per minute

seconds_to_pause_on_rate_limit

float

Duration to pause when rate limited

# Example: Online mode configuration
backend_params = {
    "max_requests_per_minute": 2000,
    "max_tokens_per_minute": 4_000_000,
    "seconds_to_pause_on_rate_limit": 15.0
}

Batch Processing Parameters

Parameters available when batch=True:

Parameter
Type
Description

batch_size

int

Number of prompts to process in each batch

batch_check_interval

float

Time interval between batch completion checks

delete_successful_batch_files

bool

Whether to delete successful batch files

delete_failed_batch_files

bool

Whether to delete failed batch files

# Example: Batch processing configuration
backend_params = {
    "batch_size": 100,
    "batch_check_interval": 1.0,
    "delete_successful_batch_files": True,
    "delete_failed_batch_files": False,
}

Offline Mode Parameters (VLLM)

Parameters for local model deployment with VLLM:

Parameter
Type
Description

tensor_parallel_size

int

Number of GPUs for tensor parallelism

enforce_eager

bool

Whether to enforce eager execution

max_model_length

int

Maximum sequence length for the model

max_tokens

int

Maximum tokens for generation

min_tokens

int

Minimum tokens for generation

gpu_memory_utilization

float

Target GPU memory utilization (0.0 to 1.0)

batch_size

int

Batch size for VLLM processing

# Example: VLLM configuration
backend_params = {
    "tensor_parallel_size": 2,
    "max_model_length": 4096,
    "max_tokens": 2048,
    "min_tokens": 1,
    "gpu_memory_utilization": 0.85,
    "batch_size": 32
}

Methods

prompt()

def prompt(self, input: _DictOrBaseModel) -> _DictOrBaseModel

Generates a prompt for the LLM based on the input data.

Parameters

  • input: Input row used to construct the prompt

Returns

A prompt that can be either:

  1. A string for a single user prompt

  2. A list of dictionaries for multiple messages

Example

def prompt(self, input: dict) -> str:
    return f"Generate a {input['type']} about {input['topic']}"

parse()

def parse(self, input: _DictOrBaseModel, response: _DictOrBaseModel) -> _DictOrBaseModel

Processes the LLM's response and optionally can be used to combine it with the input data.

Parameters

  • input: Original input row used for the prompt

  • response: Raw response from the LLM

Returns

A parsed output combining the input and response data

Example

def parse(self, input: dict, response: str) -> dict:
    return {
        "prompt_topic": input["topic"],
        "generated_text": response,
        "timestamp": datetime.now().isoformat()
    }

Response Format (Optional)

The response_format class attribute can be set to a Pydantic model to enforce structured output:

from pydantic import BaseModel

class RecipeResponse(BaseModel):
    title: str
    ingredients: List[str]
    instructions: List[str]

class RecipeGenerator(LLM):
    response_format = RecipeResponse

Usage Examples

Basic Usage


class Cuisines(BaseModel):
    """A list of cuisines."""

    cuisines_list: List[str] = Field(description="A list of cuisines.")


class CuisineGenerator(curator.LLM):
    """A cuisine generator that generates diverse cuisines."""

    response_format = Cuisines

    def prompt(self, input: dict) -> str:
        """Generate a prompt for the cuisine generator."""
        return "Generate 10 diverse cuisines."

    def parse(self, input: dict, response: Cuisines) -> dict:
        """Parse the model response along with the input to the model into the desired output format.."""
        return [{"cuisine": t} for t in response.cuisines_list]

Last updated