Key Concepts

Key Components of curator.LLM

Conceptually, curator.LLM has two important methods, prompt and parse.

class Poem(BaseModel):
    poem: str = Field(description="A poem.")


class Poems(BaseModel):
    poems_list: List[Poem] = Field(description="A list of poems.")
    
class Poet(curator.LLM):
    response_format = Poems

    def prompt(self, input: Dict) -> str:
        return f"Write two poems about {input['topic']}."

    def parse(self, input: Dict, response: Poems) -> Dict:
        return [{"topic": input["topic"], "poem": p.poem} for p in response.poems]

prompt

This calls an LLM on each row of the input dataset in parallel.

Takes a dataset row as input
Returns the prompt for the LLM.

parse

Converts LLM output into structured data by adding it back to the dataset.

Takes two arguments:
- Input row (this was given to the LLM).
- LLM's response (in response_format --- string or Pydantic)
Returns new rows (in list of dictionaries)

Returns

A `CuratorResponse` instance which holds information about the run in python object. It consists of dataset, statistics (performance, token usage, cost), viewer link attributes.

Data Flow Example

Input Dataset:

Row A 
Row B

Processing by curator.LLM:

Row A → prompt(A) → Response R1 → parse(A, R1) → [C, D] 
Row B → prompt(B) → Response R2 → parse(B, R2) → [E, F]

Output Dataset:

Row C 
Row D 
Row E 
Row F

In this example:

The two input rows (A and B) are processed in parallel to prompt the LLM
Each generates a response (R1 and R2)
The parse function converts each response into (multiple) new rows (C, D, E, F)
The final dataset contains all generated rows

You can chain LLM objects together to iteratively build up a dataset.

PreviousQuick Tour NextVisualize your dataset with the Bespoke Curator Viewer

Last updated 1 month ago