Bespoke Labs
  • Welcome
  • BESPOKE CURATOR
    • Getting Started
      • Quick Tour
      • Key Concepts
      • Visualize your dataset with the Bespoke Curator Viewer
      • Automatic recovery and caching
      • Structured Output
    • Save $$$ on LLM inference
      • Using OpenAI for batch inference
      • Using Anthropic for batch inference
      • Using Gemini for batch inference
      • Using Mistral for batch inference
      • Using kluster.ai for batch inference
    • How-to Guides
      • Using vLLM with Curator
      • Using Ollama with Curator
      • Using LiteLLM with curator
      • Handling Multimodal Data in Curator
      • Executing LLM-generated code
      • Using HuggingFace inference providers with Curator
    • Data Curation Recipes
      • Generating a diverse QA dataset
      • Using SimpleStrat block for generating diverse data
      • Curate Reasoning data with Claude-3.7 Sonnet
      • Synthetic Data for function calling
    • Finetuning Examples
      • Aspect based sentiment analysis
      • Finetuning a model to identify features of a product
    • API Reference
  • Models
    • Bespoke MiniCheck
      • Self-Hosting
      • Integrations
      • API Service
    • Bespoke MiniChart
    • OpenThinker
Powered by GitBook
On this page
  • Key Components of curator.LLM
  • prompt
  • parse
  • Data Flow Example
  1. BESPOKE CURATOR
  2. Getting Started

Key Concepts

Key Components of curator.LLM

Conceptually, curator.LLM has two important methods, prompt and parse.

class Poem(BaseModel):
    poem: str = Field(description="A poem.")


class Poems(BaseModel):
    poems_list: List[Poem] = Field(description="A list of poems.")
    
class Poet(curator.LLM):
    response_format = Poems
​
    def prompt(self, input: Dict) -> str:
        return f"Write two poems about {input['topic']}."
​
    def parse(self, input: Dict, response: Poems) -> Dict:
        return [{"topic": input["topic"], "poem": p.poem} for p in response.poems]

prompt

This calls an LLM on each row of the input dataset in parallel.

  1. Takes a dataset row as input

  2. Returns the prompt for the LLM.

parse

Converts LLM output into structured data by adding it back to the dataset.

  1. Takes two arguments:

    • Input row (this was given to the LLM).

    • LLM's response (in response_format --- string or Pydantic)

  2. Returns new rows (in list of dictionaries)

Data Flow Example

Input Dataset:

Row A 
Row B 

Processing by curator.LLM:

Row A → prompt(A) → Response R1 → parse(A, R1) → [C, D] 
Row B → prompt(B) → Response R2 → parse(B, R2) → [E, F]

Output Dataset:

Row C 
Row D 
Row E 
Row F

In this example:

  • The two input rows (A and B) are processed in parallel to prompt the LLM

  • Each generates a response (R1 and R2)

  • The parse function converts each response into (multiple) new rows (C, D, E, F)

  • The final dataset contains all generated rows

You can chain LLM objects together to iteratively build up a dataset.

PreviousQuick TourNextVisualize your dataset with the Bespoke Curator Viewer

Last updated 4 months ago