Conceptual Guides
Key Components of curator.LLM
Conceptually, curator.LLM
has two important methods, prompt
and parse
.
prompt
This calls an LLM on each row of the input dataset in parallel.
Takes a dataset row as input
Returns the prompt for the LLM.
parse
Converts LLM output into structured data by adding it back to the dataset.
Takes two arguments:
Input row (this was given to the LLM).
LLM's response (in response_format --- string or Pydantic)
Returns new rows (in list of dictionaries)
Data Flow Example
Input Dataset:
Processing by curator.LLM:
Output Dataset:
In this example:
The two input rows (A and B) are processed in parallel to prompt the LLM
Each generates a response (R1 and R2)
The parse function converts each response into (multiple) new rows (C, D, E, F)
The final dataset contains all generated rows
You can chain LLM
objects together to iteratively build up a dataset.
Last updated