Structured Output
Structured Output for Data Generation with LLMs
This example demonstrates how to use structured output with a custom LLM class to generate poems on different topics while maintaining a clean data structure:
from typing import Dict, List
from datasets import Dataset
from pydantic import BaseModel, Field
from bespokelabs import curator
# Define our structured output models
class Poem(BaseModel):
poem: str = Field(description="A poem.")
class Poems(BaseModel):
poems: List[Poem] = Field(description="A list of poems.")
# Create a custom LLM class with specialized prompting and parsing
class Poet(curator.LLM):
response_format = Poems
def prompt(self, input: Dict) -> str:
return f"Write two poems about {input['topic']}."
def parse(self, input: Dict, response: Poems) -> Dict:
return [{"topic": input["topic"], "poem": p.poem} for p in response.poems]
# Initialize our custom LLM
poet = Poet(model_name="gpt-4o-mini")
# Create a dataset of topics
topics = Dataset.from_dict({
"topic": [
"Urban loneliness in a bustling city",
"Beauty of Bespoke Labs's Curator library"
]
})
# Generate poems
poem = poet(topics)
print(poem.dataset.to_pandas())
# Output:
# topic poem
# 0 Urban loneliness in a bustling city In the city's heart, where the lights never di...
# 1 Urban loneliness in a bustling city Steps echo loudly, pavement slick with rain,\n...
# 2 Beauty of Bespoke Labs's Curator library In the heart of Curation's realm,\nWhere art...
# 3 Beauty of Bespoke Labs's Curator library Step within the library's embrace,\nA sanctu...
How This Works:
Structured Models: We define Pydantic models (
Poem
andPoems
) that specify the expected structure of our LLM output.Custom Poet Class: By inheriting from
curator.LLM
, we create a specialized class that:Sets
response_format = Poems
to specify the output structureImplements a
prompt()
method that formats our input into a proper promptImplements a
parse()
method that transforms the structured response into a list of dictionaries where each poem is a separate row with its associated topic
Processing Pipeline: When we call
poet(topics)
, our custom class:Takes each topic from the dataset
Creates a prompt for each topic
Sends the prompt to the LLM
Parses the structured response
Returns a dataset where each row contains a topic and a single poem
This approach gives us clean, structured data that's ready for analysis or further processing while maintaining the relationship between inputs (topics) and outputs (poems).
Chaining LLM calls with structured output
Using structured output along with custom prompting and parsing logic allows you to chain together multiple calls to the LLM
class to create powerful data generation pipelines.
Let's return to our example of generating poems. Suppose we want to also use LLMs to generate the topics of the poems. This can be accomplished by using another LLM
object to generate the topics, as shown in the example below.
from typing import Dict, List
from pydantic import BaseModel, Field
from bespokelabs import curator
class Topic(BaseModel):
topic: str = Field(description="A topic.")
class Topics(BaseModel):
topics: List[Topic] = Field(description="A list of topics.")
class Muse(curator.LLM):
response_format = Topics
def prompt(self, input: Dict) -> str:
return "Generate ten evocative poetry topics."
def parse(self, input: Dict, response: Topics) -> Dict:
return [{"topic": topic.topic} for topic in response.topics]
class Poem(BaseModel):
poem: str = Field(description="A poem.")
class Poems(BaseModel):
poems: List[Poem] = Field(description="A list of poems.")
class Poet(curator.LLM):
response_format = Poems
def prompt(self, input: Dict) -> str:
return f"Write two poems about {input['topic']}."
def parse(self, input: Dict, response: Poems) -> Dict:
return [{"topic": input["topic"], "poem": p.poem} for p in response.poems]
muse = Muse(model_name="gpt-4o-mini")
topics = muse()
print(topics.dataset.to_pandas())
poet = Poet(model_name="gpt-4o-mini")
poem = poet(topics)
print(poem.dataset.to_pandas())
# Output:
# topic poem
# 0 The fleeting beauty of autumn leaves In a whisper of wind, they dance and they sway...
# 1 The fleeting beauty of autumn leaves Once vibrant with life, now a radiant fade,\nC...
# 2 The whispers of an abandoned house In shadows deep where light won’t tread, \nAn...
# 3 The whispers of an abandoned house Abandoned now, my heart does fade, \nOnce a h...
# 4 The warmth of a forgotten summer day In the stillness of a memory's embrace, \nA w...
# 5 The warmth of a forgotten summer day A gentle breeze delivers the trace \nOf a day...
# ...
Chaining multiple LLM
calls this way allows us to build powerful synthetic data pipelines that can create millions of examples.
Last updated