The LLMclass provides a flexible interface to generate data with LLMs. Below is a minimal example of using LLM: we simply create an LLM object with a model_name, in this case gpt-4o-mini, and passing in a prompt.
from bespokelabs import curatorllm = curator.LLM(model_name="gpt-4o-mini")poem =llm("Write a poem about the importance of data in AI.")print(poem.to_pandas())# Output:# response# 0 In the realm where silence once held sway, \n...# Or you can pass a list of prompts to generate multiple responses.poems =llm(["Write a poem about the importance of data in AI.","Write a haiku about the importance of data in AI."])print(poems.to_pandas())# Output:# response# 0 In the realm where silence once held sway, \n...# 1 Silent streams of truth, \nData shapes the le...
Using Different Models
You can also use models from other providers by simply changing model_name (supported via LiteLLM):
from bespokelabs import curatorllm = curator.LLM(model_name="claude-3-5-haiku-20241022")poem =llm("Write a poem about the importance of data in AI.")print(poem.to_pandas())
Using Structured Output
Adding structured output to your generation
Let's look at some more interesting examples of data generation using structured output.
Suppose you want to generate multiple poems from a single a LLM call. Structured output is your friend! Using structure output allows you to easily validate and parse LLM responses:
from typing import Listfrom pydantic import BaseModel, Fieldfrom bespokelabs import curatorclassPoem(BaseModel): poem:str=Field(description="A poem.")classPoems(BaseModel): poems_list: List[Poem]=Field(description="A list of poems.")llm = curator.LLM(model_name="gpt-4o-mini", response_format=Poems)poems =llm(["Write two poems about the importance of data in AI.", "Write three haikus about the importance of data in AI."])print(poems.to_pandas())# Output: # poems_list# 0 [{'poem': 'In shadows deep where silence lies,...# 1 [{'poem': 'Data whispers truth, # Patterns wea...
Note how each row in the dataset is now a Poems object that is easy to parse and manipulate using Python code.
Defining your custom prompting and parsing logic
Sometimes, it might not be enough to simply get back the responses. For example, you might want to preserve the mapping between each topic and its corresponding poems, and you might want each poem to occupy only a single row. In this case, you can define a Poet object that inherits from LLM, and define your custom prompting and parsing logic:
from typing import Dict, Listfrom datasets import Datasetfrom pydantic import BaseModel, Fieldfrom bespokelabs import curatorclassPoem(BaseModel): poem:str=Field(description="A poem.")classPoems(BaseModel): poems: List[Poem]=Field(description="A list of poems.")classPoet(curator.LLM): response_format = Poemsdefprompt(self,input: Dict) ->str:returnf"Write two poems about {input['topic']}."defparse(self,input: Dict,response: Poems) -> Dict:return [{"topic":input["topic"],"poem": p.poem}for p in response.poems]poet =Poet(model_name="gpt-4o-mini")topics = Dataset.from_dict({"topic": ["Urban loneliness in a bustling city", "Beauty of Bespoke Labs's Curator library"]})poem =poet(topics)print(poem.to_pandas())# Output:# topic poem# 0 Urban loneliness in a bustling city In the city’s heart, where the lights never di...# 1 Urban loneliness in a bustling city Steps echo loudly, pavement slick with rain,\n...# 2 Beauty of Bespoke Labs's Curator library In the heart of Curation’s realm, \nWhere art...# 3 Beauty of Bespoke Labs's Curator library Step within the library’s embrace, \nA sanctu...
In the Poet class:
response_format is the structured output class we defined above.
prompt takes the input (input) and returns the prompt for the LLM.
parse takes the input (input) and the structured output (response) and converts it to a list of dictionaries. This is so that we can easily convert the output to a HuggingFace Dataset object.
Chaining LLM calls with structured output
Using structured output along with custom prompting and parsing logic allows you to chain together multiple calls to the LLM class to create powerful data generation pipelines.
Let's return to our example of generating poems. Suppose we want to also use LLMs to generate the topics of the poems. This can be accomplished by using another LLM object to generate the topics, as shown in the example below.
from typing import Dict, Listfrom pydantic import BaseModel, Fieldfrom bespokelabs import curatorclassTopic(BaseModel): topic:str=Field(description="A topic.")classTopics(BaseModel): topics: List[Topic]=Field(description="A list of topics.")classMuse(curator.LLM): response_format = Topicsdefprompt(self,input: Dict) ->str:return"Generate ten evocative poetry topics."defparse(self,input: Dict,response: Topics) -> Dict:return [{"topic": topic.topic}for topic in response.topics]classPoem(BaseModel): poem:str=Field(description="A poem.")classPoems(BaseModel): poems: List[Poem]=Field(description="A list of poems.")classPoet(curator.LLM): response_format = Poemsdefprompt(self,input: Dict) ->str:returnf"Write two poems about {input['topic']}."defparse(self,input: Dict,response: Poems) -> Dict:return [{"topic":input["topic"],"poem": p.poem}for p in response.poems]muse =Muse(model_name="gpt-4o-mini")topics =muse()print(topics.to_pandas())poet =Poet(model_name="gpt-4o-mini")poem =poet(topics)print(poem.to_pandas())# Output:# topic poem# 0 The fleeting beauty of autumn leaves In a whisper of wind, they dance and they sway...# 1 The fleeting beauty of autumn leaves Once vibrant with life, now a radiant fade,\nC...# 2 The whispers of an abandoned house In shadows deep where light won’t tread, \nAn...# 3 The whispers of an abandoned house Abandoned now, my heart does fade, \nOnce a h...# 4 The warmth of a forgotten summer day In the stillness of a memory's embrace, \nA w...# 5 The warmth of a forgotten summer day A gentle breeze delivers the trace \nOf a day...# ...
Chaining multiple LLMcalls this way allows us to build powerful synthetic data pipelines that can create millions of examples.
What's next?
For an in-depth tutorial of the core features in our library, please continue to Tutorials.
For how-to guides on specific topics and workflows, please continue to How-to Guides.