Using Ollama with Curator
You can use Ollama as a backend for Curator to generate structured synthetic data. In this example, we will generate a list of countries and their capitals, but the approach can be adapted for any data generation task.
Prerequisites
Python 3.10+
Curator: Install via
pip install bespokelabs-curator
Ollama: Download via https://ollama.com/download
Steps
1. Create a curator.LLM subclass
Create a class that inherits from curator.LLM
. Implement two key methods:
prompt()
: Generates the prompt for the LLM.parse()
: Processes the LLM's response into your desired format.
Here’s the implementation:
from bespokelabs import curator
from pydantic import BaseModel, Field
class Location(BaseModel):
country: str = Field(description="The name of the country")
capital: str = Field(description="The name of the capital city")
class LocationList(BaseModel):
locations: list[Location] = Field(description="A list of locations")
class SimpleOllamaGenerator(curator.LLM):
response_format = LocationList
def prompt(self, input: dict) -> str:
return "Return five countries and their capitals."
def parse(self, input: dict, response: str) -> dict:
return [{"country": output.country, "capital": output.capital} for output in response.locations]
2. Configure the Ollama Backend
Start Ollama server with
llama3.1:8b
model.
ollama pull llama3.1:8b
ollama serve
Initialize your generator with Ollama configuration:
llm = SimpleOllamaGenerator(
model_name="ollama/llama3.1:8b", # Ollama model identifier
backend_params={"base_url": "http://localhost:11434"}, # Ollama instance
)
3. Generate Data
Generate the structured data and output the results as a pandas DataFrame:
locations = llm()
print(locations.dataset.to_pandas())
Example Output
Using the above example, the output might look like this:
France
Paris
Japan
Tokyo
Germany
Berlin
India
New Delhi
Brazil
Brasília
Ollama Configuration
Use base_url
in the backend_params
to specify the connection URL.
Example:
backend_params={"base_url": "http://localhost:11434"}
Last updated