Using Ollama with Curator

You can use Ollama as a backend for Curator to generate structured synthetic data. In this example, we will generate a list of countries and their capitals, but the approach can be adapted for any data generation task.

Prerequisites

Steps

1. Create a curator.LLM subclass

Create a class that inherits from curator.LLM. Implement two key methods:

  • prompt(): Generates the prompt for the LLM.

  • parse(): Processes the LLM's response into your desired format.

Here’s the implementation:

from bespokelabs import curator
from pydantic import BaseModel, Field

class Location(BaseModel):
    country: str = Field(description="The name of the country")
    capital: str = Field(description="The name of the capital city")

class LocationList(BaseModel):
    locations: list[Location] = Field(description="A list of locations")

class SimpleOllamaGenerator(curator.LLM):
    response_format = LocationList

    def prompt(self, input: dict) -> str:
        return "Return five countries and their capitals."

    def parse(self, input: dict, response: str) -> dict:
        return [{"country": output.country, "capital": output.capital} for output in outputs.locations]

2. Configure the Ollama Backend

  1. Start Ollama server with llama3.1:8bmodel.

ollama pull llama3.1:8b
ollama serve
  1. Initialize your generator with Ollama configuration:

llm = SimpleOllamaGenerator(
    model_name="ollama/llama3.1:8b",  # Ollama model identifier
    backend_params={"base_url": "http://localhost:11434"},  # Ollama instance
)

3. Generate Data

Generate the structured data and output the results as a pandas DataFrame:

locations = llm()
print(locations.to_pandas())

Example Output

Using the above example, the output might look like this:

Country
Capital

France

Paris

Japan

Tokyo

Germany

Berlin

India

New Delhi

Brazil

Brasília

Ollama Configuration

Use base_url in the backend_params to specify the connection URL.

Example:

backend_params={"base_url": "http://localhost:11434"}

Last updated