Executing LLM-generated code
Last updated
Last updated
We have built a code-executor that can be used to execute LLM-generated code. This is useful for many situations:
You want to include error-free code in your training code. This method is used in .
LLM generates some code to generate visualization etc.
Agents and tool-use.
Here is a simple example of code execution in action:
The inherited class contains three methods:
code
: This is the method that returns the piece of code to be run. This is usually part of the row (you can use curator.LLM
to generate this code).
code_input
: This is optional, but can return a json that represents values to be passed to input()
in the code.
code_output
: This is where you parse the output of the execution.
Multiple code execution backends: We offer four backends: multiprocessing, docker, ray and E2B. These backends specify different locations where your code can be executed. You can easily switch the backend with a simple parameter change. For example, the hello world example can be run using the ray backend by simply initializing it with `HelloExecutor(backend=ray)
`
Progress monitoring using Rich Console:
We offer four backends for running your code:
Multiprocessing: This is the default backend. This runs code locally and is therefore the least safe option, but is useful for quick execution as it does not require any dependencies.
Docker: It is safer option than multiprocessing.
Ray: If you have a ray cluster, you can use it by setting CodeExecutor(backend="ray")
. This is useful when your code can take a long time to run.
This doesn't require any additional setup. You can configure backend params
while initializing as follows:
You can also configure execution parameters:
With docker, code can be executed in a secure containerized environment. You need docker installed and python's docker client installed on your machine:
pip install docker
In your terminal, run `docker pull python:3.11-slim`
Run the HelloExecutor example with HelloExecutor(backend=docker)
With docker, you can specify a custom docker image to execute your code snippets:
As the size of the dataset grows, it becomes harder to scale code execution requirements on a single machine. In such scenarios, one can use the ray backend.
Simply run pip install ray
to install the dependencies required for ray backend.
We also add light support for e2b's hosted code execution backends. While not free, they are secure environments similar to docker environments and have more features.
Run pip install e2b-code-interpreter
to install the required dependencies.
Create an account on e2b's website, get the API key and add it to your environment variables.
Full caching and automatic recovery: Similar to , code executor also has inbuilt caching and automatic recovery. Any interrupted runs can be fully recovered and no computation is lost.
E2B: Code can also be run using . Use CodeExecutor(backend="e2b")
.
Install
Install (recommended) or optionally just install the docker engine
You need to separately spin up a ray cluster and enter the base_url (). If base_url is not entered, then a local ray cluster is spun up.
Check out the to get started with code executor. If you have any questions, feel free to join our or .