Execute LLM-generated code
We have built a code-executor that can be used to execute LLM-generated code. This is useful for many situations:
You want to include error-free code in your training code. This method is used in Open Thoughts.
LLM generates some code to generate visualization etc.
Agents and tool-use.
Here is a simple example of code execution in action:
The inherited class contains three methods:
code
: This is the method that returns the piece of code to be run. This is usually part of the row (you can usecurator.LLM
to generate this code).code_input
: This is optional, but can return a json that represents values to be passed toinput()
in the code.code_output
: This is where you parse the output of the execution.
Features:
Full caching and automatic recovery: Similar to Curator.LLM's caching feature, code executor also has inbuilt caching and automatic recovery. Any interrupted runs can be fully recovered and no computation is lost.
Multiple code execution backends: We offer four backends: multiprocessing, docker, ray and E2B. These backends specify different locations where your code can be executed. You can easily switch the backend with a simple parameter change. For example, the hello world example can be run using the ray backend by simply initializing it with `
HelloExecutor(backend=ray)
`Progress monitoring using Rich Console:
Backends
We offer four backends for running your code:
Multiprocessing: This is the default backend. This runs code locally and is therefore the least safe option, but is useful for quick execution as it does not require any dependencies.
Docker: It is safer option than multiprocessing.
Ray: If you have a ray cluster, you can use it by setting
CodeExecutor(backend="ray")
. This is useful when your code can take a long time to run.E2B: Code can also be run using e2b.dev. Use
CodeExecutor(backend="e2b")
.
Backend Setup and configuration options
Multiprocessing Backend:
This doesn't require any additional setup. You can configure backend params
while initializing as follows:
You can also configure execution parameters:
Docker
With docker, code can be executed in a secure containerized environment. You need docker installed and python's docker client installed on your machine:
pip install docker
Install Docker Desktop
In your terminal, run `docker pull python:3.11-slim`
Run the HelloExecutor example with HelloExecutor(backend=docker)
With docker, you can specify a custom docker image to execute your code snippets:
Ray
As the size of the dataset grows, it becomes harder to scale code execution requirements on a single machine. In such scenarios, one can use the ray backend.
Simply run pip install ray
to install the dependencies required for ray backend.
You need to separately spin up a ray cluster and enter the base_url (Installation instructions). If base_url is not entered, then a local ray cluster is spun up.
E2B
We also add light support for e2b's hosted code execution backends. While not free, they are secure environments similar to docker environments and have more features.
Run
pip install e2b-code-interpreter
to install the required dependencies.Create an account on e2b's website, get the API key and add it to your environment variables.
Conclusion
Check out the examples to get started with code executor. If you have any questions, feel free to join our Discord or send us an email.
Last updated