Self-Hosting
```shellscript
sudo docker run \
--runtime=nvidia \
--gpus=all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=hf_xyz" \
--ipc=host \
-p 8000:8000 \
vllm/vllm-openai:latest \
--model bespokelabs/Bespoke-MiniCheck-7B --trust_remote_code --api-key your_api_key --disable-log-requests \
--dtype bfloat16 \
--max-model-len 32768 \
--tensor-parallel-size 1 &
```Last updated