Self-Hosting

You can access the model here: https://huggingface.co/bespokelabs/Bespoke-Minicheck-7B

Feel free to use this colab which uses the MiniCheck library that supports automated chunking of long documents.

Or, you can host the model directly on vLLM with docker as follows:

```shellscript
sudo docker run \
  --runtime=nvidia \
  --gpus=all \
  -v ~/.cache/huggingface:/root/.cache/huggingface     \
  --env "HUGGING_FACE_HUB_TOKEN=hf_xyz" \
  --ipc=host \
  -p 8000:8000 \
  vllm/vllm-openai:latest \
  --model bespokelabs/Bespoke-MiniCheck-7B --trust_remote_code --api-key your_api_key --disable-log-requests \
  --dtype bfloat16 \
  --max-model-len 32768 \
  --tensor-parallel-size 1 &
```

Please contact us for commercial licensing.

Last updated