Skip to main content

CI/CD Integration

Integrate LLMOps Eval into your CI/CD pipeline to run evaluations automatically on every deployment.

Authentication

LLMOps Eval uses JWT authentication. Obtain a token by calling the login endpoint:

TOKEN=$(curl -s -X POST https://your-llmops-instance/api/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"ci@example.com","password":"${{ secrets.LLMOPS_PASSWORD }}"}' \
| jq -r '.accessToken')

Tip: Create a dedicated CI service account with limited permissions (Member role) to scope access.


GitHub Actions

name: LLM Evaluation

on:
pull_request:
branches: [main]

jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Authenticate with LLMOps Eval
id: auth
run: |
TOKEN=$(curl -s -X POST ${{ secrets.LLMOPS_API_URL }}/api/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"${{ secrets.LLMOPS_CI_EMAIL }}","password":"${{ secrets.LLMOPS_CI_PASSWORD }}"}' \
| jq -r '.accessToken')
echo "token=$TOKEN" >> $GITHUB_OUTPUT

- name: Trigger LLM Evaluation
id: trigger
run: |
RESPONSE=$(curl -s -X POST \
${{ secrets.LLMOPS_API_URL }}/api/projects/${{ secrets.PROJECT_ID }}/evaluations \
-H "Authorization: Bearer ${{ steps.auth.outputs.token }}" \
-H "Content-Type: application/json" \
-d '{
"datasetId": "${{ secrets.DATASET_ID }}",
"endpointId": "${{ secrets.ENDPOINT_ID }}",
"metrics": ["faithfulness", "answer_relevancy", "bleu"]
}')
echo "evaluation_id=$(echo $RESPONSE | jq -r '.id')" >> $GITHUB_OUTPUT

- name: Wait for Results
run: |
for i in $(seq 1 30); do
STATUS=$(curl -s \
${{ secrets.LLMOPS_API_URL }}/api/evaluations/${{ steps.trigger.outputs.evaluation_id }} \
-H "Authorization: Bearer ${{ steps.auth.outputs.token }}" \
| jq -r '.status')
echo "Status: $STATUS"
if [ "$STATUS" = "COMPLETED" ]; then break; fi
if [ "$STATUS" = "FAILED" ]; then exit 1; fi
sleep 10
done

GitLab CI

llm-evaluation:
stage: test
script:
- |
TOKEN=$(curl -s -X POST $LLMOPS_API_URL/api/auth/login \
-H "Content-Type: application/json" \
-d "{\"email\":\"$LLMOPS_CI_EMAIL\",\"password\":\"$LLMOPS_CI_PASSWORD\"}" \
| jq -r '.accessToken')

curl -s -X POST $LLMOPS_API_URL/api/projects/$PROJECT_ID/evaluations \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"datasetId\": \"$DATASET_ID\",
\"endpointId\": \"$ENDPOINT_ID\",
\"metrics\": [\"faithfulness\", \"answer_relevancy\", \"bleu\"]
}"

Required Secrets

SecretDescription
LLMOPS_API_URLURL of your deployed LLMOps Eval instance (e.g. https://llmops.example.com)
LLMOPS_CI_EMAILEmail of a dedicated CI service account
LLMOPS_CI_PASSWORDPassword for the CI service account
PROJECT_IDYour project UUID (visible in the project settings URL)
DATASET_IDDataset UUID to evaluate against
ENDPOINT_IDLLM endpoint UUID to evaluate

Available Endpoints

MethodPathDescription
POST/api/projects/{projectId}/evaluationsTrigger a new evaluation run
GET/api/projects/{projectId}/evaluationsList all runs for a project
GET/api/evaluations/{evaluationId}Get evaluation status and summary
GET/api/evaluations/{evaluationId}/resultsGet detailed metric results
POST/api/evaluations/{evaluationId}/cancelCancel a running evaluation
POST/api/evaluations/{evaluationId}/retriggerRe-run a previous evaluation