Skip to main content

Evaluation Engine

The FastAPI Evaluation Engine is the Python service responsible for running all metric computations and LLM interactions.

Overview

Supported LLM Providers

ProviderNotes
OpenAIGPT-4, GPT-3.5, and all OpenAI models
AnthropicClaude 3 Opus, Sonnet, Haiku
Azure OpenAIAzure-deployed OpenAI models
AWS BedrockLlama, Titan, and other Bedrock models
Google Vertex AIGemini and PaLM models
Custom APIAny REST API that accepts prompt/response

Key Libraries

LibraryPurpose
ragasRAG evaluation metrics (Faithfulness, Context Recall, etc.)
langchainLLM framework and chains
sentence-transformersSemantic similarity embeddings
nltkBLEU score calculation
rouge-scoreROUGE metric calculation
bert-scoreBERTScore calculation

Configuration

Copy .env.example to .env and configure:

DATABASE_URL=postgresql+asyncpg://user:password@localhost:5432/llmops_eval
SPRING_BOOT_URL=http://localhost:8080
MAX_PARALLEL_EVALUATIONS=5
BATCH_SIZE=10
RETRY_ATTEMPTS=3
EMBEDDING_MODEL=all-MiniLM-L6-v2

Running the Engine

cd evaluation-engine
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python run.py

API Endpoints

MethodPathDescription
GET/healthHealth check
POST/api/v1/evaluateTrigger an evaluation
GET/api/v1/metricsList available metrics
GET/docsSwagger UI