Evaluation Engine

The FastAPI Evaluation Engine is the Python service responsible for running all metric computations and LLM interactions.

Overview

Framework: FastAPI (Python 3.11)
Port: 8000
API Docs: http://localhost:8000/docs

Supported LLM Providers

Provider	Notes
OpenAI	GPT-4, GPT-3.5, and all OpenAI models
Anthropic	Claude 3 Opus, Sonnet, Haiku
Azure OpenAI	Azure-deployed OpenAI models
AWS Bedrock	Llama, Titan, and other Bedrock models
Google Vertex AI	Gemini and PaLM models
Custom API	Any REST API that accepts prompt/response

Key Libraries

Library	Purpose
`ragas`	RAG evaluation metrics (Faithfulness, Context Recall, etc.)
`langchain`	LLM framework and chains
`sentence-transformers`	Semantic similarity embeddings
`nltk`	BLEU score calculation
`rouge-score`	ROUGE metric calculation
`bert-score`	BERTScore calculation

Configuration

Copy .env.example to .env and configure:

DATABASE_URL=postgresql+asyncpg://user:password@localhost:5432/llmops_eval
SPRING_BOOT_URL=http://localhost:8080
MAX_PARALLEL_EVALUATIONS=5
BATCH_SIZE=10
RETRY_ATTEMPTS=3
EMBEDDING_MODEL=all-MiniLM-L6-v2

Running the Engine

cd evaluation-engine
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python run.py

API Endpoints

Method	Path	Description
`GET`	`/health`	Health check
`POST`	`/api/v1/evaluate`	Trigger an evaluation
`GET`	`/api/v1/metrics`	List available metrics
`GET`	`/docs`	Swagger UI

Overview​

Supported LLM Providers​

Key Libraries​

Configuration​

Running the Engine​

API Endpoints​

Overview

Supported LLM Providers

Key Libraries

Configuration

Running the Engine

API Endpoints