PromptOps

PromptOps is a prompt management and evaluation platform for teams shipping AI products. It versions prompts, generates AI test scenarios, replays real traces against new versions, and uses an LLM-as-judge to score quality and flag regressions before changes reach production.

Next.jsFastAPIPostgreSQLRedisOpenAI

StatusLive

Year2026

RoleSolo Engineer

Screenshot of PromptOps dashboard showing prompt versioning and evaluation workflow

Problem

Changing prompts in production is risky. Small tweaks that fix one edge case break others. Teams need a way to version, test, and compare prompt changes before shipping them, without building custom eval infrastructure.

Approach

Built a version, generate scenarios, replay, compare, promote workflow. Users version their prompts, generate test scenarios with AI, replay real traces against new versions, and compare results with an LLM-as-judge that scores quality and flags regressions.

Key Features

AI-generated test scenarios from natural language descriptions
Time-travel replay: same inputs through new prompts
LLM-as-judge with randomized A/B position to prevent bias
Interactive judge chat for improvement suggestions
Magic link auth, multi-user workspaces
API keys for programmatic integration

Outcomes

End-to-end workflow ships prompt changes with eval gates instead of vibes
Randomized A/B judge position eliminates left-side bias common in LLM-as-judge setups
5-service Docker Compose lets contributors run the full stack locally in one command

Architecture

Frontend (Next.js on Vercel) connects to a FastAPI backend (Railway) backed by PostgreSQL for persistence and Redis with Celery for async scenario generation and replay. 5-service Docker Compose for local development.

Visit PromptOps→