Generate judgments via LLM¶

About this walkthrough

Estimated time: 5 minutes (mostly waiting on the worker) Tags: judgments, llm, ground-truth

Trigger the LLM-as-judge worker against a query set — every (query, top-K doc) pair is rated 0-3 with a real OpenAI call. The deterministic alternative is the import path (guide 05).

Trouble playing? Download the walkthrough video.

Step 1 — Open a query set's detail page. The 'Associated…¶

Step 2 — Click 'Generate judgments' to open the dialog. The…¶

Step 3 — Fill the text fields: a unique name for…¶

Step 5 — Submit. The worker enqueues immediately (202 ACCEPTED) and…¶

$Submit. The worker enqueues immediately (202 ACCEPTED) and begins hitting OpenAI per (query, doc) pair. The /judgments/{id} page polls for status. On success (complete), the per-judgment rows populate with rating + brief reasoning; on failure (failed), check the worker logs — common causes are cluster unreachable from the API container, daily budget exceeded, or LLM_PROVIDER_INCAPABLE (the configured model lacks structured output). Cost ~$0.01-0.05 with gpt-4o-mini for a small set.$

← Back to walkthroughs