The Optimization Loop¶

Summary

RelyLoop runs a closed feedback loop — propose, evaluate, select, repeat — over your search engine's query-time parameters. It's a Karpathy-style loop in shape, and an Optuna/TPE Bayesian search in mechanism. The loop is the product; the chat agent and the engine adapters are how you reach it.

The shape¶

The loop is the same one that shows up everywhere in machine learning:

flowchart LR
    P[Propose<br/>parameter set] --> E[Evaluate<br/>against judgments]
    E --> S[Select<br/>what worked]
    S --> P

What makes RelyLoop's version useful for search relevance is what sits in each box and where the loop's output goes.

The mechanism: Bayesian, not grid¶

The "propose" and "select" steps are an Optuna study using the Tree-structured Parzen Estimator (TPE) sampler. TPE builds a probabilistic model of which regions of the search space produce good scores, and concentrates new trials there. That's why thousands of trials converge far faster than an exhaustive grid — and why RelyLoop can tune the full query-time space at once instead of one slice.

Why this matters versus a grid

OpenSearch's Hybrid Search Optimizer is a 66-cell grid restricted to hybrid weights. RelyLoop varies field boosts, function scores, fuzziness, mm, tie-breakers, and hybrid weights together — a space far too large to grid-search, which is exactly what Bayesian optimization is for.

The evaluation: judgments + `ir_measures`¶

Each candidate is scored by running your query set against the engine and comparing the ranked results to your judgments with ir_measures — a provider-abstracted IR-evaluation engine. You get cut-aware metrics (nDCG@k, ERR, precision@k, …) computed the same way regardless of engine.

Where the loop's output goes¶

The loop does not push changes to your cluster. Its output is a winning configuration, captured as a proposal and opened as a Pull Request against your config repo. The loop ends at the PR; humans and CI take it from there. This is the deliberate, change-managed posture — RelyLoop is for offline experimentation, never the live serving path.

What the loop never touches¶

Schema, mappings, or analyzer settings — tuning is query-time only.
Production traffic — there is no online A/B test and no bandit.
Learned reranker models — LTR training is out of scope for v1.

See the related concepts: Query Sets & Judgments, Search Space, Optimization Trials, and Git-as-Source-of-Truth.