Cross-Stack Reasoning

The split that other tools force you to live with

Today, an above-store leader running a hard question lives in two tools. A BI dashboard tells them what happened — sales were down 7% at three stores last weekend. A document search tool, or a slow Slack thread with a GM, tells them what should happen — the closing playbook, the labor adjustment procedure, the brand standard for incident reporting.

The dashboard doesn't know about the playbook. The playbook doesn't know about the numbers. The leader stitches them together by hand, every time, on every question.

Cross-stack reasoning means: in one prompt, Mise reaches into your numerical stack and your document stack, composes the answer, and cites both. No tab-switching. No "let me also pull up the procedure for that."

What it looks like in practice

This is a real shape of answer Mise produces. The numbers and the procedure are reasoned about together. The citations point at the underlying source — a row in the warehouse or a page of your manual — and clicking one opens the source.

How the reasoning works

Under the hood, every question goes through four stages. None of them is exotic on its own; the work is in making them cooperate.

1 — Route

A small router classifies each question into one of three buckets: pure metric ("net sales last week at Aurora"), pure knowledge ("what's the closing-time temp log requirement"), or mixed. Mixed is the realistic case for above-store work; almost every question that matters needs both.

2 — Plan in parallel

The SQL plane does not let the model write SQL freely. It picks a named template from a versioned registry and fills typed slots. A store_id slot must resolve to a store in your org; a date_range slot must be bounded; a metric slot must be one of the registered metrics. Anything off-list is rejected before SQL is touched.

While the SQL plane works, the RAG plane retrieves: vector search over pgvector, BM25 over your document chunks, reciprocal-rank fusion to combine, then a reranker pass to drop chunks that look superficially relevant but don't actually answer the question.

3 — Compose

The synthesizer is the moment the two halves meet. It receives the SQL results as structured tool output and the retrieved chunks as supporting context, and produces a single answer. The model is instructed to ground every load-bearing claim in either a tool result or a retrieved chunk — and to refuse rather than guess if the data isn't there.

4 — Cite

Citations are filtered post-hoc. We parse the model's answer for [N] markers and only surface sources actually referenced. A distance floor (currently 0.55) keeps low-relevance chunks out, even if the retriever's top-K included them. The result: the citation list reads like a bibliography of the answer, not a dragnet of nearby chunks.

Why the SQL plane is locked down

Free-form NL→SQL is the kind of feature that demos beautifully and breaks in production. Wrong joins. Tenant filters that go missing. Long-running queries that lock a table during peak. The named-template pattern trades a little flexibility for properties you actually need — predictable performance, auditable behavior, and a code path that cannot ever return another tenant's data.

When a new question pattern shows up enough to matter, we add a template. Adding one is cheap. Operating without guard rails is not.

Why citations are filtered

An unfiltered citation list is worse than no citations at all. If the bottom of every answer carries five plausible-looking links and only two of them were actually used, your team learns to ignore the citation chip. Once they ignore it, the trust property is gone — and trust is the whole point of grounding.

Mise filters citations to the chunks the model named. If the model didn't reference a chunk, it doesn't get cited. The list is short, on purpose.

The shapes of question this unlocks

Performance + procedure

Why is X over plan, and what's our playbook for it?

The numerical "why" plus the documented "do this." The most common shape of question on Mise.

Audit + history

Which stores failed this week's check, and what does our remediation policy say?

Audit data joined to the brand-standard remediation steps, with severity scored against your tolerance.

Pattern + escalation

Has this same alert fired three times in 30 days? When does it escalate?

Operational history reasoned about against your written escalation procedure.

Comparative + standard

How does Aurora's labor mix compare to brand standard at this volume?

Live numbers from the warehouse, brand-standard ratios from the manual, single answer.

Where it doesn't apply

Not every question is a cross-stack question. "What's our net sales last week" is metric-only; "what does the closing playbook say about the alarm code" is knowledge-only. Mise routes those cleanly to one plane and skips the other. The cross-stack capability is the moment the two reach for each other on the same prompt — which, for above-store work, is most of the time.

See it on your data

The shortest path to evaluating cross-stack reasoning is to ask a real above-store question against your own stack. Book a 30-minute call — we'll spin up a sandbox connected to your sandbox POS and a folder of your manuals, and run the question shapes above through it together.

Request a Demo → See how it gets deployed Read the technical brief →