Documentation Index
Fetch the complete documentation index at: https://docs.meibel.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Most AI platforms give you a single confidence score on the model’s output. That number tells you how confident the model is in its response — but it says nothing about the quality of the data that response was built on. Meibel is different. Because the platform owns the full stack — document parsing, data element extraction, and model inference — confidence scores are produced at every stage and compose end-to-end. A poorly parsed document reduces confidence all the way through to the final agent response, and you can see exactly where the degradation happened. This means you know not just “how confident is the model” but “how confident should you be in this entire answer, given everything that went into producing it.”Document Parse Confidence
The first confidence score is produced at the parsing stage: how confident is the system that it correctly read and interpreted the source document? Several factors influence parse confidence:- OCR quality — for scanned documents, was the text readable? Blurry scans, handwritten text, and low-resolution images all reduce OCR confidence. A clean, high-resolution scan of typed text scores near 1.0; a faded photocopy of handwritten notes might score 0.4.
- Structure detection — did the system correctly identify tables, sections, headers, and other structural elements? A well-formatted PDF with clear headings is easier to parse than a flat wall of text with no formatting.
- Format recognition — did the system handle the file format properly? Standard PDFs and common image formats are well-supported. Unusual or corrupted files may parse with lower confidence.
Parse confidence is per-document. If you upload ten documents to a datasource, each gets its own parse confidence score. One bad scan doesn’t penalize the other nine.
Data Element Extraction Confidence
After parsing, the system breaks the document into data elements — atomic chunks of content. Each extraction produces its own confidence score. Extraction confidence reflects:- Boundary accuracy — did the system correctly identify where this data element starts and ends? Clean section breaks are easy; ambiguous paragraph boundaries in flowing text are harder.
- Content type classification — is this a table, a paragraph, a list, a heading? Misclassifying a table as a paragraph means the structure is lost.
- Metadata accuracy — for metadata extraction models, how confident is the system in the values it extracted? A clearly formatted date like “January 15, 2025” extracts with high confidence; an ambiguous reference like “next quarter” extracts with low confidence.
Model Output Confidence
The final stage is the language model’s confidence in its response. This is informed by the quality of context it received. When an agent searches its datasources for relevant context, it retrieves data elements. If those data elements were extracted with high confidence from well-parsed documents, the model has a solid foundation. If the data elements came from noisy OCR with uncertain boundaries, the model is working with degraded input — and the confidence score reflects this. Model output confidence also accounts for:- Context relevance — did the retrieved data elements actually address the user’s question, or were they tangential?
- Consistency — do the retrieved data elements agree with each other, or do they contain contradictory information?
- Coverage — did the agent find enough relevant context, or is it extrapolating from sparse evidence?
Score Stacking
Confidence scores compose multiplicatively across the pipeline. This is the key insight: each stage’s confidence attenuates the scores downstream. Consider a concrete example:| Stage | Confidence | Effective Confidence |
|---|---|---|
| Document parse | 0.70 | 0.70 |
| Data element extraction | 0.90 | 0.63 (0.70 x 0.90) |
| Model output | 0.95 | 0.60 (0.63 x 0.95) |
Using Confidence in Practice
Accessing Confidence Scores
The confidence scoring API lets you track scoring jobs and get aggregate summaries:Chat Response Quality Signals
Agent chat responses include data you can use to assess the quality of each answer:Setting Thresholds
How you use confidence scores depends on your use case:- Effective confidence above 0.8 — safe for automated decisions in most contexts. The source data was well-parsed and the model is confident.
- Effective confidence 0.5 to 0.8 — present the answer to the user but flag it for potential review. Something in the pipeline was uncertain.
- Effective confidence below 0.5 — route to human review. Either the source document was poorly parsed, the extraction was uncertain, or the model lacked sufficient context.