Documentation Index
Fetch the complete documentation index at: https://docs.meibel.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Context engineering is the process of turning unstructured documents into structured, queryable knowledge that AI agents can reason over. A raw PDF, scanned invoice, or legal contract is opaque to a language model — it needs to be parsed, decomposed, and indexed before an agent can use it to answer questions accurately. Meibel provides a complete pipeline from raw document to agent-ready context. Documents go through parsing (OCR, layout analysis, table extraction), structure detection (sections, headers, relationships), and data element extraction (atomic chunks with metadata). The output is a searchable knowledge base that agents query at inference time. This pipeline is the foundation of everything else in the platform. Without high-quality context, agents produce low-quality outputs — they hallucinate, miss relevant information, or cite the wrong source. Context engineering is where accuracy starts.The Document Intelligence Pipeline
The pipeline has four stages, each preserving and adding information:Adaptive Ingest
Not all documents are the same, and the platform adapts its parsing strategy based on what it encounters. A scanned invoice needs OCR to read the text, table extraction to identify line items, and layout analysis to distinguish headers from values. The system detects that the document is an image-based scan and routes it through the appropriate pipeline. A text-heavy legal contract needs section boundary detection to identify clauses, party references, and defined terms. OCR is unnecessary because the text layer is already present, but structural analysis is critical. A scientific paper may require formula handling, citation extraction, and figure caption parsing. The system identifies the academic format and applies specialized extractors.You don’t need to specify the document type manually. The platform inspects the file format and content to select the right processing strategy automatically.
Datasources as Context Containers
Datasources are organizational containers that group related documents into a single queryable knowledge base. Think of a datasource as a folder with intelligence: it doesn’t just store files, it processes them into searchable knowledge. The workflow is:- Create a datasource with a name and description
- Upload files to the datasource
- Trigger ingestion to process uploaded files through the document intelligence pipeline
- Query the datasource from agents or directly via the data elements API
Data Elements
Data elements are the atomic units of extracted knowledge. Each one represents a discrete piece of content pulled from a source document during ingestion. A data element carries:- Content — the actual text or structured data
- Source provenance — which document it came from, where in the document
- Content type — paragraph, table, list, header, etc.
- Extraction confidence — how confident the system is in the accuracy of the extraction
- Metadata — additional structured fields extracted by metadata models
Metadata Extraction
Beyond the content itself, you often need structured fields extracted from documents: dates, monetary amounts, party names, categories, status values. Metadata models handle this. A metadata model defines the structured fields you want to extract. When applied to a datasource, the platform runs the model against data elements and populates the specified fields. This adds queryable dimensions beyond full-text search — you can filter data elements by extracted date ranges, amounts, or categories. The metadata model catalog provides pre-built models for common extraction patterns, and you can define custom models for domain-specific fields.Putting It Together
Here is the full pipeline in code: parse a document, create a datasource, upload, trigger ingestion, and search the extracted data elements.process_document call runs parsing independently — useful for previewing extraction results before committing to a datasource. The datasource workflow (upload, ingest, search) is the standard path for building a persistent, queryable knowledge base.