> ## Documentation Index
> Fetch the complete documentation index at: https://docs.meibel.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Document Processing

> Parse documents asynchronously or synchronously, poll for results, and stream trace events

# Document Processing

The document processing API extracts structured content from uploaded files. You can process documents asynchronously (submit a job, poll for results) or synchronously (block until done). This guide covers both workflows, plus streaming trace events for real-time progress.

## Parse a document (async)

Submit a document for asynchronous parsing. The API returns a job ID immediately so your application stays responsive while the server processes the file.

<CodeGroup>
  ```python Python theme={null}
  import os
  from meibel import MeibelClient

  client = MeibelClient(api_key=os.environ["MEIBEL_API_KEY"])

  with open("contract.pdf", "rb") as f:
      job = client.documents.parse_document(file=f, file_name="contract.pdf")

  print(f"Job submitted: {job.job_id}")
  ```

  ```typescript TypeScript theme={null}
  import { MeibelClient } from 'meibel';
  import { readFile } from "node:fs/promises";

  const client = new MeibelClient({ apiKey: process.env.MEIBEL_API_KEY });

  const file = new Blob([await readFile("contract.pdf")]);
  const job = await client.documents.parseDocument(file, "contract.pdf");

  console.log('Job submitted:', job.jobId);
  ```

  ```go Go theme={null}
  import (
      "context"
      "fmt"
      "os"

      meibel "github.com/meibel-ai/meibel-go"
  )

  client := meibel.NewClient(meibel.WithAPIKey(os.Getenv("MEIBEL_API_KEY")))
  ctx := context.Background()

  f, err := os.Open("contract.pdf")
  if err != nil {
      log.Fatal(err)
  }
  defer f.Close()

  job, err := client.Documents.ParseDocument(ctx, f, "contract.pdf")
  if err != nil {
      log.Fatal(err)
  }

  fmt.Println("Job submitted:", job.JobID)
  ```

  ```bash CLI theme={null}
  meibel documents parse --file contract.pdf
  ```
</CodeGroup>

Store the `job_id` from the response to check status, retrieve results, and stream trace events.

## Poll for status

Check the processing status of a submitted document job. Poll until the status reaches `"completed"` or `"failed"`.

<CodeGroup>
  ```python Python theme={null}
  import time

  while True:
      status = client.documents.get_document_status(job_id=job.job_id)
      print(f"Status: {status.status}")

      if status.status == "completed":
          print("Processing finished")
          break
      elif status.status == "failed":
          print(f"Processing failed: {status.error}")
          break

      time.sleep(2)
  ```

  ```typescript TypeScript theme={null}
  let status;
  do {
    status = await client.documents.getDocumentStatus(job.jobId);
    console.log('Status:', status.status);

    if (status.status === 'failed') {
      console.error('Processing failed:', status.error);
      break;
    }

    if (status.status !== 'completed') {
      await new Promise((r) => setTimeout(r, 2000));
    }
  } while (status.status !== 'completed');

  console.log('Processing finished');
  ```

  ```go Go theme={null}
  for {
      status, err := client.Documents.GetDocumentStatus(ctx, job.JobID)
      if err != nil {
          log.Fatal(err)
      }

      fmt.Println("Status:", status.Status)

      if status.Status == "completed" {
          fmt.Println("Processing finished")
          break
      }
      if status.Status == "failed" {
          fmt.Println("Processing failed:", status.Error)
          break
      }

      time.Sleep(2 * time.Second)
  }
  ```

  ```bash CLI theme={null}
  meibel documents get-status "$JOB_ID"
  ```
</CodeGroup>

<Note>
  A 2-second polling interval is recommended. For long-running jobs, consider using the streaming trace endpoint instead.
</Note>

## Get results

Once processing is complete, retrieve the extracted content in markdown or structured JSON format.

<CodeGroup>
  ```python Python theme={null}
  # Get results as markdown
  markdown_result = client.documents.get_document_result(
      job_id=job.job_id,
      format="markdown",
  )
  print(markdown_result.content)

  # Get results as structured JSON
  json_result = client.documents.get_document_result(
      job_id=job.job_id,
      format="json",
  )
  print(json_result.content)
  ```

  ```typescript TypeScript theme={null}
  // Get results as markdown
  const markdownResult = await client.documents.getDocumentResult(job.jobId, {
    format: 'markdown',
  });
  console.log(markdownResult.content);

  // Get results as structured JSON
  const jsonResult = await client.documents.getDocumentResult(job.jobId, {
    format: 'json',
  });
  console.log(jsonResult.content);
  ```

  ```go Go theme={null}
  // Get results as markdown
  markdownFmt := "markdown"
  markdownResult, err := client.Documents.GetDocumentResult(ctx, job.JobID, &meibel.GetDocumentResultOptions{
      Format: &markdownFmt,
  })
  if err != nil {
      log.Fatal(err)
  }
  fmt.Println(markdownResult.Content)

  // Get results as structured JSON
  jsonFmt := "json"
  jsonResult, err := client.Documents.GetDocumentResult(ctx, job.JobID, &meibel.GetDocumentResultOptions{
      Format: &jsonFmt,
  })
  if err != nil {
      log.Fatal(err)
  }
  fmt.Println(jsonResult.Content)
  ```

  ```bash CLI theme={null}
  # Markdown format
  meibel documents get-result "$JOB_ID" --format markdown

  # JSON format
  meibel documents get-result "$JOB_ID" --format json
  ```
</CodeGroup>

The `markdown` format returns a clean, readable representation of the document. The `json` format returns structured data including headings, tables, and extracted metadata.

## Process synchronously

For smaller documents where you want the result in a single call, use the synchronous endpoint. It blocks until processing completes and returns the result directly.

<CodeGroup>
  ```python Python theme={null}
  with open("invoice.pdf", "rb") as f:
      result = client.documents.process_document(file=f, file_name="invoice.pdf")

  print(result.content)
  ```

  ```typescript TypeScript theme={null}
  const invoiceBlob = new Blob([await readFile("invoice.pdf")]);
  const result = await client.documents.processDocument(invoiceBlob, "invoice.pdf");

  console.log(result.content);
  ```

  ```go Go theme={null}
  f, err := os.Open("invoice.pdf")
  if err != nil {
      log.Fatal(err)
  }
  defer f.Close()

  result, err := client.Documents.ProcessDocument(ctx, f, "invoice.pdf", nil)
  if err != nil {
      log.Fatal(err)
  }

  fmt.Println(result.Content)
  ```

  ```bash CLI theme={null}
  meibel documents parse --file invoice.pdf --wait
  ```
</CodeGroup>

<Note>
  The synchronous endpoint is best for small files (under 10 MB). For larger documents, use the async workflow with polling or trace streaming.
</Note>

## List child documents

Some documents (e.g., archives, multi-part files) produce child documents during processing. List them by job ID.

<CodeGroup>
  ```python Python theme={null}
  children = client.documents.list_document_children(job_id=job.job_id)

  for child in children:
      print(f"{child.file_name}: {child.status}")
  ```

  ```typescript TypeScript theme={null}
  const children = await client.documents.listDocumentChildren(job.jobId);

  for (const child of children) {
    console.log(`${child.fileName}: ${child.status}`);
  }
  ```

  ```go Go theme={null}
  children, err := client.Documents.ListDocumentChildren(ctx, job.JobID)
  if err != nil {
      log.Fatal(err)
  }

  for _, child := range children {
      fmt.Printf("%s: %s\n", child.FileName, child.Status)
  }
  ```

  ```bash CLI theme={null}
  meibel documents list-children "$JOB_ID"
  ```
</CodeGroup>

## Stream trace events

Stream real-time processing events for a document job. Trace events provide fine-grained progress updates such as page extraction, OCR steps, and content classification.

<CodeGroup>
  ```python Python theme={null}
  for event in client.documents.stream_document_trace(job_id=job.job_id):
      print(f"[{event.type}] {event.message}")
  ```

  ```typescript TypeScript theme={null}
  for await (const event of client.documents.streamDocumentTrace(job.jobId)) {
    console.log(`[${event.type}] ${event.message}`);
  }
  ```

  ```go Go theme={null}
  stream, err := client.Documents.StreamDocumentTrace(ctx, job.JobID)
  if err != nil {
      log.Fatal(err)
  }

  for event := range stream.Events() {
      fmt.Printf("[%s] %s\n", event.Type, event.Message)
  }
  if err := stream.Err(); err != nil {
      log.Fatal(err)
  }
  ```

  ```bash CLI theme={null}
  meibel documents stream-trace "$JOB_ID"
  ```
</CodeGroup>

Trace events are delivered as Server-Sent Events (SSE). Each event includes a `type` (e.g., `"progress"`, `"page_extracted"`, `"complete"`) and a human-readable `message`.
