Skip to main content
The Azure OCR app wraps two Azure AI services behind a single Prisme.ai app: Azure Computer Vision for lightweight image OCR (Read, Caption, Tags, Objects, People) and Azure Document Intelligence for structured extraction of documents (layouts, invoices, receipts, ID documents, …).

Computer Vision

Fast OCR and visual analysis for images via a single synchronous call

Document Intelligence

Async, structured extraction for PDFs and complex layouts with prebuilt models

Prerequisites

  • An Azure subscription with access to Azure AI services
  • A Computer Vision (or Multi-Service Cognitive) resource and its endpoint + API key
  • A Document Intelligence resource and its endpoint + API key
You can configure only one of the two services — each automation only requires the config of the service it targets.

Installation

  1. Go to Apps in your workspace
  2. Search for Azure OCR and install it
  3. Open the app instance configuration and fill in the required fields

Configuration

FieldDescription
Computer Vision — EndpointAzure endpoint hostname, e.g. az-aismsa-xxx.cognitiveservices.azure.com
Computer Vision — API KeyAPI key, stored as the workspace secret computervisionApiKey
Document Intelligence — EndpointAzure endpoint hostname, e.g. az-di-xxx.cognitiveservices.azure.com
Document Intelligence — API KeyAPI key, stored as the workspace secret documentIntelligenceApiKey
Endpoints are stored as hostnames only — the app automatically strips https:// and trailing slashes, so both https://xxx.cognitiveservices.azure.com/ and xxx.cognitiveservices.azure.com work.

Available Instructions

analyzeImage (Computer Vision)

Send an image URL to Azure Computer Vision and return the analysis result (OCR and optional visual features). This call is synchronous.
ArgumentDescription
source*Public URL of the image to analyze
featuresComma-separated features (default Read). Accepted values: Read, Caption, Tags, Objects, People
languageLanguage hint (default en, e.g. fr)
Uses the Image Analysis API version 2024-02-01.
- Azure OCR.analyzeImage:
    source: https://example.com/receipt.jpg
    features: Read,Caption,Tags
    language: fr
    output: result
Returns the raw Azure response body (readResult, captionResult, tagsResult, …).

analyzeDocument (Document Intelligence)

Send a document (URL or base64) to Azure Document Intelligence, poll for the result, and return either a simplified or raw extraction. This automation handles the async submit-and-poll cycle internally (2s interval, 60s timeout).
ArgumentDescription
source*Document URL or base64 content
sourceTypeurl (default) or base64
modelDocument Intelligence model (default prebuilt-layout). See list below
pagesPages to analyze, e.g. 1-3,5,7-9
localeLocale hint, e.g. fr-FR
outputContentFormattext (default) or markdown
outputFormatsimple (default, post-processed) or raw (full Azure response)
Available prebuilt models:
ModelUse case
prebuilt-readPure text extraction (fastest, lightweight)
prebuilt-layoutText + tables + structure (default)
prebuilt-invoiceStructured invoice fields
prebuilt-receiptStructured receipt fields
prebuilt-idDocumentPassports, IDs, driver’s licences
prebuilt-businessCardBusiness card fields
Uses the Document Intelligence API version 2024-11-30.

URL source

- Azure OCR.analyzeDocument:
    source: https://example.com/invoice.pdf
    model: prebuilt-invoice
    locale: fr-FR
    outputContentFormat: markdown
    output: invoice

Base64 source

- Azure OCR.analyzeDocument:
    source: '{{pdf_base64}}'
    sourceType: base64
    model: prebuilt-layout
    pages: 1-3
    output: layout

Raw output

Set outputFormat: raw to bypass the internal simplification and receive the full analyzeResult payload from Azure (pages, tables, key-value pairs, styles, …).
- Azure OCR.analyzeDocument:
    source: '{{document_url}}'
    model: prebuilt-layout
    outputFormat: raw
    output: raw_result

Error Handling

Both instructions emit an error event and return a structured error payload instead of raising. The payload shape is:
{
  "error": "ConfigurationError | ValidationError | NetworkError | AzureError | Unauthorized | Forbidden | NotFound | RateLimited | ServerError | ProcessingFailed | Timeout",
  "message": "Human-readable explanation",
  "details": {
    "service": "computervision | documentIntelligence",
    "automation": "analyzeImage | analyzeDocument",
    "status": 401,
    "body": {}
  }
}
ErrorTypical cause
ConfigurationErrorMissing endpoint or API key in the app config
ValidationErrorsource argument is empty
NetworkErrorEndpoint unreachable — check hostname format (no scheme, no path)
UnauthorizedInvalid or revoked API key
ForbiddenThe resource exists but the key lacks access (Computer Vision only)
NotFoundprebuilt-<model> does not exist, or the Computer Vision resource is missing
RateLimitedAzure throttling — retry with back-off
ServerErrorAzure 5xx
ProcessingFailedDocument Intelligence could not process the document (corrupted, password-protected, unsupported format)
TimeoutDocument Intelligence polling exceeded 60s — try prebuilt-read or fewer pages

Listening for Errors

Because both instructions emit an error event on failure, you can globally monitor Azure OCR issues from another automation:
slug: onAzureOcrError
when:
  events:
    - error
do:
  - conditions:
      '{{payload.details.service}} == "documentIntelligence" || {{payload.details.service}} == "computervision"':
        - comment: Forward to monitoring
        - fetch:
            url: '{{config.monitoringWebhook}}'
            method: POST
            body: '{{payload}}'

Example Use Cases

Extract structured invoice data from a PDF sent by email.
- Azure OCR.analyzeDocument:
    source: '{{attachment.url}}'
    model: prebuilt-invoice
    locale: fr-FR
    output: invoice
- conditions:
    '{{invoice.error}}':
      - emit:
          event: invoice.ocr.failed
          payload: '{{invoice}}'
    default:
      - emit:
          event: invoice.parsed
          payload: '{{invoice}}'

Best Practices

  • Hostname only — always strip https:// and the trailing slash from endpoints to avoid NetworkError.
  • Pick the right modelprebuilt-read is ~3× faster than prebuilt-layout; use layout only when tables or structure matter.
  • Prefer url over base64 for large documents — base64 uploads go through the Prisme.ai runtime and inflate the request size by ~33%.
  • Handle the 60s timeout — for long PDFs, either reduce pages or split the document upstream.
  • Use simple output for LLM pipelines (flattened text/markdown) and raw when you need Azure’s full structured result (bounding boxes, tables, key-value pairs).

External Resources

Azure Computer Vision

Image Analysis 4.0 reference

Azure Document Intelligence

Document Intelligence overview and prebuilt models

Prebuilt Models

Full list of prebuilt and custom models

API Keys & Endpoints

Create a Multi-Service Cognitive resource