Computer Vision
Fast OCR and visual analysis for images via a single synchronous call
Document Intelligence
Async, structured extraction for PDFs and complex layouts with prebuilt
models
Prerequisites
- An Azure subscription with access to Azure AI services
- A Computer Vision (or Multi-Service Cognitive) resource and its endpoint + API key
- A Document Intelligence resource and its endpoint + API key
You can configure only one of the two services — each automation only requires
the config of the service it targets.
Installation
- Go to Apps in your workspace
- Search for Azure OCR and install it
- Open the app instance configuration and fill in the required fields
Configuration
| Field | Description |
|---|---|
| Computer Vision — Endpoint | Azure endpoint hostname, e.g. az-aismsa-xxx.cognitiveservices.azure.com |
| Computer Vision — API Key | API key, stored as the workspace secret computervisionApiKey |
| Document Intelligence — Endpoint | Azure endpoint hostname, e.g. az-di-xxx.cognitiveservices.azure.com |
| Document Intelligence — API Key | API key, stored as the workspace secret documentIntelligenceApiKey |
Endpoints are stored as hostnames only — the app automatically strips
https:// and trailing slashes, so both
https://xxx.cognitiveservices.azure.com/ and
xxx.cognitiveservices.azure.com work.Available Instructions
analyzeImage (Computer Vision)
Send an image URL to Azure Computer Vision and return the analysis result (OCR and optional visual features). This call is synchronous.| Argument | Description |
|---|---|
source* | Public URL of the image to analyze |
features | Comma-separated features (default Read). Accepted values: Read, Caption, Tags, Objects, People |
language | Language hint (default en, e.g. fr) |
2024-02-01.
readResult, captionResult, tagsResult, …).
analyzeDocument (Document Intelligence)
Send a document (URL or base64) to Azure Document Intelligence, poll for the result, and return either a simplified or raw extraction. This automation handles the async submit-and-poll cycle internally (2s interval, 60s timeout).| Argument | Description |
|---|---|
source* | Document URL or base64 content |
sourceType | url (default) or base64 |
model | Document Intelligence model (default prebuilt-layout). See list below |
pages | Pages to analyze, e.g. 1-3,5,7-9 |
locale | Locale hint, e.g. fr-FR |
outputContentFormat | text (default) or markdown |
outputFormat | simple (default, post-processed) or raw (full Azure response) |
| Model | Use case |
|---|---|
prebuilt-read | Pure text extraction (fastest, lightweight) |
prebuilt-layout | Text + tables + structure (default) |
prebuilt-invoice | Structured invoice fields |
prebuilt-receipt | Structured receipt fields |
prebuilt-idDocument | Passports, IDs, driver’s licences |
prebuilt-businessCard | Business card fields |
2024-11-30.
URL source
Base64 source
Raw output
SetoutputFormat: raw to bypass the internal simplification and receive the full analyzeResult payload from Azure (pages, tables, key-value pairs, styles, …).
Error Handling
Both instructions emit anerror event and return a structured error payload instead of raising. The payload shape is:
| Error | Typical cause |
|---|---|
ConfigurationError | Missing endpoint or API key in the app config |
ValidationError | source argument is empty |
NetworkError | Endpoint unreachable — check hostname format (no scheme, no path) |
Unauthorized | Invalid or revoked API key |
Forbidden | The resource exists but the key lacks access (Computer Vision only) |
NotFound | prebuilt-<model> does not exist, or the Computer Vision resource is missing |
RateLimited | Azure throttling — retry with back-off |
ServerError | Azure 5xx |
ProcessingFailed | Document Intelligence could not process the document (corrupted, password-protected, unsupported format) |
Timeout | Document Intelligence polling exceeded 60s — try prebuilt-read or fewer pages |
Listening for Errors
Because both instructions emit anerror event on failure, you can globally monitor Azure OCR issues from another automation:
Example Use Cases
- Invoice OCR
- ID Verification
- Image OCR
- Markdown Layout
Extract structured invoice data from a PDF sent by email.
Best Practices
- Hostname only — always strip
https://and the trailing slash from endpoints to avoidNetworkError. - Pick the right model —
prebuilt-readis ~3× faster thanprebuilt-layout; uselayoutonly when tables or structure matter. - Prefer
urloverbase64for large documents — base64 uploads go through the Prisme.ai runtime and inflate the request size by ~33%. - Handle the 60s timeout — for long PDFs, either reduce
pagesor split the document upstream. - Use
simpleoutput for LLM pipelines (flattened text/markdown) andrawwhen you need Azure’s full structured result (bounding boxes, tables, key-value pairs).
External Resources
Azure Computer Vision
Image Analysis 4.0 reference
Azure Document Intelligence
Document Intelligence overview and prebuilt models
Prebuilt Models
Full list of prebuilt and custom models
API Keys & Endpoints
Create a Multi-Service Cognitive resource