Multimodal Capabilities

Chat’s multimodal capabilities allow you to work with images and visual content alongside text, creating a more comprehensive and powerful AI experience. This guide explores how to leverage these capabilities effectively.

Understanding Multimodal AI

What is Multimodal AI?
Supported Modalities
Model Requirements

Multimodal AI can process and understand multiple types of information (modalities), including:

Text (natural language)
Images and video
Audio and speech
Charts and diagrams
Structured data

This allows for more comprehensive understanding and analysis across different forms of content.

Working with Images

Uploading Images

Access image upload

Click the paperclip icon at the bottom-left of the message composer and pick an image, or drag and drop an image directly into the chat.

Chat file picker open over the conversation view

Supported image formats include:

PNG
JPEG/JPG
GIF (static)
WebP
BMP
SVG (as an image; code parsing may be limited)

Add context (optional)

After uploading an image, you can provide additional context or specific questions about the image.Providing context can help guide the AI’s analysis and generate more relevant responses.

Submit for analysis

Send your message with the image to have the AI process and analyze it.The AI will acknowledge the image and provide an initial response based on its content.

Types of Image Analysis

General Image Description

Get a comprehensive description of what’s in an image

Text Extraction (OCR)

Extract and process text visible in images

Chart and Graph Analysis

Interpret data visualizations and extract insights

Technical Diagram Interpretation

Understand flowcharts, network diagrams, and technical illustrations

Document Analysis

Process documents that contain both text and visual elements

UI/UX Analysis

Evaluate screenshots of user interfaces

Content Categorization

Identify the type and category of visual content

Object and Entity Recognition

Identify specific objects and entities within images

Example Prompts for Image Analysis

Try these prompts after uploading an image: Describe what you see in this image in detail. Extract all the text visible in this image. What data does this chart show? Summarize the key trends. Explain this technical diagram and how the components interact. Identify any problems with this user interface design. Is there any personal or sensitive information in this image? Create a table of all the products and prices shown in this image. What are the key elements of this logo design? Copy

Advanced Image Interactions

Reference Specific Parts of Images

Direct the AI’s attention to particular areas or elements:Example prompts:

What is shown in the upper left corner of the image?

Can you describe the object in the center of the photo?

What does the graph line indicate between points A and B?

What text appears in the red box in this screenshot?

For best results, describe the location clearly when referring to specific parts of an image.

Compare Multiple Images

Upload several images to analyze similarities, differences, or relationships:Example prompts:

What are the main differences between these two diagrams?

How has the design evolved between the first and second version?

Compare the data shown in these two charts.

Which of these four logo designs best communicates professionalism?

You can reference images by their order (“first image,” “second image”) or by describing their distinctive features.

Sequential Image Analysis

Build on previous image analysis in a conversation:Example conversation flow:

[Upload image of a chart]
User: What trends does this sales chart show?
AI: [Provides analysis of sales trends]

[Upload image of another chart]
User: How do these results compare to the previous chart?
AI: [Compares both charts, referencing its earlier analysis]

User: What might explain the difference in Q3 results?
AI: [Provides potential explanations based on both images]

The AI maintains context from previous images throughout the conversation.

Specific Use Cases

Text Extraction (OCR)

Extract and work with text from images:

Upload an image containing text

This can include:

Scanned documents
Photos of printed materials
Screenshots with text
Whiteboards and handwritten notes (with limitations)

Request text extraction

Ask the AI to extract the text with prompts like:

Extract all the text from this image.

Transcribe the content of this document.

What text appears on this slide?

Create a digital version of this handwritten note.

Work with the extracted text

Once the text is extracted, you can ask the AI to:

Summarize the content
Answer questions about the text
Format or structure the information
Translate the extracted text
Find specific information within it

OCR performance varies based on:

Text clarity and image quality
Font type and size
Background contrast
Image resolution

For best results, use clear, high-resolution images with good lighting and contrast.

Chart and Graph Analysis

Get insights from data visualizations:

Upload a chart or graph

Support for various chart types:

Bar charts and histograms
Line graphs
Pie and donut charts
Scatter plots
Area charts
Combined visualizations

Ask for analysis

Request insights with prompts like:

What trends does this chart show?

Summarize the key findings from this graph.

What's the highest value in this chart and when did it occur?

Compare the performance of different categories in this chart.

Extract the approximate data values from this visualization.

Explore specific aspects

Dive deeper with follow-up questions:

Why might there be a spike in July?

Is there a correlation between these variables?

What's the growth rate between 2020 and 2022?

Which segment is performing below average?

Technical Diagram Interpretation

Understand complex visual information:

Upload a technical diagram

Works with various diagram types:

Flowcharts and process diagrams
Network and system architectures
UML diagrams
Circuit diagrams
Engineering schematics
Entity-relationship diagrams

Request explanation

Get comprehensive interpretations with prompts like:

Explain how this system works based on the diagram.

Describe the workflow shown in this flowchart.

What components are in this architecture and how do they interact?

Identify potential bottlenecks in this process diagram.

Translate this technical diagram into a written explanation.

Ask for specific details

Focus on particular elements:

What happens in the exception handling path?

How does data flow between the database and the API layer?

What security measures are visible in this network diagram?

What does this specific symbol/notation mean?

UI/UX Analysis

Evaluate and improve user interfaces:

Upload UI screenshots

Analyze various UI elements:

Website pages
Mobile app screens
Software interfaces
Design mockups
Forms and interactive elements

Request design analysis

Get UX insights with prompts like:

Evaluate this interface design for usability issues.

What improvements could be made to this form?

Is this design accessible? What could be improved?

Analyze the visual hierarchy of this page.

How could this UI be simplified while maintaining functionality?

Focus on specific aspects

Target particular design elements:

Is the call-to-action button prominent enough?

How could the navigation be improved?

Analyze the color scheme and contrast ratios.

Is the information architecture intuitive?

What mobile optimization issues do you see?

Working with Audio

Chat can also process audio content with compatible multimodal models:

Uploading Audio

Access audio upload

Click the upload button (📎) in the message input area and select an audio file, or drag and drop directly into the chat.Supported audio formats typically include:

MP3
WAV
M4A
OGG
FLAC

Add context (optional)

Provide additional information about the audio to guide the AI’s analysis:

This is a recording of our team meeting from yesterday.

This is a customer support call that needs summarizing.

This is a voice memo about project ideas I recorded.

This is an interview for transcription and analysis.

Submit for processing

Send your message with the audio file to have the AI process it.The AI will acknowledge the audio and provide a response based on its content.

Audio Analysis Capabilities

Transcription

Convert spoken content to written text

Meeting Summarization

Extract key points and action items from recordings

Translation

Transcribe and translate audio to different languages

Speaker Identification

Distinguish between different speakers (with limitations)

Content Analysis

Identify topics, themes, and sentiments in spoken content

Q&A on Audio Content

Answer questions based on information in the audio

Example Prompts for Audio Analysis

Try these prompts after uploading an audio file: Transcribe this audio recording. Summarize the key points from this meeting. What action items were mentioned in this recording? Translate this speech to French. Identify the main topics discussed in this conversation. Create a timeline of events mentioned in this recording. What was the sentiment of the speakers in this discussion? Extract all the numbers and statistics mentioned. Copy

Audio Transcription and Processing

Basic Transcription
Meeting Summarization
Content Analysis

Convert speech to text with various options:

Verbatim transcription (including filler words, pauses)
Clean transcription (removing stutters, false starts)
Timestamped transcription
Speaker-attributed transcription (where possible)

Example prompts:

Provide a verbatim transcription of this audio.

Transcribe this recording with timestamps every 30 seconds.

Create a clean transcription removing filler words and stutters.

Transcribe and identify different speakers if possible.

Extract structured information from meetings:

Key discussion points
Decisions made
Action items and owners
Follow-up questions
Deadlines mentioned

Example prompts:

Summarize this meeting recording in bullet points.

Extract all action items and their owners from this meeting.

What decisions were made in this discussion?

Create a structured summary with sections for context, discussion, decisions, and next steps.

Analyze the substance and characteristics of audio:

Topic identification
Sentiment analysis
Key information extraction
Tone and style assessment
Pattern recognition

Example prompts:

What are the main topics covered in this recording?

Analyze the speaker's tone and sentiment throughout.

Extract all numerical data and statistics mentioned.

Identify any technical terms used and provide explanations.

Audio Generation

Some multimodal models may offer limited audio generation capabilities:

Audio generation features:

Are typically more limited than image generation
May only be available with specific models
Often have restrictions on duration and complexity
May be in experimental phases depending on your organization’s Prisme.ai version

Check with your administrator about the availability of audio generation features in your environment.

Best Practices for Multimodal Work

Use High-Quality Media

Provide clear, well-lit images and clean audio recordings for best results.

Be Specific in Prompts

Clearly describe what aspects of the media you want the AI to focus on.

Combine Modalities Strategically

Use multiple media types together when they complement each other.

Verify Critical Information

Double-check important details extracted from images or audio.

Consider Privacy and Sensitivity

Be mindful of sensitive content in uploaded media, especially with faces or personal information.

Use Canvas for Complex Work

Leverage Canvas for more sophisticated editing and organization of multimodal content.

Save Intermediate Results

Export or save important outputs, especially for large media files that may be processed again.

Provide Context

Add explanatory text when uploading media to guide the AI’s understanding.

Troubleshooting Multimodal Issues

Image not being processed

If the AI doesn’t properly analyze your image:

Check that you’re using a multimodal-capable model
Verify the image format is supported
Ensure the image isn’t too large (try compressing)
Check that the image uploaded completely
Try describing what’s in the image as context
For complex images, try focusing on specific parts

Poor image analysis quality

If image analysis results are inaccurate or vague:

Improve image quality (resolution, lighting, focus)
Try a different multimodal model if available
Be more specific in your prompts
For text extraction, ensure text is clear and readable
For charts, make sure data points and labels are visible
Try cropping the image to focus on the relevant part

Audio processing issues

If audio isn’t being transcribed correctly:

Check audio quality and reduce background noise if possible
Verify the audio format is supported
Try shorter audio segments for complex recordings
Provide context about speakers, topic, or terminology
For non-English audio, specify the language
Try a model specifically optimized for audio if available

Image generation not working

If you can’t generate images or results are poor:

Verify your model supports image generation
Check if generation features are enabled in your instance
Be more specific and detailed in your description
Break complex images into simpler requests
Try different styles or approaches
Be aware of content policy restrictions

Privacy and Security Considerations

When working with multimodal content, be mindful of:

Personal Information: Avoid uploading images or audio with personally identifiable information (PII) unless necessary and permitted by your organization’s policies.
Confidential Content: Consider the sensitivity of visual information in screenshots, diagrams, or documents.
Consent: Ensure you have appropriate permissions when uploading media that includes other people, especially for audio recordings of conversations or meetings.
Data Retention: Understand your organization’s policies regarding how long uploaded media is retained in the system.

Prisme.ai implements several security measures for multimodal content:

Next Steps

Now that you understand the multimodal capabilities in Chat, explore these related features:

Document Handling

Work with complex documents containing text and images

Canvas

Create rich content incorporating visual elements

Conversation Management

Organize conversations with multimodal content

​Understanding Multimodal AI

​Working with Images

​Uploading Images

​Types of Image Analysis

General Image Description

Text Extraction (OCR)

Chart and Graph Analysis

Technical Diagram Interpretation

Document Analysis

UI/UX Analysis

Content Categorization

Object and Entity Recognition

​Example Prompts for Image Analysis

​Advanced Image Interactions

​Specific Use Cases

​Text Extraction (OCR)

​Chart and Graph Analysis

​Technical Diagram Interpretation

​UI/UX Analysis

​Working with Audio

​Uploading Audio

​Audio Analysis Capabilities

Transcription

Meeting Summarization

Translation

Speaker Identification

Content Analysis

Q&A on Audio Content

​Example Prompts for Audio Analysis

​Audio Transcription and Processing

​Audio Generation

​Best Practices for Multimodal Work

Use High-Quality Media

Be Specific in Prompts

Combine Modalities Strategically

Verify Critical Information

Consider Privacy and Sensitivity

Use Canvas for Complex Work

Save Intermediate Results

Provide Context

​Troubleshooting Multimodal Issues

​Privacy and Security Considerations

​Next Steps

Document Handling

Canvas

Conversation Management

Understanding Multimodal AI

Working with Images

Uploading Images

Types of Image Analysis

Example Prompts for Image Analysis

Advanced Image Interactions

Specific Use Cases

Text Extraction (OCR)

Chart and Graph Analysis

Technical Diagram Interpretation

UI/UX Analysis

Working with Audio

Uploading Audio

Audio Analysis Capabilities

Example Prompts for Audio Analysis

Audio Transcription and Processing

Audio Generation

Best Practices for Multimodal Work

Troubleshooting Multimodal Issues

Privacy and Security Considerations

Next Steps