AI SecureChat’s multimodal capabilities allow you to work with images and visual content alongside text, creating a more comprehensive and powerful AI experience. This guide explores how to leverage these capabilities effectively.

Understanding Multimodal AI

Multimodal AI can process and understand multiple types of information (modalities), including:

  • Text (natural language)
  • Images and video
  • Audio and speech
  • Charts and diagrams
  • Structured data

This allows for more comprehensive understanding and analysis across different forms of content.

Working with Images

Uploading Images

1

Access image upload

Click the upload button (📎) in the message input area and select an image, or drag and drop an image directly into the chat.

Supported image formats include:

  • PNG
  • JPEG/JPG
  • GIF (static)
  • WebP
  • BMP
  • SVG (as an image; code parsing may be limited)
2

Add context (optional)

After uploading an image, you can provide additional context or specific questions about the image.

Providing context can help guide the AI’s analysis and generate more relevant responses.

3

Submit for analysis

Send your message with the image to have the AI process and analyze it.

The AI will acknowledge the image and provide an initial response based on its content.

Types of Image Analysis

General Image Description

Get a comprehensive description of what’s in an image

Text Extraction (OCR)

Extract and process text visible in images

Chart and Graph Analysis

Interpret data visualizations and extract insights

Technical Diagram Interpretation

Understand flowcharts, network diagrams, and technical illustrations

Document Analysis

Process documents that contain both text and visual elements

UI/UX Analysis

Evaluate screenshots of user interfaces

Content Categorization

Identify the type and category of visual content

Object and Entity Recognition

Identify specific objects and entities within images

Example Prompts for Image Analysis

Try these prompts after uploading an image: Describe what you see in this image in detail. Extract all the text visible in this image. What data does this chart show? Summarize the key trends. Explain this technical diagram and how the components interact. Identify any problems with this user interface design. Is there any personal or sensitive information in this image? Create a table of all the products and prices shown in this image. What are the key elements of this logo design? Copy

Advanced Image Interactions

Specific Use Cases

Text Extraction (OCR)

Extract and work with text from images:

1

Upload an image containing text

This can include:

  • Scanned documents
  • Photos of printed materials
  • Screenshots with text
  • Whiteboards and handwritten notes (with limitations)
2

Request text extraction

Ask the AI to extract the text with prompts like:

Extract all the text from this image.

Transcribe the content of this document.

What text appears on this slide?

Create a digital version of this handwritten note.
3

Work with the extracted text

Once the text is extracted, you can ask the AI to:

  • Summarize the content
  • Answer questions about the text
  • Format or structure the information
  • Translate the extracted text
  • Find specific information within it

OCR performance varies based on:

  • Text clarity and image quality
  • Font type and size
  • Background contrast
  • Image resolution

For best results, use clear, high-resolution images with good lighting and contrast.

Chart and Graph Analysis

Get insights from data visualizations:

1

Upload a chart or graph

Support for various chart types:

  • Bar charts and histograms
  • Line graphs
  • Pie and donut charts
  • Scatter plots
  • Area charts
  • Combined visualizations
2

Ask for analysis

Request insights with prompts like:

What trends does this chart show?

Summarize the key findings from this graph.

What's the highest value in this chart and when did it occur?

Compare the performance of different categories in this chart.

Extract the approximate data values from this visualization.
3

Explore specific aspects

Dive deeper with follow-up questions:

Why might there be a spike in July?

Is there a correlation between these variables?

What's the growth rate between 2020 and 2022?

Which segment is performing below average?

Technical Diagram Interpretation

Understand complex visual information:

1

Upload a technical diagram

Works with various diagram types:

  • Flowcharts and process diagrams
  • Network and system architectures
  • UML diagrams
  • Circuit diagrams
  • Engineering schematics
  • Entity-relationship diagrams
2

Request explanation

Get comprehensive interpretations with prompts like:

Explain how this system works based on the diagram.

Describe the workflow shown in this flowchart.

What components are in this architecture and how do they interact?

Identify potential bottlenecks in this process diagram.

Translate this technical diagram into a written explanation.
3

Ask for specific details

Focus on particular elements:

What happens in the exception handling path?

How does data flow between the database and the API layer?

What security measures are visible in this network diagram?

What does this specific symbol/notation mean?

UI/UX Analysis

Evaluate and improve user interfaces:

1

Upload UI screenshots

Analyze various UI elements:

  • Website pages
  • Mobile app screens
  • Software interfaces
  • Design mockups
  • Forms and interactive elements
2

Request design analysis

Get UX insights with prompts like:

Evaluate this interface design for usability issues.

What improvements could be made to this form?

Is this design accessible? What could be improved?

Analyze the visual hierarchy of this page.

How could this UI be simplified while maintaining functionality?
3

Focus on specific aspects

Target particular design elements:

Is the call-to-action button prominent enough?

How could the navigation be improved?

Analyze the color scheme and contrast ratios.

Is the information architecture intuitive?

What mobile optimization issues do you see?

Working with Audio

AI SecureChat can also process audio content with compatible multimodal models:

Uploading Audio

1

Access audio upload

Click the upload button (📎) in the message input area and select an audio file, or drag and drop directly into the chat.

Supported audio formats typically include:

  • MP3
  • WAV
  • M4A
  • OGG
  • FLAC
2

Add context (optional)

Provide additional information about the audio to guide the AI’s analysis:

This is a recording of our team meeting from yesterday.

This is a customer support call that needs summarizing.

This is a voice memo about project ideas I recorded.

This is an interview for transcription and analysis.
3

Submit for processing

Send your message with the audio file to have the AI process it.

The AI will acknowledge the audio and provide a response based on its content.

Audio Analysis Capabilities

Transcription

Convert spoken content to written text

Meeting Summarization

Extract key points and action items from recordings

Translation

Transcribe and translate audio to different languages

Speaker Identification

Distinguish between different speakers (with limitations)

Content Analysis

Identify topics, themes, and sentiments in spoken content

Q&A on Audio Content

Answer questions based on information in the audio

Example Prompts for Audio Analysis

Try these prompts after uploading an audio file: Transcribe this audio recording. Summarize the key points from this meeting. What action items were mentioned in this recording? Translate this speech to French. Identify the main topics discussed in this conversation. Create a timeline of events mentioned in this recording. What was the sentiment of the speakers in this discussion? Extract all the numbers and statistics mentioned. Copy

Audio Transcription and Processing

Convert speech to text with various options:

  • Verbatim transcription (including filler words, pauses)
  • Clean transcription (removing stutters, false starts)
  • Timestamped transcription
  • Speaker-attributed transcription (where possible)

Example prompts:

Provide a verbatim transcription of this audio.

Transcribe this recording with timestamps every 30 seconds.

Create a clean transcription removing filler words and stutters.

Transcribe and identify different speakers if possible.

Audio Generation

Some multimodal models may offer limited audio generation capabilities:

Audio generation features:

  • Are typically more limited than image generation
  • May only be available with specific models
  • Often have restrictions on duration and complexity
  • May be in experimental phases depending on your organization’s Prisme.ai version

Check with your administrator about the availability of audio generation features in your environment.

Best Practices for Multimodal Work

Use High-Quality Media

Provide clear, well-lit images and clean audio recordings for best results.

Be Specific in Prompts

Clearly describe what aspects of the media you want the AI to focus on.

Combine Modalities Strategically

Use multiple media types together when they complement each other.

Verify Critical Information

Double-check important details extracted from images or audio.

Consider Privacy and Sensitivity

Be mindful of sensitive content in uploaded media, especially with faces or personal information.

Use Canvas for Complex Work

Leverage Canvas for more sophisticated editing and organization of multimodal content.

Save Intermediate Results

Export or save important outputs, especially for large media files that may be processed again.

Provide Context

Add explanatory text when uploading media to guide the AI’s understanding.

Troubleshooting Multimodal Issues

Privacy and Security Considerations

When working with multimodal content, be mindful of:

  • Personal Information: Avoid uploading images or audio with personally identifiable information (PII) unless necessary and permitted by your organization’s policies.

  • Confidential Content: Consider the sensitivity of visual information in screenshots, diagrams, or documents.

  • Consent: Ensure you have appropriate permissions when uploading media that includes other people, especially for audio recordings of conversations or meetings.

  • Data Retention: Understand your organization’s policies regarding how long uploaded media is retained in the system.

Prisme.ai implements several security measures for multimodal content:

Next Steps

Now that you understand the multimodal capabilities in AI SecureChat, explore these related features: