Summary  
This chapter explains how to use AI-driven, prompt-based tools to generate and iteratively refine multimedia assets—including images, audio, 3D models, and videos—by defining clear goals, context, tone, and stylistic references.

General domain of usage  
Marketing and advertising

## Overview

Copilot can generate a wide range of multimedia content directly from text prompts, including:

- images;
- audio;
- 3D models;
- videos.

The quality of the output depends heavily on how clearly you define your prompt and intent.

## Image Generation

You can create images by describing what you want in natural language.

A strong image prompt should include:

- **Goal** — what the image should show;
- **Context** — where or how it will be used (e.g. packaging, poster, branding);
- **Tone** — style and mood (minimalist, playful, professional, etc.);
- **Source** (optional) — artistic references or styles (e.g. Van Gogh, traditional ink painting).


## Example Use Case

An image prompt can be refined iteratively:

- first version generates a concept;
- follow-up prompts adjust style, composition, or detail;
- final outputs can be adapted for real-world use (e.g. packaging mockups).


## Audio Generation

Copilot can generate spoken audio using different modes.

**1. Emotive**

- one voice;
- creative interpretation of the text;
- emotion-focused delivery;
- best for social content and expressive messaging.

**2. Story**

- multiple voices;
- dialogue-style narration;
- best for storytelling and dramatized content.

**3. Scripted**

- reads text exactly as written;
- allows controlled tone and emotion;
- best for marketing, training, and professional narration.

You can also choose different voice actors, adjust emotional tone (e.g. calm, dramatic, whisper-like), and refine output through iteration.

## 3D Model Generation

Copilot can turn images into 3D models.

Typical workflow:

- use a clear image with a simple background;
- upload it into Copilot 3D;
- generate and preview the model;
- rotate, inspect, and export in different formats.

**Important:** Clean background images produce better results.

## Video Generation

Copilot (in supported versions) can generate videos from prompts.

You can:

- describe the scene and style;
- include scripts for narration;
- define tone (e.g. mystical, documentary, cinematic);
- adjust settings like voice, music, and text overlays.

Video generation often requires iteration — the first version may not match your intent perfectly. You can modify tone, remove text, or adjust style and regenerate until the result matches your vision.

## Key Insight

Copilot multimedia tools are iterative and prompt-driven. Better results come from:

- clear creative direction;
- strong prompt structure;
- iterative refinement;
- specific style and tone instructions.

What is the most important factor when generating images with Copilot?

What is the main difference between “Scripted” and “Emotive” audio modes?

Why is it recommended to use images with plain backgrounds for 3D model generation?

A comprehensive course on Microsoft Copilot, covering its chatbot, integration with Outlook, Word, PowerPoint, and Excel, and practical applications for productivity and creativity.

Explore the basics of Microsoft Copilot, its chatbot interface, prompt engineering, and advanced analysis and generation features.

Learn how to use Copilot in Outlook for drafting, editing, summarizing, and translating emails.

Master Copilot's features in Word for drafting, editing, formatting, and transforming documents.

Leverage Copilot in PowerPoint to create, edit, design, and analyze presentations.

Unlock Copilot's power in Excel for table creation, data analysis, formula generation, and visualization.

Creative Generation Tools

Overview

Image Generation

Example Use Case

Audio Generation

3D Model Generation

Video Generation

Key Insight

1. What is the most important factor when generating images with Copilot?

2. What is the main difference between “Scripted” and “Emotive” audio modes?

3. Why is it recommended to use images with plain backgrounds for 3D model generation?