Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Creative Generation Tools | Copilot Chatbot
Microsoft Copilot Mastery

Creative Generation Tools

Swipe to show menu

Overview

Copilot can generate a wide range of multimedia content directly from text prompts, including:

  • images;
  • audio;
  • 3D models;
  • videos.

The quality of the output depends heavily on how clearly you define your prompt and intent.

Image Generation

You can create images by describing what you want in natural language.

A strong image prompt should include:

  • Goal — what the image should show;
  • Context — where or how it will be used (e.g. packaging, poster, branding);
  • Tone — style and mood (minimalist, playful, professional, etc.);
  • Source (optional) — artistic references or styles (e.g. Van Gogh, traditional ink painting).

Example Use Case

An image prompt can be refined iteratively:

  • first version generates a concept;
  • follow-up prompts adjust style, composition, or detail;
  • final outputs can be adapted for real-world use (e.g. packaging mockups).

Audio Generation

Copilot can generate spoken audio using different modes.

1. Emotive

  • one voice;
  • creative interpretation of the text;
  • emotion-focused delivery;
  • best for social content and expressive messaging.

2. Story

  • multiple voices;
  • dialogue-style narration;
  • best for storytelling and dramatized content.

3. Scripted

  • reads text exactly as written;
  • allows controlled tone and emotion;
  • best for marketing, training, and professional narration.

You can also choose different voice actors, adjust emotional tone (e.g. calm, dramatic, whisper-like), and refine output through iteration.

3D Model Generation

Copilot can turn images into 3D models.

Typical workflow:

  • use a clear image with a simple background;
  • upload it into Copilot 3D;
  • generate and preview the model;
  • rotate, inspect, and export in different formats.

Important: Clean background images produce better results.

Video Generation

Copilot (in supported versions) can generate videos from prompts.

You can:

  • describe the scene and style;
  • include scripts for narration;
  • define tone (e.g. mystical, documentary, cinematic);
  • adjust settings like voice, music, and text overlays.

Video generation often requires iteration — the first version may not match your intent perfectly. You can modify tone, remove text, or adjust style and regenerate until the result matches your vision.

Key Insight

Copilot multimedia tools are iterative and prompt-driven. Better results come from:

  • clear creative direction;
  • strong prompt structure;
  • iterative refinement;
  • specific style and tone instructions.

1. What is the most important factor when generating images with Copilot?

2. What is the main difference between “Scripted” and “Emotive” audio modes?

3. Why is it recommended to use images with plain backgrounds for 3D model generation?

question mark

What is the most important factor when generating images with Copilot?

Select the correct answer

question mark

What is the main difference between “Scripted” and “Emotive” audio modes?

Select the correct answer

question mark

Why is it recommended to use images with plain backgrounds for 3D model generation?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 7

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 7
some-alt