Seedance2-Skill: Write Video Prompts That Actually Work

Smars
Agent Skills , Open Source
06 Jun, 2026

You open an AI video tool. You type “cinematic shot of a woman walking through a neon-lit street.” You hit generate.

The result: a woman walking, yes. But the lighting is flat, the camera is locked in place, and the neon looks like a cheap filter from 2018.

AI video generation has a syntax problem. The models are capable of incredible things — dynamic camera moves, precise scene composition, audio-visual sync — but they’re waiting for prompts written in a language most people don’t speak.

Seedance2-Skill solves that. It teaches your AI agent the full prompt language of ByteDance’s Seedance 2.0 — the @ reference system, camera terminology, scene patterns, and templates for real-world production scenarios.

What You Get

This skill is a prompt-writing guide packaged for your AI agent. It doesn’t generate video directly. It teaches the agent how to write prompts that produce the result you want.

@ reference syntax: the core mechanism that tells Seedance 2.0 what each uploaded image, video, or audio clip is supposed to do — first frame, character reference, camera motion template, background music
Camera language: push, pull, pan, tilt, follow, orbit, Hitchcock zoom, POV — the full vocabulary of cinematography mapped to prompt terms the model understands
Scene patterns: 12+ ready-to-use templates for ads, short dramas, music videos, educational content, product showcases, dance videos, and more
Input constraints: what image/video/audio formats and counts are allowed, so the agent doesn’t suggest impossible combinations

What Is Seedance 2.0?

Seedance 2.0 is ByteDance’s multimodal AI video generation model (branded as 即梦 / Jimeng). It takes four input types — images, videos, audio, and text — and produces 4 to 15-second videos at up to 720p. It handles character consistency, camera motion imitation, creative effects, and audio-visual synchronization.

Unlike text-only video generators, Seedance 2.0 expects you to upload reference material and describe how each piece should be used. That’s the prompt language problem.

The @ Reference System

This is the key insight the skill teaches. Every uploaded file gets a numbered reference: @image1, @video2, @audio1. You then assign each one a specific role:

@image1 as first frame — this is where the video starts
@image2 for character appearance — the model anchors on this face/outfit
@video1 for camera motion and action choreography — replicate the movement
@audio1 for background BGM — score the scene

Without this syntax, the model treats your uploads as loose inspiration. With it, you get granular control over what happens to each element.

A typical composed prompt:

@image1 character as the protagonist, follow @video1’s camera motion and action choreography, background BGM from @audio1, setting from @image2, one continuous take

The skill also catches common mistakes: vague references (“refer to @video1” — refer to what exactly?), conflicting camera instructions, overloading 5 seconds with 12 scene changes.

Prompt Structure

A Seedance 2.0 prompt follows a template:

[Subject/character] + [Setting/environment] + [Action/motion] +
[Camera language] + [Timed segments] + [Transitions/effects] +
[Audio/sound design] + [Style/atmosphere]

For videos over 8 seconds, time-segmented prompts work best:

0–3s: [opening shot, camera motion, action]
3–6s: [mid-section development]
6–10s: [climax or key action]
10–15s: [closing, freeze frame, brand text]

This structure is what separates “a video of a product” from a polished product commercial. The skill teaches your agent to think in camera language, not just scene descriptions.

Scenario Patterns

The skill covers production-ready prompt patterns, not just theory:

Character consistency: anchor on a reference image to keep the same person across shots
Camera motion cloning: upload a reference video, extract its movement, apply it to your subject
Creative effects replication: clone transitions, visual effects, and ad styles from a reference
Video extension: extend an existing video forward or backward, continuing its style
Video editing: modify specific elements in an existing video — replace a character, change a hairstyle, add an object
Music beat sync: time visual cuts to an uploaded audio track’s rhythm
E-commerce product showcase: 360-degree rotation, component separation and reassembly, 3D rendering effects
Science education visualization: medical anatomy, molecular processes, instructional CGI
Short drama with dialogue: scripted scenes with voice lines, character blocking, and audio design

Each pattern is a template the agent fills in with your specifics.

Setup

Install in one command:

npx skills add dexhunter/seedance2-skill

Or manually copy the skill file:

mkdir -p ~/.claude/skills
cp SKILL.md ~/.claude/skills/seedance-prompt-en.md

Then describe what you want to make — a 15-second product ad, a short drama scene, a music video — and the agent will prompt Seedance 2.0 for you.

What It Doesn’t Do

This skill does not call Seedance 2.0’s API. It writes prompts. You still paste them into the tool (jimeng.jianying.com) and generate manually.

The skill also doesn’t cover everything Seedance 2.0 can do — it focuses on the most common and reliable patterns. Advanced edge cases may need prompt iteration.

Seedance 2.0 itself has constraints the skill respects: no realistic human faces in reference materials, maximum 12 files, 15-second video cap.

Skills Are More Than Code

The edulab skill from the last article showed how skills can extend agents into education. Seedance2-Skill shows a different pattern: skills as domain expertise.

This skill doesn’t run Python. It doesn’t render HTML. It teaches your agent a notation system — the @ reference syntax, the camera vocabulary, the scene templates — that it can then apply to produce professional-grade video prompts. The same agent that can write code can now also write a cinematography brief.

It’s the difference between an agent that knows “video generation exists” and one that can compose @image1 as first frame, Hitchcock zoom on the character, 0–3s establishing shot, 3–8s action sequence, BGM from @audio1, 2.35:1 widescreen, filmic grade and have it actually work.