Prompt Composer

Language:

Rate: 1.0x

Pitch: 0 st

Volume: 0 dB

Audio Source

hybrid · ctc/large/500k · WhisperX

Upload Audio

Prosody

Speed

| Faster ↑ Slower ↓

Script

0 tokens Auto-tokenizing...

Style Tags

Click a slot to insert a non-spoken inline tag before that token

Tokenize text to place inline style tags

Timing

Gap durations between words (ms)

Import audio to see timing gaps

Annotations

Click a token above to add annotations

Tokenize text to see annotation slots

All lanes

Exporter

Model:

Voice:

Audio Profile:

Scene:

Director's Notes:

Sample Context:

Use the Style Tags lane for inline audio tags such as [whispers], [slow], [laughs], or [long pause]. Leave the override below blank to synthesize from the lane automatically. Gemini 3.1 Flash TTS handles these tags best, and tags should be separated with spoken text or punctuation instead of stacked back-to-back.

Tagged Transcript Override:

Veo Request ?

Model:

Seed:

Render Strategy:

Target Total Duration (s):

Silent Environment:

Final Result:

Kanade Model:

Negative Prompt:

Reference Images ?

Character Reference Images:

Upload Character Images

Character refs: none.

? Advanced Generated Board Settings

Include control-lane board Include audio-qualities board Include mixed storyboard board

Script Overlay Mode:

Preview/export sends up to three reference images from the current browser session. Character references are prioritized before optional generated boards.

? Inspect Generated Boards

Veo reference image thumbnails will appear here.

Voice Style Direction ?

Voice Style Text:

Audio Sonic Qualities:

Imported Voice Profile:

Voice Continuity Lock:

Prompt Section Overrides ?

Subject:

Action:

Style:

Camera:

Composition:

Ambiance:

Region:

Voice:

Input Mode:

Audio Encoding:

Direction Layer For Auto SSML

Chirp 3 HD does not expose a separate hidden prompt field. These controls are merged into the auto-generated SSML so you can add more direction without losing the composer lanes.

Lead-In SSML / Spoken Context:

Tail SSML / Spoken Context:

Pitch Bias (%):

Rate Bias (%):

Volume Bias (dB):

Custom Pronunciations JSON:

Stage 1: Vertex Neural2

GCP Project ID:

Region:

Neural2 Voice:

Stage 2: ElevenLabs Voice Changer

Target Voice ID:

STS Model:

Output Format:

Input File Format:

Remove background noise Enable logging

Optimize Streaming Latency:

Seed:

Voice Settings JSON:

Stage 1: Vertex Neural2

GCP Project ID:

Region:

Neural2 Voice:

Stage 2: Hosted Kanade VC

Kanade options loading...

Reference Asset ID:

Reference Audio URL:

Local Reference Audio:

No local reference selected. Live mode J tests will use the reference URL.

Kanade Model:

Qwen3 Service Mode

Qwen3 service status loading...

Requested Mode:

Speaker:

Instruction / Voice Design Prompt:

Voice Clone Reference

Reference Audio URL:

Local Reference Audio:

Upload Reference

No local reference selected.

Reference Transcript:

Use x-vector only

Sampling

Top-k, top-p, repetition penalty, and max token count apply to CustomVoice and VoiceDesign requests.

Temperature:

Top-k:

Top-p:

Repetition Penalty:

Max New Tokens:

Reference Voice

IndexTTS service status loading...

Speaker Audio URL:

Local Speaker Audio:

Upload Speaker

IndexTTS requires a speaker reference.

Emotion

Emotion Mode:

Emotion Audio URL:

Upload Emotion

No separate emotion reference selected.

Emotion Weight:

Random emotion sampling

Emotion Prompt:

Emotion vector order: happy, angry, sad, afraid, disgusted, melancholic, surprised, calm.

Happy:

Angry:

Sad:

Afraid:

Disgusted:

Melancholic:

Surprised:

Calm:

Pronunciation

Inline Pinyin remains script-driven. Use glossary JSON when you need per-request term overrides.

Glossary JSON:

Sampling

Temperature:

Top-p:

Top-k:

Num Beams:

Repetition Penalty:

Length Penalty:

Max Mel Tokens:

Max Text Tokens / Segment:

Segment Silence (ms):

EchoTTS Endpoint

EchoTTS service status loading...

EchoTTS can target an OpenAI-compatible /v1/audio/speech service or the Voice DAW River endpoint at /warp_echo/warp_render. Leave blank to use ECHO_TTS_URL on the server.

Custom Service URL:

Speech Request

Voice / Prompt Name:

Model:

Response Format:

Seed:

Reference Audio URL:

River warp EchoTTS uses this WAV as the source/reference audio and sends it as audio_b64. OpenAI-compatible EchoTTS ignores this field.

Prompt Prefix:

Full Input Override:

extra_body JSON:

DramaBox Prompt

DramaBox service status loading...

DramaBox supports English prompt-driven expressive TTS, optional voice reference cloning, laughs/sighs/sounds inside quoted dialogue, and stage directions outside quotes. It does not support IPA, SSML, voice IDs, or emotion vectors.

Custom Service URL:

Speaker Description:

Extra Voice Direction:

Full Prompt Override:

Optional Voice Reference

Use a clean 10+ second clip when cloning timbre. Match the prompt's gender/age description to the reference.

Reference Audio URL:

Local Reference Audio:

Upload Reference

No voice reference selected.

Inference

CFG Scale:

STG Scale:

Duration Multiplier:

Exact Duration (s):

Seed:

Reference Seconds:

Rescale:

Apply Perth watermark

Exporter

Model & Toolchain Settings

Generated Veo Board

Kanade Reference Audio

Edit Token Annotation