Name this session and choose where it should live.
AssignmentSign in to Directorkit first
The session will be saved in Prompt Composer's app database and mirrored into Directorkit.
Load Directorkit Session
Switch workspaces above to see sessions saved elsewhere.
Saved sessions will appear here after sign in.
Recent assigned sessions
Latest sessions in the selected workspace.
1.0x
0 st
0 dB
Audio Source
hybrid · ctc/large/500k · WhisperX
Prosody
|
Speed
|Faster ↑ Slower ↓
Script
0 tokensAuto-tokenizing...
Style Tags
Click a slot to insert a non-spoken inline tag before that token
Tokenize text to place inline style tags
Timing
Gap durations between words (ms)
Import audio to see timing gaps
Annotations
Click a token above to add annotations
Tokenize text to see annotation slots
All lanes
Exporter
Exporter details
⚠️ Selected voice does not support pitch adjustment
Use the Style Tags lane for inline audio tags such as [whispers], [slow], [laughs], or [long pause]. Leave the override below blank to synthesize from the lane automatically. Gemini 3.1 Flash TTS handles these tags best, and tags should be separated with spoken text or punctuation instead of stacked back-to-back.
Veo Request
?Experimental exporter for turning the current script, control lanes, and optional imported audio into Veo-oriented prompt text plus optional generated reference boards. Auto-chain splits longer scripts into sequential Veo clips; later clips reuse the previous ending frame for visual continuity. Audio-only modes still render Veo clips first, then extract and post-process the audio through Kanade.
Reference Images
?Character images are local to this browser session and are sent before optional generated boards. They guide broad visual casting only, not identity matching or literal in-frame photos. Generated boards are off by default so character references stay primary; enable boards one at a time when you want abstract control-lane or audio-quality guidance.
Character refs: none.
?Generated boards are off by default, especially when character references are uploaded. Use the no-board baseline first, then enable control, audio, or mixed boards one run at a time to compare whether they improve or degrade Veo quality. Advanced Generated Board Settings
Preview/export sends up to three reference images from the current browser session. Character references are prioritized before optional generated boards.
?Optional diagnostic view. Generated board PNGs appear after preview/export refresh only when board generation is enabled. Use Inspect to open the full-size image or Download PNG to save an individual board. Inspect Generated Boards
Veo reference image thumbnails will appear here.
Voice Style Direction
?Use these fields to spell out target voice identity, delivery, timbre, mic feel, and continuity. Imported voice profiles can seed the description from audio analysis; edit them to lock a preferred description or clear them to fall back to generated analysis.
Prompt Section Overrides
?These fields auto-seed from the generated Veo prompt after preview refresh. Edit a field to override that section, or clear it to fall back to the exporter's latest generated direction.
Direction Layer For Auto SSML
Chirp 3 HD does not expose a separate hidden prompt field. These controls are merged into the auto-generated SSML so you can add more direction without losing the composer lanes.
Stage 1: Vertex Neural2
Stage 2: ElevenLabs Voice Changer
Stage 1: Vertex Neural2
Stage 2: Hosted Kanade VC
Kanade options loading...
No local reference selected. Live mode J tests will use the reference URL.
Qwen3 Service Mode
Qwen3 service status loading...
Voice Clone Reference
No local reference selected.
Sampling
Top-k, top-p, repetition penalty, and max token count apply to CustomVoice and VoiceDesign requests.
Inline Pinyin remains script-driven. Use glossary JSON when you need per-request term overrides.
Sampling
DramaBox Prompt
DramaBox service status loading...
DramaBox supports English prompt-driven expressive TTS, optional voice reference cloning, laughs/sighs/sounds inside quoted dialogue, and stage directions outside quotes. It does not support IPA, SSML, voice IDs, or emotion vectors.
Optional Voice Reference
Use a clean 10+ second clip when cloning timbre. Match the prompt's gender/age description to the reference.
No voice reference selected.
Inference
0.50
0.75
Starts a live Veo generation with the current prompt manifest and generated reference boards. The boards are created in this browser session from the active project and optional imported audio.
Veo Run Inspector
Run a Veo generation to inspect submitted segments, server status payloads, and stage artifacts.
Raw status JSON
Uses the configured self-hosted service for the active exporter. Any selected local reference upload overrides the matching reference URL for this browser session.
Model & Toolchain Settings
ZIPA Workbench
Tune the import strategy, checkpoints, alignment, and merge thresholds without crowding the composer lanes.
Profile
Toolchain Status
Live readiness for phonemizer, ZIPA, Kanade, configured self-hosted TTS services, and ASR used by local imports and synthesis previews.
Runtime details will appear after the first status check.
--PhonemizerChecking...
--ZIPAChecking...
--Kanade VCChecking...
--Qwen3 TTSChecking...
--IndexTTS2Checking...
--DramaBoxChecking...
--Whisper ASRChecking...
Import Diagnostics
Review the latest import settings, diagnostics, and benchmark guidance.
Research DiagnosticsNo import yet
Import diagnostics will appear here.
Generated Veo Board
Kanade Reference Audio
Pick one or more clips to use as the Kanade target voice reference. The files are kept only in this browser session, normalized to mono 24kHz WAV, then concatenated in the order shown below for live mode J tests.
No files selected.
No local reference clips selected yet.
Edit Token Annotation
Token:
------
Add or edit IPA above to replace individual phones.