AI Analysis

StoryFab uses multiple AI models to analyze your video: Whisper for transcription, SmartSegmenter for scene understanding, and in Commentary Mode, LLM for semantic understanding.

Transcription (Whisper)

StoryFab runs OpenAI Whisper entirely locally. No audio is ever sent to the cloud.

Supported Models

Model	Size	Speed	Accuracy
`tiny`	~75 MB	10x realtime	Baseline
`base`	~140 MB	7x realtime	Good
`small`	~470 MB	4x realtime	Very Good
`medium`	~1.5 GB	2x realtime	Excellent

The default model is base. You can change it in Settings → AI → Whisper Model.

How Transcription Works

Audio is extracted from the video file via FFmpeg
Audio is split into 30-second chunks
Each chunk is fed to Whisper for transcription with timestamps
Results are merged into a continuous subtitle track (SRT format)

Smart Segmentation（智能分段）

After transcription, StoryFab analyzes the audio energy, visual scene changes, and speech activity to segment the video into meaningful chunks.

Scoring Factors

Audio Energy — Loud, dynamic audio segments score higher (e.g., exclamation, applause, music peaks)
Scene Change — Sharp visual transitions often indicate topic changes or key moments
Speech Activity — Segments with clear speech (not silence) are preferred
Pause Detection — Natural breakpoints in speech are used as clip boundaries

Speed Derivation

Each segment gets a suggested playback speed based on its energy profile:

Energy Ratio（vs 平均）	Suggested Speed	Description
> 1.1x	1x	High energy — keep original pacing
0.85–1.1x	2x	Normal energy — mild acceleration
0.5–0.85x	4x	Low energy — skip dead time
< 0.5x	6x	Near silence — maximum compression

Tuning Detection Parameters

In Settings → AI → Highlight Detection, you can tune:

Parameter	Range	Default	Effect
Min clip duration	5–60s	15s	Longer = fewer, more substantial clips
Max clips	3–20	10	Upper limit on clips per video
Sensitivity	Low / Medium / High	Medium	Higher = more clips detected

Semantic Segmentation（语义分段）🆕

Available in Commentary Mode only

In Commentary Mode, after Smart Segmentation, StoryFab uses an LLM to add semantic understanding to each segment.

What It Does

Plot Understanding — LLM reads each segment's audio/transcript and summarizes what happens
Character Tracking — Identifies which characters appear in each segment
Emotional Tone — Classifies the scene's emotional tone (happy/sad/tense/comedic/surprising)
Commentary Potential — Scores each segment on how compelling it would be as narration content

Semantic Segment Output

typescript

interface SemanticSegment {
  start_ms: number;
  end_ms: number;
  segment_type: string;        // "dialogue" | "action" | "transition" | "silence" | "content"
  plot_summary: string;        // 一两句话总结剧情
  characters: string[];        // 出现的人物
  emotional_tone: string;      // "happy" | "sad" | "tense" | "calm" | "comedic"
  commentary_tone: string;     // 建议的解说语气
  highlight_potential: number; // 0.0-1.0，作为解说的潜力
}

Example

Segment	Plot Summary	Characters	Emotional Tone	Commentary Tone
0:15–0:32	男主角刚进门就被女主撞到	王霸天, 林小雨	tense	震惊版
0:32–1:05	两人互相道歉，发现是误会	王霸天, 林小雨	comedic	幽默版
1:05–1:20	女主注意到男主的服装	王霸天, 林小雨	calm	接地气版

Subtitle Generation

Transcription automatically produces SRT subtitle files. Subtitles are:

Word-level accurate
Time-synced to the audio
Stored locally in the project folder

You can also import existing subtitle files (.srt, .ass, .vtt).

AI Analysis ​

Transcription (Whisper) ​

Supported Models ​

How Transcription Works ​

Smart Segmentation（智能分段） ​

Scoring Factors ​

Speed Derivation ​

Tuning Detection Parameters ​

Semantic Segmentation（语义分段）🆕 ​

What It Does ​

Semantic Segment Output ​

Example ​

Subtitle Generation ​