KAIMEDIA · AI Video Intelligence
KAIMEDIA's AI engine understands scenes, detects people, reads subtitles, and intelligently reframes every shot — no manual editing required.
The same broadcast clip — landscape for TV, vertical for mobile — automatically reframed by AI.
A fully automated six-stage AI pipeline converts every frame with editorial precision.
Scene change detection analyses shot boundaries so only keyframes are processed.
AI Object Detection runs first, providing pixel-accurate bounding boxes before VLM processing.
VLM receives detection results as grounding hints and classifies roles (Anchor, Reporter) and extracts relationships.
An ontology-based scorer selects the optimal split-screen or fullscreen layout.
Bottom captions are detected frame-by-frame and re-rendered with crisp Korean/English text.
Video encoder produces the final 9:16 output at broadcast quality with original audio.
Every scene gets a custom layout — no fixed crop, no black bars forced onto important content.
Person detection ensures anchors, reporters, and guests are never clipped. Safety margins prevent face or body cutoff.
A semantic ontology classifies every shot — SoloAnchor, ConversationScene, ThreePersonScene, MaterialScene — and picks the right layout family.
Multi-person or anchor+background scenes get a two-panel layout: tight portrait crop on top, widescreen context below.
Korean and English lower-thirds are OCR'd and re-rendered in the native 9:16 space — no more clipped or shrunk captions.
Scene-level layout caching and temporal smoothing eliminate flickering or jumpy reframes within a single shot.
High-quality video encoder with full audio preservation — indistinguishable from native vertical production.
Video and frame comparisons from a real broadcast clip reframed by AI. Left: original 16:9. Right: AI-reframed 9:16.
Every component is best-in-class, running fully on-premises with no cloud dependency.
All AI inference runs locally. No video data ever leaves your infrastructure.
VLM inference runs on GPU for maximum speed; graceful CPU fallback for edge deployments.
Subtitle OCR handles mixed Korean/English lower-thirds with character-level accuracy.
Every parameter — scene threshold, layout ratios, subtitle safe area — is tunable via YAML config.