KAIMEDIA · AI Video Intelligence

16:9 9:16 Fully Automated

Broadcast Video,
Reframed for Mobile

KAIMEDIA's AI engine understands scenes, detects people, reads subtitles, and intelligently reframes every shot — no manual editing required.

Live Demo

See the Transformation

The same broadcast clip — landscape for TV, vertical for mobile — automatically reframed by AI.

Original Broadcast
1280 × 720  ·  30 fps
16:9
AI Reframe
AI-Reframed Output
720 × 1280  ·  30 fps
9:16
Pipeline

How It Works

A fully automated six-stage AI pipeline converts every frame with editorial precision.

🎬

Scene Detection

Scene change detection analyses shot boundaries so only keyframes are processed.

👤

Person Detection

AI Object Detection runs first, providing pixel-accurate bounding boxes before VLM processing.

🧠

Scene Graph

VLM receives detection results as grounding hints and classifies roles (Anchor, Reporter) and extracts relationships.

📐

Layout Solver

An ontology-based scorer selects the optimal split-screen or fullscreen layout.

📝

Subtitle OCR

Bottom captions are detected frame-by-frame and re-rendered with crisp Korean/English text.

🎞️

Composition

Video encoder produces the final 9:16 output at broadcast quality with original audio.

Features

Intelligent Frame-by-Frame Decisions

Every scene gets a custom layout — no fixed crop, no black bars forced onto important content.

🎯

Subject-Aware Cropping

Person detection ensures anchors, reporters, and guests are never clipped. Safety margins prevent face or body cutoff.

🗺️

Ontology Scene Reasoning

A semantic ontology classifies every shot — SoloAnchor, ConversationScene, ThreePersonScene, MaterialScene — and picks the right layout family.

📺

Split-Screen Intelligence

Multi-person or anchor+background scenes get a two-panel layout: tight portrait crop on top, widescreen context below.

🔤

Subtitle Preservation

Korean and English lower-thirds are OCR'd and re-rendered in the native 9:16 space — no more clipped or shrunk captions.

⏱️

Temporal Consistency

Scene-level layout caching and temporal smoothing eliminate flickering or jumpy reframes within a single shot.

🎬

Broadcast-Quality Output

High-quality video encoder with full audio preservation — indistinguishable from native vertical production.

9:16
Output aspect ratio
(TikTok, Reels, Shorts)
6
AI pipeline stages
HQ
Video encoder quality
(near-lossless)
VLM
On-device vision AI
(no cloud required)
100%
Automated — zero
manual edits required
Technology

Built on State-of-the-Art AI

Every component is best-in-class, running fully on-premises with no cloud dependency.

Vision Language Model (VLM) AI Object Detection Scene Change Detection RDF Ontology Reasoning Video Encoder OpenCV Deep Learning Framework Supersampled Text Rendering High-Quality Audio Python GPU / CPU Inference Multilingual Font Support
🔒

On-Premises Only

All AI inference runs locally. No video data ever leaves your infrastructure.

GPU or CPU

VLM inference runs on GPU for maximum speed; graceful CPU fallback for edge deployments.

🌐

Korean & English

Subtitle OCR handles mixed Korean/English lower-thirds with character-level accuracy.

🔧

Configurable Pipeline

Every parameter — scene threshold, layout ratios, subtitle safe area — is tunable via YAML config.