◆ Case Study · Faceless Video Factory

200+ Videos Per Month, Fully Automated — Script to Published in Minutes

A custom Remotion microservice that turns LLM-generated scripts, AI voiceovers, and AI images into broadcast-quality videos across 5 production templates — serving YouTube long-form, TikTok Shorts, and Instagram Reels in 3 languages. Entire pipeline runs on a single ARM64 container for under a predictable monthly budget.
200+
Videos / Month
5
Templates
3
Languages
3
Platforms
On-Demand
Render API
Production deployment
Live for a trilingual content client · details under NDA
The Solution — Leon Microservice

I built a standalone Remotion-based video rendering microservice with a clean API surface: n8n (running on a separate Oracle Cloud VM) generates scripts with DeepSeek, audio with Google Cloud TTS, and images with GPT Image Mini — then POSTs a JSON payload to the microservice, which renders the final video and returns it via download endpoint. Five production templates cover 16:9 long-form, 9:16 shorts, and static thumbnails. The entire service runs on a single ARM64 Docker container on Oracle Cloud's free tier, with sub-30 second cold starts and no serverless cost spike.

How It Works
01
📝

n8n Generates Assets

DeepSeek writes the script, Google Cloud TTS synthesizes voiceover, GPT Image Mini generates on-brand images. All assets uploaded to CDN.

02
📥

POST /render

n8n sends a single JSON payload with composition name, props (title, text, images, audio URL), and branding config. Returns a job ID.

03
🎬

Remotion Orchestration

The microservice loads the matching template (ArticleYT, DreamShort, HoroscopeWeekly, etc.), syncs scene durations to the audio, and orchestrates Ken Burns zooms, subtitles, watermarks, and music fade layers.

04
💻

Render Pipeline

Remotion renders the composition frame-by-frame to an MP4 with h264 + AAC encoding. Progress is tracked in an in-memory job store for polling.

05

GET /download/:jobId

Once complete, the MP4 is exposed via a signed download endpoint. n8n pulls the file and uploads it to YouTube / TikTok / Instagram via their respective APIs.

06
🎯

Multi-Language Reuse

Images are generated once per video, then reused across the 3 language versions — only audio and text change. This cuts image generation costs by 66%.

The 5 Production Templates
Template Format Resolution Duration Use Case
ArticleYT 16:9 1920×1080 8–15 min YouTube long-form from blog articles
DreamShort 9:16 1080×1920 2–3 min TikTok / Reels / Shorts — dream interpretation
HoroscopeWeekly 9:16 1080×1920 ~5 min TikTok / Reels / Shorts — weekly horoscope per sign
DreamThumbnail 16:9 1280×720 1 frame Static thumbnail for DreamShort videos
HoroscopeThumbnail 16:9 1280×720 1 frame Static thumbnail for HoroscopeWeekly videos
Every template shares the same underlying architecture: a set of reusable components (KenBurnsImage, AudioLayer, GlowText, AnimatedSubtitle, Watermark, FloatingParticles) composed into scene sequences. Adding a new template takes hours, not days — I register it in Root.tsx, define its Zod schema in types/common.ts, and compose scenes from the existing library.
Tech Stack

Video Engine

FrameworkRemotion v4
LanguageTypeScript + React
API ServerNode.js + Express
ValidationZod schemas per template
Encodingh264 + AAC via ffmpeg

Assets Pipeline

ScriptsDeepSeek via n8n
VoiceoverGoogle Cloud TTS WaveNet
ImagesGPT Image Mini (1 / scene)
MusicInternet Archive CC
StorageDocker volume CDN

Infrastructure

ContainerDocker ARM64
HostOracle Cloud Free Tier
Orchestrationdocker-compose
Reverse Proxynginx
Cold Start< 30 seconds
Microservice Architecture
Client · runs on separate VM
GAEL — n8n Orchestrator
DeepSeek (script)
Google TTS (voice)
GPT Image (art)
CDN upload
POST /render  →  JSON payload
Leon · this project
Remotion Video Microservice
Node.js + Express + Remotion v4 · Docker ARM64 · Oracle Cloud
📝
Validate
Zod schema per template
📋
Queue
In-memory job store
🎬
Render
Remotion + ffmpeg
💾
Serve
/download/:jobId
GET /status/:jobId  ·  GET /download/:jobId
Downstream · back on n8n
MP4 → Multi-Platform Publish
YouTube
TikTok
Instagram
API Endpoints
Endpoint Method Purpose
/health GET Health check for orchestration and uptime monitoring
/compositions GET Lists all available templates with their input schemas
/render POST Async render — returns jobId, validates props against Zod schema
/render-sync POST Synchronous render — blocks until video is ready (for short pipelines)
/status/:jobId GET Job status and progress percentage for polling clients
/download/:jobId GET Streams the rendered MP4 file back to the caller
Inside a Render Job
Step Component What it does
1 Zod schema validation Rejects malformed payloads before burning render time
2 calculateMetadata() Measures audio length and sets composition duration dynamically — scene durations distribute proportionally
3 Asset resolution Downloads images, audio, and music from remote URLs or CDN volume
4 Remotion bundling Compiles the React composition tree into a video-ready bundle
5 Frame-by-frame render Headless Chrome renders every frame, passes to ffmpeg for h264+AAC encoding
6 Job store update Marks the job complete and exposes the MP4 at /download/:jobId
Reusable Component Library

Visual Components

KenBurnsImagecinematic zoom
GlowTextanimated headlines
AnimatedSubtitleword-by-word sync
FloatingParticlesambient motion
ProgressBarsection indicators
Watermarkconfigurable brand mark

Audio & Sync

AudioLayervoiceover + music mix
audioSync.tsduration distribution
Music fadeconfigurable volume
SSML pausesrespected by sync layer

Utilities

resolveAssetUrlremote + CDN resolution
Zod schemasone per template
Branding configper-request overrides
Job storein-memory queue
Adding a new template to the microservice takes hours, not days. Compose scenes from the existing components, define a Zod schema in types/common.ts, register the composition in Root.tsx, and it is available via the API. No client code changes needed.
Engineering Challenges Solved

🎬 Video Duration Synchronization

Videos must match voiceover audio length exactly. Different languages produce different audio durations for the same script.

Solution: Remotion's calculateMetadata measures MP3 duration and distributes scene timing proportionally. Each language version gets perfect sync.

🌐 Multilingual TTS Optimization

TTS engines don't handle pauses naturally. Rapid-fire speech without breathing room sounds robotic.

Solution: SSML <break> tags with strategic timing (400-900ms) optimized by AI "Script Expert" per language. Scripts are written natively, not translated.

🖼️ Image Asset Reuse Across Languages

Generating unique images per language triples cost. 5 images × 3 languages = 15 generations per topic.

Solution: Generate once, store in tagged PostgreSQL image library, reuse across all 3 language versions. 5 images serve 3 videos = 67% cost reduction.

🛠️ Docker ARM64 on Oracle Free Tier

Remotion needs headless Chrome + ffmpeg. Default images are x86 and huge. Oracle's free ARM VM demanded a lean ARM64 build.

Solution: Multi-stage Docker build with ARM64 Chromium and static ffmpeg. Final image runs inside 1 GB RAM alongside n8n and Postgres on the same VM.

📊 Template Schema Versioning

Every template accepts different props. A malformed payload mid-render wastes minutes and leaves broken MP4s in the job store.

Solution: Zod schema per composition, validated at POST time. Invalid payloads fail instantly with a precise error path. No render cycles burned on bad input.

🔃 Render Queue Without Redis

Job queues usually need Redis or a worker framework. Overkill for a single-VM microservice handling dozens of jobs/day.

Solution: In-memory job store with jobId tracking and status polling. Crashes are rare and the orchestrator retries. Simpler, cheaper, enough for the scale.
💡 This microservice pattern — HTTP API + Remotion + containerized render worker — drops into any content pipeline. n8n, Zapier, Make, or a custom backend can all drive it.
Results & Metrics
200+
Videos rendered
per month
5
Reusable
templates
3
Languages
(ES / EN / PT)
Total infra cost
(Oracle free tier)
Throughput & Reuse

Production Output

Short-form vertical (9:16)daily cadence
Long-form horizontal (16:9)weekly cadence
PlatformsYouTube, TikTok, Instagram
LanguagesES / EN / PT natively

Cost Profile

Pricing modelper-render or subscription
ComputeARM-friendly, horizontally scalable
Variable costLLM + TTS + image gen API usage
Fixed costmodest single-VM baseline
Deliverables
Technology Tags
Remotion v4 Node.js Express TypeScript Zod Docker ARM64 Oracle Cloud Headless Chrome ffmpeg DeepSeek Google Cloud TTS GPT Image n8n HTTP API Microservice Video Automation

Need an AI Content System That Runs Itself?

I build production-grade content automation systems — from AI video pipelines to multilingual email marketing. End-to-end delivery, zero middleware.

💬 Hire Me on Contra ← Back to portfolio
What I Build

Voice AI Agents

Inbound receptionistsAny language
Outbound sales agentsCold + warm
Appointment bookingReal-time CRM
Emergency routingLive transfer

Content Automation

AI content generationMultilingual
Blog & social publishingMulti-platform
Email drip campaignsFull funnel
Trend intelligenceAutomated

Video Production

AI script writingNative multilingual
AI image generationGPT Image
TTS voiceoverWaveNet voices
Remotion renderCustom templates

Infrastructure

N8N workflowsSelf-hosted
PostgreSQL + RedisProduction-grade
Monitoring & alerts24/7
DocumentationRunbooks + guides
Based in Honduras (UTC-6). Available for async collaboration across all timezones. Fluent in English, Spanish, and French.