I built a standalone Remotion-based video rendering microservice with a clean API surface: n8n (running on a separate Oracle Cloud VM) generates scripts with DeepSeek, audio with Google Cloud TTS, and images with GPT Image Mini — then POSTs a JSON payload to the microservice, which renders the final video and returns it via download endpoint. Five production templates cover 16:9 long-form, 9:16 shorts, and static thumbnails. The entire service runs on a single ARM64 Docker container on Oracle Cloud's free tier, with sub-30 second cold starts and no serverless cost spike.
DeepSeek writes the script, Google Cloud TTS synthesizes voiceover, GPT Image Mini generates on-brand images. All assets uploaded to CDN.
n8n sends a single JSON payload with composition name, props (title, text, images, audio URL), and branding config. Returns a job ID.
The microservice loads the matching template (ArticleYT, DreamShort, HoroscopeWeekly, etc.), syncs scene durations to the audio, and orchestrates Ken Burns zooms, subtitles, watermarks, and music fade layers.
Remotion renders the composition frame-by-frame to an MP4 with h264 + AAC encoding. Progress is tracked in an in-memory job store for polling.
Once complete, the MP4 is exposed via a signed download endpoint. n8n pulls the file and uploads it to YouTube / TikTok / Instagram via their respective APIs.
Images are generated once per video, then reused across the 3 language versions — only audio and text change. This cuts image generation costs by 66%.
| Template | Format | Resolution | Duration | Use Case |
|---|---|---|---|---|
| ArticleYT | 16:9 | 1920×1080 | 8–15 min | YouTube long-form from blog articles |
| DreamShort | 9:16 | 1080×1920 | 2–3 min | TikTok / Reels / Shorts — dream interpretation |
| HoroscopeWeekly | 9:16 | 1080×1920 | ~5 min | TikTok / Reels / Shorts — weekly horoscope per sign |
| DreamThumbnail | 16:9 | 1280×720 | 1 frame | Static thumbnail for DreamShort videos |
| HoroscopeThumbnail | 16:9 | 1280×720 | 1 frame | Static thumbnail for HoroscopeWeekly videos |
KenBurnsImage, AudioLayer, GlowText, AnimatedSubtitle, Watermark, FloatingParticles) composed into scene sequences. Adding a new template takes hours, not days — I register it in Root.tsx, define its Zod schema in types/common.ts, and compose scenes from the existing library.
| Endpoint | Method | Purpose |
|---|---|---|
/health |
GET | Health check for orchestration and uptime monitoring |
/compositions |
GET | Lists all available templates with their input schemas |
/render |
POST | Async render — returns jobId, validates props against Zod schema |
/render-sync |
POST | Synchronous render — blocks until video is ready (for short pipelines) |
/status/:jobId |
GET | Job status and progress percentage for polling clients |
/download/:jobId |
GET | Streams the rendered MP4 file back to the caller |
| Step | Component | What it does |
|---|---|---|
| 1 | Zod schema validation | Rejects malformed payloads before burning render time |
| 2 | calculateMetadata() |
Measures audio length and sets composition duration dynamically — scene durations distribute proportionally |
| 3 | Asset resolution | Downloads images, audio, and music from remote URLs or CDN volume |
| 4 | Remotion bundling | Compiles the React composition tree into a video-ready bundle |
| 5 | Frame-by-frame render | Headless Chrome renders every frame, passes to ffmpeg for h264+AAC encoding |
| 6 | Job store update | Marks the job complete and exposes the MP4 at /download/:jobId |
types/common.ts, register the composition in Root.tsx, and it is available via the API. No client code changes needed.
Videos must match voiceover audio length exactly. Different languages produce different audio durations for the same script.
TTS engines don't handle pauses naturally. Rapid-fire speech without breathing room sounds robotic.
Generating unique images per language triples cost. 5 images × 3 languages = 15 generations per topic.
Remotion needs headless Chrome + ffmpeg. Default images are x86 and huge. Oracle's free ARM VM demanded a lean ARM64 build.
Every template accepts different props. A malformed payload mid-render wastes minutes and leaves broken MP4s in the job store.
Job queues usually need Redis or a worker framework. Overkill for a single-VM microservice handling dozens of jobs/day.
I build production-grade content automation systems — from AI video pipelines to multilingual email marketing. End-to-end delivery, zero middleware.
💬 Hire Me on Contra ← Back to portfolio