Projects

AetherClips — Automated Short‑Video Generation

Python
FastAPI
AI
Video

Python orchestration of multi‑modal AI to auto‑produce 10s vertical clips (ideas → visuals → audio → edit → mp4), with async job control and optional TikTok/Discord integrations.

Isometric AI pipeline from film strip to clapperboard

Objectives

  • End‑to‑end automation from a natural‑language brief (e.g., “make 3 cosmic videos”) to final vertical clips.
  • Combine LLM planning, image generation, optional TTS, and video assembly.
  • Scale to batches and integrate with platforms (TikTok API, Discord bot).

Architecture

  • API: FastAPI server exposes job endpoints; async pipeline (threads + asyncio) to parallelize work.
  • Director/Agents pattern: ProductionDirector plans clips; specialized agents handle storyboard, generation (image/audio), and editing.
  • Persistence: SQLAlchemy (SQLite/Postgres) with Alembic migrations for projects/scenes/media.
  • Integrations: TikTok API for publish/trends; Discord bot for chat control; YouTube/Google APIs optional.
  • Tooling: ffmpeg for encoding; OpenCV/numpy for assembly and post‑processing.

AI Stack

  • LLMs via LangChain (OpenAI GPT‑3.5/4 or local models with Ollama) for briefs, scripts, and JSON outputs (Pydantic + structured prompting/Instructor).
  • Vision: Stable Diffusion (Diffusers) to generate frames/visuals; composable ComfyUI JSON workflows.
  • Audio: Whisper for ASR; optional TTS; librosa/torchaudio for processing.

Reliability & Scale

  • JobManager tracks states (pending/running/done/error) and errors per stage.
  • Failover strategies for LLMs/servers; parallel image generation across local LLM servers or threads.
  • Tuned ffmpeg parameters (bitrate/CRF) to meet size/quality constraints (e.g., <20MB for Discord).

Results

  • Functional prototype pipeline producing themed short clips from a single prompt.
  • Demonstrated orchestration across text, vision, audio, and video with robust error handling.

Demo

Download the demo video (MP4): preview_workflow_aetherclips_movie.mp4

Skills Demonstrated

Async backend design, agent‑style orchestration, data modeling + migrations, multi‑modal AI integration, and media processing performance tuning.