Objectives
- End‑to‑end automation from a natural‑language brief (e.g., “make 3 cosmic videos”) to final vertical clips.
- Combine LLM planning, image generation, optional TTS, and video assembly.
- Scale to batches and integrate with platforms (TikTok API, Discord bot).
Architecture
- API: FastAPI server exposes job endpoints; async pipeline (threads + asyncio) to parallelize work.
- Director/Agents pattern:
ProductionDirector
plans clips; specialized agents handle storyboard, generation (image/audio), and editing. - Persistence: SQLAlchemy (SQLite/Postgres) with Alembic migrations for projects/scenes/media.
- Integrations: TikTok API for publish/trends; Discord bot for chat control; YouTube/Google APIs optional.
- Tooling: ffmpeg for encoding; OpenCV/numpy for assembly and post‑processing.
AI Stack
- LLMs via LangChain (OpenAI GPT‑3.5/4 or local models with Ollama) for briefs, scripts, and JSON outputs (Pydantic + structured prompting/Instructor).
- Vision: Stable Diffusion (Diffusers) to generate frames/visuals; composable ComfyUI JSON workflows.
- Audio: Whisper for ASR; optional TTS; librosa/torchaudio for processing.
Reliability & Scale
- JobManager tracks states (pending/running/done/error) and errors per stage.
- Failover strategies for LLMs/servers; parallel image generation across local LLM servers or threads.
- Tuned ffmpeg parameters (bitrate/CRF) to meet size/quality constraints (e.g., <20MB for Discord).
Results
- Functional prototype pipeline producing themed short clips from a single prompt.
- Demonstrated orchestration across text, vision, audio, and video with robust error handling.
Demo
Download the demo video (MP4): preview_workflow_aetherclips_movie.mp4
Skills Demonstrated
Async backend design, agent‑style orchestration, data modeling + migrations, multi‑modal AI integration, and media processing performance tuning.