AetherClips — Automated Short‑Video Generation

Objectives

End‑to‑end automation from a natural‑language brief (e.g., “make 3 cosmic videos”) to final vertical clips.
Combine LLM planning, image generation, optional TTS, and video assembly.
Scale to batches and integrate with platforms (TikTok API, Discord bot).

API: FastAPI server exposes job endpoints; async pipeline (threads + asyncio) to parallelize work.
Director/Agents pattern: ProductionDirector plans clips; specialized agents handle storyboard, generation (image/audio), and editing.
Persistence: SQLAlchemy (SQLite/Postgres) with Alembic migrations for projects/scenes/media.
Integrations: TikTok API for publish/trends; Discord bot for chat control; YouTube/Google APIs optional.
Tooling: ffmpeg for encoding; OpenCV/numpy for assembly and post‑processing.

LLMs via LangChain (OpenAI GPT‑3.5/4 or local models with Ollama) for briefs, scripts, and JSON outputs (Pydantic + structured prompting/Instructor).
Vision: Stable Diffusion (Diffusers) to generate frames/visuals; composable ComfyUI JSON workflows.
Audio: Whisper for ASR; optional TTS; librosa/torchaudio for processing.

JobManager tracks states (pending/running/done/error) and errors per stage.
Failover strategies for LLMs/servers; parallel image generation across local LLM servers or threads.
Tuned ffmpeg parameters (bitrate/CRF) to meet size/quality constraints (e.g., <20MB for Discord).

Functional prototype pipeline producing themed short clips from a single prompt.
Demonstrated orchestration across text, vision, audio, and video with robust error handling.

Async backend design, agent‑style orchestration, data modeling + migrations, multi‑modal AI integration, and media processing performance tuning.