Deploy & Roadmap Plan
Current system status + what is coming next (reflects real work done)
β Done
- Console: landing page, dashboard, presenter management, gallery with tabs/search/player, settings, i18n EN/δΈζ/ΰΉΰΈΰΈ’, dark mode + adjustable font/size/density
- Per-presenter: background removal (rembg), image/video backdrop, camera zoom, 9-point frame position + presenter size β live preview before save
- Backend: Postgres + PostgREST (Supabase-lite) + nginx gateway on a single host
- Pipeline split into clean layers: api / services / clients / gpu β ALL GPU work behind one boundary, swap the GPU server with env only (GPU_BACKEND + GPU_BASE_URL)
- TTS: MMS-TTS Thai/English code-switch (offline, China-safe) | Azure Neural | edge-tts | GPT-SoVITS (pitch/emotion) wired into the chain
- Voice clone: OpenVoice v2 β matches the presenter voice from the source clip
- Lip-sync: Wav2Lip on RTX 4090 β mouth + head motion from a reference video (HD 1080p)
- Render: async + progress bar, keeps every version, adjustable speed/length (up to 30 min)
- 24/7 live: Qwen auto-generates a continuous script + 30s look-ahead pre-render β gapless RTMP (auto filler)
- Live Console: real-time FB comments + AI auto-reply + the presenter can SPEAK replies on stream
- Stream: ffmpeg β Facebook Live / TikTok / Shopee (RTMP)
- Status panel: monitors every service + GPU/RAM/disk bars
β’ In progress / next
- Deploy GPT-SoVITS api.py on the GPU box, then enable in Settings (system side is ready β see services/sovits/README.md)
- Point a real domain + run deploy/scripts/enable-https.sh + add Google OAuth credentials β enable Gmail login (script/login page ready)
- Set fb_page_token + fb_live_video_id in Settings for real live comments
- Apply migrations 0005-0006 to the production DB + redeploy on UCloud
- Later: full per-user accounts replacing demo mode, durable queue (Redis) replacing in-memory state
β Architecture
- web (Next.js 16) :3000 β console + API gateway (single forwarder lib/pipeline-client)
- pipeline (FastAPI) :8000 β api β services β {clients, gpu, media} | /render /generate /live /gpu
- gpu boundary β lipsync, MMS-TTS, OpenVoice, GFPGAN, rembg, SoVITS β local/remote per capability
- lipsync :8001 β Wav2Lip/MuseTalk GPU | sovits :9880 β GPT-SoVITS (opt-in)
- db (Postgres) :5432 + postgrest + gateway :8088 | qwen (ollama) :11434
- nginx β reverse proxy + nip.io (status/console/api) + opt-in TLS via script
Everything runs on one UCloud GPU host β and the GPU part alone can move to another box by changing env only (GPU_BACKEND=remote)