Mongolian Transcription App

AI-powered video subtitle generator

An end-to-end transcription platform that converts Mongolian video and audio files into subtitle (.srt) format using Google Speech-to-Text v2. The app provides a smooth user experience: upload any media file through a modern web interface, track progress in real time, and download perfectly time-aligned subtitles ready for YouTube or DaVinci Resolve.
Under the hood, it runs a scalable microservice stack built entirely from scratch:
- FastAPI backend handles file presigning, job orchestration, and database state management.
- Worker service segments media into manageable chunks, extracts audio with FFmpeg, calls Google Speech-to-Text v2, and assembles SRT output with proper timestamps.
- MinIO (S3-compatible) storage for uploaded media and results.
- PostgreSQL + Redis for reliable persistence and queueing.
- Next.js frontend with upload progress, drag-and-drop support, error handling, and live video preview with subtitle track.

This system is optimized for Mongolian language recognition and designed with SaaS scalability in mind — easily extendable to other languages or speech models.
Project Background
While creating my own YouTube videos, I quickly realized how limited and expensive Mongolian transcription tools are.
Existing apps were either inaccurate or locked behind costly subscriptions, and there was no simple solution for generating subtitles in Mongolian. Out of curiosity, I asked ChatGPT if it would be possible to build my own transcription application — and to my surprise, it said yes.
That conversation sparked this entire project: I decided to build a complete, end-to-end system myself.
Key Features:
- Upload any video/audio file and get an .srt subtitle file automatically.
- Uses Google Speech-to-Text v2 (latest_long) model for accurate Mongolian transcription.
- Real-time progress tracking via job polling and toast notifications.
- Modular architecture with Docker Compose for local development.
- CORS-safe presigned uploads to S3 (MinIO) — secure and scalable.
- Easily deployable to Google Cloud Run for production.
Frontend
Next.js, TypeScript, TailwindCSS, React Hooks
Backend
FastAPI (Python), PostgreSQL, Redis, MinIO, FFmpeg
AI/Cloud
Google Cloud Speech-to-Text v2
Infrastructure
Docker, Docker Compose, Cloud Run (planned)
Tools
GitHub Actions (planned CI/CD), Stripe, Firebase, Firestore

Future Development Roadmap

Phase 1 — Cloud Migration

- Deploy backend and worker to Google Cloud Run
-
Replace MinIO with Google Cloud Storage
-
Migrate metadata from Postgres to Firestore
Phase 2 — SaaS Features

- Add Google Sign-In via Firebase Auth
- Integrate Stripe Checkout for subscription tiers (Starter / Pro)
- Add automatic usage tracking and limits
Phase 3 — Advanced Speech Features

- Enable Long-Running Recognize for unlimited audio length
- Support custom vocabulary hints to boost accuracy
- Generate multi-segment SRT with precise timestamps
- Optional “Polish mode” for enhanced text formatting and punctuation
Phase 4 — Monitoring & Polish

- Add Cloud Logging & Error Reporting dashboards
- Improve UI design and marketing landing page
- Add team/workspace support for collaborative projects

Want to get in touch?
Drop me a line!

I’d love to hear from you — whether it’s about a project, an opportunity, or just to connect.

NAME
EMAIL ADDRESS
FORM LABEL
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Anar Assistant
Ask about projects, CV, skills & more.
🇬🇧 English • 🇩🇪 Deutsch • 🇲🇳 Монгол