Ankush
Rathour
|
I architect production-grade software systems — from blazing-fast APIs and distributed cloud infra to multimodal AI agents. Turning ambitious ideas into deployed realities.

01 — About
Engineering at the intersection of scale & intelligence
I'm Ankush Rathour, a Software Engineer specializing in architecting scalable full-stack solutions and seamless third-party integrations. I bridge the gap between complex backend logic and intuitive frontend experiences. My expertise lies in building AI-driven workflows, real-time communication systems, and enterprise CRM connectors
I specialize in Python ecosystems — crafting everything from blazing-fast FastAPI microservices to Django monoliths handling millions of requests. On the cloud side, I've architected solutions across AWS, GCP, and Azure. My open-source work includes the AudioMaker PyPI package, enabling programmatic audio generation at scale.
Currently obsessed with the convergence of voice AI and messaging platforms — building systems where LLMs don't just generate text, but orchestrate real-time telephony, voice synthesis, and multimodal reasoning pipelines.
Backend Engineering
Python · Django · FastAPI · REST APIs · GraphQL · Celery · Redis
Cloud & DevOps
AWS · GCP · Azure · Docker · Kubernetes · CI/CD · Ngnix
AI & Machine Learning
OpenAI · Gemini · LangChain · ElevenLabs · RAG · LLM Pipelines
Data & Databases
PostgreSQL · MongoDB · Redis · Elasticsearch · Pandas · NumPy
Experience
2024–Present
Senior Software Engineer
AI-focused product company
Building multimodal AI agents & production ML systems
2021–2024
Software Engineer
SaaS Platform
Scaled Django/FastAPI backend to 10M+ requests/day on AWS
2020–2021
Python Intern
Startup Ecosystem
Designed microservices architecture, cloud infra & data pipelines
02 — Tech Stack
Tools of the trade
A curated selection of technologies I use to build reliable, scalable systems.
Languages
Frameworks & Libraries
Cloud & DevOps
AI / ML
Databases & Storage
Tools & Practices
Core Proficiencies
03 — Projects
Things I've built
Open-source tools, AI experiments, and production systems.
AudioMaker
Text-to-Audio Python Library on PyPI
A production-ready PyPI package that simplifies programmatic audio creation — supporting multiple TTS engines, batch processing, and audio manipulation pipelines. Built for developers who need reliable voice synthesis without the boilerplate.
GoogleMapsScraper
Async Data Extraction Engine
High-performance Google Maps data extraction tool built with Python. Supports async scraping, proxy rotation, rate limiting, and exports structured business data (name, address, phone, ratings, reviews) to CSV/JSON/Excel.
ChatPDF
RAG-powered Document Chat Interface
AI-powered PDF conversation tool using Retrieval-Augmented Generation. Users upload any PDF, and the system chunked, embeds, and stores documents in a vector database — enabling context-aware Q&A over entire documents using OpenAI / Gemini LLMs.
04 — Technical Showcase
Unified Multimodal AI Agent
Engineering the bridge between conversational text AI and real-time voice telephony. A complete system that transitions a WhatsApp chat into a live AI voice call — using Twilio SIP Domains, ElevenLabs, and LLM orchestration.
System Architecture Flow
WhatsApp Message Arrives
User sends a message via WhatsApp Business API. The webhook fires to our FastAPI ingestion service, parsing intent, language, and context in real-time.
LLM Reasoning Layer
The message payload hits the LLM orchestration layer (OpenAI GPT-4o / Google Gemini 1.5). The model decides: generate a text reply, or trigger the voice pipeline? Intent classification happens here.
Voice Synthesis via ElevenLabs
When the voice path is chosen, the LLM-generated text response is passed to ElevenLabs streaming API. A cloned or multilingual voice renders the response as a high-fidelity WAV/MP3 audio stream.
Twilio SIP Domain Routing
A Twilio SIP Domain is configured as the PSTN bridge. The audio stream is routed via TwiML — the call is initiated to the user's number with the ElevenLabs-generated voice. Real-time bidirectional audio over WebRTC/SIP.
User Receives Voice Call
The agent delivers the AI-generated voice response as a real phone call. The user can interact back (speech-to-text via Whisper), creating a full conversational loop — from text message to live phone AI agent.
1"color:var(--ink-faint);opacity:0.7"># Simplified orchestration flow2async def handle_whatsapp_message(payload: WebhookPayload):3 "color:var(--ink-faint);opacity:0.7"># 1. Parse intent with LLM4 intent = await llm_router.classify(payload.body)5 6 if intent.requires_voice:7 "color:var(--ink-faint);opacity:0.7"># 2. Generate response text8 response = await openai_client.chat.completions.create(9 model="gpt-4o",10 messages=build_conversation(payload)11 )12 13 "color:var(--ink-faint);opacity:0.7"># 3. Synthesize voice via ElevenLabs14 audio_stream = await elevenlabs.generate(15 text=response.choices[0].message.content,16 voice="ankush-custom-voice",17 stream=True18 )19 20 "color:var(--ink-faint);opacity:0.7"># 4. Initiate Twilio SIP call21 call = twilio_client.calls.create(22 twiml=build_twiml(audio_stream),23 to=payload.from_number,24 from_=settings.TWILIO_SIP_DOMAIN25 )26 27 return {"call_sid": call.sid, "status": "voice_delivered"}28 29 return await send_whatsapp_reply(payload, intent.text_response)Real-time streaming
ElevenLabs audio streamed directly into Twilio TwiML — sub-2s latency voice delivery
Bidirectional loop
Whisper STT captures user's spoken reply, routes back to LLM for context-aware follow-up
Multilingual ready
ElevenLabs multilingual v2 + Gemini enable native voice responses in 29+ languages
05 — Contact
Let's build something remarkable
Whether you're looking to hire a backend engineer, collaborate on AI projects, discuss open-source, or just want to geek out about LLMs and distributed systems — my inbox is always open.