Available for opportunities

Ankush
Rathour

|

I architect production-grade software systems — from blazing-fast APIs and distributed cloud infra to multimodal AI agents. Turning ambitious ideas into deployed realities.

Ankush Rathour — Software Engineer
Shipping daily
5+
Years Experience
📦
3+
PyPI Packages
☁️
AWS·GCP·Azure
Cloud Platforms
🛠️
10+
Open Source Projects

01 — About

Engineering at the intersection of scale & intelligence

I'm Ankush Rathour, a Software Engineer specializing in architecting scalable full-stack solutions and seamless third-party integrations. I bridge the gap between complex backend logic and intuitive frontend experiences. My expertise lies in building AI-driven workflows, real-time communication systems, and enterprise CRM connectors

I specialize in Python ecosystems — crafting everything from blazing-fast FastAPI microservices to Django monoliths handling millions of requests. On the cloud side, I've architected solutions across AWS, GCP, and Azure. My open-source work includes the AudioMaker PyPI package, enabling programmatic audio generation at scale.

Currently obsessed with the convergence of voice AI and messaging platforms — building systems where LLMs don't just generate text, but orchestrate real-time telephony, voice synthesis, and multimodal reasoning pipelines.

Backend Engineering

Python · Django · FastAPI · REST APIs · GraphQL · Celery · Redis

Cloud & DevOps

AWS · GCP · Azure · Docker · Kubernetes · CI/CD · Ngnix

AI & Machine Learning

OpenAI · Gemini · LangChain · ElevenLabs · RAG · LLM Pipelines

Data & Databases

PostgreSQL · MongoDB · Redis · Elasticsearch · Pandas · NumPy

Experience

2024–Present

Senior Software Engineer

AI-focused product company

Building multimodal AI agents & production ML systems

2021–2024

Software Engineer

SaaS Platform

Scaled Django/FastAPI backend to 10M+ requests/day on AWS

2020–2021

Python Intern

Startup Ecosystem

Designed microservices architecture, cloud infra & data pipelines

02 — Tech Stack

Tools of the trade

A curated selection of technologies I use to build reliable, scalable systems.

⌨️

Languages

PythonTypeScriptJavaScriptSQLBashGo (basics)HTML/CSS
🧩

Frameworks & Libraries

DjangoFastAPIFlaskNext.jsCelerySQLAlchemyPydanticPandasNumPy
☁️

Cloud & DevOps

AWS (Lambda, EC2, S3, RDS, EKS)GCP (Cloud Run, BigQuery)AzureDockerKubernetesTerraformGitHub ActionsCI/CD
🤖

AI / ML

OpenAI APIGoogle GeminiLangChainElevenLabsTwilioHugging FaceRAG PipelinesLLM Fine-tuningLlamaIndex
🗄️

Databases & Storage

PostgreSQLMongoDBRedisElasticsearchMySQLChromaDBPinecone
🛠️

Tools & Practices

GitREST API DesignGraphQLMicroservicesSystem DesignTDDAgile/ScrumWebSockets

Core Proficiencies

Python / Django / FastAPI96%
Cloud Architecture (AWS/GCP/Azure)88%
AI / LLM Engineering85%
System Design & Microservices90%

03 — Projects

Things I've built

Open-source tools, AI experiments, and production systems.

Featured
🎙️

AudioMaker

Text-to-Audio Python Library on PyPI

A production-ready PyPI package that simplifies programmatic audio creation — supporting multiple TTS engines, batch processing, and audio manipulation pipelines. Built for developers who need reliable voice synthesis without the boilerplate.

3+ packages published
PythonPyPIElevenLabsgTTSAudio ProcessingOpen Source
🗺️

GoogleMapsScraper

Async Data Extraction Engine

High-performance Google Maps data extraction tool built with Python. Supports async scraping, proxy rotation, rate limiting, and exports structured business data (name, address, phone, ratings, reviews) to CSV/JSON/Excel.

Handles 10K+ entries/run
PythonAsyncPlaywrightData EngineeringSeleniumProxy Rotation
📄

ChatPDF

RAG-powered Document Chat Interface

AI-powered PDF conversation tool using Retrieval-Augmented Generation. Users upload any PDF, and the system chunked, embeds, and stores documents in a vector database — enabling context-aware Q&A over entire documents using OpenAI / Gemini LLMs.

Semantic search over any PDF
PythonFastAPIOpenAILangChainChromaDBRAGEmbeddings

04 — Technical Showcase

Unified Multimodal AI Agent

Engineering the bridge between conversational text AI and real-time voice telephony. A complete system that transitions a WhatsApp chat into a live AI voice call — using Twilio SIP Domains, ElevenLabs, and LLM orchestration.

System Architecture Flow

💬
01

WhatsApp Message Arrives

User sends a message via WhatsApp Business API. The webhook fires to our FastAPI ingestion service, parsing intent, language, and context in real-time.

WhatsApp Business APIFastAPIWebhook Handler
🧠
02

LLM Reasoning Layer

The message payload hits the LLM orchestration layer (OpenAI GPT-4o / Google Gemini 1.5). The model decides: generate a text reply, or trigger the voice pipeline? Intent classification happens here.

OpenAI GPT-4oGoogle Gemini 1.5LangChainIntent Routing
🎙️
03

Voice Synthesis via ElevenLabs

When the voice path is chosen, the LLM-generated text response is passed to ElevenLabs streaming API. A cloned or multilingual voice renders the response as a high-fidelity WAV/MP3 audio stream.

ElevenLabs Streaming APIVoice CloningText-to-SpeechSSML
📞
04

Twilio SIP Domain Routing

A Twilio SIP Domain is configured as the PSTN bridge. The audio stream is routed via TwiML — the call is initiated to the user's number with the ElevenLabs-generated voice. Real-time bidirectional audio over WebRTC/SIP.

Twilio SIP DomainsTwiMLWebRTCPSTN Bridge
📲
05

User Receives Voice Call

The agent delivers the AI-generated voice response as a real phone call. The user can interact back (speech-to-text via Whisper), creating a full conversational loop — from text message to live phone AI agent.

OpenAI Whisper STTReal-time ASRConversation MemoryContext Window
orchestrator.pyPython
1"color:var(--ink-faint);opacity:0.7"># Simplified orchestration flow
2async def handle_whatsapp_message(payload: WebhookPayload):
3 "color:var(--ink-faint);opacity:0.7"># 1. Parse intent with LLM
4 intent = await llm_router.classify(payload.body)
5
6 if intent.requires_voice:
7 "color:var(--ink-faint);opacity:0.7"># 2. Generate response text
8 response = await openai_client.chat.completions.create(
9 model="gpt-4o",
10 messages=build_conversation(payload)
11 )
12
13 "color:var(--ink-faint);opacity:0.7"># 3. Synthesize voice via ElevenLabs
14 audio_stream = await elevenlabs.generate(
15 text=response.choices[0].message.content,
16 voice="ankush-custom-voice",
17 stream=True
18 )
19
20 "color:var(--ink-faint);opacity:0.7"># 4. Initiate Twilio SIP call
21 call = twilio_client.calls.create(
22 twiml=build_twiml(audio_stream),
23 to=payload.from_number,
24 from_=settings.TWILIO_SIP_DOMAIN
25 )
26
27 return {"call_sid": call.sid, "status": "voice_delivered"}
28
29 return await send_whatsapp_reply(payload, intent.text_response)

Real-time streaming

ElevenLabs audio streamed directly into Twilio TwiML — sub-2s latency voice delivery

🔁

Bidirectional loop

Whisper STT captures user's spoken reply, routes back to LLM for context-aware follow-up

🌐

Multilingual ready

ElevenLabs multilingual v2 + Gemini enable native voice responses in 29+ languages

05 — Contact

Let's build something remarkable

Whether you're looking to hire a backend engineer, collaborate on AI projects, discuss open-source, or just want to geek out about LLMs and distributed systems — my inbox is always open.

Based in India · Open to remote, hybrid & relocation