Available for opportunities

Ankush
Rathour

I architect production-grade software systems — from blazing-fast APIs and distributed cloud infra to multimodal AI agents. Turning ambitious ideas into deployed realities.

View Projects Get in Touch GitHub ↗

● Shipping daily

⚡

Years Experience

📦

PyPI Packages

☁️

AWS·GCP·Azure

Cloud Platforms

🛠️

10+

Open Source Projects

scroll

01 — About

Engineering at the intersection of scale & intelligence

I'm Ankush Rathour, a Software Engineer specializing in architecting scalable full-stack solutions and seamless third-party integrations. I bridge the gap between complex backend logic and intuitive frontend experiences. My expertise lies in building AI-driven workflows, real-time communication systems, and enterprise CRM connectors

I specialize in Python ecosystems — crafting everything from blazing-fast FastAPI microservices to Django monoliths handling millions of requests. On the cloud side, I've architected solutions across AWS, GCP, and Azure. My open-source work includes the AudioMaker PyPI package, enabling programmatic audio generation at scale.

Currently obsessed with the convergence of voice AI and messaging platforms — building systems where LLMs don't just generate text, but orchestrate real-time telephony, voice synthesis, and multimodal reasoning pipelines.

Backend Engineering

Python · Django · FastAPI · REST APIs · GraphQL · Celery · Redis

Cloud & DevOps

AWS · GCP · Azure · Docker · Kubernetes · CI/CD · Ngnix

AI & Machine Learning

OpenAI · Gemini · LangChain · ElevenLabs · RAG · LLM Pipelines

Data & Databases

PostgreSQL · MongoDB · Redis · Elasticsearch · Pandas · NumPy

Experience

2024–Present

Senior Software Engineer

AI-focused product company

Building multimodal AI agents & production ML systems

2021–2024

Software Engineer

SaaS Platform

Scaled Django/FastAPI backend to 10M+ requests/day on AWS

2020–2021

Python Intern

Startup Ecosystem

Designed microservices architecture, cloud infra & data pipelines

02 — Tech Stack

Tools of the trade

A curated selection of technologies I use to build reliable, scalable systems.

⌨️

Languages

PythonTypeScriptJavaScriptSQLBashGo (basics)HTML/CSS

🧩

Frameworks & Libraries

DjangoFastAPIFlaskNext.jsCelerySQLAlchemyPydanticPandasNumPy

☁️

Cloud & DevOps

AWS (Lambda, EC2, S3, RDS, EKS)GCP (Cloud Run, BigQuery)AzureDockerKubernetesTerraformGitHub ActionsCI/CD

🤖

AI / ML

OpenAI APIGoogle GeminiLangChainElevenLabsTwilioHugging FaceRAG PipelinesLLM Fine-tuningLlamaIndex

🗄️

Databases & Storage

PostgreSQLMongoDBRedisElasticsearchMySQLChromaDBPinecone

🛠️

Tools & Practices

GitREST API DesignGraphQLMicroservicesSystem DesignTDDAgile/ScrumWebSockets

Core Proficiencies

Python / Django / FastAPI96%

Cloud Architecture (AWS/GCP/Azure)88%

AI / LLM Engineering85%

System Design & Microservices90%

03 — Projects

Things I've built

Open-source tools, AI experiments, and production systems.

Featured

🎙️

AudioMaker

Text-to-Audio Python Library on PyPI

A production-ready PyPI package that simplifies programmatic audio creation — supporting multiple TTS engines, batch processing, and audio manipulation pipelines. Built for developers who need reliable voice synthesis without the boilerplate.

✓ 3+ packages published

PythonPyPIElevenLabsgTTSAudio ProcessingOpen Source

GitHub PyPI

🗺️

GoogleMapsScraper

Async Data Extraction Engine

High-performance Google Maps data extraction tool built with Python. Supports async scraping, proxy rotation, rate limiting, and exports structured business data (name, address, phone, ratings, reviews) to CSV/JSON/Excel.

✓ Handles 10K+ entries/run

PythonAsyncPlaywrightData EngineeringSeleniumProxy Rotation

GitHub

📄

ChatPDF

RAG-powered Document Chat Interface

AI-powered PDF conversation tool using Retrieval-Augmented Generation. Users upload any PDF, and the system chunked, embeds, and stores documents in a vector database — enabling context-aware Q&A over entire documents using OpenAI / Gemini LLMs.

✓ Semantic search over any PDF

PythonFastAPIOpenAILangChainChromaDBRAGEmbeddings

GitHub

View all on GitHub →

04 — Technical Showcase

Unified Multimodal AI Agent

Engineering the bridge between conversational text AI and real-time voice telephony. A complete system that transitions a WhatsApp chat into a live AI voice call — using Twilio SIP Domains, ElevenLabs, and LLM orchestration.

System Architecture Flow

💬

WhatsApp Message Arrives

User sends a message via WhatsApp Business API. The webhook fires to our FastAPI ingestion service, parsing intent, language, and context in real-time.

WhatsApp Business APIFastAPIWebhook Handler

↓

🧠

LLM Reasoning Layer

The message payload hits the LLM orchestration layer (OpenAI GPT-4o / Google Gemini 1.5). The model decides: generate a text reply, or trigger the voice pipeline? Intent classification happens here.

OpenAI GPT-4oGoogle Gemini 1.5LangChainIntent Routing

↓

🎙️

Voice Synthesis via ElevenLabs

When the voice path is chosen, the LLM-generated text response is passed to ElevenLabs streaming API. A cloned or multilingual voice renders the response as a high-fidelity WAV/MP3 audio stream.

ElevenLabs Streaming APIVoice CloningText-to-SpeechSSML

↓

📞

Twilio SIP Domain Routing

A Twilio SIP Domain is configured as the PSTN bridge. The audio stream is routed via TwiML — the call is initiated to the user's number with the ElevenLabs-generated voice. Real-time bidirectional audio over WebRTC/SIP.

Twilio SIP DomainsTwiMLWebRTCPSTN Bridge

↓

📲

User Receives Voice Call

The agent delivers the AI-generated voice response as a real phone call. The user can interact back (speech-to-text via Whisper), creating a full conversational loop — from text message to live phone AI agent.

OpenAI Whisper STTReal-time ASRConversation MemoryContext Window

↓

orchestrator.pyPython

1"color:var(--ink-faint);opacity:0.7"># Simplified orchestration flow
2async def handle_whatsapp_message(payload: WebhookPayload):
3    "color:var(--ink-faint);opacity:0.7"># 1. Parse intent with LLM
4    intent = await llm_router.classify(payload.body)
5    
6    if intent.requires_voice:
7        "color:var(--ink-faint);opacity:0.7"># 2. Generate response text
8        response = await openai_client.chat.completions.create(
9            model="gpt-4o",
10            messages=build_conversation(payload)
11        )
12        
13        "color:var(--ink-faint);opacity:0.7"># 3. Synthesize voice via ElevenLabs
14        audio_stream = await elevenlabs.generate(
15            text=response.choices[0].message.content,
16            voice="ankush-custom-voice",
17            stream=True
18        )
19        
20        "color:var(--ink-faint);opacity:0.7"># 4. Initiate Twilio SIP call
21        call = twilio_client.calls.create(
22            twiml=build_twiml(audio_stream),
23            to=payload.from_number,
24            from_=settings.TWILIO_SIP_DOMAIN
25        )
26        
27        return {"call_sid": call.sid, "status": "voice_delivered"}
28    
29    return await send_whatsapp_reply(payload, intent.text_response)

⚡

Real-time streaming

ElevenLabs audio streamed directly into Twilio TwiML — sub-2s latency voice delivery

🔁

Bidirectional loop

Whisper STT captures user's spoken reply, routes back to LLM for context-aware follow-up

🌐

Multilingual ready

ElevenLabs multilingual v2 + Gemini enable native voice responses in 29+ languages

05 — Contact

Let's build something remarkable

Whether you're looking to hire a backend engineer, collaborate on AI projects, discuss open-source, or just want to geek out about LLMs and distributed systems — my inbox is always open.

Send me an email

Based in India · Open to remote, hybrid & relocation

[email protected]

linkedin.com/in/ankush-rathour

GitHub

github.com/AnkushRathour

Linktree

linktr.ee/ankushrathour