Open to Senior SWE / AI Engineer roles

Building scalable systems where software meets AI.

I'm Harsh Aathavale, a Senior Software Engineer with 2+ years shipping distributed systems, microservices, and production ML platforms across AWS, Azure, and GCP. Full-stack by day, ML infra by night.

Mumbai, Indiaaathavaleharsh@gmail.com +91 93723 11512

Download Resume GitHub LinkedIn

50K+

Daily inferences served

GPU throughput gain

8.69

BTech CGPA

Intro Video - 45s

Where I have worked

Experience

Senior AI Engineer

Fulcrum Digital

Jun 2025 - Present

Architected production ML serving infrastructure on AKS and Kubernetes, processing 50K+ inference requests daily with sub-200ms latency.
Built full-stack GenAI apps with React, FastAPI, LangGraph agentic workflows, and RAG pipelines, reducing manual document processing by 70%.
Designed event-driven microservices with Kafka and RabbitMQ for real-time data ingestion across distributed systems.
Implemented observability with Grafana and Prometheus; optimized pipelines with Numba/CUDA for 4x throughput improvement.

Software Engineer

Pertsol

Nov 2024 - Jun 2025

Engineered a real-time target proximity prediction system via REST APIs, reducing location lookup latency by 60%.
Built a speaker diarization pipeline and optimized ingestion scripts for audio processing at scale.

Software Engineer Intern

Dyma.ai

Jul - Nov 2024

Built a production agentic RAG system for document retrieval, improving retrieval accuracy by 35% through systematic benchmarking.

Research Intern

Ernst & Young (EY)

Jan - Jul 2024

Developed ML-based document retrieval and fraud detection systems, reducing the false-positive rate by 40%.

Technical Stack

Languages

PythonJavaJavaScriptTypeScriptSQL

Backend

FastAPISpring BootNode.jsgRPCREST APIsNginx

Frontend

ReactNext.jsHTML/CSS

Cloud & DevOps

AWSAzureGCPAKSDockerKubernetesTerraformGitHub ActionsCI/CD

Data & Messaging

PostgreSQLMongoDBRedisKafkaRabbitMQ

AI / ML

PyTorchHugging FaceLangGraphRAGCUDANumba

Observability

GrafanaPrometheusELK Stack

Tools

GitLinuxDocker ComposeJiraAgile/Scrum

Selected work

Featured Work

Systems and platforms I have architected, from ML infrastructure and RAG pipelines to event-driven microservices.

Distributed ML Training & Inference Platform

End-to-end ML training pipeline with LoRA fine-tuning, 8-bit quantization, and Kubernetes orchestration for GPU scaling. Designed for cost-efficient model iteration.

PythonPyTorchKubernetesDockerLoRAQuantization

Real-Time Issue Processing System

Event-driven full-stack system with real-time WebSocket streaming, agentic workflows, and Docker Compose plus Nginx architecture built for high availability.

ReactTypeScriptFastAPIWebSocketsDocker ComposeNginx

Production ML Serving Infrastructure

AKS and Kubernetes serving stack processing 50K+ inference requests daily with sub-200ms latency. Instrumented with Grafana and Prometheus observability.

KubernetesAKSFastAPIGrafanaPrometheus

Agentic RAG Document Retrieval

Production RAG system for enterprise document retrieval, improving retrieval accuracy by 35% through systematic benchmarking of chunking and embedding strategies.

LangGraphRAGPythonVector DB

Event-Driven Microservices

Real-time data ingestion pipeline across distributed systems using Kafka and RabbitMQ. Handles high-throughput streams with backpressure and DLQs.

KafkaRabbitMQFastAPINode.js

GPU-Accelerated Pipelines

Optimized data pipelines using Numba and CUDA, achieving 4x throughput improvement over the CPU baseline for numerical and inference workloads.

CUDANumbaPythonPyTorch

Writing

From the Journal

Notes on shipping ML platforms, distributed systems, and the craft of engineering at scale.

May 14, 2025 · 8 min read

Serving 50K Inference Requests a Day on Kubernetes

Lessons from architecting production ML serving on AKS and hitting sub-200ms latency without burning through GPU budget.

MLOpsKubernetes

Read

Apr 2, 2025 · 7 min read

Building Agentic RAG Systems That Actually Work

Six months of production RAG taught me that retrieval quality dominates, not the LLM. Here is how we improved accuracy by 35%.

RAGLangGraph

Read

Feb 18, 2025 · 6 min read

Kafka vs. RabbitMQ in Real Event-Driven Systems

Both are great. Neither is universal. Here is how I decide when designing event-driven microservices.

Distributed SystemsKafka

Read

PDF - Updated 2025

Grab the full CV.

Complete work history, technical skills matrix, education, and projects, designed to be scanned quickly.

Download Full CV (PDF)Get in touch

Backend APIs

Cloud Platforms

ML Observability