Open to Opportunities

Ritwika Kancharla
Software Engineer

Building high-scale distributed systems, real-time communication infrastructure, and AI-powered platforms. Specializing in backend engineering, GPU infrastructure, and knowledge systems.

Anthropic / SRE

AI Reliability & Infrastructure

Token Path Observability Stack

The Problem: Standard monitoring tools (Datadog, Prometheus) are built for request/response web apps. They don't understand that in LLM serving, a "slow" request might be fine if it's generating a long essay, but terrible if it's a short answer.

The Solution: I built a custom Prometheus exporter and Grafana dashboard specifically for vLLM and TGI servers. It tracks Time-to-First-Token (TTFT) to measure responsiveness and Inter-Token-Latency (ITL) to measure streaming smoothness.

Why It Matters: This directly targets the "Token Path" requirement in Anthropic's JD. It proves I understand that LLM user experience is defined by token delivery rates, not just HTTP 200 OKs.

Prometheus Grafana Python Docker
Live Demo Source

GPU Chaos Framework

The Problem: Most SREs test web server failures. Very few test what happens when a GPU hangs, runs out of VRAM, or desyncs in a distributed training run.

The Solution: Built on Kubernetes Chaos Mesh, this framework introduces specific GPU faults: OOM kills, thermal throttling simulation, and network partitions between inference nodes.

Why It Matters: Demonstrates the "Chaos Engineering" requirement for Anthropic. It shows proactive reliability engineering—finding breaking points before users do.

Kubernetes Chaos Mesh Go
Live Demo Source

KV-Cache Aware Load Balancer

The Problem: Round-robin load balancing is inefficient for LLMs. A server processing a long 10k-token prompt is "available" but has no KV-Cache memory left for a new request, leading to queuing delays.

The Solution: A custom middleware that queries the inference server's internal metrics (specifically KV-Cache usage) and routes new requests to the node with the most available cache memory.

Why It Matters: Shows deep understanding of Transformer architecture constraints. This is "holistic system design"—optimizing for the physics of the model, not just the network.

Python FastAPI Redis vLLM
Live Demo Source

OpenAI / Real-Time

Networking & Media Systems

Browser-to-LLM Voice Pipeline (WebRTC)

The Problem: Voice interfaces to AI usually rely on slow HTTP APIs (Record -> Upload -> Process -> Download -> Play). Latency is high (2-5s).

The Solution: A full-stack WebRTC implementation. The browser captures audio -> streams via WebRTC to a signaling server -> server streams audio to LLM -> server streams response audio back to browser.

Why It Matters: Directly maps to OpenAI's "Real-Time" team work on Advanced Voice Mode. Proves I can handle signaling, ICE candidates, and real-time duplex streaming.

WebRTC Node.js React Sockets
Live Demo

Custom Selective Forwarding Unit (Go)

The Problem: Video conferencing relies on SFUs. Most engineers just use Twilio/Agora. Understanding how an SFU actually works (forwarding media without decoding) is rare.

The Solution: Built a minimal SFU in Go using the Pion library. It supports Simulcast (receiving high/low quality streams) and dynamic bitrate switching based on user network conditions.

Why It Matters: Proves "Senior Level" RTC engineering skills. Writing an SFU from scratch requires understanding RTP, RTCP, and bandwidth estimation algorithms.

Go Pion Docker
Live Demo

Jitter Buffer Simulator

The Problem: In real-time audio, packets arrive out of order or at irregular intervals. If you play them immediately, audio glitches. If you wait too long, latency spikes.

The Solution: A simulation tool that artificially introduces network jitter. It visualizes how a Jitter Buffer (a queue) smooths out playback by balancing "target delay" vs. "packet loss".

Why It Matters: Demonstrates understanding of "Lip Sync" and encoding/decoding constraints mentioned in the JD. It’s a debugging tool for media quality.

Python FastAPI React
Live Demo

OpenAI / Knowledge

Applied AI & Data

Graph-Enhanced RAG Engine

The Problem: Standard RAG (Vector Search) retrieves text chunks but misses relationships. It can't answer "Which team owns the service that depends on Database X?"

The Solution: A backend that uses LLMs to extract entities and relationships from documents, storing them in a Postgres Knowledge Graph. Uses Recursive CTEs for multi-hop traversal queries.

Why It Matters: Hits the "Structured Knowledge Representation" requirement for Knowledge Innovation. It moves AI from "reading text" to "understanding structure".

Python Postgres LLM FastAPI
Live Demo

Agentic Operations Platform

The Problem: Internal operations (support, integrity) are manual and slow. Standard automation scripts are brittle.

The Solution: An agentic framework where specialized agents (Classifier, ToolRunner, Escalator) handle tickets. It uses a state machine in Python to ensure reliability and logs every "thought" to Postgres.

Why It Matters: Demonstrates "Automated Agent Systems" capability. It shows I can build the primitives needed to let AI safely operate on production systems.

Python Celery LangChain
Live Demo

OpenAI / Storage

Systems Engineering

Custom LSM-Tree Storage Engine

The Problem: Databases are black boxes. Engineers often treat them as magic. This leads to poor schema design and indexing choices.

The Solution: A C++ implementation of a Log-Structured Merge-Tree (the engine behind RocksDB/Cassandra). Implements MemTables, Write-Ahead Logs (WAL), and SSTable compaction.

Why It Matters: Demonstrates "Systems Programming" and "Storage" expertise. Proving I can build the database, not just query it.

C++ Systems IO
Source Code

Distributed Write-Ahead Log

The Problem: Critical data (like financial transactions) cannot be lost, even if a data center goes down.

The Solution: A distributed log service in Go. It implements leader election and replicates logs to follower nodes before acknowledging the client. It handles crash recovery by replaying logs.

Why It Matters: Directly targets "Reliability" and "Distributed Systems" requirements. It shows understanding of Consensus and Durability.

Go gRPC Raft
Source Code