Ritwika Kancharla | Software Engineer

Anthropic / SRE

AI Reliability & Infrastructure

Token Path Observability Stack

The Problem: Standard monitoring tools (Datadog, Prometheus) are built for request/response web apps. They don't understand that in LLM serving, a "slow" request might be fine if it's generating a long essay, but terrible if it's a short answer.

The Solution: I built a custom Prometheus exporter and Grafana dashboard specifically for vLLM and TGI servers. It tracks Time-to-First-Token (TTFT) to measure responsiveness and Inter-Token-Latency (ITL) to measure streaming smoothness.

Why It Matters: This directly targets the "Token Path" requirement in Anthropic's JD. It proves I understand that LLM user experience is defined by token delivery rates, not just HTTP 200 OKs.

Prometheus Grafana Python Docker

Live Demo Source

GPU Chaos Framework

The Problem: Most SREs test web server failures. Very few test what happens when a GPU hangs, runs out of VRAM, or desyncs in a distributed training run.

The Solution: Built on Kubernetes Chaos Mesh, this framework introduces specific GPU faults: OOM kills, thermal throttling simulation, and network partitions between inference nodes.

Why It Matters: Demonstrates the "Chaos Engineering" requirement for Anthropic. It shows proactive reliability engineering—finding breaking points before users do.

Kubernetes Chaos Mesh Go

Live Demo Source

KV-Cache Aware Load Balancer

The Problem: Round-robin load balancing is inefficient for LLMs. A server processing a long 10k-token prompt is "available" but has no KV-Cache memory left for a new request, leading to queuing delays.

The Solution: A custom middleware that queries the inference server's internal metrics (specifically KV-Cache usage) and routes new requests to the node with the most available cache memory.

Why It Matters: Shows deep understanding of Transformer architecture constraints. This is "holistic system design"—optimizing for the physics of the model, not just the network.

Python FastAPI Redis vLLM

Live Demo Source

OpenAI / Real-Time

Networking & Media Systems

Browser-to-LLM Voice Pipeline (WebRTC)

The Problem: Voice interfaces to AI usually rely on slow HTTP APIs (Record -> Upload -> Process -> Download -> Play). Latency is high (2-5s).

The Solution: A full-stack WebRTC implementation. The browser captures audio -> streams via WebRTC to a signaling server -> server streams audio to LLM -> server streams response audio back to browser.

Why It Matters: Directly maps to OpenAI's "Real-Time" team work on Advanced Voice Mode. Proves I can handle signaling, ICE candidates, and real-time duplex streaming.

WebRTC Node.js React Sockets

Live Demo

Custom Selective Forwarding Unit (Go)

The Problem: Video conferencing relies on SFUs. Most engineers just use Twilio/Agora. Understanding how an SFU actually works (forwarding media without decoding) is rare.

The Solution: Built a minimal SFU in Go using the Pion library. It supports Simulcast (receiving high/low quality streams) and dynamic bitrate switching based on user network conditions.

Why It Matters: Proves "Senior Level" RTC engineering skills. Writing an SFU from scratch requires understanding RTP, RTCP, and bandwidth estimation algorithms.

Go Pion Docker

Live Demo

Jitter Buffer Simulator

The Problem: In real-time audio, packets arrive out of order or at irregular intervals. If you play them immediately, audio glitches. If you wait too long, latency spikes.

The Solution: A simulation tool that artificially introduces network jitter. It visualizes how a Jitter Buffer (a queue) smooths out playback by balancing "target delay" vs. "packet loss".

Why It Matters: Demonstrates understanding of "Lip Sync" and encoding/decoding constraints mentioned in the JD. It’s a debugging tool for media quality.

Python FastAPI React

Live Demo

OpenAI / Knowledge

Applied AI & Data

Graph-Enhanced RAG Engine

The Problem: Standard RAG (Vector Search) retrieves text chunks but misses relationships. It can't answer "Which team owns the service that depends on Database X?"

The Solution: A backend that uses LLMs to extract entities and relationships from documents, storing them in a Postgres Knowledge Graph. Uses Recursive CTEs for multi-hop traversal queries.

Why It Matters: Hits the "Structured Knowledge Representation" requirement for Knowledge Innovation. It moves AI from "reading text" to "understanding structure".

Python Postgres LLM FastAPI

Live Demo

Agentic Operations Platform

The Problem: Internal operations (support, integrity) are manual and slow. Standard automation scripts are brittle.

The Solution: An agentic framework where specialized agents (Classifier, ToolRunner, Escalator) handle tickets. It uses a state machine in Python to ensure reliability and logs every "thought" to Postgres.

Why It Matters: Demonstrates "Automated Agent Systems" capability. It shows I can build the primitives needed to let AI safely operate on production systems.

Python Celery LangChain

Live Demo

OpenAI / Storage

Systems Engineering

Custom LSM-Tree Storage Engine

The Problem: Databases are black boxes. Engineers often treat them as magic. This leads to poor schema design and indexing choices.

The Solution: A C++ implementation of a Log-Structured Merge-Tree (the engine behind RocksDB/Cassandra). Implements MemTables, Write-Ahead Logs (WAL), and SSTable compaction.

Why It Matters: Demonstrates "Systems Programming" and "Storage" expertise. Proving I can build the database, not just query it.

C++ Systems IO

Source Code

Distributed Write-Ahead Log

The Problem: Critical data (like financial transactions) cannot be lost, even if a data center goes down.

The Solution: A distributed log service in Go. It implements leader election and replicates logs to follower nodes before acknowledging the client. It handles crash recovery by replaying logs.

Why It Matters: Directly targets "Reliability" and "Distributed Systems" requirements. It shows understanding of Consensus and Durability.

Go gRPC Raft

Source Code

Education

M.S. in Computer Science

B.Tech in Computer Science

Ritwika Kancharla
Software Engineer

AI Reliability & Infrastructure

Token Path Observability Stack

GPU Chaos Framework

KV-Cache Aware Load Balancer

Networking & Media Systems

Browser-to-LLM Voice Pipeline (WebRTC)

Custom Selective Forwarding Unit (Go)

Jitter Buffer Simulator

Applied AI & Data

Graph-Enhanced RAG Engine

Agentic Operations Platform

Systems Engineering

Custom LSM-Tree Storage Engine

Distributed Write-Ahead Log

Education

M.S. in Computer Science

B.Tech in Computer Science

Ritwika Kancharla Software Engineer

AI Reliability & Infrastructure

Token Path Observability Stack

GPU Chaos Framework

KV-Cache Aware Load Balancer

Networking & Media Systems

Browser-to-LLM Voice Pipeline (WebRTC)

Custom Selective Forwarding Unit (Go)

Jitter Buffer Simulator

Applied AI & Data

Graph-Enhanced RAG Engine

Agentic Operations Platform

Systems Engineering

Custom LSM-Tree Storage Engine

Distributed Write-Ahead Log

Ritwika Kancharla
Software Engineer