Building high-scale distributed systems, real-time communication infrastructure, and AI-powered platforms. Specializing in backend engineering, GPU infrastructure, and knowledge systems.
Anthropic / SRE
The Problem: Standard monitoring tools (Datadog, Prometheus) are built for request/response web apps. They don't understand that in LLM serving, a "slow" request might be fine if it's generating a long essay, but terrible if it's a short answer.
The Solution: I built a custom Prometheus exporter and Grafana dashboard specifically for vLLM and TGI servers. It tracks Time-to-First-Token (TTFT) to measure responsiveness and Inter-Token-Latency (ITL) to measure streaming smoothness.
Why It Matters: This directly targets the "Token Path" requirement in Anthropic's JD. It proves I understand that LLM user experience is defined by token delivery rates, not just HTTP 200 OKs.
The Problem: Most SREs test web server failures. Very few test what happens when a GPU hangs, runs out of VRAM, or desyncs in a distributed training run.
The Solution: Built on Kubernetes Chaos Mesh, this framework introduces specific GPU faults: OOM kills, thermal throttling simulation, and network partitions between inference nodes.
Why It Matters: Demonstrates the "Chaos Engineering" requirement for Anthropic. It shows proactive reliability engineering—finding breaking points before users do.
The Problem: Round-robin load balancing is inefficient for LLMs. A server processing a long 10k-token prompt is "available" but has no KV-Cache memory left for a new request, leading to queuing delays.
The Solution: A custom middleware that queries the inference server's internal metrics (specifically KV-Cache usage) and routes new requests to the node with the most available cache memory.
Why It Matters: Shows deep understanding of Transformer architecture constraints. This is "holistic system design"—optimizing for the physics of the model, not just the network.
OpenAI / Real-Time
The Problem: Voice interfaces to AI usually rely on slow HTTP APIs (Record -> Upload -> Process -> Download -> Play). Latency is high (2-5s).
The Solution: A full-stack WebRTC implementation. The browser captures audio -> streams via WebRTC to a signaling server -> server streams audio to LLM -> server streams response audio back to browser.
Why It Matters: Directly maps to OpenAI's "Real-Time" team work on Advanced Voice Mode. Proves I can handle signaling, ICE candidates, and real-time duplex streaming.
The Problem: Video conferencing relies on SFUs. Most engineers just use Twilio/Agora. Understanding how an SFU actually works (forwarding media without decoding) is rare.
The Solution: Built a minimal SFU in Go using the Pion library. It supports Simulcast (receiving high/low quality streams) and dynamic bitrate switching based on user network conditions.
Why It Matters: Proves "Senior Level" RTC engineering skills. Writing an SFU from scratch requires understanding RTP, RTCP, and bandwidth estimation algorithms.
The Problem: In real-time audio, packets arrive out of order or at irregular intervals. If you play them immediately, audio glitches. If you wait too long, latency spikes.
The Solution: A simulation tool that artificially introduces network jitter. It visualizes how a Jitter Buffer (a queue) smooths out playback by balancing "target delay" vs. "packet loss".
Why It Matters: Demonstrates understanding of "Lip Sync" and encoding/decoding constraints mentioned in the JD. It’s a debugging tool for media quality.
OpenAI / Knowledge
The Problem: Standard RAG (Vector Search) retrieves text chunks but misses relationships. It can't answer "Which team owns the service that depends on Database X?"
The Solution: A backend that uses LLMs to extract entities and relationships from documents, storing them in a Postgres Knowledge Graph. Uses Recursive CTEs for multi-hop traversal queries.
Why It Matters: Hits the "Structured Knowledge Representation" requirement for Knowledge Innovation. It moves AI from "reading text" to "understanding structure".
The Problem: Internal operations (support, integrity) are manual and slow. Standard automation scripts are brittle.
The Solution: An agentic framework where specialized agents (Classifier, ToolRunner, Escalator) handle tickets. It uses a state machine in Python to ensure reliability and logs every "thought" to Postgres.
Why It Matters: Demonstrates "Automated Agent Systems" capability. It shows I can build the primitives needed to let AI safely operate on production systems.
OpenAI / Storage
The Problem: Databases are black boxes. Engineers often treat them as magic. This leads to poor schema design and indexing choices.
The Solution: A C++ implementation of a Log-Structured Merge-Tree (the engine behind RocksDB/Cassandra). Implements MemTables, Write-Ahead Logs (WAL), and SSTable compaction.
Why It Matters: Demonstrates "Systems Programming" and "Storage" expertise. Proving I can build the database, not just query it.
The Problem: Critical data (like financial transactions) cannot be lost, even if a data center goes down.
The Solution: A distributed log service in Go. It implements leader election and replicates logs to follower nodes before acknowledging the client. It handles crash recovery by replaying logs.
Why It Matters: Directly targets "Reliability" and "Distributed Systems" requirements. It shows understanding of Consensus and Durability.