..

Projects

A collection of things I've built, researched, and tinkered with over the years.

Some project links are not available because they were done as part of graded coursework or unpublished research. Academic integrity policies of certain courses prohibit publicly sharing the code or reports. Please contact me privately for any information regarding them.

GPU Kernel Programming with Triton

2025 ongoing
  • Implemented Triton kernels covering vector addition, matrix multiplication, convolution, attention, normalization, and fused optimizer paths as a comprehensive suite of ML primitives.
  • Implemented kernel optimizations like autotuning, pipelining, and alignment hinting, to surpass the baseline reference implementations.
Triton GPU Programming Machine Learning

Accelerating CNN Inference via AVX Intrinsics

2025 completed
  • Implemented and accelerated core CNN layers (Conv2D, Fully Connected, ReLU, MaxPool) in C++ using AVX intrinsics.
  • Coded multiple advanced convolution strategies, including Input/Weight/Output-Stationary dataflows and 2D Tiling, to optimize performance on an AlexNet model.
AVX C++ Machine Learning

High-Performance Algorithms on Shared & Distributed Memory

2023 completed
  • Engineered high-throughput parallel solutions for data compression (Huffman Coding) and vector processing (Blelloch and Hillis–Steele Prefix Sum), utilizing OpenMP and Pthreads for shared-memory threading and MPI for distributed message passing.
  • Achieved lock-free concurrent writes via an offset-precomputation strategy and header-based partitioning, while benchmarking scalability and communication overheads to evaluate trade-offs between work-efficiency and step-complexity.
MPI OpenMP Pthreads Parallel Computing

MPI Collectives over 2-D Mesh Topology

2023 completed
  • Implemented broadcast, reduce, and allreduce collectives on an m * m mesh using only MPI_Send/MPI_Recv, enforcing strict neighbor-only communication and minimizing per-link load via planar row/column decomposition.
MPI Parallel Computing

Lightweight Hypervisor with KVM API

2025 completed
  • Built a C-based hypervisor using the Linux KVM API to initialize and manage VCPU state and memory maps, successfully booting guest VMs into protected mode.
  • Orchestrated a multi-VM producer-consumer model by implementing hypercalls (via KVM_EXIT_IO) to act as a broker, trapping VM exits to copy a shared data buffer between guests.
UNIX C KVM Operating Systems

Concurrent Min/Max Heap as Loadable Kernel Module

2024 completed
  • Implemented a loadable kernel module (LKM) in C to provide a per-process, concurrency-safe min/max heap, managing kernel memory and state for multiple processes.
  • Exposed kernel-space heap operations (init, insert, extract-top) to userspace via the /proc filesystem using both standard read/write and ioctl system calls.
UNIX C Kernel Modules Operating Systems

Extended CLI Shell with New Fan-Out Operators

2022 completed
  • Built a C-based CLI shell from scratch using syscalls with robust I/O redirection (<, >, >>), built-ins (cd, exit, logout, type, history), and background job control (&) with process status reporting and child cleanup.
  • Extended POSIX pipelining with new fan-out operators (||, |||) to broadcast output to multiple downstream processes simultaneously
UNIX C System Programming

Kernel-Level Thread Scheduler and Alarm Clock in PintOS

2023 completed
  • Reimplemented timer_sleep() in the PintOS kernel using interrupt-driven wake-ups instead of busy-waiting.
  • Extended the thread scheduler to maintain an ordered sleep queue and unblock threads on timer interrupts for deterministic wake-ups.
UNIX C Operating Systems

Autoscaling Cloud Management System with libvirt API

2024 completed
  • Built an autoscaling program using the libvirt to manage the lifecycle of a multi-VM, CPU-intensive client-server application.
  • Implemented a monitoring loop to read VM CPU utilization via virDomainPtr handles, triggering horizontal scaling (N to N+1 replicas) when load exceeded a threshold.
  • Programmatically spawned new server VMs from XML templates and notified the multi-threaded client to distribute load to the new replica, mitigating the overload.
libvirt Virtualization

Container Runtime with Linux Namespaces & Cgroups

2024 completed
  • Built a container runtime in C, invoking clone & setns system calls to create isolated PID, NET, MNT, and UTS namespaces.
  • Mounted a new rootfs and /proc for process isolation, configured veth (virtual ethernet) pairs for host networking, and enforced memory limits using the cgroups API.
C Linux Namespaces Cgroups Containerization

Memory-Efficient Compiler for a Scoped, Statically-Typed Language

2023 completed
  • Implemented a full compiler pipeline (recursive-descent parser, AST, hash-based symbol table with scoped entries, semantic checks, and NASM code generation) for a statically-typed language designed by faculty, called ERPLAG.
  • Designed lightweight activation records and register allocation, lowering memory usage by 35% compared to naive stack allocation, while supporting recursion and dynamic arrays.
C Compiler Construction NASM

32-bit MIPS Processor Design & Custom Pipeline Architecture

2022 completed
  • Implemented a 32-bit Single-Cycle MIPS processor in Verilog, modularly designing the Control Unit, Instruction and Data Memory, ALU, and Register File, and integrating them into a complete R-format execution pipeline.
  • Built a 3-stage instruction pipeline (Fetch/Encode, Execute, Generate Parity) supporting 8 custom ALU operations.
Verilog Computer Architecture Digital Logic and Design

x86 Real-Mode Split-Screen Editor & System Utilities

2022 completed
  • Developed a dual-viewport text editor in x86 Assembly, utilizing BIOS video interrupts (INT 10h) and implemented a low-level file management module using DOS handles (INT 21h) for sequential and random access (LSEEK) file operations.
x86 Assembly Microprocessor Programming

Low-Level Profiling and Benchmarking of the Lua Garbage Collector

2024 completed
  • Profiled the C implementation of Lua's garbage collector using Valgrind (Callgrind) to identify performance hotspots under various configurations (stop-the-world, generational, incremental)
  • Analyzed source code and call graphs to benchmark the performance trade-offs of different memory management strategies under varying workloads.
C Performance Analysis Lua

High-Performance Key-Value Database with Kernel Bypass

2024 completed
  • Built a high-performance key-value database engine in C, with RB-Tree memtables, WAL durability, Bloom-filter SSTables, and leveled compaction. Implemented storage paths, on-disk formats, and crash-recovery logic.
  • Integrated the engine with a networking stack using Linux network namespaces and DPDK for kernel bypass, and benchmarked latency gains against the standard kernel stack.
C DPDK Storage Systems Networking

File System on an In-Memory Disk Emulator

2024 completed
  • Built an in-memory disk emulator in C to provide a persistent block-level device interface with a fixed 4KB block size.
  • Designed and implemented a file system on the emulated disk, managing all core metadata including the super block, inodes, data bitmaps, and indirect blocks
C Storage Systems File Systems

Multi-Client TCP Server with Custom ARQ Protocol

2023 completed
  • Built a multi-client TCP server in C with a custom packet format (seq num, type flags) and stop-and-wait ARQ to handle 10% simulated packet loss with a 2s retransmission timeout, guaranteeing reliable, in-order delivery.
C TCP Network Programming

Network Message Bus: Distributed Message Queue with UDP

2022 completed
  • Built a distributed message queue in C providing System V-style APIs over UDP multicast for inter-host communication.
  • Implemented per-host servers and error-capture processes to demultiplex multicasts and propagate ICM failures (e.g., HOST UNREACHABLE) across nodes for end-to-end reliability.
C UDP Network Programming

Customizable Load Balancer with Dynamic Replica Management

2024 completed
  • Designed a load balancer to route asynchronous client requests, using consistent hashing to efficiently distribute load and minimize remapping on server failure.
  • Implemented dynamic replica management by interfacing with the Docker daemon to programmatically spawn and terminate server containers in response to requests.
Cloud Computing Docker

Network Packet Sniffer

2022 completed

Developed a packet-capture utility using raw sockets to inspect Ethernet/IP/TCP headers for educational analysis. Reconstructed simple TCP flows and exported pcap-compatible output for downstream tooling.

C Socket Programming

Anytime Clustering for Streaming Data

2022 completed

Implemented an anytime hierarchical k-medoids pipeline for streaming data experiments. Employed micro-cluster sketches and asynchronous insertion logic to maintain low-latency updates under arrival bursts.

C Machine Learning

Affine Short-Rate Models for Swap Valuation

2023 completed

Studied Hull–White and CIR affine term-structure models for pricing interest-rate derivatives. Implemented semi-analytical calibration routines and finite-difference checks for PDE-based pricing validation.

Python Scientific Computing

Monte Carlo Methods for Option Pricing

2023 completed

Built path-dependent Monte Carlo estimators for derivative pricing under stochastic volatility models. Used antithetic variates and control variates to reduce estimator variance and implemented parallel sampling.

Python Scientific Computing

Clustering Stability in Noisy Streams

2022 completed

Investigated medoid-based clustering stability under nonstationary stream arrivals and noise. Performed controlled experiments measuring purity and silhouette with randomized concept-drift injections.

C++ Statistics

Randomized Algorithms for Data Summarization

2022 completed

Explored sketching methods for approximate frequency and quantile estimation on large streams. Implemented Count-Min and t-digest sketches and compared memory/accuracy trade-offs.

Python Algorithms