Hi, I'm Shivam Singh

ML Engineer & AI Researcher specializing in hardware acceleration, ML infrastructure, RAG systems, and computer vision with deep interests in Linear Algebra and Quantum Mechanics.

# GPU-Accelerated ML Infrastructure
import torch
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel
# Multi-GPU training setup
model = DistributedDataParallel(model)
optimizer = torch.optim.AdamW(model.parameters())
# Custom CUDA kernel optimization
@torch.jit.script
def optimized_attention(q, k, v):
  return F.scaled_dot_product_attention(q, k, v)
# 2.5x speedup achieved! ⚡

About Me

I'm a Machine Learning Engineer and AI Researcher currently pursuing my Master's in ML & Data Science at UC San Diego. I specialize in hardware acceleration, ML infrastructure, RAG systems, and computer vision with deep interests in Linear Algebra and Quantum Mechanics.

At the Causality Lab under Prof. Biwei Hwang, I'm developing GPU-accelerated ML frameworks that achieve 2.5x speedup through optimized CUDA kernels and multi-GPU parallelization. My work bridges the gap between theoretical ML and high-performance computing.

I've delivered production ML solutions at Parabole.ai, Dell Technologies, and Tata Communications, focusing on inference optimization, distributed systems, and vector database integration for enterprise RAG applications.

I just don't only code :D In my free time I like to play tennis, drumming, wildlife & nature photography!

2.5x GPU Speedup
35% Latency Reduction
3+ Years Experience
10+ Projects Delivered

Beyond the Code

When I'm not optimizing ML algorithms, you'll find me exploring the world, capturing moments, and staying active

Skills & Expertise

Technologies and frameworks I use to build intelligent systems

Hardware Acceleration

CUDA NCCL Custom Kernels Multi-GPU XLA LLVM
🏗️

ML Infrastructure

MLFlow Kubernetes Docker Distributed Training Model Serving CI/CD
🔍

RAG & Vector Systems

Vector Databases Embeddings LLM Integration Retrieval Systems Semantic Search Knowledge Graphs
🔬

Theory & Research

Linear Algebra Quantum Mechanics Computer Vision Graph Theory Statistical Analysis Research Papers

Featured Projects

Innovative ML solutions that push the boundaries of what's possible

🧠

GPU-Accelerated ML Framework

UC San Diego - Causality Lab

Built high-performance ML framework with custom CUDA kernels and multi-GPU parallelization. Optimized inference pipelines achieving 2.5x speedup over CPU-based implementations.

2.5x
Speedup
Multi-GPU
Parallel
CUDA
Optimized
CUDA NCCL Custom Kernels Multi-GPU PyTorch
🚀

Dynamic Resource Allocator for LLMs

UC San Diego - Personal Project

Built a dynamic resource manager using CUDA + NCCL for multi-GPU load balancing with real-time memory allocation monitoring and Vulkan API visualization.

40%
Concurrency ↑
Real-time
Monitoring
Multi-GPU
Support
CUDA NCCL Vulkan API C++ Memory Management
🕵️

Fraud Detection System

UC San Diego - Kaggle Competition

Anomaly detection system using Isolation Forests & Autoencoders with RBM integration for complex transactional fraud patterns detection.

92%
Precision
15%
Recall ↑
IEEE-CIS
Dataset
Isolation Forest Autoencoders RBM GNN PyTorch
🎨

Computer Vision for Art Classification

Springer Publication - Networks & Systems

Published research on CNN-based art classification across historical periods. Advanced feature extraction techniques for style recognition in Baroque, Renaissance, and Impressionism paintings.

Published
Springer
8500+
Images
CV
Research
Computer Vision CNN Feature Extraction Deep Learning Art Analysis
🔍

Enterprise RAG Platform

Parabole.ai - Production System

Built and optimized RAG-based platform with vector databases for enterprise knowledge retrieval. Implemented custom embeddings and semantic search with 30% performance improvement.

30%
Performance ↑
Vector
Database
Enterprise
Scale
RAG Vector DB Embeddings LLM Semantic Search
💰

Portfolio Management System

Techstars Hackathon - Winning Project

Investment portfolio optimizer using Modern Portfolio Theory and Black-Litterman model with scenario analysis and stress testing capabilities.

300+
Teams
Verbal
Mention
MPT
Algorithm
Modern Portfolio Theory Black-Litterman Financial Modeling Risk Analysis

Experience

My journey through top-tier companies and research institutions

Jul 2024 - Present

Research Assistant

UC San Diego - Causality Lab

Developing GPU-accelerated ML frameworks with custom CUDA kernels and multi-GPU parallelization. Built high-performance inference pipelines achieving 2.5x speedup through hardware optimization and distributed computing.

Jun 2024 - Aug 2024

Software Engineer Intern

Parabole.ai

Built enterprise RAG platform with vector databases and semantic search. Optimized embedding generation and retrieval systems, achieving 30% performance improvement through custom indexing and query optimization.

Jan 2023 - Jun 2023

Software Engineer Intern

Dell Technologies

Built automated test suites reducing manual efforts by 25% and developed deployment automation with Python & Terraform, cutting deployment time by 50%.

Jun 2021 - Aug 2021

Analytics Intern

Tata Communications

Created analytics models for project evaluation in JIRA using ETL pipelines with PostgreSQL and Tableau. Built real-time data pipeline with Apache Kafka for telecom operations.

Publications & Research

Contributing to the advancement of ML and computer vision through peer-reviewed research

Springer - Lecture Notes in Networks & Systems

Classifying Artworks/Paintings using Deep Learning: A Computer Vision Approach to Art Analysis

Shivam Singh, et al.

We developed a CNN-based image classification model to predict the genre of 8,500 digital paintings, achieving 60% accuracy and surpassing previous benchmarks. The research improved feature extraction techniques for style recognition in Baroque, Renaissance, and Impressionism paintings through advanced deep learning architectures.

arXiv

Causal Copilot: An Autonomous Causal Analysis Agent

Shivam Singh, Biwei Hwang

This work presents novel CUDA kernel optimizations for distributed ML inference, achieving 2.5x speedup through multi-GPU parallelization. We introduce custom memory management strategies and stream-based asynchronous processing for production-scale model serving.

Let's Build Something Amazing

I'm always excited to collaborate on innovative ML projects, research opportunities, or discuss how AI can solve complex problems. Let's connect!

Available for full-time opportunities starting June 2025