Skip to content
View Shashank-Tripathi-07's full-sized avatar
🧠
🧠

Block or report Shashank-Tripathi-07

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Shashank Tripathi

building ML systems, optimizing GPU workloads, and experimenting across AI infrastructure, performance engineering, and scalable software systems.

currently focused on Triton/CUDA optimization, distributed training systems, efficient inference, and systems-aware deep learning.


About

I work across the intersection of:

  • ML systems
  • GPU programming
  • AI infrastructure
  • performance engineering
  • scalable backend systems
  • full-stack product development
  • AI consulting + technical strategy

Alongside engineering-focused work, I’ve collaborated with startups and product teams on building practical, cost-efficient AI and software solutions.

My approach combines:

  • deep technical understanding
  • systems-level optimization
  • product thinking
  • business-aware engineering decisions

I enjoy bridging the gap between technical and non-technical teams — translating complex systems into scalable, usable, and commercially practical solutions.

This includes helping teams:

- optimize infrastructure costs
- choose efficient AI/ML architectures
- scale products pragmatically
- improve engineering workflows
- ship faster without sacrificing quality
- balance performance with maintainability

I’m comfortable working across different environments:

  • early-stage startups
  • hackathon teams
  • fast-moving product groups
  • research-oriented engineering teams
  • enterprise-scale workflows

and across vastly different levels of technical complexity.

That can range from:

- building lightweight automations for small businesses
- designing AI agents for operational workflows
- creating internal productivity tools
- shipping full-stack MVPs quickly
- improving backend scalability
- optimizing cloud/resource usage
- designing efficient ML pipelines
- tuning GPU kernels for high-throughput inference
- optimizing Triton/CUDA workloads for LLM systems
- experimenting with systems-level performance engineering

I enjoy solving both ends of the spectrum: practical business problems that need clean execution, and deeply technical infrastructure problems that require low-level optimization and systems thinking.

My work ranges from low-level kernel optimization and distributed training experiments to building real-world applications, developer tools, and AI-powered products.

I enjoy understanding systems from the inside out — memory movement, scheduling, throughput, compiler behavior, kernel execution, and the engineering tradeoffs behind modern AI workloads.

This GitHub is essentially an active engineering workspace where I explore:

- GPU kernels + Triton/CUDA
- efficient deep learning systems
- training + inference infrastructure
- compiler-aware optimization
- AI-powered products
- distributed systems
- developer tooling
- experimental ML infrastructure

selected repositories

TinyTorch [Built in Harvard's CS249r repository]

A lightweight deep learning framework built to understand tensor systems, autograd internals, and the foundations behind modern deep learning libraries.

focus areas:

  • tensor abstractions
  • automatic differentiation
  • computational graphs
  • backend execution mechanics
  • educational systems design

Triton + CUDA Optimization [Current Focus]

Collection of kernel optimization experiments focused on maximizing GPU throughput and understanding low-level execution behavior.

includes work around:

  • GEMM optimization
  • memory coalescing
  • occupancy tuning
  • shared memory optimization
  • tiling strategies
  • warp-level execution
  • benchmarking + profiling

recent work includes iterative optimization of matrix multiplication kernels achieving extremely high GFLOPS through scheduling and memory-access improvements.


Turing — AI Integrated Real-Time Assistant [Individual project on Agents and System Automation]

An experimental AI assistant platform exploring real-time interactions, AI tooling, and product-scale system design.

focus areas:

  • AI integration
  • real-time workflows
  • product engineering
  • scalable architecture
  • user-focused AI experiences

Hackathon + Product Projects

A collection of fast-built but ambitious projects exploring:

  • AI applications
  • full-stack systems
  • developer tooling
  • automation
  • real-time platforms
  • rapid product iteration

these projects helped shape my approach toward shipping quickly while maintaining strong engineering fundamentals.


ML Systems Experiments

Repositories exploring:

  • distributed training
  • TPU/GPU experimentation
  • inference optimization
  • scalable training workflows
  • systems-oriented deep learning
  • infrastructure-aware experimentation

built while experimenting with Kaggle TPUs, large-scale workloads, and efficient AI system design.


Experience

  • Kaggle Grandmaster
  • Harvard Edge Computing Lab
  • IIT Guwahati (Class of 2028)

Engineering interests

- compiler-aware ML optimization
- efficient transformer systems
- GPU kernel engineering
- distributed AI systems
- high-throughput inference
- scalable LLM infrastructure
- systems-level AI research
- performance benchmarking

Tech stack

languages     → python, c++, cuda, javascript, typescript, c, R, 
ml/ai         → pytorch, triton, tensorflow, jax
systems       → cuda, distributed systems, gpu programming, 
backend       → node.js, express, fastapi, express.js, svelte, sveltekit
frontend      → react, next.js
infra         → linux, docker, git, vercel, kubernetes 
areas         → ML systems, AI infra, performance engineering, AI Engineering, ML Engineering, Frontier Research 

links


building systems that make AI workloads faster, scalable, and usable in the real world.

@Shashank-Tripathi-07's activity is private