Hi, I'm a recent graduate from IIIT Delhi, where I earned my Bachelor's degree in Computer Science & Engineering. My research interests are broadly in machine learning and computer vision. My undergraduate research focused on teaching machines to learn effectively from limited data in real-world scenarios, focusing on OOD generalization and active learning for computer vision tasks. I have also explored deep reinforcement learning for compositional reasoning and core knowledge acquisition. I have experience working on various problems in vision, ML, and reinforcement learning.
I've been fortunate to collaborate with and be mentored by Dr. Saket Anand, Dr. Gautam Shroff (TCS Research), Dr. Supratim Shit, and Dr. Pravesh Biyani, each of whom has shaped my thinking in meaningful ways.
I currently work full-time as a software engineer at tbo.com. I am looking for research opportunities in ML and CV. I am interested in fundamental research in general.
Outside of research, I'm deeply passionate about classical music and playing the piano. I used to be a competitive speedcuber and competitive gamer. I also run a (currently inactive) YouTube channel for my music shenanigans, which also features Rubik's Cube tutorials I produced in my childhood.
Feel free to mail me at atharv21027@iiitd.ac.in. I'm always happy to chat!
publications
Reliable Active Learning from Unreliable Labels via Neural Collapse Geometry
Active learning algorithm that uses neural collapse geometry dynamics to guide data acquisition. Results in models that are more accurate, robust to noise, generalizable to OOD data & OOD classes — using considerably less labeled data.
Just Add Geometry: Gradient-Free Open-Vocabulary 3D Detection Without Human-in-the-Loop
Training-free, annotation-free pipeline for open-vocabulary 3D detection, built on 2D vision-language foundation models and classical geometric reasoning. Proposes a novel dataset for testing under adverse conditions and lack of depth information.
projects
Active Learning for Object Detection: From Foundation Models to Geometric Insights

Semi-supervised active learning framework for object detection that combines foundation models with human-in-the-loop annotation. We leverage Neural Collapse to develop a targeted acquisition function, enabling efficient training under tight annotation budgets, and strong performance with minimal human supervision.
Toward Compositional Reasoning with Deep Reinforcement Learning

Explored training deep RL agents to acquire core knowledge priors via demonstration learning in procedurally generated 2D grid environments. The goal was to enable compositional generalization across diverse tasks, inspired by ARC-style reasoning. Investigated meta-learning, TTA, inverse RL, pretraining, model compression, and open-ended learning to allow agents to construct and transform symbolic objects, and ultimately, to generate their own training data.
Unraveling the Abstraction & Reasoning Challenge

Collection of techniques for tackling the ARC-AGI benchmark for general intelligence. Investigated Neurosymbolic AI, designing algorithms using discrete program search, model-based meta-learning, and LLMs. Features a custom LLM inferencing method, and performs self-supervised test-time adaptation.
Neural Subgraph Matching for Wildlife Reidentification

Designed algorithms for Wildlife Reidentification under incomplete data with graph registration, subgraph matching, 3D vision and domain knowledge.
engineering
VXGI: 3D Graphics Rendering Engine

Wrote a rendering engine from scratch in raw OpenGL for simulating global illumination in real-time. Implemented a custom dynamic voxelization algorithm with voxel cone tracing for indirect lighting, achieving real-time performance while maintaining visual fidelity comparable to offline rendering techniques.
Raft: Distributed Hash Table

Implemented a Distributed Key-Value Store using a from-scratch implementation of the Raft Consensus algorithm. Features a custom leader lease implementation ensuring fault tolerance and log consistency across distributed nodes.
teaching
CSE544: Computer Vision