Zero-shot Learning

Internship at SRI International, Princeton, NJ (May 2017-August 2017)

Comparison of Supervised and Unsupervised Neural Networks

Course Project for CMSC828L (Deep Learning), UMD College Park (August 2016-December 2016)
Compared the networks learnt using Generative Adversarial Networks [1] and compared against supervised networks for classification. [report]

Real-time (30 fps) Face Detection Using YOLO (You Only Look Once)

Trained a real-time (30 fps) face detector using the You Only Look Once method [2] to find faces in about 15 million images.

Detecting handles of kitchen appliances using Faster R-CNN

Course Project for CMSC828K (Preception for Robotics), UMD College Park (January 2016-April 2016)
Obtained annotations for data using Amazon Mechanical Turks and trained Faster R-CNN [3] models for detection of handles of common kitchen appliances.

Training baseline deep networks for simple problems (MNIST, CIFAR-10)

Course Projects for ENEE633 (Statistical Pattern Classification) and ENEE731 (Image Understanding), UMD College Park
Trained baseline deep models for MNIST digit classification and CIFAR-10 classification.

Estimating Number of People in Images of Very High Density Crowds

Master's thesis, IIT Kanpur (May 2014-June 2015)
The task was to find the number of people in images of crowds taken at concerts, rallies, demonstrations, sports events, religious processions and ceremonies, fairs, etc. Exploited multiple sources - head counts, Fourier analysis, GLCM features, and interest point based counting. Used deformable part models (DPM) for detecting heads and estimating their count and confidence. Implemented Fourier analysis based counting by finding local maxima in reconstructed gradient images of crowds. Trained a support vector regression model on interest points (SIFT features) to obtain an estimate of the count. Obtained the final count by fusing counts from different sources using support vector regression. Paper based on the work.

Femtocell Networks

Term Paper for EE670 (Wireless Communications), IIT Kanpur (January 2014-April 2014)

Robust Methods for Estimation problems

Term Paper for EE602 (Statistical Signal Processing), IIT Kanpur (August 2013-November 2013)

Real-time Continuous Speech Recognition System for English

Course Project for EE627 (Speech Recognition), IIT Kanpur (August 2013-November 2013)

Multiple Object Tracking Using Kalman Filter

Course Project for EE698J (Kalman Filter and Applications), IIT Kanpur (August 2013-November 2013)
Established the Kalman model with centroid and area of objects and used these to solve correspondence problem. Integrated initial covariance determination algorithm with the tracking process to obtain better results.

Detection and Pose Estimation of Multiple texture-less 3-D Objects in Images and Videos

Summer Internship at Czech Technical University, Prague (May 2013-July 2013)
Detection and pose estimation of multiple texture-less 3-D objects in images and videos. Used a two stage edge-based procedure for the task: Hypothesis generation, and verification. Assigned entropies calculated using histograms of oriented gradients (HOG) as weights to pixels for training. The hypothesis was generated using the edge-based descriptor thus obtained and fast search by index tables. Employed improved oriented chamfer matching, improved by selecting discriminative edges, for verification. Achieved an improvement of 3-10% over existing state of the art.

Cricbot - Image Processing based Ball Collecting Robot

Independent Project for Techfest 2012, IIT Bombay (December 2011-January 2012)
Built a robot that identifies a ball using blob tracking algorithms, collects the ball from the ground and brings it to a pit at the corner of the field. Winner of the Cricbot event in Techfest 2012.

Autonomous Library Assistant

Independent Project under Robotics Club, IIT Kanpur (May 2011-July 2011)
Built an image processing robot that searches for a given book in a shelf using a camera mounted on it and brings it back to the user.


[1] Radford, Alec, Luke Metz, and Soumith Chintala. “Unsupervised representation learning with deep convolutional generative adversarial networks.” arXiv preprint arXiv:1511.06434 (2015).
[2] Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. “You only look once: Unified, real-time object detection.” arXiv preprint arXiv:1506.02640 (2015).
[3] Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. “Faster R-CNN: Towards real-time object detection with region proposal networks.” In Advances in neural information processing systems, pp. 91-99. 2015.