Projects

Ordered in reverse chronological order in each section.

Research

  1. Extracting Event Schema from Scientific text: Project involves development of models to extract common sense knowledge representations about materials science from research paper text in an unsupervised manner. An example of knowledge that one might induce from text is that a "heating" event always involves a material to heat, a temperature to heat to, a apparatus to heat with and optionally a time to heat for. Currently exploring matrix factorization and neural network models to learn the said representations.
  2. Extracting Event Chains from Material Synthesis Procedure Text: Project involves extraction of structured representation of synthesis procedures from material science research papers. Project involved developing parts of an NLP pipeline for the task and evaluating the results and currently involves maintaining the project and improving performance of initial methods. This project is part of a larger collaboration between IESL and the Olivetti Group at MIT. (Paper)
  3. Research Manuscript PDF Parsing and Labeling: Trained a feed forward neural network for labelling tokens from a research manuscript as belonging to one of many metadata classes. This project is part of a larger effort at IESL to extract structured metadata from research papers. This work was part of a data science independent study and was performed under the supervision of Dr. Shankar Vembu at the Chan Zuckerberg Initiative and Andrew McCallum. (Code)
  4. Non-text Object Segmentation from Grayscale Document Images: Using connected operators to segment the non-text objects from grayscale document images. (Project page)

Curricular

Projects during my graduate level coursework at UMass Amherst:

  1. N-ary relation extraction with Compositional Universal Schema: Experimented with LSTMs and matrix factorization models directed at extract n-ary relations such as Born(Barack Obama, USA, 1961, Ann Dunham, Barack Obama Sr.) from Wikipedia and materials science research paper text in an unsupervised manner. Work forms part of an ongoing research project and was completed for the advanced Machine Learning class. (Code) (Report)
  2. Cross-sentence relation extraction with LSTMs: Work involved implementation and analysis of a doc2vec based baseline and LSTM models for the relation extraction task at the paragraph level. Contributions also involved modifying the Google Relation Extraction Corpus for a paragraph-level relation extraction task. Project formed part of the Neural Networks class. (Code) (Report)
  3. Transfer Learning for human activity recognition from sensor data: Explored the application of conventional supervised approaches to human activity recognition, applied a Isomap based domain adaptation method for transfer learning between individuals, modified the approach and showed the modification to perform better than the original domain adaptation method. Work formed part of my Machine Learning class. (Code) (Report)
  4. Suggesting Appropriate Wikipedia Infoboxes: This project involved the suggestion of appropriate infobox templates to Wikipedia articles. Given a Wikipedia article, we tried to answer the questions: Does this page require an infobox? If yes, what is the appropriate infobox to use for this page? We cast these questions as classification tasks and made use of linear SVMs and Random Forest classifiers in our solutions. This work formed part of my NLP class project. (Poster) (Report)

Projects which formed a part of my undergraduate requirements:

  1. Color Document Image Binarization: Converted a given color document image to a binary image while retaining all desired image elements. (Project page)
  2. Dual Channel Electroencephalography Headset: Implemented a dual channel electroencephalography headset. (Project page)
  3. AM Transmitter: This work involved the design of a modulation circuit and the design and manufacturing of the circuit board to implement the designed circuit. Project was a part of the Second Year mini project.

Extra Curricular

These projects were those which weren't part of a formal curriculum.

  1. Extracting University of Pune Marks Data: Command line tool written in Python that aids in extracting marks from the yearly college-wise engineering results distributed by the University of Pune, as pdf files, and writing them to a csv file. (Project page)
  2. Line follower robot: Implemented a line follower robot. The robot was also fitted with a DTMF decoder so it could also be controlled by a mobile phone. The work involved programming a Atmega8 MCU in C. The work was done as part of a robotics workshop. The built robot also participated in a competition and placed second among about 20 other teams.