Portfolio

About Me

Hello! My name is Ashwin, I am a prospective Master's in Computer Science student and will be joining Virginia Tech in Fall 2021.

My primary interests, stemming from my immense love for literature, have been in studying and applying intelligent algorithms to NLP problems in order to understand language better and draw meaningful linguistic insights from text. My other interests include applied computer vision and machine learning and I constanty try to learn the intricacies of the underlying intelligent Algorithms and their potential utility in everyday life.

I have completed my Bachelor's degree in Computer Engineering from Pune Institute of Computer Technology with a GPA of 8.67/10.0

I love to write in general. As an intersection of my passion to write and to learn the depths of NLP, ML and CV, I write articles and blogposts on any new thing that I learn, as Richard Feynman says - 'What I cannot create, I do not understand.', I try to reinforce my knowledge by sharing it on my blogging platform. You can checkout my blog here >> Blog.

Get in touch with me on linkedin, or drop an email to me here.

Experience

Outreach Corporation

Machine Learning Intern

Responsible for developing a template engine project to help Data Scientists and Machine Learning Engineers at Outreach to use templates to deploy any NLP model online inorder for them to avoide writing redundant boiler plate code.
Delivered an Online Inference Solution with a gRPC based Microservice in Golang serving NLP based models viz. BERT, ROBERTA and DISTILBERT for topic detection, question detection, action analysis and sentiment analysis.
Wrote Python pipelines for ingesting data, preprocessing, tokenization, prediction and postprocessing of text data.
Wrote Bash scripts to instantiate NLP model binaries in the ONNX format on the NVIDIA Triton Inference Server and packaged the inference solution as a docker image.
Wrote a Go based microservice to be used to communicate with the inference server via gRPC requests and responses. Dockerized the microservice solution which would be later used to communicate with the inference service.
Wrote tests for the application service as well as the inference service via CircleCI configuration files.
Deployed the application online via Kubernetes manifests on Outreach Staging Environment.
Reduced Data Scientist efficiency time from 3-4 days to 2 Hours.

Skills: Backend Engineering · Microservices · Online Inference · Docker · Kubernetes · CircleCI · Continuous Integration and Continuous Delivery (CI/CD) · Software Development · MLOps · Go (Programming Language) · Python (Programming Language) · Machine Learning · Deep Learning · Natural Language Processing (NLP)

Mindbowser Infosolutions Pvt. Ltd.

Deep Learning Intern

Responsible for implementing a Facial Expression Recognition (FER) application.
Performed Exploratory Data Analysis on the underlying data - (FER 2013 dataset) and tested classical Machine Learning models viz. Logistic Regression, Support Vector Machine as a baseline classifier.
Implemented a Proof of Concept Convolutional Neural Network VGG-19 transfer learning model as the final classifier in Pytorch achieving an accuracy of 73% on the validation set.
Integrated the application with a MongoDB database to store meeting metadata (timestamps, faces detected, expressions classified etc) and respective images in a GridFS format.
Wrote a GUI script to translate the POC into a desktop application using Python Tkinter.
Alternatively wrote a Flask application to build a web application on the underlying model and dockerized the application.
Packaged the code in a python based executable which could be instantiated with a button click on the desktop as an application.
Owned the application from design, development to production.
The project is in beta testing at Volkswagen and Bajaj India.

Skills: Flask · Graphical User Interface (GUI) · Tkinter · OpenCV · MongoDB · Docker · Python (Programming Language) · Deep Learning · PyTorch · Keras · Scikit-Learn

Research

Detecting Insincere Questions from Text: A Transfer Learning Approach

The internet today has become an unrivalled source of information where people converse on content based websites such as Quora, Reddit, StackOverflow and Twitter asking doubts and sharing knowledge with the world. A major arising problem with such websites is the proliferation of toxic comments or instances of insincerity wherein the users instead of maintaining a sincere motive indulge in spreading toxic and divisive content. The....

Paper

Covid 19 Chest Xray Detection : A Transfer Learning Approach

The coronavirus outbreak scaused a devastating effect on people all around the world and has infected millions. The exponential escalation of the spread of the disease makes it emergent for appropriate screening methods to detect the disease and take steps in mitigating it. In this study models are developed to provide accurate diagnostics for multiclass classification (Covid vs No Findings vs Pneumonia).

Paper

Projects

Automatic Text Summarization with Pytorch and Transformers

Fine tuned a T5ForConditionalGeneration (t5-base) Model on a custom News Summary dataset to perform Abstractive Text Summarization. Created a custom dataloader from data comprising of a news core text and its human generated summary. Kept a constant mini-batch size of 2 and trained the model for 3 epochs. The gradients were calculated using the AdamW optimizer. The training was done using Pytorch Lightning library.

Roberta for Covid-19 Tweet Sentiment Classification

Widening the perspective on the state of the global pandemic by harnessing the power of Twitter data and the powerful Roberta model by Facebook. Custom cleaned tweets were fed to the Roberta tokenizer to tokenize according to the model's configuration and then a custom RobertaModel was concatenated with two linear layers to perform sentiment analysis on the tweet data.

Generating Text with BART

Teaching the denoising autoencoder - BART to generate given text with Pytorch Lightning.

LGBM Classifier + Optuna for Tabular data classification

Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API. In this project I used the new integration API of optuna to automatically tune LGBM parameters for tabular data classification.

EfficientNetb4 for Leaf disease classification

At least 80% of household farms in Sub-Saharan Africa are affected by viral diseases and are major sources of poor yields. As part of this Kaggle Competition I fine tuned an EfficientNetB4 model using fastai as well as keras to classify images comprising of four classes of diseased leaf variants and a normal class.

Ensemble Regression for Tabular data.

An advanced ensemble regression of Xgboost regressor + lightgbm regressor + neural network. Initially the predictions were made using xgboost and lgbm seperately, then these predictions were averaged and fed to a neural network to make the final predictions. This approach put me in top 10% in the competition - Tabular Playground series - Jan 2021.

Generating Digits with Variational AutoEncoder

Generating Digits using Variational AutoEncoders in Keras. Inspired by Francois Chollet's tutorial in his book - Deep Learning with Python.

Neural Style Transfer

In this kernel I implemented the style transfer method that is outlined in the paper, Image Style Transfer Using Convolutional Neural Networks, by Gatys in PyTorch. In this paper, style transfer uses the features found in the 19-layer VGG Network, which is comprised of a series of convolutional and pooling layers, and a few fully-connected layers. In the image below, the convolutional layers are named by stack and their order in the stack.

Text Regression using transformers and Pytorch

In this recently launched competition, we are supposed to build algorithms to rate the complexity of reading passages for grade 3-12 classroom use. I created a simple Roberta-large baseline using Root mean squared loss as its loss function to get a continuous output in native Pytorch.

Text Classification with ULMFIT fastai.

ULMFit is an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine-tuning a language model. It matches the performance of training from scratch on 100x more data.

Certifications

Deep Learning Specialization

Completed the five course specialization which is a part of the famous Stanford course CS230 - Deep Learning taught by Prof. Andrew Ng. Learnt how to train neural networks from scratch along with the calculus and linear algebra needed to do so. Following this, learnt how to tune hyperparamters and various methods to imporve neural networks. This was followed by an in depth learning of two most important sub-classes of modern Neural Nets i.e Convolutional Neural Networks in course 4 and Recurrent Neural Networks in course 5

Natural language processing Specialization

Completed an Intensive 4 course specialization offered by deepLearning.ai that covered basic to advanced Natural Language Processing. The four courses in this specialization include : 1. Classification and Vector Spaces in NLP 2. Probabilistic Models in NLP 3. Sequence Models in NLP 4. Attention Models in NLP

Python for everybody Specialization

This Specialization introduces programming concepts including data structures, networked application program interfaces, and databases, using the Python programming language. In the Capstone Project, I used the technologies learned throughout the Specialization to design and create my own application for data retrieval, processing, and visualization.

TensorFlow Developer Specialization

In this 4 course specialization by deeplearning.ai I learnt how to build and train neural networks using TensorFlow in the first course. In the second course, I learnt how to improve my network’s performance using convolutions as I trained it to identify real-world images. Next up I learnt how to algorithms to understand, analyze, and respond to human speech with natural language processing systems. Process text, represent sentences as vectors, and train a model to create original poetry!

Ashwin Rachha

CS GRAD | NLP | VISION | ML