Chirayu Tripathi

As a Machine Learning Researcher at UF Health and a master's graduate in computer science, I have spearheaded innovative projects leveraging NLP, ML, and deep learning. I engineered predictive methodologies, modeled graph neural networks, addressed bias and lack of explainability in medical models, and deployed Deep Neural Networks and AI-driven business intelligence features. I have contributed to academia and open-source, co-authoring papers, implementing libraries, and earning academic accolades. With 3 years of experience, I am passionate about solving complex problems, delivering value-driven solutions, and making a positive impact.

Skills

Programming Languages

Python, C/C++, HTML/CSS, R, Erlang, Java.

Databases

MySQL, MongoDB, PostgreSQL.

Frameworks & Applications

Hadoop, Tableau, Numpy, Pandas, Matplotlib, Scikit-learn, PyTorch, Tensorflow, Excel, Git, AWS, Flask, REST API, Apache Spark, SAS, ETL, Langchain, Docker, Kubernetes.

Technical Skills

Deep Learning, RPA, Statistics, Linux, DBMS, Algorithms, Data Visualization, MLOps, NLP.

nl2query

Engineered a framework to translate natural language text into queries for Pandas, MongoDB, Kusto, and Neo4j databases, enabling seamless querying across disparate data sources using plain English. Created and published a package using PyPI with over 6K downloads, eliminating the need of specialized database languages and is available at nl2query.

Tech Used: Phi-2, CodeT5+, MongoDB, Pandas, Neo4j, Kusto, NLP, Text Generation, Python, Pytorch.

Paper Implementations

My implementations of Machine Learning and Deep Learning papers from scratch.

Tech Used: PyTorch, Python, Numpy.

MongoDB Querifier

Improve LLMs MongoDB query generation ability with the help of advanced retrieval augmented generation. This project demonstrates a sophisticated approach for improving the generated MongoDB queries from natural language questions using the Large Language Models. It leverages state-of-the-art technologies in natural language processing, vector databases, and advanced retrieval augmented generation to create an efficient and accurate query generation pipeline. It also showcases the use of Weaviate, an open-source vector database, for efficient retrieval of similar questions and their corresponding MongoDB queries.

Tech Used: RAG, Gemini Pro, Sentence transformers, Pytorch, Weaviate.

VerAIzon

RAG (Retrieval Augmented Generation) chatbot accompanied with Mistral-7B, specifically tailored for Verizon customer services. The chatbot's workflow begins with dataset creation through iterative extraction from links and user guides, resulting in approximately 1000 pages of data. To address the processing limitations of Large Language Models (LLMs), we implement a recursive character splitting technique from Langchain, breaking the dataset into manageable text chunks. Embeddings are then created using the all-MiniLM-L12-v2 model. Storage of these embeddings is efficiently managed through the FAISS (Facebook AI Similarity Search) vector store, known for its effectiveness in handling extensive data. During retrieval, user queries are matched with the most relevant chunks using FAISS search, ensuring accurate and tailored responses.

Tech Used: RAG, Mistral-7B, Transformers, Pytorch, Langchain.

Twitter API Clone using Distributed Computing

Designed a Twitter server and client using the Erlang and Akka Model, capable of serving up to 1-2 million requests at a specific time-stamp, demonstrating the efficiency of using Akka model in distributed systems for high traffic systems.

Tech Used: Erlang, Akka Model, Distributed Computing, APIs.

Kubernetes Controller

Designed and developed a performance-optimized Kubernetes cluster on CloudLab utilizing model-based feedback control, including implementing a local controller on each node scaling pod counts based on utilization metrics to efficiently maximize resource usage, as well as creating a global cluster controller managing overall utilization through threshold-triggered node scaling and job scheduling using a middleware layer to separate cluster management from control logic, achieving an 80% POD utilization operating point through local and global controllers.

Tech Used: Kubernetes, Flask, Docker, PID Controllers.

Latent Factor based Recommender System using Spark ALS

Implemented the popularity based recommender using latent factor model with ALS to decrease root mean square error by 15% . Prevented cold start problem for a new user by using additional metadata like tags.

Tech Used: Apache Spark, Recommender Systems.

Achievements and Open Source Contributions.

Here are some of my achievements and open source contributions.

Global Rank 10 in "Love in the time of screens" Hackathon.

Ranked 10th globally out of 2500 participants in the ”Love in the time of screens” Machine Learning hackathon organized by HackerEarth, which involved matching dating candidates based on their profile.

Achievement Award Scholarship

Received $4500 worth of scholarship from the University of Florida for my past academic record.

SimpleT5 (Open Source)

Contributed to open source by implementing CodeT5 transformer support for SimpleT5 library.

Github Repository

nl2query (Open Source)

Open sourced my collected data as well as the fine-tuned Phi-2, CodeT5+ models for my nl2query package, so that open source community can build on top of it. At present it has 54 stars and about 6K downloads.

Github Repository

Research Work

Co-authored the paper “Transparent AI: Developing an Explainable Interface for Predicting Postoperative Complications”.

arXiv

My Resume

My Blogs

Check out my latest blog post on Medium:

Iterators and Generators in Python

Get in touch