My Portfolio

nl2query

Engineered a framework to translate natural language text into queries for Pandas, MongoDB, Kusto, and Neo4j databases, enabling seamless querying across disparate data sources using plain English. Created and published a package using PyPI with over 6K downloads, eliminating the need of specialized database languages and is available at nl2query.

Tech Used: Phi-2, CodeT5+, MongoDB, Pandas, Neo4j, Kusto, NLP, Text Generation, Python, Pytorch.

See Repository

Paper Implementations

My implementations of Machine Learning and Deep Learning papers from scratch.

Tech Used: PyTorch, Python, Numpy.

See Repository

MongoDB Querifier

Improve LLMs MongoDB query generation ability with the help of advanced retrieval augmented generation. This project demonstrates a sophisticated approach for improving the generated MongoDB queries from natural language questions using the Large Language Models. It leverages state-of-the-art technologies in natural language processing, vector databases, and advanced retrieval augmented generation to create an efficient and accurate query generation pipeline. It also showcases the use of Weaviate, an open-source vector database, for efficient retrieval of similar questions and their corresponding MongoDB queries.

Tech Used: RAG, Gemini Pro, Sentence transformers, Pytorch, Weaviate.

See Repository

VerAIzon

RAG (Retrieval Augmented Generation) chatbot accompanied with Mistral-7B, specifically tailored for Verizon customer services. The chatbot's workflow begins with dataset creation through iterative extraction from links and user guides, resulting in approximately 1000 pages of data. To address the processing limitations of Large Language Models (LLMs), we implement a recursive character splitting technique from Langchain, breaking the dataset into manageable text chunks. Embeddings are then created using the all-MiniLM-L12-v2 model. Storage of these embeddings is efficiently managed through the FAISS (Facebook AI Similarity Search) vector store, known for its effectiveness in handling extensive data. During retrieval, user queries are matched with the most relevant chunks using FAISS search, ensuring accurate and tailored responses.

Tech Used: RAG, Mistral-7B, Transformers, Pytorch, Langchain.

See Repository

Twitter API Clone using Distributed Computing

Designed a Twitter server and client using the Erlang and Akka Model, capable of serving up to 1-2 million requests at a specific time-stamp, demonstrating the efficiency of using Akka model in distributed systems for high traffic systems.

Tech Used: Erlang, Akka Model, Distributed Computing, APIs.

See Repository

Kubernetes Controller

Designed and developed a performance-optimized Kubernetes cluster on CloudLab utilizing model-based feedback control, including implementing a local controller on each node scaling pod counts based on utilization metrics to efficiently maximize resource usage, as well as creating a global cluster controller managing overall utilization through threshold-triggered node scaling and job scheduling using a middleware layer to separate cluster management from control logic, achieving an 80% POD utilization operating point through local and global controllers.

Tech Used: Kubernetes, Flask, Docker, PID Controllers.

See Repository

Latent Factor based Recommender System using Spark ALS

Implemented the popularity based recommender using latent factor model with ALS to decrease root mean square error by 15% . Prevented cold start problem for a new user by using additional metadata like tags.

Tech Used: Apache Spark, Recommender Systems.

Chirayu Tripathi

Skills

Programming Languages

Databases

Frameworks & Applications

Technical Skills

nl2query

Paper Implementations

MongoDB Querifier

VerAIzon

Twitter API Clone using Distributed Computing

Kubernetes Controller

Latent Factor based Recommender System using Spark ALS

Achievements and Open Source Contributions.

Achievement Award Scholarship

nl2query (Open Source)

Research Work

Research Work

SimpleT5 (Open Source)

Global Rank 10 in "Love in the time of screens" Hackathon.

My Resume

My Blogs

Get in touch

Address

Email

Phone

Social