Projects

My Dataset Collection Repository

For a comprehensive list of datasets I have explored or used in my projects, visit my Dataset Collection Repository.
This repository consolidates a variety of popular datasets for machine learning projects, learning, and experimentation.


Credit Risk Analysis and Loan Default Prediction using EDA and Machine Learning

This project focuses on analyzing credit loan data to uncover key insights into borrower behavior and loan default patterns using Exploratory Data Analysis (EDA). By leveraging libraries such as Pandas, NumPy, Matplotlib, and Seaborn, I performed in-depth data cleaning, preprocessing, and visualization to identify critical factors contributing to loan defaults. The analysis lays the groundwork for predictive modeling, enabling better credit risk assessment and data-driven decision-making.
Credit-Loan-Default-EDA Repo | Jupyter Notebook | Case Study

IMDB Movie Case Study - Insights and Trends Using SQL

In this project, I analyzed the RSVP Movies dataset using MySQL to derive key insights into movie trends, genres, ratings, and industry success. Through advanced SQL queries, I explored director and actor performance, production house rankings, and genre-based analysis. This project highlights my expertise in SQL query design, data exploration, and delivering actionable insights for real-world datasets.
RSVP-Movies-SQL-Case-Study Repo | SQL Queries | Case Study - Executive Summary

Bike Sharing Demand Prediction Using Linear Regression

In this project, I analyzed bike-sharing system data to predict user demand using Linear Regression. Key steps included data exploration, feature engineering (using manual and automatic like RFE), and model development to identify the relationship between environmental conditions and bike usage and performance evaluation. The model provides actionable insights for optimizing bike availability and resource allocation. I used libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn
Bike Sharing Linear Regression Repo | Jupyter Notebook | Subjective Assignment

Home Loan Default Prediction Using Logistic Regression

This project focuses on predicting home loan default risks using logistic regression. It involves solving a critical business problem by identifying potential high-risk, medium-risk, and low-risk loan applicants. The project explores data preprocessing, feature engineering, and implementing multi-class classification using One-vs-Rest and One-vs-One strategies. Additionally, it delves into the mathematical concepts of logistic regression, including the sigmoid function and log-loss optimization. Key learnings include understanding logistic regression coefficients for feature importance, applying libraries like Scikit-learn, Pandas, and Matplotlib, and evaluating models using metrics such as accuracy, precision, recall, F1-Score.
Home Loan Default Prediction Repo | Jupyter Notebook

Car Evaluation Prediction Using Decision Tree 

This project uses Decision Tree models to evaluate car acceptability based on various features like safety, maintenance cost, and capacity. The repository provides insights into feature importance, hyperparameter tuning, and decision-making processes using Scikit-learn, Pandas, and visualization libraries and categoy_encoders.
Car Evaluation Prediction RepoJupyter Notebook

Heart Disease Prediction Using Decision Tree  and Hyperparameter Tuning

This repository focuses on predicting housing prices using Decision Tree models. It highlights insights into feature importance, hyperparameter tuning using GridSearchCV, and learnings derived from practical applications of data science tools and libraries like Scikit-learn, Pandas, and Matplotlib.  
Heart Disease Prediction Decision Tree Repo | Heart Disease Decision Tree Notebook
Heart Disease Hyperparameter TuningHyperparameter Tuning Notebook

Housing Price Prediction Using Ensemble - Stacking Regressor and Random Forest

This project addresses the business problem of predicting housing prices with high accuracy, a critical requirement for stakeholders in the real estate sector. It employs a combination of regression models—linear regression, KNN regressor, and decision tree regressor—and enhances their predictive performance using the Stacking Regressor and Random Forest from sklearn.ensemble. Key libraries used include pandas, numpy, matplotlib, seaborn, Scikit-learn, and statsmodels, showcasing expertise in data preprocessing, visualization, machine learning, and statistical modeling.

The dataset, accessible here, forms the basis for this study. The models are evaluated using the R-squared metric, with statistical analysis performed through the OLS module from statsmodels. This project demonstrates the advantages of ensemble techniques and statistical rigor in deriving actionable insights for housing price prediction.
Housing Price Using Decision Tree RepoDecision Tree Notebook
Housing-price-prediction-stacking-regressorStacking Regressor Notebook
Housing-price-prediction-random-forestRandom Forest Notebook

Global Clusters: A Case Study on Socio-Economic and Health Indicators

This project focuses on grouping countries based on socio-economic and health-related indicators using clustering techniques. It addresses global development patterns by analyzing metrics like child mortality, exports, health expenditure, income, and GDP. The project covers data exploration, feature scaling, and implementing clustering algorithms such as K-Means and Hierarchical Clustering. It also provides silhouette score, elbow method and Hopkins Test to get the optimal number of clusters.

Key learnings include understanding feature normalization for clustering, interpreting cluster centroids, and applying tools like Scikit-learn, Pandas, and Matplotlib to build and analyze clustering models. The project highlights the significance of clustering for policy-making, identifying development disparities, and exploring socio-economic similarities among countries.
country-clustering-on-socio-economic-factorClustering Notebook | Country Dataset

AIoT Project: Motion Sensing and Classification

This AIoT project utilizes an ESP32 microcontroller, an MPU6050 accelerometer/gyroscope sensor, an LCD display, and a potentiometer to build a real-time motion classification system. By leveraging a trained Random Forest machine learning model, the system identifies six different physical activities (e.g., bench press, overhead press, deadlift, squat, row, rest) and displays the results on the LCD. The model was trained to study the accelerometer and gyroscope data. The system is designed for fitness tracking, sports performance monitoring, and real-time activity recognition. It integrates hardware and software to demonstrate a complete AIoT pipeline, from data collection and model training to deployment and visualization.

Key learnings include classify six physical activities in real time using a trained Random Forest model. It demonstrated the complete AIoT pipeline using libraries such as NumPy and Pandas for data manipulation, scikit-learn for model training, and Matplotlib and Seaborn for visualization. This project highlighted how AI and IoT can seamlessly integrate using simulated tools for real-time insights. It used tools like PlatformIO, Workwi, Kicad.  It was developed under the guidance of the instructors of IIT Kharagpur.
Motion Sensing AI RepoMotion Sensing AI Notebook | Motion Sensor Dataset

Reliance Stock Prediction: Forecasting Using LSTM

This project focuses on predicting the stock price of Reliance Industries using Long Short-Term Memory (LSTM) networks, a type of recurrent neural network (RNN) suited for sequence prediction tasks. It involves time-series analysis to model historical stock prices and forecast future trends. The project covers data preprocessing, feature scaling, and implementing LSTM networks for sequence learning and prediction.

Key aspects include constructing a robust LSTM model architecture, optimizing hyperparameters, and evaluating the model using metrics such as Root Mean Squared Error (RMSE). The project also visualizes the predicted versus actual stock prices to assess the model's performance.

Key learnings include understanding time-series forecasting, the application of LSTM for sequential data, and leveraging tools like TensorFlow, Keras, Pandas, and Matplotlib. This project highlights the importance of deep learning in financial forecasting and decision-making, offering a framework for predicting stock price movements in dynamic markets.

LSTM-long-short-term-memory-of-reliance-stock | Stock Prediction Notebook | Reliance Stock Dataset | Reliance Stock Dataset Cleaned

CIFAR-10 Dataset: Image Classification Benchmark for Object Recognition using CNN

The CIFAR-10 dataset is a widely used benchmark for image classification tasks in machine learning and deep learning. It consists of 60,000 color images (32x32 pixels) across 10 distinct classes, such as airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. With 50,000 training images and 10,000 test images, this dataset is ideal for developing and evaluating computer vision models.

This dataset is frequently used for tasks like: Image classification and object recognition, Benchmarking convolutional neural networks (CNNs), Data augmentation and preprocessing experimentation

Classes: Airplane, Automobile, Bird, Cat, Deer, Dog, Frog, Horse, Ship, Truck.

Project Website | Capstone-Project-CIFAR-10-using-CNN Repo | Capstone Project Notebook | Project Report

Hacker Rank ProfileKaggle ProfileGitHub