View on GitHub


Repository containing portfolio of data analysis and machine learning projects

<p align=center> Data Science & Analytics Project Portfolio </p>

Repository containing portfolio of ongoing and completed data science projects completed by me for the Practicum Data Science bootcamp and academic, self learning, and hobby purposes.

Following are the highlights of the projects:

<p align=center> Relevant Experience: </p>

Tweet Geolocation Prediction - | Junior NLP Engineer

Contributed research to the company’s infrastructure, with the goal of training a deep learning model using BERT to predict user geolocation from individual tweets. Yachay is an open-source Machine Learning community that has collected decades worth of useful natural language data from various sources.


<img src = width=400 />

Tools: Python, seaborn, folium, scikit-learn, langid, geopandas, tensorflow, BERT

Song Valence Prediction - Cuetessa,inc | Junior Data Scientist

Tasked with developing a Python-based regression model to predict the valence of pop songs for playlist curation and other applications. An automatic method of classifying the valence of pop songs is useful for playlist curation and other applications.


Tools: pandas, numPy, matplotlib, seaborn, spotipy, transformers, sklearn, Librosa

### [Grant Automated Web Scraper - DataReadyDFW Data/Web Analytics Intern](

Researching grant prospects can be time-consuming and overwhelming. Develop an automation system for a nonprofit organization (DataReady DFW) to find available grant opportunities and fill out applications with little or no human intervention.


Tools: TensorFlow, Beautiful Soup, Selenium, pandas, ntlk, Google Analytics ** *

<p align=center> Technical Projects: </p>

| Hands-on Experience | Project | Technical skills | | ——————– | ————- |————— | | Unsupervised Learning | Client Churn Forecast | Machine learning algorithms, XGBoost, CatBoost, LightGBM | | Computer Vision (CV) | Computer Vision Age Detection w/ deep learning | Tensorflow | | Natural Language Processing (NLP) | IMDB Movie Sentiment Analysis using NLP | SGDClassifier, Naïve bayes, LightGBM, spaCy, TF-IDF, BERT |
| Time Series Analysis | Time Series Forecast | Time Series Analysis, CatBoost, LightGBM, XGBoost | | Machine Learning in Business | Gold Recovery Regression Model | Python, Scikit-learn, LinearRegression |
| Numerical Methods with ML |Vehicle Market Value Prediction w/ gradient boosting | Numerical Methods, CatBoost, LightGBM, XGBoost | | Linear Algebra with Machine Learning | Insurance Benefits Predictive Model | Scikit-learn, Linear Algebra, k-Nearest Neighbors | | Machine Learning - Classification | Telecom Plan Classification Model | Python, Scikit-learn, Pandas | | Supervised Learning - Prediction | Bank Customer Churn Prediction | Scikit-learn, XGBoost, GridSearchCV, AdaBoost |
| Machine Learning in Business | Oil Well Regression Model | Python, Scikit-learn, Bootstrapping, LinearRegression | | Webscraping and Data Storage | Data Collection, Webscraping and Storage | PostgreSQL, Python, BeautifulSoup, Seaborn, Matplotlib, ETL (extract, transform and load) | | Data Visualization and Storytelling with Data | Video Game Market Analysis | Python, Pandas, Squarify, Seaborn, Matplotlib |
| Data Preprocessing | Credit Score Analysis | Python, NLTK, WordNetLemmatizer, SnowballStemmer, Seaborn, Matplotlib | | Exploratory Data Analysis (EDA) | Vehicle Market Analysis | Pandas, Matplotlib | | Statistical Data Analysis (SDA) | Telecom Customer Data Analysis | Python, pandas, Numpy, SciPy, Seaborn, Matplotlib

Tools: sklearn, Pandas, Seaborn, Matplotlib, TensorFlow, PIL (Python Imaging Library)

Tools: Pandas, Seaborn and Matplotlib, SciPy

Tools: Beautiful Soup, Requests, Pandas, PostgreSQL, SciPy.stats,NumPy

Tools: Python, NLTK, WordNetLemmatizer, SnowballStemmer

<p align=center> Undergraduate Research Projects (University at Buffalo) </p>

Tools: SPSS Statistics

Psychological Assessment Scales and Measures: