GitHub

Sentiment Analysis

The task is supervised polarity classification over noisy, real Amazon product reviews (not a small benchmark corpus), so the pipeline has to survive scraping, deduplication, and honest train/test separation. I pulled multi-SKU review text with Selenium, applied TF-IDF features, and compared Multinomial Naive Bayes to a tuned Linear SVC; on the notebook’s held-out split (377 test rows), Linear SVC reached about 86% accuracy versus about 78% for NB, with NB remaining the lighter baseline on the same sparse representation.

Scikit-learn SVM NLP
GitHub

Purchase Intention Predictor

E-commerce teams need to know which sessions will convert before they bounce so spend and onsite nudges aren’t wasted. On the UCI Online Shoppers dataset I benchmarked k-NN, Random Forest, MLP, and AdaBoost (among other sklearn baselines in the notebook), then treated severe class imbalance with SMOTE-family resamplers. Random Forest held about 89% accuracy on the ~2.5k-row holdout before resampling; with SMOTETomek it reached about 90%, and the side-by-side runs show which augmentations actually moved the minority class without trashing majority precision.

Python Feature Engineering ML
GitHub

Time Series Forecasting

Monthly series such as atmospheric CO₂ mix a slow trend with a clear annual cycle; the hard part is proving which differencing and seasonal orders earn their parameters before you trust forward projections. I followed the repo’s seasonal track on Mauna Loa-style monthly CO₂: stationarity via seasonal and non-seasonal differencing, candidate SARIMA fits in R’s forecast package, and order choice grounded in ACF/PACF plus AIC/BIC. The retained seasonal ARIMA in the notebook reports an AIC near 369 on the fitted series. A parallel non-seasonal dataset in the same project repeats the diagnostics workflow; multi-step forecasts and residual checks show how much variance the chosen structure explains versus what is left in the noise.

ARIMA SARIMA R · forecast