Projects | Kasyap Rayalacheruvu

Completed

StockWise

Multi-agent stock research: you ask about a ticker, and the system pulls market data (yfinance), SEC 10-K filings (EDGAR), news (Alpha Vantage), and optional Reddit sentiment. A LangGraph layer plus Claude then turns that into structured company analysis through a FastAPI + TypeScript (Vite) stack.

The quant layer computes every metric in Python first; agents only interpret pre-computed figures. The LLM never does arithmetic on raw prices or ratios, which keeps financial reasoning auditable and avoids a common hallucination path. LlamaIndex indexes 10-K text for filing Q&A in Qdrant; Redis caches quant snapshots for four hours; Docker and Railway host the full service, with LangSmith for traces.

Flow

Data APIs → Quant (Python) → LangGraph → Research output

LangGraph LlamaIndex FastAPI Qdrant Redis Claude API TypeScript Docker

In Development

Research Pod

Researchers can’t keep pace with arXiv by keyword search alone, because related work lives in citation graphs, shared tasks, and methodological lineages that flat retrieval misses. Research Pod is a multi-agent Graph RAG stack: LangGraph coordinates specialized steps, Neo4j holds structured paper relationships, Qdrant handles dense chunk retrieval, and NVIDIA NIM backs generation so answers stay grounded in fetched context rather than model confabulation. Active build focus is tightening evals and observability so every hop from query → subgraph → answer is traceable.

LangGraph Neo4j Qdrant NVIDIA NIM

GitHub

Sentiment Analysis

The task is supervised polarity classification over noisy, real Amazon product reviews (not a small benchmark corpus), so the pipeline has to survive scraping, deduplication, and honest train/test separation. I pulled multi-SKU review text with Selenium, applied TF-IDF features, and compared Multinomial Naive Bayes to a tuned Linear SVC; on the notebook’s held-out split (377 test rows), Linear SVC reached about 86% accuracy versus about 78% for NB, with NB remaining the lighter baseline on the same sparse representation.

Scikit-learn SVM NLP

GitHub

Purchase Intention Predictor

E-commerce teams need to know which sessions will convert before they bounce so spend and onsite nudges aren’t wasted. On the UCI Online Shoppers dataset I benchmarked k-NN, Random Forest, MLP, and AdaBoost (among other sklearn baselines in the notebook), then treated severe class imbalance with SMOTE-family resamplers. Random Forest held about 89% accuracy on the ~2.5k-row holdout before resampling; with SMOTETomek it reached about 90%, and the side-by-side runs show which augmentations actually moved the minority class without trashing majority precision.

Python Feature Engineering ML

GitHub

Time Series Forecasting

Monthly series such as atmospheric CO₂ mix a slow trend with a clear annual cycle; the hard part is proving which differencing and seasonal orders earn their parameters before you trust forward projections. I followed the repo’s seasonal track on Mauna Loa-style monthly CO₂: stationarity via seasonal and non-seasonal differencing, candidate SARIMA fits in R’s forecast package, and order choice grounded in ACF/PACF plus AIC/BIC. The retained seasonal ARIMA in the notebook reports an AIC near 369 on the fitted series. A parallel non-seasonal dataset in the same project repeats the diagnostics workflow; multi-step forecasts and residual checks show how much variance the chosen structure explains versus what is left in the noise.

ARIMA SARIMA R · forecast