Sahil Singh

Logo

Email: sahil.singh@yale.edu

View the Project on GitHub sahil-681/portfolio

Data Scientist

LinkedIn, GitHub, Email

Technical Skills

Programming: Python, R, SQL, Git/GitHub

Machine Learning: Regression Analysis, Boosted Trees, PCA, Clustering

Libraries/Tools: Pandas, NumPy, Sklearn, Spark, S3, Hive, tidyverse, Matplotlib

Data Visualization: Tableau, Matplotlib, Seaborn, ggplot2

MS Office: Excel, PowerPoint, Word

Education

M.S., Statistics and Data Science | Yale University (December 2023)

B.S., Mathematics | Hans Raj College, University of Delhi (May 2022)

Work Experience

Data Scientist @ Yale Sports Analytics Lab (May 2023 - Present)

Data Scientist @ TransOrg Analytics (August 2021 - September 2021)

Business Analyst @ General Electric (July 2021 - September 2021)

Research Analyst @ Intellify (July 2021 - September 2021)

Projects

Team USA Gymnast Selection Optimization for Paris 2024 Olympics

Poster, App, GitHub Repository

Designed an interactive R Shiny app for the USOPC to select Team USA gymnasts optimized for maximizing medals at the 2024 Paris Olympics by developing a model to simulate 10,000+ team combinations and compare the expected medal count

Gymnastics

Machine Learning for Breast Cancer Detection: Unveiling Diagnostic Potentials

GitHub Repository

Analyzed tabular data of cancer cell features and trained supervised ML algorithms like XGBoost, Logistic Regression, Naive Bayes, K-Nearest Neighbors, Random Forests, and more to classify malignant or benign tumors with 98% accuracy

Breast Cancer

To Swing or Not to Swing: Baseball Swing Probability Modeling

GitHub Repository

Developed predictive models to estimate batting swing probability for pitches thrown during a baseball game, with the best-performing model reaching an accuracy of 86%. By analyzing a dataset of around 2,000,000 pitches, the model aims to provide accurate swing probability estimates for pitches, thereby aiding strategic decision-making in gameplay and player analysis.

Baseball_Swing Plot_Baseball

GitHub Repository

Geospatially elicited parking difficulty insights in New York City by implementing Spatial Autocorrelation and Clustering. Utilized R and Google Maps API to create visualizations. Employed Universal Kriging to predict and map time to find parking

NYC Parking