← Back to Portfolio
Project Overview
A comprehensive database-driven machine learning project that predicts whether a quarterback (QB) will throw at least one touchdown (TD) in a given NFL game using player statistics, game context, and historical performance data.
Project Objective: Predict whether an NFL quarterback will throw a touchdown pass in a game using past performance and game details. This binary classification model uses historical QB game logs, player career stats, and basic bio data stored in a SQLite database, demonstrating feature engineering, model evaluation, and explainability techniques.
Key Features
🗄️ Database-Driven
All data stored in SQLite for easy management and validation with comprehensive relational schema
⚡ Real-time Predictions
Make predictions using current player data from database with confidence scores
✅ Data Validation
Comprehensive data quality checks and validation framework
🚀 Automated Workflow
One-command setup and deployment with modular architecture
📊 Historical Tracking
View prediction history and accuracy tracking in database
🌐 Modern Web App
Beautiful Streamlit interface with multiple pages and real-time updates
System Architecture
Built with a modular, database-first approach using SQLite for data persistence and Streamlit for the user interface.
Project Structure
📁 data/
├── 📁 raw/ # Original CSV files
└── 📁 processed/ # Cleaned & engineered datasets
📁 src/
├── 🗄️ database.py # Database management
├── 📥 data_loader.py # Load CSV data into database
├── ✅ data_validator.py # Data quality validation
├── 🔄 preprocess.py # Database-driven preprocessing
├── 🎯 train_model.py # Model training
└── 📊 explain_shap.py # Model explainability
📁 app/
└── 🌐 app.py # Streamlit web application
Database Schema
The project uses a relational SQLite database with comprehensive data management:
Core Tables
- basic_stats: Player demographics and physical info
- game_logs: Game-by-game performance records
- qb_stats: QB-specific game statistics
- career_stats: Season-level career statistics
- qb_career_passing: QB career passing stats
- predictions: Model prediction history
Key Relationships
- Players linked by player_id
- Game logs linked to QB stats by game_log_id
- Career stats linked to QB passing stats by career_id
Model Performance
Data Sources
- Total Records: 100,000+ game logs
- Quarterbacks: 500+ players
- Years Covered: 2000-2024
- Files Used: Game_Logs_Quarterback.csv, Career_Stats_Passing.csv, Basic_Stats.csv
Web Application Features
The Streamlit app provides multiple comprehensive pages:
🎯 Make Prediction
- Select quarterback from database
- View recent performance stats
- Make real-time predictions
- Save predictions to database
🗄️ Player Database
- Browse all available data
- View database statistics
- Explore sample records
📊 Prediction History
- Track all predictions made
- View prediction accuracy
- Analyze confidence scores
Technical Implementation
Python
SQLite
XGBoost
Streamlit
Pandas
NumPy
Scikit-learn
SHAP
Data Validation Framework
- Data completeness: Check for missing values
- Data consistency: Verify relationships between tables
- Value ranges: Ensure statistics are reasonable
- Duplicate detection: Find and handle duplicates
- Date validation: Check for valid game dates
- QB-specific checks: Validate quarterback data quality
Advanced Features
Automated Workflow
One-command setup with modular execution:
python main.py
- Complete workflow
python main.py --setup
- Database setup
python main.py --validate
- Data validation
python main.py --app
- Launch web app
Database Management
- Connection pooling and transaction management
- Automated data loading from CSV files
- Prediction history tracking
- Data quality monitoring
What I'd Improve
- Real-time Data Integration: Connect to live NFL API for current game data
- Model Ensemble: Combine XGBoost with neural networks for better performance
- Advanced Analytics: Add trend analysis and player comparison features
- Performance Optimization: Implement caching and query optimization
- Cloud Deployment: Deploy to AWS/GCP with containerization
- API Development: Create REST API for external integrations
Business Impact
This project demonstrates my ability to:
- Design and implement comprehensive database architectures
- Build production-ready machine learning pipelines
- Create user-friendly interfaces for complex data systems
- Implement robust data validation and quality assurance
- Develop modular, maintainable code with automated workflows
- Handle large-scale data processing and real-time predictions