🚢 Titanic ML Pipeline¶
Welcome to the documentation for MLOps Labs — a modular, production-ready machine learning pipeline for the Titanic survival prediction task.
Built as part of the ITI MLOps Course.
Features¶
- Modular pipeline — Download → Preprocess → Validate → Train → Test → Inference
- Optuna hyperparameter optimization across 6 model families
- Weights & Biases integration — optional experiment tracking
- Data validation tests — pipeline stops if data quality checks fail
- Colored logging — centralized logger with color-coded log levels
- Hydra config management — all settings in
config.yamlwith CLI overrides - CLI inference — predict on new data with one command
Quick Links¶
| Section | Description |
|---|---|
| Getting Started | Installation, setup, and first run |
| Pipeline Overview | Step-by-step pipeline flow |
| Configuration | Hydra config, CLI overrides, environment variables |
| API Reference | Module and function documentation |
| Data Validation Tests | Test descriptions and usage |
Tech Stack¶
| Tool | Purpose |
|---|---|
| scikit-learn | Preprocessing pipelines & ensemble models |
| XGBoost | Gradient boosting classifier |
| CatBoost | Categorical-aware boosting |
| Optuna | Hyperparameter optimization |
| Hydra | Configuration management with CLI overrides |
| Weights & Biases | Experiment tracking (optional) |
| pytest | Data validation testing |
| MKDocs | Project documentation |
| kagglehub | Kaggle data download |