🚢 Titanic ML Pipeline

Welcome to the documentation for MLOps Labs — a modular, production-ready machine learning pipeline for the Titanic survival prediction task.

Built as part of the ITI MLOps Course.


Features

  • Modular pipeline — Download → Preprocess → Validate → Train → Test → Inference
  • Optuna hyperparameter optimization across 6 model families
  • Weights & Biases integration — optional experiment tracking
  • Data validation tests — pipeline stops if data quality checks fail
  • Colored logging — centralized logger with color-coded log levels
  • Hydra config management — all settings in config.yaml with CLI overrides
  • CLI inference — predict on new data with one command

Section Description
Getting Started Installation, setup, and first run
Pipeline Overview Step-by-step pipeline flow
Configuration Hydra config, CLI overrides, environment variables
API Reference Module and function documentation
Data Validation Tests Test descriptions and usage

Tech Stack

Tool Purpose
scikit-learn Preprocessing pipelines & ensemble models
XGBoost Gradient boosting classifier
CatBoost Categorical-aware boosting
Optuna Hyperparameter optimization
Hydra Configuration management with CLI overrides
Weights & Biases Experiment tracking (optional)
pytest Data validation testing
MKDocs Project documentation
kagglehub Kaggle data download