Hotel Reservation Cancellation Prediction

Hotel cancellation prediction with ten-model benchmarking, MLflow tracking, and Jenkins deployment to Google Cloud Run

I built an end-to-end machine learning pipeline that predicts hotel reservation cancellations from booking and guest features. I benchmarked ten classification algorithms, selected the most informative features with Random Forest importance ranking, balanced the training data with SMOTE, tracked training runs in MLflow, and deployed the serving workflow through Jenkins, Docker, and Google Cloud Run.

ContextAcademic Project

RoleMachine Learning Engineer

TeamSolo

DateMar–Jun 2025

I built the full pipeline solo, including data ingestion, preprocessing and feature selection, the ten-algorithm model comparison, LightGBM training and tuning, MLflow tracking, the Flask UI, the Dockerfile, and the Jenkins pipeline definition.

10-algorithm model comparisonRandom Forest 89.1% accuracy after tuning, the benchmark winnerLightGBM 86.5% accuracy, the production model

PythonLightGBMRandom ForestXGBoostscikit-learnSMOTE

Source

Overview

I built a pipeline that ingests approximately 36,275 hotel reservation records from a GCS bucket, splits them 80/20 into train and test sets, and preprocesses them with label encoding across six categorical columns, skewness correction at a threshold of 5, SMOTE class balancing, and Random Forest based feature selection that reduces 18 input features down to the top 10. Before a production model was chosen, I benchmarked ten classification algorithms on the processed data. The production pipeline trains and tunes a LightGBM classifier with RandomizedSearchCV, logs every run to MLflow, and serves predictions through a Flask UI, while a Jenkins pipeline builds the Docker image, pushes it to GCR, and deploys it to Cloud Run.

Problem

Hotels lose revenue to late cancellations. A model that flags a likely cancellation ahead of time can support overbooking strategy, targeted retention offers, and the detection of habitual or fraudulent cancellers.

Intended User

Built for hotel revenue-management teams, with the stated use cases of overbooking strategy, targeted retention offers, and the detection of habitual or fraudulent cancellers.

Architecture

Raw reservation data lands in a GCS bucket and flows through an ingestion step that splits it 80/20 into train and test sets. A preprocessing stage label encodes six categorical columns, log transforms skewed numerical columns above a skewness threshold of 5, balances the cancellation and non-cancellation classes with SMOTE, and selects the top 10 of 18 features by Random Forest feature importance. The training step tunes a LightGBM classifier with RandomizedSearchCV and logs its parameters, metrics, datasets, and model artifact to MLflow. A Jenkins pipeline then builds a Docker image, pushes it to Google Container Registry, and deploys it to Cloud Run, where a Flask app serves predictions from the trained model.

My Contribution

I built the full pipeline solo, including data ingestion, the preprocessing and feature selection steps, the ten-algorithm model comparison, LightGBM training and tuning, MLflow tracking, the Flask UI, the Dockerfile, and the Jenkins pipeline definition.

Implementation

Benchmarked ten classification algorithms, including Random Forest, Logistic Regression, Gradient Boosting, SVC, Decision Tree, KNN, Naive Bayes, XGBoost, AdaBoost, and LightGBM, on the same processed data before a production model was chosen.
Reduced 18 input features to the top 10 by training a Random Forest purely for its feature importance ranking, rather than choosing features by hand.
Applied SMOTE to balance the cancellation and non-cancellation classes, and log transformed numerical columns whose skewness exceeded a threshold of 5.
Tuned LightGBM hyperparameters, including the number of estimators, max depth, learning rate, number of leaves, and boosting type, with RandomizedSearchCV rather than fixed defaults.
Logged every training run, its parameters, its accuracy, precision, recall, and F1 score, and the resulting model artifact to MLflow.
Defined the full CI/CD path in a Jenkinsfile that checks out the repository, builds and pushes a Docker image to GCR, and deploys it to Cloud Run.

Key Decisions

A ten-algorithm comparison before choosing a production model

Why — Comparing Random Forest, Logistic Regression, Gradient Boosting, SVC, Decision Tree, KNN, Naive Bayes, XGBoost, AdaBoost, and LightGBM on the same processed data showed how each candidate actually performed before any one of them became the production model.

Random Forest feature importance for feature selection

Why — Ranking all available features by a trained Random Forest model and keeping the top 10 reduced dimensionality using a measured signal instead of manual guesswork.

SMOTE for class balancing

Why — Reservation cancellations were imbalanced, so SMOTE balanced the classes before training rather than relying on class weights alone.

Jenkins for CI/CD rather than GitHub Actions

Why — I wanted a complete build to deploy pipeline, GCR to Cloud Run, using Jenkins rather than the GitHub native option.

Testing & Validation

Each pipeline stage logs its own progress and errors through a dedicated logger and a custom exception class, and the training step logs accuracy, precision, recall, and F1 score to MLflow on every run, giving an inspectable record of what each run actually did.

Results

The ten-model comparison identified Random Forest as the strongest reported benchmark, reaching 89.1% accuracy and 89.2% F1 after RandomizedSearchCV tuning. LightGBM, which was carried into the production pipeline, reached 86.5% accuracy, 88.9% recall, and 86.9% F1. Beyond model training, I built the full path from GCS ingestion and feature engineering to MLflow tracking, Flask inference, Docker packaging, and Jenkins deployment to Google Cloud Run.

Reliability & Failure Handling

Every pipeline stage wraps its errors in a custom exception that carries the originating error and a log entry, so a failure in ingestion, preprocessing, or training surfaces with context instead of failing silently.

Deployment & Runtime

The Jenkins pipeline built the Docker image, pushed it to Google Container Registry, and deployed the Flask application to Cloud Run during the project period. A recorded demo documents the end-to-end prediction workflow. The public deployment is not maintained as a continuously live service today.

Lessons Learned

The ten-model comparison gave every candidate the same processed data and the same accuracy, precision, recall, and F1 metrics. Random Forest was the strongest reported benchmark at 88.97% accuracy, 89.16% precision, 88.93% recall, and 89.04% F1, and a further RandomizedSearchCV tuning pass raised it to 89.10% accuracy, 88.88% precision, 89.58% recall, and 89.23% F1. LightGBM, the algorithm carried into the production pipeline rather than Random Forest, reached 86.53% accuracy, 85.04% precision, 88.89% recall, and 86.92% F1 in the same comparison. The production pipeline tunes LightGBM directly with its own RandomizedSearchCV pass and logs the resulting accuracy, precision, recall, and F1 score to MLflow at training time.

Evidence & Technical Proof

View preprocessing and feature selection View training pipeline View model training and MLflow logging View Jenkins pipeline View Docker setup Watch the demo video

Technologies

PythonLightGBMRandom ForestXGBoostscikit-learnSMOTEMLflowFlaskDockerJenkinsGoogle Cloud StorageGoogle Container RegistryGoogle Cloud RunClassificationFeature EngineeringHyperparameter Tuning

Back to all projects