Hotel Booking Cancellations Prediction

This project aims to predict hotel booking cancellations. By knowing the guest who is likely to cancel, the hoteliers can do actions to avoid the cancellations, calculate the right demand, or adjust their overbooking tactics and cancellation policies appropriately to increase the revenue.

This project includes the flask dashboard called Bell-Man (Booking cancellation prediction machine) which contain some visualizations of the data and the prediction machine itself.

The data used in this project contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data.

The data is originally from the article Hotel Booking Demand Datasets, written by Nuno Antonio, Ana Almeida, and Luis Nunes for Data in Brief, Volume 22, February 2019. The data was downloaded and cleaned by Thomas Mock and Antoine Bichat for #TidyTuesday during the week of February 11th, 2020.

Data Dictionary

VARIABLE	TYPE	DESCRIPTION
hotel	character	Hotel (H1 = Resort Hotel or H2 = City Hotel)
is_canceled	double	Value indicating if the booking was canceled (1) or not (0)
lead_time	double	Number of days that elapsed between the entering date of the booking into the PMS and the arrival date
arrival_date_year	double	Year of arrival date
arrival_date_month	character	Month of arrival date
arrival_date_week_number	double	Week number of year for arrival date
arrival_date_day_of_month	double	Day of arrival date
stays_in_weekend_nights	double	Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel
stays_in_week_nights	double	Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel
adults	double	Number of adults
children	double	Number of children
babies	double	Number of babies
meal	character	Type of meal booked. Categories are presented in standard hospitality meal packages: Undefined/SC – no meal package; BB – Bed & Breakfast; HB – Half board (breakfast and one other meal – usually dinner); FB – Full board (breakfast, lunch and dinner)
country	character	Country of origin. Categories are represented in the ISO 3155–3:2013 format
market_segment	character	Market segment designation. In categories, the term “TA” means “Travel Agents” and “TO” means “Tour Operators”
distribution_channel	character	Booking distribution channel. The term “TA” means “Travel Agents” and “TO” means “Tour Operators”
is_repeated_guest	double	Value indicating if the booking name was from a repeated guest (1) or not (0)
previous_cancellations	double	Number of previous bookings that were cancelled by the customer prior to the current booking
previous_bookings_not_canceled	double	Number of previous bookings not cancelled by the customer prior to the current booking
reserved_room_type	character	Code of room type reserved. Code is presented instead of designation for anonymity reasons
assigned_room_type	character	Code for the type of room assigned to the booking. Sometimes the assigned room type differs from the reserved room type due to hotel operation reasons (e.g. overbooking) or by customer request. Code is presented instead of designation for anonymity reasons
booking_changes	double	Number of changes/amendments made to the booking from the moment the booking was entered on the PMS until the moment of check-in or cancellation
deposit_type	character	Indication on if the customer made a deposit to guarantee the booking. This variable can assume three categories: No Deposit – no deposit was made; Non Refund – a deposit was made in the value of the total stay cost; Refundable – a deposit was made with a value under the total cost of stay.
agent	character	ID of the travel agency that made the booking
company	character	ID of the company/entity that made the booking or responsible for paying the booking. ID is presented instead of designation for anonymity reasons
days_in_waiting_list	double	Number of days the booking was in the waiting list before it was confirmed to the customer
customer_type	character	Type of booking, assuming one of four categories: Contract - when the booking has an allotment or other type of contract associated to it; Group – when the booking is associated to a group; Transient – when the booking is not part of a group or contract, and is not associated to other transient booking; Transient-party – when the booking is transient, but is associated to at least other transient booking
adr	double	Average Daily Rate as defined by dividing the sum of all lodging transactions by the total number of staying nights
required_car_parking_spaces	double	Number of car parking spaces required by the customer
total_of_special_requests	double	Number of special requests made by the customer (e.g. twin bed or high floor)
reservation_status	character	Reservation last status, assuming one of three categories: Canceled – booking was canceled by the customer; Check-Out – customer has checked in but already departed; No-Show – customer did not check-in and did inform the hotel of the reason why
reservation_status_date	double	Date at which the last status was set. This variable can be used in conjunction with the ReservationStatus to understand when was the booking canceled or when did the customer checked-out of the hotel

Data Preparation

1. Drop Rows with Missing Values

children, country, market_segment, distribution_channel

2. Modifying Columns

total_guests = adults + children + babies

3. Remove Wrong Values

adr feature has negative value. It does not make sense if the price is negative.
total_guests has the value of 0. It does not make sense if no one booked the room.

4. Drop Features

Considering: percentage of null values, information contained, high correlation with other features

Modeling

1. Preprocessing

Using Pipeline: constant imputer and binary encoder for categorical data, robust scaler for numerical data, smote

2. Modeling with Default Parameters

Six models are used: Logistic Regression, DTC, RFC, Gradient Boosting Classifier, XGB Classifier, KNN. The two best models are RFC and XGB

3. Modeling with Default Parameters + RFE in Pipeline

The test score using RFE is slightly drop from the non-RFE model and RFC model still has the best score

4. Hyperparameter Tuning RFC

The tuning is failed to make the score better. So, the default parameter RFC model pipeline will be used

5. Threshold Adjustment

The threshold of 0.590833 increase precision score as much as almost 4% while losing recall score 5.7%

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
dashboard		dashboard
pic_for_readme		pic_for_readme
1_DataCleaning_EDA.ipynb		1_DataCleaning_EDA.ipynb
2_Modeling.ipynb		2_Modeling.ipynb
Booking Cancellation Prediction.pdf		Booking Cancellation Prediction.pdf
README.md		README.md
hotel_bookings.csv		hotel_bookings.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hotel Booking Cancellations Prediction

Data Dictionary

Data Preparation

1. Drop Rows with Missing Values

2. Modifying Columns

3. Remove Wrong Values

4. Drop Features

Modeling

1. Preprocessing

2. Modeling with Default Parameters

3. Modeling with Default Parameters + RFE in Pipeline

4. Hyperparameter Tuning RFC

5. Threshold Adjustment

Dashboard

Homepage

Dataset Page

Visualization Page

Predict and Result

About

Releases

Packages

Languages

juniomata/Hotel_Booking_Cancellations_Prediction

Folders and files

Latest commit

History

Repository files navigation

Hotel Booking Cancellations Prediction

Data Dictionary

Data Preparation

1. Drop Rows with Missing Values

2. Modifying Columns

3. Remove Wrong Values

4. Drop Features

Modeling

1. Preprocessing

2. Modeling with Default Parameters

3. Modeling with Default Parameters + RFE in Pipeline

4. Hyperparameter Tuning RFC

5. Threshold Adjustment

Dashboard

Homepage

Dataset Page

Visualization Page

Predict and Result

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages