Sagar Gediya, Author at Softqubes


The purpose of this research is to understand the potential of traditional and non-traditional statistical techniques to predict dynamic hotel room prices. Four forecast models were employed: Linear Regression, the Random forest, the Extreme Gradient Boosting(XG boost), is a scalable, distributed gradient-boosted decision tree (GBDT), and the Decision tree. This research is based on an empirical study of data obtained from the Choice Property Management System(PMS)for the property named “Comfort Inn & Suites” for the year 2021,2022.

The economic predictors were obtained from other reliable sources such as the World Tourism Organization. This study agreed with existing literature on the ability of machine learning to predict hotel room prices precisely. Given the complexity of the hotel industry, the effect of external economic predictors was tested in the model. The challenge lay in dealing with the mixed frequencies observed in the collected data. This is designed to add an innovative approach to the existing literature on machine learning in the hotel industry. This creates a bridge between many academic disciplines such as computer science, economics, and marketing. Hotel operators should benefit from this research when setting strategies as well as in using the model to set their relative room prices.

Artificial Intelligence In The Hospotality Industry

The hospitality business has several variables to take into account when determining the optimal price for hotel rooms and other property services. Evaluating demand for accommodation and pricing rates can enable business to have clear idea of when they will experience high demand and accordingly charge high rates. To deal with this, we have built a machine learning model that can predict hotel prices as well as tools and methodologies for analyzing historical hotel data. This allows to maximize revenue during peak seasons and better management of resources.




We get the sample data from one of the PMS system named “Choice PMS”, in our case it was easy to get the data but some times it becomes lot more hard to acquire relatable data from authentic sources.


Data Cleaning means the process of identifying the incorrect, incomplete, inaccurate, irrelevant or missing part of the data and then modifying, replacing or deleting them according to the necessity. Data cleaning is considered a foundational element of the basic data science.


Feature engineering is the act of converting raw observations into desired features using statistical or machine learning approaches with the goal of simplifying and speeding up data transformations while also enhancing model accuracy.


Exploratory Data Analysis (EDA) is an approach to analyse the data using visual techniques. It is used to discover trends, patterns, or to check assumptions with the help of statistical summary and graphical representations.


Machine learning model predictions allow businesses to make highly accurate guesses as to the likely outcomes of a question based on historical data, which can be about all kinds of things – customer churn likelihood, possible fraudulent activity, price prediction and more. These provide the businesses various insights that result in tangible business value.







The machine learning field is continuously evolving. And along with evolution comes a rise in demand and importance.

Before and After Outlier removal and normalisation

There is one crucial reason why data scientists need machine learning, and that is: ‘High-value predictions that can guide better decisions and smart actions in real-time without human intervention’, we feed important features columns to a ML model as a X variable and target column as Y variable and then divided it into training set and testing set in a ratio of 70:30, so we will get training and testing results.

Here we used the linear registration, the Random forest, the Extreme Gradient Boosting(XG boost), and the Decision tree, before giving data to machine learning model, it should be outlier free and normalised, here is the graph of before and after removing of outliers and normalisation.

There is visible difference, before the process target data is more saturated to it’s mean an also have high standard deviation, After processing it was well distributed and have well defined interquartile range and outliers free, finally ready to feed to ML model.

Evaluation Metrics

Evaluation Metrics

R2 score is the difference between the samples in the dataset and the predictions made by the model.

MAE is the mean of the absolute error values (actuals – predictions)

MSE is a simple metric that calculates the difference between the actual value and the predicted value (error), squares it and then provides the mean of all the errors.

RMSE is the root of MSE and is beneficial because it helps to bring down the scale of the errors closer to the actual values, making it more interpretable.

Here is the Evaluations metrics of ML models, model’s good accuracy is identified by less difference between training and testing accuracy and with less score of errors.

Based on this matrix, we chose XGB Regressor as the best algorithm for the model. There is around 14% of error because there is lot of other factors to consider while predicting rates and some are unpredictable but we’re working cleverly on this and will develop more accurate ML model.

Here is the trained model on which inputing required Parameters will predict Rate ($110).