causality of the cancellation policy on bookings of airbnb · airbnb, inc. is an online marketplace...
TRANSCRIPT
Causality of the Cancellation Policy on Bookings of Airbnb Course Project Report
Team 1 4/10/2020 OPIM 5510 Web Analytics
P a g e 1 | 16
Contents Part 1: Company Background ...........................................................................................................................2
Part 2: Data Retrieval and Preprocessing ........................................................................................................2
Part 3: Objectives and Metrics ..........................................................................................................................3
Part 4: Causality Analysis ..................................................................................................................................3
Validity Check on the Data ...........................................................................................................................3
Modeling and Hypothesis Testing ................................................................................................................3
Heterogeneous Treatment Effects ................................................................................................................4
Suggestions Drawn from the Causality Analysis ..........................................................................................4
Part 5: Experiment Design .................................................................................................................................5
Methodology of the Experiment Design........................................................................................................5
Additional Suggestions Based on a New Experiment ..................................................................................5
Appendix .............................................................................................................................................................6
Exhibit 1: Variable Data Type ...............................................................................................................6
Exhibit 2: Boroughs in New York City ...................................................................................................6
Exhibit 3: Cancellation Policy Excerpt from Airbnb’s Official Website ..................................................6
Exhibit 4: Pivot Table of Variables ........................................................................................................7
Exhibit 5: Cancellation Rate by Room Type .........................................................................................7
Exhibit 6: Share of Strict Cancellation Policy by Room Type ...............................................................7
Exhibit 7: Number of Reviews by Borough ...........................................................................................8
Exhibit 8: Validity Check of the Data ....................................................................................................8
Exhibit 9: Results of PSM .....................................................................................................................9
Exhibit 10: Validity Check of the Matched Data ................................................................................ 10
Exhibit 11: Histogram of Number of Reviews after PSM ................................................................... 11
Exhibit 12: Negative Binomial Regression ........................................................................................ 11
Exhibit 13: Linear Regression ............................................................................................................ 12
Exhibit 14: Poisson Regression ......................................................................................................... 12
Exhibit 15: Model Comparison ........................................................................................................... 13
Exhibit 16: Interaction between Borough and Cancellation Policy .................................................... 14
Exhibit 17: Interaction between Price and Cancellation Policy ......................................................... 14
Exhibit 18: Interaction between Accommodates and Cancellation Policy ......................................... 15
Exhibit 19: Hypothesis Test of the Coefficient of Accommodates ..................................................... 15
Exhibit 20: Interaction between Room Type and Cancellation Policy ............................................... 16
P a g e 2 | 16
Part 1: Company Background
Airbnb, Inc. is an online marketplace for people to list, discover, and book accommodations worldwide.
The company connects people looking to rent out their homes with people searching for accommodation.
The main competitors of Airbnb include both online platforms and traditional hotel chains. However,
unlike hotel brands, Airbnb does not own any of the real estate listings and its revenue comes from
commissions of each booking. This peer-to-peer business model that enables users to search the database
is their competitive advantage on one hand, and on the other, gives them little control over the operations
of the off-line services. For example, compared with the lodging services provided by traditional hotels, a
more stringent cancellation policy has become a two-edged sword that both deters the inclination of
canceling existing orders by the customers and hinders the conversion of a user to a customer, especially
when a significant proportion of the company’s customer base is composed of budget and non-business
and budget travelers, who has a more flexible itinerary and a stronger sensitivity to the refund associated
with cancellation conditions. This project aims to explore the causality between Airbnb’s current
cancellation policy and the number of reviews and thus to offer insights for both the landlords and Airbnb
to increase the volume of bookings of their listings.
Part 2: Data Retrieval and Preprocessing
The data is obtained from “Inside Airbnb1,” an independent website that provides historical booking data
of Airbnb. The original dataset is arranged by listing. Each listing is a lodge unit that has an individual
web page to display images and provide such information as price, address, available dates, amenities,
cancellation policies, so on and so forth. We set the timeframe as the fourth quarter of 2019, namely from
10/01/2019 to 12/31/2019; however, instead of downloading the dataset as a whole, to ensure its accuracy,
we retrieved the datasets of all the three months in this quarter, and only retained those entities that existed
across all three months. Meanwhile, in case of occurrence of changes in the cancellation policy during the
timeframe, we compared the policy month by month for each listing and only when it remained unchanged
in all three months, did we save it for the final use.
After the processing work, the new data is composed of 25,235 rows (or observations) and 9 columns (see
Exhibit 1 for their data types). Among them, eight are independent variables and one is the dependent
variable. Each of them is defined as follows:
1) ID: unique identification for each listing
2) Location: the five boroughs comprising the New York City, including the Bronx, Brooklyn,
Manhattan, Queens, and Staten Island (see Exhibit 2)
3) Room type: types of the space: 1) entire home/apt; 2) private room; 3) shared room 4) hotel room
4) Accommodates: the maximum number of guests allowed to reside
5) Price: booking price per night per lodge unit
6) Security Deposit: the payment to secure the booking, not mandatory to have. “1” denotes presence
of security deposit and “0” denotes no security deposit
7) Cleaning Fee: a one-time charge on customers for housekeeping for each booking
8) Cancellation Policy: based on its strictness, the policies are divided into not-strict and strict, which
are respectively the control (denoted by “0”) and treatment (denoted by “1”) groups (see Exhibit 3
for the distinction between them2).
9) Number of Reviews: the volume of reviews made by guests who have checked out after living in
the listed property, a mimic of the transaction outcomes
1 Link: http://insideairbnb.com/get-the-data.html 2 There are originally 6 policies. For computational simplicity, we combine similar ones into 2.
P a g e 3 | 16
Part 3: Objectives and Metrics
We have both original metrics and created metrics. The original metrics are those we take directly from the
variables of the data, which is the number of reviews. As discussed, Airbnb stipulates that only those who
have checked out from the listed property are given the access to making reviews, it is almost an equivalent
indicator of the volume of bookings for each individual listed property. Meanwhile, as we have set our goal
as helping landlords and Airbnb generate higher revenue, we believe this indicator bears a very good
connection with the objective to be achieved, given the limited access to the data that is not available to the
public. Besides, we also created some metrics by manipulating different existing variables. Those include
the rate of property with cancellation policy by borough and by room type, and the number of reviews per
listing. These numbers are expected to facilitate the exploration of the heterogeneous treatment effects of
reviews and the cancellation policy on other dummy variables, which we expect to complement the in-
depth comprehension about the effect of the cancellation policy on the volume of reviews.
To capture an intuitive sense of these metrics, we conducted an exploratory data analysis by using the pivot
table (see Exhibit 4) and presenting various charts. The overall rate of cancellation policy among all listings
is 56.6%. Compared with hotel rooms that are operated by dedicated hospitality firms, strict cancellation
policies are mostly adopted by room types operated by private landlords themselves, with shared rooms
accounting for 67.15% and entire homes accounting for 61.02% (see Exhibit 5). However, among all the
listings that feature strict cancellation requirements, entire rooms account for a majority of 57.07%, higher
than the rest of room types combined, followed by private room’s 39.25% (see Exhibit 6). Based on the
two metrics, we can assert that the entire home room type constitutes the main force that exerts strictness
in cancellation. Location-wise, Manhattan and Brooklyn are the two major sources of rental offerings, as
evidenced by their relatively high quantity of reviews. Besides, there seems to exist a positive linear
correlation between the price and the rate of cancellation policy for properties across the five boroughs,
meaning the rate of adopting cancellation policies increases as the rent price goes higher (see Exhibit 7).
Part 4: Causality Analysis
Validity Check on the Data In order to check if there is any significance between the two groups that differ in cancellation requirements,
we firstly examine the relationships between the cancellation policy and continuous variables one by one
through linear regression, as well as the relationships between the cancellation policy and nominal variables
through Chi-square tests (see Exhibit 8). By excluding those coefficients with p-values lower than 0.05,
we find that subjects in the strict group and the non-strict group are not comparable and original data falls
short of being random. Therefore, we proceed to apply Propensity Score Matching to sort out those subjects
that possess the same (or similar) set of observable characteristics from both of the groups. As it turns out,
the valid data significantly drops to 4,729 pairs for the two groups (see Exhibit 9). After having got the
matched data, we rerun the Chi-square tests and linear regressions that have been used to test the validity
of the data in the previous step. To our delight, this time p-value of all the coefficients are all greater than
0.05 (see Exhibit 10), indicating there is no bias between the control group and the treatment group now.
Modeling and Hypothesis Testing Before we began building the model, three models came to our minds. Obviously, the outcome variable,
number of reviews is non-negative integral, so either the Poisson regression or the negative binomial
regression shall suit. But for the consideration of being conservative, we decided to take a look at how the
outcome variable is distributed. As Exhibit 11 shows, a landslide portion of all properties have less than 10
reviews and the whole sample has a mean of 3.63 reviews and a variance of 17.28 reviews. In cases of such
an overdispersion, negative binomial would be more appropriate. Based on these statistics, we decide to
prioritize the negative binomial model. In addition, we decided to also give a thought to the linear
P a g e 4 | 16
regression and Poisson regression as backup models to run. Not to our surprise, the negative binomial
model turns out the highest log-likelihood among the three models (see Exhibit 12-15 for each model’s
summary and their comparison). Derived from that model we have the following findings:
1) Listings with strict cancellation policies have a higher number of reviews than non-strict ones;
2) Listings in Manhattan have more reviews than any other boroughs;
3) Entire home/apt is more popular and has more reviews than any other room types;
4) Listings that require no security deposit have a higher number of reviews;
5) Decreasing the price and cleaning fee would respectively increase the number of reviews, although
in different magnitudes;
6) Higher numbers of accommodatable guests lead to higher number of reviews.
Heterogeneous Treatment Effects Now that we have come up with insights for all the listings as a whole, we would dive deeper to see if the
cancellation policy has different interactive effects across other variables. The negative binomial model
with interaction terms of boroughs and the cancellation policy shows that strict cancellation policies have
a positive effect on the number of reviews for the listings in Brooklyn and Manhattan and no effect on the
other three boroughs (see Exhibit 16). This might be partially explained by the fact that Brooklyn has a
largest supply of lodging facilities while Manhattan is the hottest destination among lodge bookers.
Results from the interaction between the price and the cancellation policy shows that given the same price,
the presence of a strict cancellation policy does not influence the outcomes (see the hypothesis test in
Exhibit 17). This means that binding a strict cancellation policy with the price wouldn’t hurt the
competitiveness of the listings and the hosts are good to be at their own discretion with respect to whether
to restrict cancellation or not. The coefficient of the interaction between accommodates and cancellation
policy is not significant (see Exhibit 18). However, for the treatment group, being able to accommodate
one more guest could increase the number of reviews, as confirmed in the hypothesis test (see Exhibit 19).
Last but not the least, strict cancellation policy has different interactive effects across different room types,
with positive effect on private rooms, shared rooms, and entire rooms, and no effect on hotel rooms (see
Exhibit 20).
Suggestions Drawn from the Causality Analysis Combining results from the regression and the heterogeneous treatment analysis, we are able to draw the
following suggestions:
1) For listings in Manhattan and Brooklyn, hosts are expected to raise their thresholds of cancellation
conditions to secure the existing bookings and sales revenue, whereas the Bronx, Queens and Staten
Island seem immune to the side effects of strict policy.
2) In terms of deterring potential customers, high prices function in the same way as a strict
cancellation policy but in a greater magnitude.
3) If a listing price has been set and not adjustable, binding a strict cancellation policy will not decrease
the number of reviews.
4) For hosts who have to incur a considerable amount fee associated with housekeeping, they are better
off to allocate it in the booking price per night rather than list it as a stand-alone charge, since
customers are less sensitive to incremental increase of the booking price.
5) For room types other than hotel rooms, we recommend hosts to go with the strict policy but for hotel
rooms, appropriate flexible policy could be implemented to gain a higher number of reviews.
6) The host is recommended to lift the restrictions on the number of accommodatable guests if the
space allows them to do so.
P a g e 5 | 16
Part 5: Experiment Design
As said above, the dataset is a historical one, as such it is susceptible to impact from external events. In
order to rule out any possible confounding factors to draw causal inference, we should carry out a controlled
experiment. Before we get embarked to design a controlled experiment, let’s go back to rethink our
objective: improving sales revenue for both the landlord and Airbnb through testing the effect of the varied
strictness observed in the cancellation policy.
Methodology of the Experiment Design In terms of choosing a right outcome variable, we want to increase revenue per property without increasing
the vacancy rate. Therefore, the most straightforward way to measure this outcome is to look at the number
of nights that have been booked per month or monthly occupancy ratio. A lowered occupancy ratio is better
explained by the deterring effects of a high threshold cancellation policy. In fact, replacing occupancy with
number of reviews is a reconcile of lack of internal transaction data, since not every eligible reviewer will
leave a review, even though they are encouraged to do so.
When it comes to choosing the properties, we would like to run an A/A test first. For example, two groups
of properties feature exactly the same cancellation policy. Failed A/A tests suggest asymmetric experience
between the control and treatment groups. Given everything else being equal, if properties with different
cancellation policies have different occupancy ratios, we may easily attribute this difference to the effect
of the cancellation policy.
At last, we would run a before-after experiment instead of an after-only experiment. To do that, we would
impose a uniformed cancellation policy (e.g. the flexible one), on both two groups for a set period of time
and then have the treatment group switched to a stricter policy while leaving the other to stick to the old
policy. Moving forward for the same period of time, if there is still difference in the outcome, we shall be
more confident that this difference is a genuine one caused by the different treatments between the two
groups in the latter period.
Additional Suggestions Based on a New Experiment Through the experiment we may find that, most of the landlords or hosts turn to strict policies because of
being afraid of ending up vacant in case of any “last-minute” reneges from existing bookings, and compared
with hotels, they are less known to the market and more difficult to find new guests within a short period
of time. Going from there, we would recommend Airbnb to offer alternatives for a listing to increase its
exposure and visibility to potential customers, so as to further bring in a stable stream of lodging shoppers.
Some of these measures may include:
1) providing sponsored ads for interested hosts to optimize their rank on the platform, which will in
return diversify the source of revenue for Airbnb as well, other than relying on commissions from
each transaction between customers and hosts.
2) updating its algorithm of listing rankings, based on a holistic set of factors such as rating, page
quality, volume of pageviews, and past bookings.
3) advancing functionality of the in-site search engine. Currently Airbnb features a naïve search engine
where the searchers find a target listing only when they input the ID number of that listing. We
expect a move that would allow the user to input a set of keywords to define his/her searches.
From the host side, we encourage them to think about how to improve the popularity and customer rating
of their listings. Some of these actions might include posting more enticing images about the property and
its neighborhood to generate more organic online traffic, and providing more value-added services like
local touring guide to enrich the off-line experience of the guests. They may also consider to differentiate
themselves by labelling themselves with a few tags that could be further used as search keywords.
P a g e 6 | 16
Appendix
Exhibit 1: Variable Data Type
Variable Name Data Sub-type Data Type Remarks
ID Number series Numerical Unique Key
Location Character Categorical Dummy
Room Type Character Categorical Dummy
Accommodates Integer Discrete Count
Price Currency Continuous Numerical
Security Deposit Binary Categorical and ordinal Dummy
Cleaning Fee Currency Continuous Numerical
Cancellation Policy Binary Categorical and ordinal Dummy
Number of Reviews Integer Discrete Outcome
Exhibit 2: Boroughs in the New York City
Exhibit 3: Cancellation Policy Excerpt from Airbnb’s Official Website On Airbnb, hosts can choose which cancellation policies to offer to guests, and guests can review them before booking.
Based on their strictness, we divide them into two groups. Not-Strict group has a flexible or moderate cancellation
policy. The strict group has strict or super-strict cancellation policy.
1. Flexible: Free cancellation until 14 days before check-in. If booked less than 14 days before check-in, free cancellation
for 48 hours after booking, up to 24 hours before check-in. After that, the guest can cancel up to 24 hours before
check-in and get a refund of the nightly rate and the cleaning fee, but not the service fee.
2. Moderate: Free cancellation until 14 days before check-in. If booked less than 14 days before check-in, free
cancellation for 48 hours after booking up, up to 5 days before check-in. After that, guests can cancel up to 5 days
before check-in and get a refund of the nightly rate and the cleaning fee, but not the service fee.
3. Strict/Strict with 14 days grace period: Hosts can choose which strict policies to offer. The strict policy allows free
cancellation for 48 hours after booking up. After that, guests can cancel up to 7 days before check-in to get a 50%
refund of the nightly rate and cleaning, but not the service fee. Strict with 14 days grace period allows free cancellation
for 48 hours after booking up, as long as the guests cancel at least 14 days before check-in. After that, guests can
cancel up to 14 days before check-in to get a 50% refund of the nightly fee and the cleaning, but not the service fee.
4. Super Strict Policy 30/60: Hosts can choose which super strict policy to offer. Super strict 30 policy allows free
cancellation at least 30/60 days before check-in and gets a 50% refund of the nightly fee and cleaning fee, but not the
service fee.
P a g e 7 | 16
Exhibit 4: Pivot Table of Variables
Exhibit 5: Cancellation Rate by Room Type
Exhibit 6: Share of Strict Cancellation Policy by Room Type
P a g e 8 | 16
Exhibit 7: Number of Reviews by Borough
Exhibit 8: Validity Check of the Data
P a g e 9 | 16
Exhibit 9: Results of PSM
P a g e 10 | 16
Exhibit 10: Validity Check of the Matched Data
P a g e 11 | 16
Exhibit 11: Histogram of Number of Reviews after PSM
Exhibit 12: Negative Binomial Regression
P a g e 12 | 16
Exhibit 13: Linear Regression
Exhibit 14: Poisson Regression
P a g e 13 | 16
Exhibit 15: Model Comparison
Note: m1 has a lower log-likelihood and hence a poorer fit than nml
Note: pl has a lower log-likelihood and hence a poorer fit than ml
Note: p1 has a lower log-likelihood and hence a poorer fit than nml
P a g e 14 | 16
Exhibit 16: Interaction between Borough and Cancellation Policy
Exhibit 17: Interaction between Price and Cancellation Policy
P a g e 15 | 16
Exhibit 18: Interaction between Accommodates and Cancellation Policy
Exhibit 19: Hypothesis Test of the Coefficient of Accommodates
P a g e 16 | 16
Exhibit 20: Interaction between Room Type and Cancellation Policy