causality of the cancellation policy on bookings of airbnb · airbnb, inc. is an online marketplace...

Causality of the Cancellation Policy on Bookings of Airbnb Course Project Report

Team 1 4/10/2020 OPIM 5510 Web Analytics

P a g e 1 | 16

Contents Part 1: Company Background ...........................................................................................................................2

Part 2: Data Retrieval and Preprocessing ........................................................................................................2

Part 3: Objectives and Metrics ..........................................................................................................................3

Part 4: Causality Analysis ..................................................................................................................................3

Validity Check on the Data ...........................................................................................................................3

Modeling and Hypothesis Testing ................................................................................................................3

Heterogeneous Treatment Effects ................................................................................................................4

Suggestions Drawn from the Causality Analysis ..........................................................................................4

Part 5: Experiment Design .................................................................................................................................5

Methodology of the Experiment Design........................................................................................................5

Additional Suggestions Based on a New Experiment ..................................................................................5

Appendix .............................................................................................................................................................6

Exhibit 1: Variable Data Type ...............................................................................................................6

Exhibit 2: Boroughs in New York City ...................................................................................................6

Exhibit 3: Cancellation Policy Excerpt from Airbnb’s Official Website ..................................................6

Exhibit 4: Pivot Table of Variables ........................................................................................................7

Exhibit 5: Cancellation Rate by Room Type .........................................................................................7

Exhibit 6: Share of Strict Cancellation Policy by Room Type ...............................................................7

Exhibit 7: Number of Reviews by Borough ...........................................................................................8

Exhibit 8: Validity Check of the Data ....................................................................................................8

Exhibit 9: Results of PSM .....................................................................................................................9

Exhibit 10: Validity Check of the Matched Data ................................................................................ 10

Exhibit 11: Histogram of Number of Reviews after PSM ................................................................... 11

Exhibit 12: Negative Binomial Regression ........................................................................................ 11

Exhibit 13: Linear Regression ............................................................................................................ 12

Exhibit 14: Poisson Regression ......................................................................................................... 12

Exhibit 15: Model Comparison ........................................................................................................... 13

Exhibit 16: Interaction between Borough and Cancellation Policy .................................................... 14

Exhibit 17: Interaction between Price and Cancellation Policy ......................................................... 14

Exhibit 18: Interaction between Accommodates and Cancellation Policy ......................................... 15

Exhibit 19: Hypothesis Test of the Coefficient of Accommodates ..................................................... 15

Exhibit 20: Interaction between Room Type and Cancellation Policy ............................................... 16

P a g e 2 | 16

Part 1: Company Background

Airbnb, Inc. is an online marketplace for people to list, discover, and book accommodations worldwide.

The company connects people looking to rent out their homes with people searching for accommodation.

The main competitors of Airbnb include both online platforms and traditional hotel chains. However,

unlike hotel brands, Airbnb does not own any of the real estate listings and its revenue comes from

commissions of each booking. This peer-to-peer business model that enables users to search the database

is their competitive advantage on one hand, and on the other, gives them little control over the operations

of the off-line services. For example, compared with the lodging services provided by traditional hotels, a

more stringent cancellation policy has become a two-edged sword that both deters the inclination of

canceling existing orders by the customers and hinders the conversion of a user to a customer, especially

when a significant proportion of the company’s customer base is composed of budget and non-business

and budget travelers, who has a more flexible itinerary and a stronger sensitivity to the refund associated

with cancellation conditions. This project aims to explore the causality between Airbnb’s current

cancellation policy and the number of reviews and thus to offer insights for both the landlords and Airbnb

to increase the volume of bookings of their listings.

Part 2: Data Retrieval and Preprocessing

The data is obtained from “Inside Airbnb1,” an independent website that provides historical booking data

of Airbnb. The original dataset is arranged by listing. Each listing is a lodge unit that has an individual

web page to display images and provide such information as price, address, available dates, amenities,

cancellation policies, so on and so forth. We set the timeframe as the fourth quarter of 2019, namely from

10/01/2019 to 12/31/2019; however, instead of downloading the dataset as a whole, to ensure its accuracy,

we retrieved the datasets of all the three months in this quarter, and only retained those entities that existed

across all three months. Meanwhile, in case of occurrence of changes in the cancellation policy during the

timeframe, we compared the policy month by month for each listing and only when it remained unchanged

in all three months, did we save it for the final use.

After the processing work, the new data is composed of 25,235 rows (or observations) and 9 columns (see

Exhibit 1 for their data types). Among them, eight are independent variables and one is the dependent

variable. Each of them is defined as follows:

1) ID: unique identification for each listing

2) Location: the five boroughs comprising the New York City, including the Bronx, Brooklyn,

Manhattan, Queens, and Staten Island (see Exhibit 2)

3) Room type: types of the space: 1) entire home/apt; 2) private room; 3) shared room 4) hotel room

4) Accommodates: the maximum number of guests allowed to reside

5) Price: booking price per night per lodge unit

6) Security Deposit: the payment to secure the booking, not mandatory to have. “1” denotes presence

of security deposit and “0” denotes no security deposit

7) Cleaning Fee: a one-time charge on customers for housekeeping for each booking

8) Cancellation Policy: based on its strictness, the policies are divided into not-strict and strict, which

are respectively the control (denoted by “0”) and treatment (denoted by “1”) groups (see Exhibit 3

for the distinction between them2).

9) Number of Reviews: the volume of reviews made by guests who have checked out after living in

the listed property, a mimic of the transaction outcomes

1 Link: http://insideairbnb.com/get-the-data.html 2 There are originally 6 policies. For computational simplicity, we combine similar ones into 2.

P a g e 3 | 16

Part 3: Objectives and Metrics

We have both original metrics and created metrics. The original metrics are those we take directly from the

variables of the data, which is the number of reviews. As discussed, Airbnb stipulates that only those who

have checked out from the listed property are given the access to making reviews, it is almost an equivalent

indicator of the volume of bookings for each individual listed property. Meanwhile, as we have set our goal

as helping landlords and Airbnb generate higher revenue, we believe this indicator bears a very good

connection with the objective to be achieved, given the limited access to the data that is not available to the

public. Besides, we also created some metrics by manipulating different existing variables. Those include

the rate of property with cancellation policy by borough and by room type, and the number of reviews per

listing. These numbers are expected to facilitate the exploration of the heterogeneous treatment effects of

reviews and the cancellation policy on other dummy variables, which we expect to complement the in-

depth comprehension about the effect of the cancellation policy on the volume of reviews.

To capture an intuitive sense of these metrics, we conducted an exploratory data analysis by using the pivot

table (see Exhibit 4) and presenting various charts. The overall rate of cancellation policy among all listings

is 56.6%. Compared with hotel rooms that are operated by dedicated hospitality firms, strict cancellation

policies are mostly adopted by room types operated by private landlords themselves, with shared rooms

accounting for 67.15% and entire homes accounting for 61.02% (see Exhibit 5). However, among all the

listings that feature strict cancellation requirements, entire rooms account for a majority of 57.07%, higher

than the rest of room types combined, followed by private room’s 39.25% (see Exhibit 6). Based on the

two metrics, we can assert that the entire home room type constitutes the main force that exerts strictness

in cancellation. Location-wise, Manhattan and Brooklyn are the two major sources of rental offerings, as

evidenced by their relatively high quantity of reviews. Besides, there seems to exist a positive linear

correlation between the price and the rate of cancellation policy for properties across the five boroughs,

meaning the rate of adopting cancellation policies increases as the rent price goes higher (see Exhibit 7).

Part 4: Causality Analysis

Validity Check on the Data In order to check if there is any significance between the two groups that differ in cancellation requirements,

we firstly examine the relationships between the cancellation policy and continuous variables one by one

through linear regression, as well as the relationships between the cancellation policy and nominal variables

through Chi-square tests (see Exhibit 8). By excluding those coefficients with p-values lower than 0.05,

we find that subjects in the strict group and the non-strict group are not comparable and original data falls

short of being random. Therefore, we proceed to apply Propensity Score Matching to sort out those subjects

that possess the same (or similar) set of observable characteristics from both of the groups. As it turns out,

the valid data significantly drops to 4,729 pairs for the two groups (see Exhibit 9). After having got the

matched data, we rerun the Chi-square tests and linear regressions that have been used to test the validity

of the data in the previous step. To our delight, this time p-value of all the coefficients are all greater than

0.05 (see Exhibit 10), indicating there is no bias between the control group and the treatment group now.

Modeling and Hypothesis Testing Before we began building the model, three models came to our minds. Obviously, the outcome variable,

number of reviews is non-negative integral, so either the Poisson regression or the negative binomial

regression shall suit. But for the consideration of being conservative, we decided to take a look at how the

outcome variable is distributed. As Exhibit 11 shows, a landslide portion of all properties have less than 10

reviews and the whole sample has a mean of 3.63 reviews and a variance of 17.28 reviews. In cases of such

an overdispersion, negative binomial would be more appropriate. Based on these statistics, we decide to

prioritize the negative binomial model. In addition, we decided to also give a thought to the linear

P a g e 4 | 16

regression and Poisson regression as backup models to run. Not to our surprise, the negative binomial

model turns out the highest log-likelihood among the three models (see Exhibit 12-15 for each model’s

summary and their comparison). Derived from that model we have the following findings:

1) Listings with strict cancellation policies have a higher number of reviews than non-strict ones;

2) Listings in Manhattan have more reviews than any other boroughs;

3) Entire home/apt is more popular and has more reviews than any other room types;

4) Listings that require no security deposit have a higher number of reviews;

5) Decreasing the price and cleaning fee would respectively increase the number of reviews, although

in different magnitudes;

6) Higher numbers of accommodatable guests lead to higher number of reviews.

Heterogeneous Treatment Effects Now that we have come up with insights for all the listings as a whole, we would dive deeper to see if the

cancellation policy has different interactive effects across other variables. The negative binomial model

with interaction terms of boroughs and the cancellation policy shows that strict cancellation policies have

a positive effect on the number of reviews for the listings in Brooklyn and Manhattan and no effect on the

other three boroughs (see Exhibit 16). This might be partially explained by the fact that Brooklyn has a

largest supply of lodging facilities while Manhattan is the hottest destination among lodge bookers.

Results from the interaction between the price and the cancellation policy shows that given the same price,

the presence of a strict cancellation policy does not influence the outcomes (see the hypothesis test in

Exhibit 17). This means that binding a strict cancellation policy with the price wouldn’t hurt the

competitiveness of the listings and the hosts are good to be at their own discretion with respect to whether

to restrict cancellation or not. The coefficient of the interaction between accommodates and cancellation

policy is not significant (see Exhibit 18). However, for the treatment group, being able to accommodate

one more guest could increase the number of reviews, as confirmed in the hypothesis test (see Exhibit 19).

Last but not the least, strict cancellation policy has different interactive effects across different room types,

with positive effect on private rooms, shared rooms, and entire rooms, and no effect on hotel rooms (see

Exhibit 20).

Suggestions Drawn from the Causality Analysis Combining results from the regression and the heterogeneous treatment analysis, we are able to draw the

following suggestions:

1) For listings in Manhattan and Brooklyn, hosts are expected to raise their thresholds of cancellation

conditions to secure the existing bookings and sales revenue, whereas the Bronx, Queens and Staten

Island seem immune to the side effects of strict policy.

2) In terms of deterring potential customers, high prices function in the same way as a strict

cancellation policy but in a greater magnitude.

3) If a listing price has been set and not adjustable, binding a strict cancellation policy will not decrease

the number of reviews.

4) For hosts who have to incur a considerable amount fee associated with housekeeping, they are better

off to allocate it in the booking price per night rather than list it as a stand-alone charge, since

customers are less sensitive to incremental increase of the booking price.

5) For room types other than hotel rooms, we recommend hosts to go with the strict policy but for hotel

rooms, appropriate flexible policy could be implemented to gain a higher number of reviews.

6) The host is recommended to lift the restrictions on the number of accommodatable guests if the

space allows them to do so.

P a g e 5 | 16

Part 5: Experiment Design

As said above, the dataset is a historical one, as such it is susceptible to impact from external events. In

order to rule out any possible confounding factors to draw causal inference, we should carry out a controlled

experiment. Before we get embarked to design a controlled experiment, let’s go back to rethink our

objective: improving sales revenue for both the landlord and Airbnb through testing the effect of the varied

strictness observed in the cancellation policy.

Methodology of the Experiment Design In terms of choosing a right outcome variable, we want to increase revenue per property without increasing

the vacancy rate. Therefore, the most straightforward way to measure this outcome is to look at the number

of nights that have been booked per month or monthly occupancy ratio. A lowered occupancy ratio is better

explained by the deterring effects of a high threshold cancellation policy. In fact, replacing occupancy with

number of reviews is a reconcile of lack of internal transaction data, since not every eligible reviewer will

leave a review, even though they are encouraged to do so.

When it comes to choosing the properties, we would like to run an A/A test first. For example, two groups

of properties feature exactly the same cancellation policy. Failed A/A tests suggest asymmetric experience

between the control and treatment groups. Given everything else being equal, if properties with different

cancellation policies have different occupancy ratios, we may easily attribute this difference to the effect

of the cancellation policy.

At last, we would run a before-after experiment instead of an after-only experiment. To do that, we would

impose a uniformed cancellation policy (e.g. the flexible one), on both two groups for a set period of time

and then have the treatment group switched to a stricter policy while leaving the other to stick to the old

policy. Moving forward for the same period of time, if there is still difference in the outcome, we shall be

more confident that this difference is a genuine one caused by the different treatments between the two

groups in the latter period.

Additional Suggestions Based on a New Experiment Through the experiment we may find that, most of the landlords or hosts turn to strict policies because of

being afraid of ending up vacant in case of any “last-minute” reneges from existing bookings, and compared

with hotels, they are less known to the market and more difficult to find new guests within a short period

of time. Going from there, we would recommend Airbnb to offer alternatives for a listing to increase its

exposure and visibility to potential customers, so as to further bring in a stable stream of lodging shoppers.

Some of these measures may include:

1) providing sponsored ads for interested hosts to optimize their rank on the platform, which will in

return diversify the source of revenue for Airbnb as well, other than relying on commissions from

each transaction between customers and hosts.

2) updating its algorithm of listing rankings, based on a holistic set of factors such as rating, page

quality, volume of pageviews, and past bookings.

3) advancing functionality of the in-site search engine. Currently Airbnb features a naïve search engine

where the searchers find a target listing only when they input the ID number of that listing. We

expect a move that would allow the user to input a set of keywords to define his/her searches.

From the host side, we encourage them to think about how to improve the popularity and customer rating

of their listings. Some of these actions might include posting more enticing images about the property and

its neighborhood to generate more organic online traffic, and providing more value-added services like

local touring guide to enrich the off-line experience of the guests. They may also consider to differentiate

themselves by labelling themselves with a few tags that could be further used as search keywords.

P a g e 6 | 16

Appendix

Exhibit 1: Variable Data Type

Variable Name Data Sub-type Data Type Remarks

ID Number series Numerical Unique Key

Location Character Categorical Dummy

Room Type Character Categorical Dummy

Accommodates Integer Discrete Count

Price Currency Continuous Numerical

Security Deposit Binary Categorical and ordinal Dummy

Cleaning Fee Currency Continuous Numerical

Cancellation Policy Binary Categorical and ordinal Dummy

Number of Reviews Integer Discrete Outcome

Exhibit 2: Boroughs in the New York City

Exhibit 3: Cancellation Policy Excerpt from Airbnb’s Official Website On Airbnb, hosts can choose which cancellation policies to offer to guests, and guests can review them before booking.

Based on their strictness, we divide them into two groups. Not-Strict group has a flexible or moderate cancellation

policy. The strict group has strict or super-strict cancellation policy.

1. Flexible: Free cancellation until 14 days before check-in. If booked less than 14 days before check-in, free cancellation

for 48 hours after booking, up to 24 hours before check-in. After that, the guest can cancel up to 24 hours before

check-in and get a refund of the nightly rate and the cleaning fee, but not the service fee.

2. Moderate: Free cancellation until 14 days before check-in. If booked less than 14 days before check-in, free

cancellation for 48 hours after booking up, up to 5 days before check-in. After that, guests can cancel up to 5 days

before check-in and get a refund of the nightly rate and the cleaning fee, but not the service fee.

3. Strict/Strict with 14 days grace period: Hosts can choose which strict policies to offer. The strict policy allows free

cancellation for 48 hours after booking up. After that, guests can cancel up to 7 days before check-in to get a 50%

refund of the nightly rate and cleaning, but not the service fee. Strict with 14 days grace period allows free cancellation

for 48 hours after booking up, as long as the guests cancel at least 14 days before check-in. After that, guests can

cancel up to 14 days before check-in to get a 50% refund of the nightly fee and the cleaning, but not the service fee.

4. Super Strict Policy 30/60: Hosts can choose which super strict policy to offer. Super strict 30 policy allows free

cancellation at least 30/60 days before check-in and gets a 50% refund of the nightly fee and cleaning fee, but not the

service fee.

P a g e 7 | 16

Exhibit 4: Pivot Table of Variables

Exhibit 5: Cancellation Rate by Room Type

Exhibit 6: Share of Strict Cancellation Policy by Room Type

P a g e 8 | 16

Exhibit 7: Number of Reviews by Borough

Exhibit 8: Validity Check of the Data

P a g e 9 | 16

Exhibit 9: Results of PSM

P a g e 10 | 16

Exhibit 10: Validity Check of the Matched Data

P a g e 11 | 16

Exhibit 11: Histogram of Number of Reviews after PSM

Exhibit 12: Negative Binomial Regression

P a g e 12 | 16

Exhibit 13: Linear Regression

Exhibit 14: Poisson Regression

P a g e 13 | 16

Exhibit 15: Model Comparison

Note: m1 has a lower log-likelihood and hence a poorer fit than nml

Note: pl has a lower log-likelihood and hence a poorer fit than ml

Note: p1 has a lower log-likelihood and hence a poorer fit than nml

P a g e 14 | 16

Exhibit 16: Interaction between Borough and Cancellation Policy

Exhibit 17: Interaction between Price and Cancellation Policy

P a g e 15 | 16

Exhibit 18: Interaction between Accommodates and Cancellation Policy

Exhibit 19: Hypothesis Test of the Coefficient of Accommodates

P a g e 16 | 16

Exhibit 20: Interaction between Room Type and Cancellation Policy

causality of the cancellation policy on bookings of airbnb · airbnb, inc. is an online marketplace...

Documents