Watershed Capstone Project - Price Optimization

This is the code that I use to run predictive model for forecasted occupancy rate as a function of nightly rent price. I also use optim in order to find the short-term rent price that could optimize revenue for each single property, just like we use Solver in Excel.

df <- read.csv(file = "database.csv", header=TRUE)
head(df)

Our purpose is to predict occupancy rate using current variables, so let’s run a simple prediction model with independent variable being sample_nightly_rent_price.

model<-lm(data=df, occupancy_rate ~ sample_nightly_rent_price)
summary(model)

## 
## Call:
## lm(formula = occupancy_rate ~ sample_nightly_rent_price, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40692 -0.10288 -0.00011  0.09632  0.46844 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                5.086e-01  2.121e-02  23.976  < 2e-16 ***
## sample_nightly_rent_price -1.524e-04  5.373e-05  -2.837  0.00494 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1615 on 242 degrees of freedom
## Multiple R-squared:  0.03219,    Adjusted R-squared:  0.02819 
## F-statistic: 8.048 on 1 and 242 DF,  p-value: 0.004941

This is obviously a very low-accurate model with R-squared being only a mere 3%. Intuitively, each cities and each regions will have different range of rent prices. Renting a small bedromm at $200/night is obviously too expensive in a small town at Montana; however, it could be considered as acceptable in a metropolis such as New York.

Thus, to capture this effect, we should incorporate the different in living expenses at different regions into our model, which could be done by reflecting each rent price as a percentile vs. prices of other properties in the same region. If $200 is at 90 percentile in Montana and at 50 percentile in New York, it would be better for the regression model to predict.

Thankfully, the given dataset allows us to do this practice.

col<-c("percentile_90th_price","percentile_10th_price","sample_nightly_rent_price")
dt<-df[col]

vs10 <- function (x) {if (x[3]>x[2]) {1} else {0}}
vs90 <- function (x) {if (x[3]<x[1]) {1} else {0}}


dt$vs10 <- apply(dt,1,vs10) #check if sample rent price is higher than percentile 10
dt$vs90 <- apply(dt,1,vs90) #check if sample rent price is lower than percentile 90

c(sum(dt$vs90),sum(dt$vs10))

## [1] 243 243

Since the total dataset has 243 instances, this indicates that all the sample rent price is within (10,90) percentile.

We can then convert the sample nightly rent price into percentile using below formula:

sample_percentile_price of a given property (x) = 0.1+0.8*(range between x and 10th percentile )/(range between 90 and 10 percentile)

df$percentile_90th_vs_10th =df$percentile_90th_price-df$percentile_10th_price
df$sample_vs_10th <- df$sample_nightly_rent_price-df$percentile_10th_price
df$sample_price_percentile <- 0.1+0.8*df$sample_vs_10th/df$percentile_90th_vs_10th

head(df)

Now, let’s rerun the regression model

model<-lm(data=df, occupancy_rate ~ sample_price_percentile)
summary(model)

## 
## Call:
## lm(formula = occupancy_rate ~ sample_price_percentile, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.37493 -0.06930  0.00154  0.07194  0.30655 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              0.85167    0.02560   33.27   <2e-16 ***
## sample_price_percentile -0.79359    0.04922  -16.12   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.114 on 242 degrees of freedom
## Multiple R-squared:  0.5179, Adjusted R-squared:  0.5159 
## F-statistic:   260 on 1 and 242 DF,  p-value: < 2.2e-16

I also tested inputting number of bedrooms into our model. However, this variable does not yield statistically significance. Perhaps the reason is that the rent price already factors the number of bedrooms, thus incorporating such new feature does not help much to our prediction.

model_1<-lm(data=df, occupancy_rate ~ sample_price_percentile + num_bedrooms)
summary(model_1)

## 
## Call:
## lm(formula = occupancy_rate ~ sample_price_percentile + num_bedrooms, 
##     data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.37028 -0.07003  0.00165  0.07053  0.31110 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              0.865209   0.033469  25.851   <2e-16 ***
## sample_price_percentile -0.793088   0.049287 -16.091   <2e-16 ***
## num_bedrooms            -0.009193   0.014615  -0.629     0.53    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1141 on 241 degrees of freedom
## Multiple R-squared:  0.5187, Adjusted R-squared:  0.5147 
## F-statistic: 129.9 on 2 and 241 DF,  p-value: < 2.2e-16

I also want to plot QQ chart and residual plot to test if our univariate model follows the assumption of linear regression.

model.res<-resid(model);

plot(df$sample_price_percentile, model.res, 
        ylab="Residuals", xlab="Sample Price Percentile") 
abline(0,0);

qqnorm(model.res, pch = 1, frame = FALSE)
qqline(model.res, col = "steelblue", lwd = 2)

Though the R-squared is only moderate, at 55%, considering current available dataset and other more important tasks in this analysis, I decide to go with this univariate model.

Our next step is to find optimal rent price for each property that can maximize our revenue. Higher rent price could lead to higher income for us, but it would also lower occupancy rate.

This requirement can easily be done in Excel Solver; nonetheless, running optimization for each property (total 243 instances) is an exhausting practice. Consequently, we need to use optim function in R.

# Firsly, we need to create a function to calculate revenue, based on predicted occupancy rate and nightly_rent_price

revenue <- function(data,par){
                            par_vs_10th <- par-data$percentile_10th_price
                            normalized_price <-0.1+0.8*par_vs_10th/data$percentile_90th_vs_10th
                            fcst_occupancy <-coef(model)['(Intercept)']+coef(model)['sample_price_percentile']*normalized_price
                            fcst_st_revenue <-fcst_occupancy*365*par
                            fcst_st_revenue
}

# Then create an optimization function and run for each row in df.

for (i in 1:nrow(df))   {df[i,'optimized_price'] <-
                                                optim(122,revenue,data=df[i,],method="L-BFGS-B",      control=list(fnscale=-1),lower=df[i,'percentile_10th_price']) }
#122 is an initialized value
# We need to input lower parameters to ensure that the optimal rent price does not fall lower than 10th percentile


head(df)

We then need to convert the optimized price in dollar values into percentiles as well. Then use the newly calculated variable to predict the occupancy rate.

df$normalized_optimized_price<-0.1+0.8*(df$optimized_price-df$percentile_10th_price)/(df$percentile_90th_vs_10th)

new <-data.frame(sample_price_percentile=df$normalized_optimized_price)
df$forecast_occupancy <- predict.lm(model,newdata=new)

head(df,10)