Naive Estimator





The naive estimator of the treatment effect, \(\widehat{\delta}^{\text{*}}_{N}\), is

\[ \widehat{\delta}^{\text{*}}_{N} = \overline{Y}_{1} - \overline{Y}_{0}, \] where \(\overline{Y}_{1}\) and \(\overline{Y}_{0}\) are the sample average outcomes among individuals that received treatment \(A=1\) and those that received treatment \(A=0\), respectively. Specifically,

\[ \overline{Y}_{1} = \frac{\sum\limits_{i=1}^{n} A_i Y_i}{\sum\limits_{i=1}^{n} A_i}~~~~ \text{and}~~~ \overline{Y}_{0} = \frac{\sum\limits_{i=1}^{n} (1-A_i) Y_i}{\sum\limits_{i=1}^{n} (1 - A_i)}. \]





We estimate the standard error of the naive estimator of the treatment effect as

\[ \text{SE}_N = \frac{1}{\sqrt{n}} \left\{\frac{1}{n}\sum_{i = 1}^{n} \left( \widehat{\delta}^{\text{*}}_{N,i} \right) ^2 \right\}^{1/2}, \] where \(\widehat{\delta}^{\text{*}}_{N,i}\) is \[ \widehat{\delta}^{\text{*}}_{N,i} = \frac{ A_i \left( Y_i - \overline{Y}_{1} \right)}{\frac{1}{n}\sum\limits_{i=1}^{n} A_i} - \frac{ (1-A_i) \left(Y_i - \overline{Y}_{0}\right)}{\frac{1}{n}\sum\limits_{i=1}^{n} (1 - A_i)}. \]

This expression follows from the sandwich formula discussed in Section 2.5, Review of M-estimation, where the estimator for \(\delta^{\text{*}}\) solves jointly in \(\delta^{\text{*}}\) and \(\pi\) the stacked estimating equations

\[ \begin{align} \sum_{i = 1}^{n} \frac{ A_i Y_i}{\pi} - \frac{ (1-A_i) Y_i}{1 - \pi} & = & 0 \\ \sum_{i = 1}^{n} \left( A_{i} - \pi \right) &=& 0. \end{align} \]



We have defined two functions to estimate the treatment effect and the standard error.

  • delta_N(): naive estimator of the treatment effect.
  • delta_N_se(): calls delta_N() to estimate the treatment effect using the naive estimator and estimates its standard error.


#------------------------------------------------------------------------------#
# naive estimator of tx effect
#
# ASSUMPTIONS
#   tx is binary coded as 0,1
#
# INPUTS
#  y : a vector containing the outcome of interest
#  A : a vector containing the tx received
#
# RETURNS
#  a list containing
#        EY : the sample mean of the outcome under each tx option
#  deltaHat : the naive estimator for the tx effect
#------------------------------------------------------------------------------#
delta_N <- function(y, A) {

  #### Average Responses

  # aggregate data for mean in each tx
  EY <- stats::aggregate(x = y, by = list(A), FUN = mean)

  # convert to named vector
  EY <- array(data = EY[,2L], dimnames = list(EY[,1L]))

  #### Treatment Effect

  delta <- unname(obj = EY[2L] - EY[1L])

  return( list("EY" = EY, "deltaHat" = delta) )
}


#------------------------------------------------------------------------------#
# naive estimator of tx effect and its standard error
#
# REQUIRES
#   delta_N()
#
# ASSUMPTIONS
#   tx is binary coded as {0,1}
#
# INPUTS
#  y : a vector containing the outcome of interest
#  A : a vector containing the tx received
#
# RETURNS
#  a list containing
#  deltaHat : the naive estimator for the tx effect
#        EY : the sample mean of the outcome under each tx option
#  sigmaHat : the estimated standard error
#------------------------------------------------------------------------------#
delta_N_se <- function(y, A) {

  # tx effect
  delta <- delta_N(y = y, A = A)

  # mean outcome A = 1
  ybar_1 <- sum(y * A) / sum(A)
  
  # mean outcome A = 0
  ybar_0 <- sum(y * {1.0 - A}) / sum(1.0 - A)

  # estimated tx effect each individual
  delta_N_i <- A * {y - ybar_1} / mean(x = A) - 
               {1.0 - A} * {y - ybar_0} / mean(x = 1.0 - A)

  # variance
  sigmaSq <- mean(x = {delta_N_i}^2L)

  return( c(delta, "sigmaHat" = sqrt(x = sigmaSq / length(x = y))) ) 
}


For our simulated data set, the naive estimate of the treatment effect and its standard error are

delta_N_se(y = y, A = dataSBP$A)
$EY
        0         1 
-4.164835 14.222467 

$deltaHat
[1] 18.3873

$sigmaHat
[1] 0.7543302

Under the naive estimator, the treatment effect is estimated to be \(\widehat{\delta}^{\text{*}}_N =\) 18.39 (0.75) mmHg indicating that patients that received the fictitious medication experienced on average a larger decrease in systolic blood pressure after six months as compared to patients that did not receive the drug. However, the randomization assumption underpinning this estimator is not appropriate for our simulated data set, and this result may be misleading.

Outcome Regression Estimator





The outcome regression estimator of the treatment effect, \(\widehat{\delta}^{*}_{OR}\), is defined as

\[ \widehat{\delta}^{*}_{OR} = n^{-1} \sum_{i=1}^{n} \left\{Q(X_{i},1; \widehat{\beta}) - Q(X_{i},0;\widehat{\beta})\right\}, \] where \(Q(x, a; \beta)\) is a parametric model for \(Q(x,a) = E\left(Y|X=x, A=a\right)\) and \(\widehat{\beta}\) is a suitable estimator.



For the outcome regression estimator of the treatment effect for which the outcome regression parameters are estimated using ordinary least squares (OLS), the \((1,1)\) component of the sandwich estimator for the variance is defined as

\[ \widehat{\Sigma}_{11} =A_n + (C^{\intercal}_n~D^{-1}_n)~B_n~(C^{\intercal}_n~D^{-1}_n)^{\intercal}, \]

where

\(A_{n} = n^{-1}\sum_{i=1}^{n} A_{ni} A_{ni}\) and \(B_{n} = n^{-1}\sum_{i=1}^{n} B_{ni} B_{ni}^{\intercal}\) with

\[ \begin{aligned} A_{ni} &= \left\{Q(X_{i},1;\widehat{\beta})- Q(X_{i},0;\widehat{\beta})\right\} - \widehat{\delta}^{*}_{OR}, \\ \\ B_{ni} &= \left\{Y_{i} - Q(X_{i},A_{i};\widehat{\beta})\right\} \frac{\partial Q(X_{i},A_{i};\widehat{\beta})}{\partial \beta}. \end{aligned} \]

And,

\[ \begin{aligned} C_n = n^{-1}\sum_{i=1}^{n}\frac{\partial A_{ni}}{\partial {\beta}} \quad D_n = n^{-1}\sum_{i=1}^{n}\frac{\partial B_{ni}}{\partial {\beta}^{\intercal}}. \end{aligned} \]



Anticipating later methods, we implement the treatment effect and the sandwich estimator of the variance using multiple functions, each function calculating a specific component of the M-estimator and its derivative.

  • delta_OR_se(): estimates the treatment effect using the outcome regression estimator and estimates its standard error using the sandwich variance estimator.
  • delta_OR_swv(): \(\widehat{\delta}^{\text{*}}_{OR}\) component of the M-estimating equation and its derivatives, i.e., \(A_n\) and \(C^{\intercal}_n\) of the sandwich variance expression.
  • delta_OR(): outcome regression estimator of the treatment effect.
  • swv_OLS(): OLS contributions to the sandwich estimator and its derivatives, i.e., \(B_n\) and \(D^{-1}_n\) of the sandwich variance expression.

This implementation is not the most efficient implementation of the variance estimator. For example, the regression step is completed twice (once for delta_OR_swv() and once for swv_OLS()). However, our objective is to provide general, reusable code. Function swv_OLS() will be reused later in this chapter as well as in Chapter 3.



#------------------------------------------------------------------------------#
# outcome regression estimator of tx effect
#
# ASSUMPTIONS
#   tx is denoted by A and is binary coded as {0,1}
#
# INPUTS
#  moOR : a modeling object specifying the outcome regression step
#  data : a data.frame containing covariates and tx
#     y : a vector containing the outcome of interest
#
# RETURNS
#  a list containing
#     fitOR : a modelObjFit object containing the results of the
#             outcome regression step
#        EY : the sample mean of the outcome under each tx option
#  deltaHat : the outcome regression estimator for the tx effect
#------------------------------------------------------------------------------#
delta_OR <- function(moOR, data, y) {

  #### Outcome Regression 

  fitOR <- modelObj::fit(object = moOR, data = data, response = y)

  # Q(X,0;betaHat)
  data$A <- 0L 
  Q0 <- drop(x = modelObj::predict(object = fitOR, newdata = data))

  # Q(X,1;betaHat)
  data$A <- 1L 
  Q1 <- drop(x = modelObj::predict(object = fitOR, newdata = data))

  #### Tx Effect

  EY <- array(data = 0.0, dim = 2L, dimnames = list(c("0","1")))
  EY[2L] <- mean(x = Q1)
  EY[1L] <- mean(x = Q0)
  delta <- unname(obj = EY[2L] - EY[1L])

  return( list(   "fitOR" = fitOR, 
                     "EY" = EY,
               "deltaHat" = delta) )
}


#------------------------------------------------------------------------------#
# outcome regression estimator of tx effect and its standard error
#
# REQUIRES
#   swv_OLS() and delta_OR_swv()
#
# ASSUMPTIONS
#   tx is denoted by A and is binary coded as {0,1}
#   outcome regression model is a linear model and parameters are estimated 
#     using OLS
#
# INPUTS
#  moOR : a modeling object specifying the outcome regression step
#         *** must be a linear model ***
#  data : a data.frame containing covariates and tx
#         *** tx must be named 'A' and coded 0/1 ***
#     y : a vector containing the outcome of interest
#
# RETURNS
#  a list containing
#  deltaHat : the outcome regression estimator for the tx effect
#        EY : the sample mean of the outcome under each tx option
#     fitOR : a modelObjFit object containing the results of the
#             outcome regression step
#  sigmaHat : the estimated standard error
#------------------------------------------------------------------------------#
delta_OR_se <- function(moOR, data, y) {

  #### OLS components
  OLS <- swv_OLS(mo = moOR, data = data, y = y) 

  #### estimator components
  OR <- delta_OR_swv(moOR = moOR, data = data, y = y)

  #### 1,1 Component of Sandwich Estimator

  # OLS contribution
  temp <- OR$dMdB %*% OLS$invdM
  sig11OLS <- temp %*% OLS$MM %*% t(x = temp)

  sig11 <- drop(x = OR$MM + sig11OLS)

  return( c(OR$delta, "sigmaHat" = sqrt(x = sig11 / nrow(x = data))) )

}


#------------------------------------------------------------------------------#
# tx effect estimator component of the sandwich estimator 
# outcome regression estimator
#
# REQUIRES
#   delta_OR()
#
# ASSUMPTIONS
#   outcome regression model is a linear model
#   tx is denoted by A and is binary coded as {0,1}
#
# INPUTS:
#  moOR : a modeling object specifying the outcome regression step
#         *** must be a linear model ***
#  data : a data.frame containing covariates and tx
#         *** tx must be named 'A' and coded 0/1 ***
#     y : a vector containing the outcome of interest
#
# OUTPUTS:
#  a list containing
#  delta : the list returned by delta_OR()
#     MM : M M^T matrix
#   dMdB : matrix of derivatives of M wrt beta
#------------------------------------------------------------------------------#
delta_OR_swv <- function(moOR, data, y) {

  # estimate tx effect
  delta <- delta_OR(moOR = moOR, data = data, y = y)

  # Q(X,0;betaHat)
  data$A <- 0L 
  Q0 <- drop(modelObj::predict(object = delta$fitOR, newdata = data))

  # derivative of Q(X,0;betaHat)
  dQ0 <- stats::model.matrix(object = modelObj::model(object = moOR), 
                             data = data)

  # Q(X,1;betaHat)
  data$A <- 1L 
  Q1 <- drop(modelObj::predict(object = delta$fitOR, newdata = data))

  # derivative of Q(X,1;betaHat)
  dQ1 <- stats::model.matrix(object = modelObj::model(object = moOR), 
                             data = data)

  # delta component of M MT
  mmDelta <- mean(x = {Q1 - Q0 - delta$deltaHat}^2)

  # derivative of delta component w.r.t beta
  dMdB <- colMeans(x = dQ1 -dQ0) 

  return( list("delta" = delta, "MM" = mmDelta, "dMdB" = dMdB) )

}


#------------------------------------------------------------------------------#
# components of the sandwich estimator for an ordinary least squares estimation
#   of a linear regression model
#
# ASSUMPTIONS
#   mo is a linear model with parameters to be estimated using OLS
#
# INPUTS
#    mo : a modeling object specifying a regression step
#         *** must be a linear model ***
#  data : a data.frame containing covariates and tx
#     y : a vector containing the outcome of interest
#
#
# RETURNS
# a list containing
#     MM : M^T M component from OLS
#  invdM : inverse of the matrix of derivatives of M wrt beta
#------------------------------------------------------------------------------#
swv_OLS <- function(mo, data, y) {

  n <- nrow(x = data)

  fit <- modelObj::fit(object = mo, data = data, response = y)

  # Q(X,A;betaHat)
  Qa <- drop(x = modelObj::predict(object = fit, newdata = data))

  # derivative of Q(X,A;betaHat)
  dQa <- stats::model.matrix(object = modelObj::model(object = mo), 
                             data = data)

  # number of parameters in model
  p <- ncol(x = dQa)

  # OLS component of M
  mOLSi <- {y - Qa}*dQa

  # OLS component of M MT
  mmOLS <- sapply(X = 1L:n, 
                  FUN = function(i){ mOLSi[i,] %o% mOLSi[i,] }, 
                  simplify = "array")

  # derivative OLS component
  dmOLS <- - sapply(X = 1L:n, 
                    FUN = function(i){ dQa[i,] %o% dQa[i,] }, 
                    simplify = "array")

  # sum over individuals
  if (p == 1L) {
    mmOLS <- mean(x = mmOLS)
    dmOLS <- mean(x = dmOLS)
  } else {
    mmOLS <- rowMeans(x = mmOLS, dim = 2L)
    dmOLS <- rowMeans(x = dmOLS, dim = 2L)
  }

  # invert derivative OLS component
  invD <- solve(a = dmOLS) 

  return( list("MM" = mmOLS, "invdM" = invD) )

}


Throughout Chapters 2-4, we consider three outcome regression models selected to represent a range of model (mis)specification. Note that we are not demonstrating a definitive analysis that one might do in practice, in which the analyst would use all sorts of variable selection techniques, etc, to arrive at a posited model. Our objective is to illustrate how the methods work under a range of model (mis)specification.


Click on each tab below to see the model and basic model diagnostic steps.



The first model is a completely misspecified model

\[ Q^{1}(x,a;\beta) = \beta_{0} + \beta_{1} \text{W} + \beta_{2} \text{Cr} + a~(\beta_{3} + \beta_{4} \text{Cr}). \]


Modeling Object


The parameters of this model will be estimated using ordinary least squares via R’s stats::lm() function. Predictions will be made using stats::predict.lm(), which by default returns predictions on the scale of the response variable.

The modeling objects for this regression step is

q1 <- modelObj::buildModelObj(model = ~ W + A*Cr,
                              solver.method = 'lm',
                              predict.method = 'predict.lm')

Model Diagnostics


Though ultimately, the regression steps will be performed within the implementations of the treatment effect and value estimators, we make use of modelObj::fit() to perform some preliminary model diagnostics.

For \(Q^{1}(x,a;\beta)\) the regression can be completed as follows

OR1 <- modelObj::fit(object = q1, data = dataSBP, response = y)
OR1 <- modelObj::fitObject(object = OR1)
OR1

Call:
lm(formula = YinternalY ~ W + A + Cr + A:Cr, data = data)

Coefficients:
(Intercept)            W            A           Cr         A:Cr  
   -6.66893      0.02784     16.46653      0.56324      2.41004  

where for convenience we have made use of modelObj’s fitObject() function to strip away the modeling object framework making OR1 an object of class “lm.”

Let’s examine the regression results for the model under consideration. First, the diagnostic plots defined for “lm” objects.

par(mfrow = c(2,2))
graphics::plot(x = OR1)

We see that the diagnostic plots for model \(Q^{1}(x,a;\beta)\) show some unusual behaviors. The two groupings of residuals in the Residuals vs Fitted and the Scale-Location plots are reflecting the fact that the model includes only the covariates W and Cr, neither of which are associated with outcome in the true regression relationship. Thus, for all practical purposes the model is fitting the two treatment means.

Now, let’s examine the summary statistics for the regression step

summary(object = OR1)

Call:
lm(formula = YinternalY ~ W + A + Cr + A:Cr, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-35.911  -7.605  -0.380   7.963  35.437 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) -6.66893    4.67330  -1.427  0.15389   
W            0.02784    0.02717   1.025  0.30564   
A           16.46653    5.96413   2.761  0.00587 **
Cr           0.56324    5.07604   0.111  0.91167   
A:Cr         2.41004    7.22827   0.333  0.73889   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.6 on 995 degrees of freedom
Multiple R-squared:  0.3853,    Adjusted R-squared:  0.3828 
F-statistic: 155.9 on 4 and 995 DF,  p-value: < 2.2e-16

We see that the residual standard error is large and that the adjusted R-squared value is small.

Readers familiar with R might have noticed that the response variable specified in the call to the regression method is YinternalY. This is an internal naming convention within package modelObj; it is understood to represent the outcome of interest \(y\).



The second model is an incomplete model having only one of the covariates of the true model,

\[ Q^{2}(x,a;\beta) = \beta_{0} + \beta_{1} \text{Ch} + a~(\beta_{2} + \beta_{3} \text{Ch}). \]


Modeling Object


As for \(Q^{1}(x,a;\beta)\), the parameters of this model will be estimated using ordinary least squares via R’s stats::lm() function. Predictions will be made using stats::predict.lm(), which by default returns predictions on the scale of the response variable.

The modeling object for this regression step is

q2 <- modelObj::buildModelObj(model = ~ Ch*A,
                              solver.method = 'lm',
                              predict.method = 'predict.lm')

Model Diagnostics


For \(Q^{2}(x,a;\beta)\) the regression can be completed as follows

OR2 <- modelObj::fit(object = q2, data = dataSBP, response = y)
OR2 <- modelObj::fitObject(object = OR2)
OR2

Call:
lm(formula = YinternalY ~ Ch + A + Ch:A, data = data)

Coefficients:
(Intercept)           Ch            A         Ch:A  
    36.5101      -0.2052     -89.5245       0.5074  

where for convenience we have made use of modelObj’s fitObject() function to strip away the modeling object framework making OR2 an object of class “lm.”

First, let’s examine the diagnostic plots defined for “lm” objects.

par(mfrow = c(2,4))
graphics::plot(x = OR2)
graphics::plot(x = OR1)

where we have included the diagnostic plots for model \(Q^{1}(x,a;\beta)\) for easy comparison. We see that the residuals from the fit of model \(Q^{2}(x,a;\beta)\) do not exhibit the two groupings, reflecting the fact that \(Q^{2}(x,a;\beta)\) is only partially misspecified in that it includes the important covariate Ch.

Now, let’s examine the summary statistics for the regression step

summary(object = OR2)

Call:
lm(formula = YinternalY ~ Ch + A + Ch:A, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-16.1012  -2.7476  -0.0032   2.6727  15.1825 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  36.510110   0.933019   39.13   <2e-16 ***
Ch           -0.205226   0.004606  -44.56   <2e-16 ***
A           -89.524507   1.471905  -60.82   <2e-16 ***
Ch:A          0.507374   0.006818   74.42   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.511 on 996 degrees of freedom
Multiple R-squared:  0.907, Adjusted R-squared:  0.9068 
F-statistic:  3239 on 3 and 996 DF,  p-value: < 2.2e-16

Comparing to the diagnostics for model \(Q^{1}(x,a;\beta)\), we see that the residual standard error is smaller (4.51 vs 11.6) and that the adjusted R-squared value is larger (0.91 vs 0.38). Both of these results indicate that model \(Q^{2}(x,a;\beta)\) is a more suitable model for \(E(Y|X=x,A=a)\) than model \(Q^{1}(x,a;\beta)\).



The third model we will consider is the correctly specified model used to generate the dataset,

\[ Q^{3}(x,a;\beta) = \beta_{0} + \beta_{1} \text{Ch} + \beta_{2} \text{K} + a~(\beta_{3} + \beta_{4} \text{Ch} + \beta_{5} \text{K}). \]


Modeling Object


As for \(Q^{1}(x,a;\beta)\) and \(Q^{2}(x,a;\beta)\), the parameters of this model will be estimated using ordinary least squares via R’s stats::lm() function. Predictions will be made using stats::predict.lm(), which by default returns predictions on the scale of the response variable.

The modeling object for this regression step is

q3 <- modelObj::buildModelObj(model = ~ (Ch + K)*A,
                              solver.method = 'lm',
                              predict.method = 'predict.lm')

Model Diagnostics


For \(Q^{3}(x,a;\beta)\) the regression can be completed as follows

OR3 <- modelObj::fit(object = q3, data = dataSBP, response = y)
OR3 <- modelObj::fitObject(object = OR3)
OR3

Call:
lm(formula = YinternalY ~ Ch + K + A + Ch:A + K:A, data = data)

Coefficients:
(Intercept)           Ch            K            A         Ch:A          K:A  
   -15.6048      -0.2035      12.2849     -61.0979       0.5048      -6.6099  

where for convenience we have made use of modelObj’s fitObject() function to strip away the modeling object framework making OR3 an object of class “lm.”

Again, let’s start by examining the diagnostic plots defined for “lm” objects.

par(mfrow = c(2,4))
graphics::plot(x = OR3)
graphics::plot(x = OR2)

where we have included the diagnostic plots for model \(Q^{2}(x,a;\beta)\) for easy comparison. We see that the results for models \(Q^{2}(x,a;\beta)\) and \(Q^{3}(x,a;\beta)\) are very similar, with model \(Q^{3}(x,a;\beta)\) yielding slightly better diagnostics (e.g., smaller residuals); a result in line with the knowledge that \(Q^{3}(x,a;\beta)\) is the model used to generate the data.

Now, let’s examine the summary statistics for the regression step

summary(object = OR3)

Call:
lm(formula = YinternalY ~ Ch + K + A + Ch:A + K:A, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.0371 -1.9376  0.0051  2.0127  9.6452 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -15.604845   1.636349  -9.536   <2e-16 ***
Ch           -0.203472   0.002987 -68.116   <2e-16 ***
K            12.284852   0.358393  34.278   <2e-16 ***
A           -61.097909   2.456637 -24.871   <2e-16 ***
Ch:A          0.504816   0.004422 114.168   <2e-16 ***
K:A          -6.609876   0.538386 -12.277   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.925 on 994 degrees of freedom
Multiple R-squared:  0.961, Adjusted R-squared:  0.9608 
F-statistic:  4897 on 5 and 994 DF,  p-value: < 2.2e-16

As for model \(Q^{2}(x,a;\beta)\), we see that the residual standard error is smaller (2.93 vs 4.51) and that the adjusted R-squared value is larger (0.96 vs 0.91). Again, these results indicate that model \(Q^{3}(x,a;\beta)\) is a more suitable model than both models \(Q^{1}(x,a;\beta)\) and \(Q^{2}(x,a;\beta)\).



With the implementations defined, it is straightforward to estimate the treatment effect and the standard error for a variety of linear models. Throughout this chapter, we consider three outcome regression models selected to represent a range of model (mis)specification. Click on each tab below to see the respective analysis. The treatment effects and standard errors under all models are summarized under the heading Comparison.





The first model is a completely misspecified model

\[ Q^{1}(x,a;\beta) = \beta_{0} + \beta_{1} \text{W} + \beta_{2} \text{Cr} + a~(\beta_{3} + \beta_{4} \text{Cr}). \]

For \(Q^{1}(x,a;\beta)\) the estimated treatment effect and its standard error are obtained as follows:

q1 <- modelObj::buildModelObj(model = ~ W + A*Cr,
                              solver.method = 'lm',
                              predict.method = 'predict.lm')
delta_OR_se(moOR = q1, data = dataSBP, y = y)
$fitOR

Call:
lm(formula = YinternalY ~ W + A + Cr + A:Cr, data = data)

Coefficients:
(Intercept)            W            A           Cr         A:Cr  
   -6.66893      0.02784     16.46653      0.56324      2.41004  


$EY
        0         1 
-4.184763 14.255836 

$deltaHat
[1] 18.4406

$sigmaHat
[1] 0.7563737

We see that the estimated treatment effect is 18.44 (0.76) mmHg. Recall, the model diagnostic step indicated that there are issues with this model, and thus these results may be misleading.



The second model is an incomplete model having only one of the covariates of the true model,

\[ Q^{2}(x,a;\beta) = \beta_{0} + \beta_{1} \text{Ch} + a~(\beta_{2} + \beta_{3} \text{Ch}). \]

The estimated treatment effect and its standard error are obtained as follows:

q2 <- modelObj::buildModelObj(model = ~ Ch*A,
                              solver.method = 'lm',
                              predict.method = 'predict.lm')
delta_OR_se(moOR = q2, data = dataSBP, y = y)
$fitOR

Call:
lm(formula = YinternalY ~ Ch + A + Ch:A, data = data)

Coefficients:
(Intercept)           Ch            A         Ch:A  
    36.5101      -0.2052     -89.5245       0.5074  


$EY
        0         1 
-6.432054 10.208093 

$deltaHat
[1] 16.64015

$sigmaHat
[1] 0.7561243

We see that the estimated treatment effect is 16.64 (0.76) mmHg, a value smaller than that obtained under the completely misspecified model, \(Q^{1}(x,a;\beta)\) and with similar standard error.



The third model we consider is the correctly specified model used to generate the dataset,

\[ Q^{3}(x,a;\beta) = \beta_{0} + \beta_{1} \text{Ch} + \beta_{2} \text{K} + a~(\beta_{3} + \beta_{4} \text{Ch} + \beta_{5} \text{K}). \]

The estimated treatment effect and its standard error are obtained as follows:

q3 <- modelObj::buildModelObj(model = ~ (Ch + K)*A,
                              solver.method = 'lm',
                              predict.method = 'predict.lm')
delta_OR_se(moOR = q3, data = dataSBP, y = y)
$fitOR

Call:
lm(formula = YinternalY ~ Ch + K + A + Ch:A + K:A, data = data)

Coefficients:
(Intercept)           Ch            K            A         Ch:A          K:A  
   -15.6048      -0.2035      12.2849     -61.0979       0.5048      -6.6099  


$EY
        0         1 
-6.458372 10.244167 

$deltaHat
[1] 16.70254

$sigmaHat
[1] 0.7282849

We see that the estimated treatment effect is 16.7 (0.73) mmHg, a value similar to that obtained under the previously discussed incomplete model, \(Q^{2}(x,a;\beta)\), and with similar standard error.



In the table below, we see that the estimated treatment effect under these three models ranges from 16.64 mmHg - 18.44 mmHg.

(mmHg) \(Q^{1}(x,a;\beta)\) \(Q^{2}(x,a;\beta)\) \(Q^{3}(x,a;\beta)\)
\(\widehat{\delta}^{\text{*}}_{OR}\) 18.44 (0.76) 16.64 (0.76) 16.70 (0.73)

Stratification Estimator





Following the suggestion of Rosenbaum and Rubin (1983, 1984) we can estimate \(\delta^{\text{*}}\) by stratifying individuals into \(K\) groups (\(K>1)\) based on the values of the estimated propensity score \(\pi(x;\widehat{\gamma})\).


Groups are defined using cut-off values such that \(0 = c_0 \lt c_1 \lt \cdots \lt c_K = 1\). Individual \(i\) belongs to group \(j\) if \(c_{j-1} \lt \pi(X_{i}; \widehat{\gamma}) \le c_j\); we denote group membership as \[ g_{ji} = \text{I}\left\{c_{j-1} < \pi(X_{i};\widehat{\gamma}_{1}) \le c_{j}\right\}; \] that is, \(g_{ji}\) is 1 if the propensity score for individual \(i\) satisfies \(c_{j-1} < \pi(X_{i};\widehat{\gamma}_{1}) \le c_{j}\) and is thereby in group \(j\), and \(g_{ji}\) is 0 otherwise.


Within each group, we can define sample average outcomes among individual in the group receiving treatment 1 and among those receiving treatment 0 as \[ \overline{Y}_{1j} = \frac{\frac{1}{n_{j}}\sum_{i = 1}^{n} A_{i}~Y_{i}~g_{ji}}{\frac{1}{n_{j}}\sum_{k=1}^{n} A_{k}~g_{jk}} \quad \textrm{and} \quad \overline{Y}_{0j} = \frac{\frac{1}{n_{j}}\sum_{i = 1}^{n}(1-A_{i})~Y_{i}~g_{ji}}{\frac{1}{n_{j}}\sum_{k=1}^{n} (1-A_{k})~g_{jk}}. \]

The treatment effect for group \(j\) is written as \[ \widehat{\delta}^{\text{*}}_{Sj} = \overline{Y}_{1j} - \overline{Y}_{0j}. \]

The stratification estimator for the treatment effect is the weighted sum of \(\widehat{\delta}^{\text{*}}_{Sj}\)

\[ \widehat{\delta}^{\text{*}}_{S} = \sum_{j=1}^{K}(\overline{Y}_{1j} - \overline{Y}_{0j}) (n_j/n). \]

where \(n_{j}\) is the number of individuals in the \(j^{th}\) group.

Cut-off values are chosen such that an approximately equal number of individuals are in each stratum.



We estimate the standard error using the pooled estimator: \[ \text{SE}_{pooled} = \left(\frac{1}{K}\sum_{j=1}^{K} \widehat{\sigma}_{j}^2\right)^{1/2}, \] where \(\sigma_{j}^2\) is the variance for group \(j\). For the stratification estimator, the variance is defined as

\[ \widehat{\sigma}^{2}_{j} = \frac{1}{n_{j}} \sum_{i=1}^{n} \left(\widehat{\delta}^{\text{*}}_{Sj,i} - \widehat{\delta}^{\text{*}}_{Sj}\right)^2 g_{ji}, \] where \(g_{ji}\) is the indicator of membership in group \(j\).



We have defined two functions to estimate the treatment effect and the standard error.

  • delta_S_se(): estimates the treatment effect using the stratification estimator and uses the pooled estimator to estimate its standard error.
  • delta_S(): stratification estimator of the treatment effect.


#------------------------------------------------------------------------------#
# stratification estimator of tx effect
#
# ASSUMPTIONS
#   tx is denoted by A and is binary coded as {0,1}
#
# INPUTS
#  moPS : a modeling object specifying the propensity score regression step
#  data : a data.frame containing covariates and tx
#         *** tx must be named 'A' and coded 0/1 ***
#     y : a vector containing the outcome of interest
#     K : the integer number of strata
#
# RETURNS
#  a list containing
#      fitPS : a modelObjFit object containing the results of the
#              propensity score regression step
#         cj : the propensity score boundaries defining the strata
#        grp : the number of individuals in each stratum
#         EY : the sample mean of the outcome under each tx option
#  deltaHatj : the stratification estimator for the tx effect for each strata
#   deltaHat : the stratification estimator for the tx effect
#------------------------------------------------------------------------------#
delta_S <- function(moPS, data, y, K) {

  #### Propensity Score Regression

  fitPS <- modelObj::fit(object = moPS, data = data, response = data$A)

  # estimated propensity score
  p1 <- modelObj::predict(object = fitPS, newdata = data) 
  if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]

  #### Identify Strata

  # cutoff points for K groups 
  probs <- seq(from = 0.0, to = 1.0, length.out = K+1L)
  coff <- stats::quantile(x = p1, probs = probs) 
  coff[1L] <- 0.0
  coff[K+1L] <- 1.0

  # group ids for each individual
  grp <- colSums(x = sapply(X = p1, FUN = function(x){x <= coff}))

  #### Treatment Effect

  EY <- matrix(data = 0.0, nrow = K, ncol = 2L, 
               dimnames = list(NULL, c("0","1")))

  delta <- 0.0

  deltaj <- array(data = 0.0, dim = K, dimnames = list(1L:K))

  for (j in 1L:K) {

    gji <- grp == j
    
    EY[j,2L] <- mean(x = data$A[gji] * y[gji]) / mean(x = data$A[gji])
    EY[j,1L] <- mean({1.0-data$A[gji]} * y[gji]) / mean(x = 1.0 - data$A[gji])

    deltaj[j] <- EY[j,2L] - EY[j,1L]

    delta <- delta + sum(gji) / nrow(x = data) * deltaj[j]
  }

  delta <- unname(obj = delta)

  return( list(    "fitPS" = fitPS,
                      "cj" = coff,
                     "grp" = table(bin = grp),
                      "EY" = EY,
               "deltaHatj" = deltaj,
                "deltaHat" = delta) )

}


#------------------------------------------------------------------------------#
# stratification estimator of tx effect and its standard error
#
# REQUIRES
#   delta_S()
#
# ASSUMPTIONS
#   tx is denoted by A and is binary coded as {0,1}
#
# INPUTS
#  moPS : a modeling object specifying the propensity score regression step
#  data : a data.frame containing covariates and tx
#         *** tx must be named 'A' and coded 0/1 ***
#     y : a vector containing the outcome of interest
#     K : the integer number of strata
#
# RETURNS
#  a list containing
#      fitPS : a modelObjFit object containing the results of the
#              propensity score regression step
#         cj : the propensity score boundaries defining the strata
#        grp : the number of individuals in each stratum
#         EY : the sample mean of the outcome under each tx option
#  deltaHatj : the stratification estimator for the tx effect for each strata
#   deltaHat : the stratification estimator for the tx effect
#   sigmaHat : the estimated standard error
#------------------------------------------------------------------------------#
delta_S_se <- function(moPS, data, y, K) {

  #### Treatment Effect

  delta <- delta_S(moPS = moPS, data = data, y = y, K = K)

  #### Standard Error

  # estimated propensity score
  p1 <- modelObj::predict(object = delta$fitPS, newdata = data) 
  if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]

  # group ids for each individual
  grp <- colSums(x = sapply(X = p1, FUN = function(x){x <= delta$cj}))

  sigmaSq <- 0.0

  for (j in 1L:K) {

    gji <- grp == j
    
    Y1ji <- data$A[gji] * y[gji] / mean(x = data$A[gji])
    Y0ji <- {1.0-data$A[gji]} * y[gji] / mean(x = 1.0 - data$A[gji])

    deltaji <- {Y1ji - Y0ji}

    sigmaSq <- sigmaSq + mean(x = {deltaji - delta$deltaHatj[j]}^2)
  }

  sigmaSq <- sigmaSq / K

  return( c(delta, "sigmaHat" = sqrt(x = sigmaSq / nrow(x = data))) )
                
}


Throughout Chapters 2-4, we consider three propensity score models selected to represent a range of model (mis)specification. Note that we are not demonstrating a definitive analysis that one might do in practice, in which the analyst would use all sorts of variable selection techniques, etc, to arrive at a posited model. Our objective is to illustrate how the methods work under a range of model (mis)specification.


Click on each tab below to see the model and basic model diagnostic steps.



The first model is a completely misspecified model having none of the covariates used in the data generating model

\[ \pi^{1}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}. \]


Modeling Object


The parameters of this model will be estimated using maximum likelihood via R’s stats::glm() function. Predictions will be made using stats::predict.glm(), which by default returns predictions on the scale of the linear predictors. We will see in the coming sections that this is not the most convenient scale, so we opt to include a modification to the default input argument of stats::predict.glm() to return predictions on the scale of the response variable, i.e., the probabilities. The modeling object for this model is specified as

p1 <- modelObj::buildModelObj(model = ~ W + Cr,
                              solver.method = 'glm',
                              solver.args = list(family='binomial'),
                              predict.method = 'predict.glm',
                              predict.args = list(type='response'))

Model Diagnostics


Though we will implement our treatment effect and value estimators in such a way that the regression step is performed internally, it is convenient to do model diagnostics separately here. We will make use of modelObj::fit() to complete the regression step before considering individual treatment effect estimators.



For \(\pi^{1}(x;\gamma)\) the regression can be completed as follows:

PS1 <- modelObj::fit(object = p1, data = dataSBP, response = dataSBP$A)
PS1 <- modelObj::fitObject(object = PS1)
PS1

Call:  glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)

Coefficients:
(Intercept)            W           Cr  
   0.966434    -0.007919    -0.703766  

Degrees of Freedom: 999 Total (i.e. Null);  997 Residual
Null Deviance:      1378 
Residual Deviance: 1374     AIC: 1380

where for convenience we have made use of modelObj’s fitObject() function to strip away the modeling object framework making PS1 an object of class “glm.”

Now, let’s examine the regression results for the model under consideration.

summary(object = PS1)

Call:
glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.239  -1.104  -1.027   1.248   1.443  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.966434   0.624135   1.548   0.1215  
W           -0.007919   0.004731  -1.674   0.0942 .
Cr          -0.703766   0.627430  -1.122   0.2620  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1377.8  on 999  degrees of freedom
Residual deviance: 1373.8  on 997  degrees of freedom
AIC: 1379.8

Number of Fisher Scoring iterations: 4

First, in comparing the null deviance (1377.8) and the residual deviance (1373.8), we see that including the independent variables does not significantly reduce the deviance. Thus, this model is not significantly better than the constant propensity score model. However, we know that the data mimics an observational study for which the propensity score is not constant or known. And, note that the Akaike Information Criterion (AIC) is large (1379.772). Though the AIC value alone does not tell us much about the quality of our model, we can compare this to other models to determine a relative quality.



The second model is an incomplete model having only one of the covariates of the true data generating model

\[ \pi^{2}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1} \text{Ch})}{1+\exp(\gamma_{0} + \gamma_{1} \text{Ch})}. \]


Modeling Object


As for \(\pi_{1}(h_{1};\gamma)\) the parameters of this model will be estimated using maximum likelihood via R’s stats::glm() function. For convenience in later method implementations, we will again require that the predictions be returned on the scale of the probability.

The modeling objects for this regression step is

p2 <- modelObj::buildModelObj(model = ~ Ch,
                              solver.method = 'glm',
                              solver.args = list(family='binomial'),
                              predict.method = 'predict.glm',
                              predict.args = list(type='response'))

The regression is completed as follows:

PS2 <- modelObj::fit(object = p2, data = dataSBP, response = dataSBP$A)
PS2 <- modelObj::fitObject(object = PS2)
PS2

Call:  glm(formula = YinternalY ~ Ch, family = "binomial", data = data)

Coefficients:
(Intercept)           Ch  
   -3.06279      0.01368  

Degrees of Freedom: 999 Total (i.e. Null);  998 Residual
Null Deviance:      1378 
Residual Deviance: 1298     AIC: 1302

Again, we use summary() to examine statistics about the fit.

summary(PS2)

Call:
glm(formula = YinternalY ~ Ch, family = "binomial", data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7497  -1.0573  -0.7433   1.1449   1.9316  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.062786   0.348085  -8.799   <2e-16 ***
Ch           0.013683   0.001617   8.462   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1377.8  on 999  degrees of freedom
Residual deviance: 1298.2  on 998  degrees of freedom
AIC: 1302.2

Number of Fisher Scoring iterations: 4

First, in comparing the null deviance (1377.8) and the residual deviance (1298.2), we see that including the independent variables leads to a smaller deviance than obtained from the intercept only model. And finally, we see that the AIC is large (1302.247), but it is less than that obtained for \(\pi^{1}(x;\gamma)\) (1379.772). This result is not unexpected as we know that the model is only partially misspecified.



The third model we will consider is the correctly specified model used to generate the data set,

\[ \pi^{3}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{SBP0} + \gamma_{2}~\text{Ch})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{SBP0}+ \gamma_{2}~\text{Ch})}. \]

The parameters of this model will be estimated using maximum likelihood via R’s stats::glm() function. Predictions will be made using stats::predict.glm(), and we include a modification to the default input argument of stats::predict.glm() to ensure that predictions are returned on the scale of the probability. .

The modeling objects for this regression step is

p3 <- modelObj::buildModelObj(model = ~ SBP0 + Ch,
                              solver.method = 'glm',
                              solver.args = list(family='binomial'),
                              predict.method = 'predict.glm',
                              predict.args = list(type='response'))

The regression is completed as follows:

PS3 <- modelObj::fit(object = p3, data = dataSBP, response = dataSBP$A)
PS3 <- modelObj::fitObject(object = PS3)
PS3

Call:  glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)

Coefficients:
(Intercept)         SBP0           Ch  
  -15.94153      0.07669      0.01589  

Degrees of Freedom: 999 Total (i.e. Null);  997 Residual
Null Deviance:      1378 
Residual Deviance: 1162     AIC: 1168

Again, we use summary() to examine statistics about the fit.

summary(PS3)

Call:
glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3891  -0.9502  -0.4940   0.9939   2.1427  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -15.941527   1.299952 -12.263   <2e-16 ***
SBP0          0.076687   0.007196  10.657   <2e-16 ***
Ch            0.015892   0.001753   9.066   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1377.8  on 999  degrees of freedom
Residual deviance: 1161.6  on 997  degrees of freedom
AIC: 1167.6

Number of Fisher Scoring iterations: 3

First, we see from the null deviance (1377.8) and the residual deviance (1161.6) that including the independent variables does reduce the deviance as compared to the intercept only model. And finally, we see that the AIC is large (1167.621), but it is less than that obtained for both \(\pi^{1}(x;\gamma)\) (1379.772) and \(\pi^{2}(x;\gamma)\) (1302.247). This result is in-line with the knowledge that this is the data generating model.



To illustrate, we assume \(K=5\) and evaluate the estimator and standard error for each of the propensity score regression models below.



For model \(\pi^{1}(x;\gamma)\), the completely misspecified model,

delta_S_se(moPS = p1, data = dataSBP, y = y, K = 5)
$fitPS

Call:  glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)

Coefficients:
(Intercept)            W           Cr  
   0.966434    -0.007919    -0.703766  

Degrees of Freedom: 999 Total (i.e. Null);  997 Residual
Null Deviance:      1378 
Residual Deviance: 1374     AIC: 1380

$cj
       0%       20%       40%       60%       80%      100% 
0.0000000 0.4282948 0.4469531 0.4623706 0.4795282 1.0000000 

$grp
bin
  1   2   3   4   5 
200 198 202 200 200 

$EY
             0        1
[1,] -4.428571 14.65686
[2,] -5.218487 13.32911
[3,] -4.838384 14.85437
[4,] -3.425926 13.91304
[5,] -3.032787 14.08974

$deltaHatj
       1        2        3        4        5 
19.08543 18.54760 19.69275 17.33897 17.12253 

$deltaHat
[1] 18.35975

$sigmaHat
[1] 0.8430623

The estimated treatment effect is 18.36 (0.84) mmHg, which is larger than the true treatment effect. Notice the tight range of propensity score cutoffs ($cj). This is consistent with the fact that this model is misspecified; at a minimum, it indicates that the \(K=5\) stratification estimator under this model is not appropriate.



For model \(\pi^{2}(x;\gamma)\), the partial model,

delta_S_se(moPS = p2, data = dataSBP, y = y, K = 5)
$fitPS

Call:  glm(formula = YinternalY ~ Ch, family = "binomial", data = data)

Coefficients:
(Intercept)           Ch  
   -3.06279      0.01368  

Degrees of Freedom: 999 Total (i.e. Null);  998 Residual
Null Deviance:      1378 
Residual Deviance: 1298     AIC: 1302

$cj
       0%       20%       40%       60%       80%      100% 
0.0000000 0.3296184 0.4120421 0.4907518 0.5792204 1.0000000 

$grp
bin
  1   2   3   4   5 
199 201 200 200 200 

$EY
              0         1
[1,] -19.085714 29.620155
[2,] -10.212766 17.504673
[3,]  -6.640351 10.244186
[4,]  -1.420168  3.061728
[5,]   6.362416 -7.176471

$deltaHatj
         1          2          3          4          5 
 48.705869  27.717439  16.884537   4.481896 -13.538887 

$deltaHat
[1] 16.82918

$sigmaHat
[1] 0.4056475

The estimated treatment effect is 16.83 (0.41) mmHg, which is smaller than the true treatment effect.



And for model \(\pi^{3}(x;\gamma)\), the correctly specified model,

delta_S_se(moPS = p3, data = dataSBP, y = y, K = 5)
$fitPS

Call:  glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)

Coefficients:
(Intercept)         SBP0           Ch  
  -15.94153      0.07669      0.01589  

Degrees of Freedom: 999 Total (i.e. Null);  997 Residual
Null Deviance:      1378 
Residual Deviance: 1162     AIC: 1168

$cj
       0%       20%       40%       60%       80%      100% 
0.0000000 0.2379388 0.3743752 0.5098823 0.6679042 1.0000000 

$grp
bin
  1   2   3   4   5 
200 200 200 200 200 

$EY
              0          1
[1,] -13.500000 21.5687500
[2,]  -9.212766 15.9433962
[3,]  -6.278846 10.0625000
[4,]  -3.923077  6.5263158
[5,]   2.096970 -0.6285714

$deltaHatj
        1         2         3         4         5 
35.068750 25.156162 16.341346 10.449393 -2.725541 

$deltaHat
[1] 16.85802

$sigmaHat
[1] 0.7607441

The estimated treatment effect is 16.86 (0.76) mmHg, which is smaller than the true treatment effect.



In the table below, we see that the estimated treatment effect under these three models ranges from 16.83 mmHg - 18.36 mmHg.

(mmHg) \(\pi^{1}(x;\gamma)\) \(\pi^{2}(x;\gamma)\) \(\pi^{3}(x;\gamma)\)
\(\widehat{\delta}^{\text{*}}_{S}\) 18.36 (0.84) 16.83 (0.41) 16.86 (0.76)


IPW Estimator





The inverse probability weighted (IPW) estimator of the treatment effect, \(\widehat{\delta}^{\text{*}}_{IPW}\), is defined as

\[ \widehat{\delta}^{\text{*}}_{IPW} = n^{-1} \sum_{i=1}^{n} \frac{A_{i} Y_{i}}{\pi(X_i; \widehat{\gamma})} - \frac{(1-A_{i}) Y_{i}}{\{1-\pi(X_i; \widehat{\gamma})\}} \] where \(\pi(X_i; \widehat{\gamma})\) is a parametric model for \(\pi(X_i) = P(A=1|X)\).



For treatment coded as \(A=\{0,1\}\), estimator \(\widehat{\delta}^{*}_{IPW}\) and the maximum likelihood (ML) estimate for \(\widehat{\gamma}\), the \((1,1)\) component of \(\widehat{\Sigma}\) is

\[ \widehat{\Sigma}_{11} =A_n + (C^{\intercal}_n~D^{-1}_n)~B_{n}~(C^{\intercal}_n~D^{-1}_n)^{\intercal} . \]

where \(A_{n} = n^{-1} \sum_{i=1}^{n} A_{ni} A_{ni}\) and \(B_{n} = n^{-1} \sum_{i=1}^{n} B_{ni} B_{ni}^{\intercal}\)

\[ \begin{aligned} A_{ni} &= \frac{A_{i} Y_{i}}{\pi(X_i; \widehat{\gamma})} - \frac{(1-A_{i}) Y_{i}}{\{1-\pi(X_i; \widehat{\gamma})\}} - \widehat{\delta}^{\text{*}}_{IPW}, \\ ~\\ B_{ni} &= \frac{ A_i - \pi(X_i;\widehat{\gamma})}{\pi(X_i; \widehat{\gamma})\{1-\pi(X_i; \widehat{\gamma})\}} \frac{\partial \pi(X_{i}; \widehat{\gamma})}{\partial \gamma}. \end{aligned} \]

And

\[ C_n = n^{-1}\sum_{i=1}^{n}\frac{\partial A_{ni}}{\partial {\gamma}^{\intercal}} \quad D_n = n^{-1}\sum_{i=1}^{n}\frac{\partial B_{ni}}{\partial {\gamma}} \]



Anticipating later methods, we break the implementation of the treatment effect and the sandwich estimator of the variance into multiple functions, each function completing a specific component.

  • delta_IPW_se() : estimates the treatment effect using the IPW estimator and estimates its standard error using the sandwich variance estimator.
  • delta_IPW_swv(): \(\widehat{\delta}^{\text{*}}_{IPW}\) component of the M-estimating equation and its derivatives, i.e., \(A_n\) and \(C^{\intercal}_n\) of the sandwich variance expression.
  • delta_IPW(): IPW estimator of the treatment effect.
  • swv_ML(): ML contributions to the sandwich estimator and its derivatives, i.e., \(B_n\) and \(D^{-1}_n\) of the sandwich variance expression.

This implementation is not the most efficient implementation of the variance estimator. For example, the regression step is completed twice (once in delta_IPW_swv() and once in swv_ML()). However, our objective is to provide general, reusable code. Function swv_ML() will be reused later in this chapter as well as in Chapter 3.



#------------------------------------------------------------------------------#
# IPW estimator of tx effect 
#
# ASSUMPTIONS
#   tx is denoted by A and is binary coded as {0,1}
#
# INPUTS
#  moPS : a modeling object specifying the propensity score regression step
#  data : a data.frame containing covariates and tx
#     y : a vector containing the outcome of interest
#
# RETURNS
#  a list containing
#     fitPS : a modelObjFit object containing the results of the
#             propensity score regression step
#        EY : the sample mean of the outcome under each tx option
#  deltaHat : the outcome regression estimator for the tx effect
#------------------------------------------------------------------------------#
delta_IPW <- function(moPS, data, y) {

  #### Propensity Score

  fitPS <- modelObj::fit(object = moPS, data = data, response = data$A)

  # estimated propensity score
  p1 <- modelObj::predict(object = fitPS, newdata = data) 
  if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]

  #### Treatment Effect
  
  EY <- array(data = 0.0, dim = 2L, dimnames = list(c("0","1")))
  EY[1L] <- mean(x = {1.0 - data$A} * y / {1.0 - p1})
  EY[2L] <- mean(x = data$A * y / p1) 
  delta <- unname(obj = EY[2L] - EY[1L])

  return( list(   "fitPS" = fitPS,
                     "EY" = EY,
               "deltaHat" = delta) )
}


#------------------------------------------------------------------------------#
# IPW estimator of the tx effect and its standard error
#
# REQUIRES
#   swv_ML() and delta_IPW_swv()
#
# ASSUMPTIONS
#   tx is denoted by A and is binary coded as {0,1}
#   propensity score regression model is a logistic model and parameters are
#     estimated using ML
#
# INPUTS
#  moOR : a modeling object specifying the propensity score regression step
#         *** must be a logistic model ***
#  data : a data.frame containing covariates and tx
#         *** tx must be named 'A' and coded 0/1 ***
#     y : a vector containing the outcome of interest
#
# RETURNS
#  a list containing
#     fitPS : a modelObjFit object containing the results of the
#             propensity score regression step
#        EY : the sample mean of the outcome under each tx option
#  deltaHat : the outcome regression estimator for the tx effect
#  sigmaHat : the estimated standard error
#------------------------------------------------------------------------------#
delta_IPW_se <- function(moPS, data, y){

  #### ML components
  ML <- swv_ML(mo = moPS, data = data, y = data$A) 

  #### estimator components
  IPW <- delta_IPW_swv(moPS = moPS, data = data, y = y)

  #### 1,1 Component of Sandwich Estimator

  # ML contribution
  temp <- IPW$dMdG %*% ML$invdM
  sig11ML <- temp %*% ML$MM %*% t(x = temp)

  sig11 <- drop(x = IPW$MM + sig11ML)

  return( c(IPW$delta, "sigmaHat" = sqrt(x = sig11 / nrow(x = data))) )

}


#------------------------------------------------------------------------------#
# tx effect estimator component of the sandwich estimator IPW estimator
#
# REQUIRES
#   delta_IPW()
#
# ASSUMPTIONS
#   propensity score regression model is a logistic model
#   tx is denoted by A and is binary coded as {0,1}
#
# INPUTS:
#  moPS : a modeling object specifying the propensity score regression step
#         *** must be a logistic model ***
#  data : a data.frame containing covariates and tx
#         *** tx must be named 'A' and coded 0/1 ***
#     y : a vector containing the outcome of interest
#
# OUTPUTS:
#  a list containing
#  delta : the list returned by delta_IPW()
#     MM : M M^T matrix
#   dMdG : matrix of derivatives of M wrt gamma
#------------------------------------------------------------------------------#
delta_IPW_swv <- function(moPS, data, y) {

  # estimate treatment effect
  delta <- delta_IPW(moPS = moPS, data = data, y = y)

  # pi(x; gamma)
  p1 <- modelObj::predict(object = delta$fitPS, newdata = data)
  if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]

  # model.matrix for propensity model
  piMM <- stats::model.matrix(object = modelObj::model(object = moPS), 
                              data = data)

  # delta component of M MT
  mmDelta <- mean(x = {data$A * y / p1 - {1.0-data$A} * y / {1.0-p1} - 
                       delta$deltaHat}^2)

  # derivative w.r.t. gamma
  dMdG <- -{ data$A * y * {1.0-p1} / p1 +
            {1.0-data$A} * y * p1 / {1.0-p1} } * piMM
  dMdG <- colMeans(x = dMdG)

  return( list("delta" = delta,
                  "MM" = mmDelta,
                "dMdG" = dMdG) )

}


#------------------------------------------------------------------------------#
# components of the sandwich estimator for a maximum likelihood estimation of 
#   a logistic regression model
#
# ASSUMPTIONS
#   mo is a logistic model with parameters to be estimated using ML
#
# INPUTS
#    mo : a modeling object specifying a regression step
#         *** must be a logistic model ***
#  data : a data.frame containing covariates and tx
#     y : a vector containing the binary outcome of interest
#
#
# RETURNS
# a list containing
#     MM : M^T M component from ML
#  invdM : inverse of the matrix of derivatives of M wrt gamma
#------------------------------------------------------------------------------#
swv_ML <- function(mo, data, y) {

  # regression step
  fit <- modelObj::fit(object = mo, data = data, response = y)

  # yHat
  p1 <- modelObj::predict(object = fit, newdata = data) 
  if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]

  # model.matrix for model
  piMM <- stats::model.matrix(object = modelObj::model(object = mo), 
                              data = data)

  n <- nrow(x = piMM)
  p <- ncol(x = piMM)

  # ML M-estimator component
  mMLi <- {y - p1} * piMM

  # ML component of M MT
  mmML <- sapply(X = 1L:n, 
                 FUN = function(i){ mMLi[i,] %o% mMLi[i,] }, 
                 simplify = "array")

  # derivative of ML component
  dFun <- function(i){ 
            - piMM[i,] %o% piMM[i,] * p1[i] * {1.0 - p1[i]}
           } 
  dmML <- sapply(X = 1L:n, FUN = dFun, simplify = "array")

  if( p == 1L ) {
    mmML <- mean(x = mmML)
    dmML <- mean(x = dmML)
  } else {
    mmML <- rowMeans(x = mmML, dim = 2L)
    dmML <- rowMeans(x = dmML, dim = 2L)
  }

  # invert derivative ML component
  invD <- solve(a = dmML) 

  return( list("MM" = mmML, "invdM" = invD) )

}


The propensity score models discussed here are those discussed for the stratitifacation estimator. We repeat them here for convenience.

Throughout Chapters 2-4, we consider three propensity score models selected to represent a range of model (mis)specification. Note that we are not demonstrating a definitive analysis that one might do in practice, in which the analyst would use all sorts of variable selection techniques, etc, to arrive at a posited model. Our objective is to illustrate how the methods work under a range of model (mis)specification.


Click on each tab below to see the model and basic model diagnostic steps.



The first model is a completely misspecified model having none of the covariates used in the data generating model

\[ \pi^{1}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}. \]


Modeling Object


The parameters of this model will be estimated using maximum likelihood via R’s stats::glm() function. Predictions will be made using stats::predict.glm(), which by default returns predictions on the scale of the linear predictors. We will see in the coming sections that this is not the most convenient scale, so we opt to include a modification to the default input argument of stats::predict.glm() to return predictions on the scale of the response variable, i.e., the probabilities. The modeling object for this model is specified as

p1 <- modelObj::buildModelObj(model = ~ W + Cr,
                              solver.method = 'glm',
                              solver.args = list(family='binomial'),
                              predict.method = 'predict.glm',
                              predict.args = list(type='response'))

Model Diagnostics


Though we will implement our treatment effect and value estimators in such a way that the regression step is performed internally, it is convenient to do model diagnostics separately here. We will make use of modelObj::fit() to complete the regression step before considering individual treatment effect estimators.



For \(\pi^{1}(x;\gamma)\) the regression can be completed as follows:

PS1 <- modelObj::fit(object = p1, data = dataSBP, response = dataSBP$A)
PS1 <- modelObj::fitObject(object = PS1)
PS1

Call:  glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)

Coefficients:
(Intercept)            W           Cr  
   0.966434    -0.007919    -0.703766  

Degrees of Freedom: 999 Total (i.e. Null);  997 Residual
Null Deviance:      1378 
Residual Deviance: 1374     AIC: 1380

where for convenience we have made use of modelObj’s fitObject() function to strip away the modeling object framework making PS1 an object of class “glm.”

Now, let’s examine the regression results for the model under consideration.

summary(object = PS1)

Call:
glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.239  -1.104  -1.027   1.248   1.443  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.966434   0.624135   1.548   0.1215  
W           -0.007919   0.004731  -1.674   0.0942 .
Cr          -0.703766   0.627430  -1.122   0.2620  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1377.8  on 999  degrees of freedom
Residual deviance: 1373.8  on 997  degrees of freedom
AIC: 1379.8

Number of Fisher Scoring iterations: 4

First, in comparing the null deviance (1377.8) and the residual deviance (1373.8), we see that including the independent variables does not significantly reduce the deviance. Thus, this model is not significantly better than the constant propensity score model. However, we know that the data mimics an observational study for which the propensity score is not constant or known. And, note that the Akaike Information Criterion (AIC) is large (1379.772). Though the AIC value alone does not tell us much about the quality of our model, we can compare this to other models to determine a relative quality.



The second model is an incomplete model having only one of the covariates of the true data generating model

\[ \pi^{2}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1} \text{Ch})}{1+\exp(\gamma_{0} + \gamma_{1} \text{Ch})}. \]


Modeling Object


As for \(\pi_{1}(h_{1};\gamma)\) the parameters of this model will be estimated using maximum likelihood via R’s stats::glm() function. For convenience in later method implementations, we will again require that the predictions be returned on the scale of the probability.

The modeling objects for this regression step is

p2 <- modelObj::buildModelObj(model = ~ Ch,
                              solver.method = 'glm',
                              solver.args = list(family='binomial'),
                              predict.method = 'predict.glm',
                              predict.args = list(type='response'))

The regression is completed as follows:

PS2 <- modelObj::fit(object = p2, data = dataSBP, response = dataSBP$A)
PS2 <- modelObj::fitObject(object = PS2)
PS2

Call:  glm(formula = YinternalY ~ Ch, family = "binomial", data = data)

Coefficients:
(Intercept)           Ch  
   -3.06279      0.01368  

Degrees of Freedom: 999 Total (i.e. Null);  998 Residual
Null Deviance:      1378 
Residual Deviance: 1298     AIC: 1302

Again, we use summary() to examine statistics about the fit.

summary(PS2)

Call:
glm(formula = YinternalY ~ Ch, family = "binomial", data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7497  -1.0573  -0.7433   1.1449   1.9316  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.062786   0.348085  -8.799   <2e-16 ***
Ch           0.013683   0.001617   8.462   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1377.8  on 999  degrees of freedom
Residual deviance: 1298.2  on 998  degrees of freedom
AIC: 1302.2

Number of Fisher Scoring iterations: 4

First, in comparing the null deviance (1377.8) and the residual deviance (1298.2), we see that including the independent variables leads to a smaller deviance than obtained from the intercept only model. And finally, we see that the AIC is large (1302.247), but it is less than that obtained for \(\pi^{1}(x;\gamma)\) (1379.772). This result is not unexpected as we know that the model is only partially misspecified.



The third model we will consider is the correctly specified model used to generate the data set,

\[ \pi^{3}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{SBP0} + \gamma_{2}~\text{Ch})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{SBP0}+ \gamma_{2}~\text{Ch})}. \]

The parameters of this model will be estimated using maximum likelihood via R’s stats::glm() function. Predictions will be made using stats::predict.glm(), and we include a modification to the default input argument of stats::predict.glm() to ensure that predictions are returned on the scale of the probability. .

The modeling objects for this regression step is

p3 <- modelObj::buildModelObj(model = ~ SBP0 + Ch,
                              solver.method = 'glm',
                              solver.args = list(family='binomial'),
                              predict.method = 'predict.glm',
                              predict.args = list(type='response'))

The regression is completed as follows:

PS3 <- modelObj::fit(object = p3, data = dataSBP, response = dataSBP$A)
PS3 <- modelObj::fitObject(object = PS3)
PS3

Call:  glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)

Coefficients:
(Intercept)         SBP0           Ch  
  -15.94153      0.07669      0.01589  

Degrees of Freedom: 999 Total (i.e. Null);  997 Residual
Null Deviance:      1378 
Residual Deviance: 1162     AIC: 1168

Again, we use summary() to examine statistics about the fit.

summary(PS3)

Call:
glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3891  -0.9502  -0.4940   0.9939   2.1427  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -15.941527   1.299952 -12.263   <2e-16 ***
SBP0          0.076687   0.007196  10.657   <2e-16 ***
Ch            0.015892   0.001753   9.066   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1377.8  on 999  degrees of freedom
Residual deviance: 1161.6  on 997  degrees of freedom
AIC: 1167.6

Number of Fisher Scoring iterations: 3

First, we see from the null deviance (1377.8) and the residual deviance (1161.6) that including the independent variables does reduce the deviance as compared to the intercept only model. And finally, we see that the AIC is large (1167.621), but it is less than that obtained for both \(\pi^{1}(x;\gamma)\) (1379.772) and \(\pi^{2}(x;\gamma)\) (1302.247). This result is in-line with the knowledge that this is the data generating model.



The IPW treatment effect and estimated standard error for the previously defined propensity score models are obtained as follows.



For model \(\pi^{1}(x;\gamma)\), the completely misspecified model,

delta_IPW_se(moPS = p1, data = dataSBP, y = y)
$fitPS

Call:  glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)

Coefficients:
(Intercept)            W           Cr  
   0.966434    -0.007919    -0.703766  

Degrees of Freedom: 999 Total (i.e. Null);  997 Residual
Null Deviance:      1378 
Residual Deviance: 1374     AIC: 1380

$EY
        0         1 
-4.202015 14.232216 

$deltaHat
[1] 18.43423

$sigmaHat
[1] 0.9261142

The estimated treatment effect is 18.43 (0.93) mmHg, which is larger than the true treatment effect.



For model \(\pi^{2}(x;\gamma)\), the incomplete model,

delta_IPW_se(moPS = p2, data = dataSBP, y = y)
$fitPS

Call:  glm(formula = YinternalY ~ Ch, family = "binomial", data = data)

Coefficients:
(Intercept)           Ch  
   -3.06279      0.01368  

Degrees of Freedom: 999 Total (i.e. Null);  998 Residual
Null Deviance:      1378 
Residual Deviance: 1298     AIC: 1302

$EY
        0         1 
-6.447606 10.205682 

$deltaHat
[1] 16.65329

$sigmaHat
[1] 0.8205346

The estimated treatment effect is 16.65 (0.82) mmHg, which is smaller than the true treatment effect.



For model \(\pi^{3}(x;\gamma)\), the model used to generate the data,

delta_IPW_se(moPS = p3, data = dataSBP, y = y)
$fitPS

Call:  glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)

Coefficients:
(Intercept)         SBP0           Ch  
  -15.94153      0.07669      0.01589  

Degrees of Freedom: 999 Total (i.e. Null);  997 Residual
Null Deviance:      1378 
Residual Deviance: 1162     AIC: 1168

$EY
        0         1 
-6.562869 10.445915 

$deltaHat
[1] 17.00878

$sigmaHat
[1] 0.9855199

The estimated treatment effect is 17.01 (0.99) mmHg, which is smaller than the true treatment effect.



In the table below, we see that the estimated treatment effect under these three models ranges from 16.65 mmHg - 18.43 mmHg.

(mmHg) \(\pi^{1}(x;\gamma)\) \(\pi^{2}(x;\gamma)\) \(\pi^{3}(x;\gamma)\)
\(\widehat{\delta}^{\text{*}}_{IPW}\) 18.43 (0.93) 16.65 (0.82) 17.01 (0.99)


Doubly Robust Estimator





The doubly robust estimator for the causal treatment effect, \(\delta^{\text{*}}\), is given by

\[ \begin{aligned} \widehat{\delta^{\text{*}}}_{DR} = n^{-1} \sum_{i=1}^{n} \Bigg[ & \frac{A_{i} Y_{i}}{\pi(X_{i}; \widehat{\gamma})} - \frac{(1-A_{i}) Y_{i}}{1-\pi(X_{i}; \widehat{\gamma})} \left. - \{A_{i} - \pi(X_{i}; \widehat{\gamma})\} \left\{ \frac{Q(X_{i},1;\widehat{\beta})}{\pi(X_{i}; \widehat{\gamma})}+ \frac{Q(X_{i},0;\widehat{\beta})}{1-\pi(X_{i}; \widehat{\gamma})} \right\} \right], \end{aligned} \]

where \(Q(x,a;\beta)\) is a parametric model for \(Q(x,a)= E(Y|X=x,A=a)\) and \(\pi(x; \gamma)\) is that for the propensity score, \(\pi(x) = P(A=1|X)\).



The \((1,1)\) component of the sandwich estimator for the variance of the doubly robust estimator is defined as

\[ \widehat{\Sigma}_{11} = A_{n} + ( D_{n}^{\intercal}~E_{n}^{-1})~B_{n}~( D_{n}^{\intercal}~E_{n}^{-1})^{T} + ( G_{n}^{\intercal}~J_{n}^{-1})~C_{n}~( G_{n}^{\intercal}~J_{n}^{-1})^{T}, \]

where \(A_{n} = n^{-1} \sum_{i=1}^{n} A_{ni} A_{ni}^{\intercal}\), \(B_{n} = n^{-1} \sum_{i=1}^{n} B_{ni} B_{ni}^{\intercal}\), \(C_{n} = n^{-1} \sum_{i=1}^{n} C_{ni} C_{ni}^{\intercal}\), and


\[ \begin{aligned} A_{ni} &= \frac{A_{i}~Y_{i}}{\pi(X_{i};\widehat{\gamma})} - \frac{(1-A_{i})Y_{i}}{1-\pi(X_{i};\widehat{\gamma})} - \left\{A_{i}-\pi(X_{i};\widehat{\gamma})\right\} \left\{\frac{Q(X_{i},1;\widehat{\beta})}{\pi(X_{i};\widehat{\gamma})} + \frac{Q(X_{i},0;\widehat{\beta})}{1-\pi(X_{i};\widehat{\gamma})}\right\} - \widehat{\delta}^{\text{*}}_{DR}, \\ \\ B_{ni} &= \left\{ Y_{i} - Q(X_{i},A_{i};\widehat{\beta}) \right\} \frac{ \partial Q(X_{i},A_{i};\widehat{\beta})}{\partial \beta}, \\ \\ C_{ni} &= \frac{ A_i - \pi(X_i;\widehat{\gamma})}{\pi(X_i;\widehat{\gamma})\{1-\pi(X_i;\widehat{\gamma})\}} \frac{\partial \pi(X_i;\widehat{\gamma})}{\partial \gamma}. \end{aligned} \]

And,

\[ D_{n} = n^{-1}\sum_{i=1}^{n}\frac{\partial A_{ni}}{\partial {\beta}} \quad G_{n} = n^{-1}\sum_{i=1}^{n}\frac{\partial A_{ni}}{\partial {\gamma}} \quad E_{n} = n^{-1}\sum_{i=1}^{n}\frac{\partial B_{ni}}{\partial {\beta}^{\intercal}} \quad J_{n} = n^{-1}\sum_{i=1}^{n}\frac{\partial C_{ni}}{\partial {\gamma}^{\intercal}} \]



We break the implementation of the treatment effect and the sandwich estimator of the variance into multiple functions and make use of previously defined functions, swv_ML() and swv_OLS().

  • delta_DR_se() calls delta_DR() to to estimate the treatment effect using the doubly robust estimator and estimates its standard error.
  • delta_DR_swv(): \(\widehat{\delta}^{\text{*}}_{DR}\) component of the M-estimating equation and its derivatives, i.e., \(A_n\), \(D^{\intercal}_n\), and \(G^{\intercal}_n\) of the sandwich variance expression.
  • delta_DR(): doubly robust estimator of the treatment effect.
  • swv_ML(): ML contributions to the sandwich estimator and its derivatives, i.e., \(C_n\) and \(J^{-1}_n\) of the sandwich variance expression.
  • swv_OLS(): OLS contributions to the sandwich estimator and its derivatives, i.e., \(B_n\) and \(E^{-1}_n\) of the sandwich variance expression.


#------------------------------------------------------------------------------#
# doubly robust estimator of tx effect 
#
# ASSUMPTIONS
#   tx is denoted by A and is binary coded as {0,1}
#
# INPUTS
#  moOR : a modeling object specifying the outcome regression step
#  moPS : a modeling object specifying the propensity score regression step
#  data : a data.frame containing covariates and tx
#     y : a vector containing the outcome of interest
#
# RETURNS
#  a list containing
#     fitOR : a modelObjFit object containing the results of the
#             outcome regression step
#     fitPS : a modelObjFit object containing the results of the
#             propensity score regression step
#        EY : the sample mean of the outcome under each tx option
#  deltaHat : the doubly robust estimator for the tx effect
#------------------------------------------------------------------------------#
delta_DR <- function(moOR, moPS, data, y) {

  #### Propensity Score

  fitPS <- modelObj::fit(object = moPS, data = data, response = data$A)

  # estimated propensity score
  p1 <- modelObj::predict(object = fitPS, newdata = data) 
  if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]

  #### Outcome Regression 

  fitOR <- modelObj::fit(object = moOR, data = data, response = y)

  # store tx variable
  A <- data$A 

  # estimated Q-function when all A=0
  data$A <- 0L 
  Q0 <- drop(x = modelObj::predict(object = fitOR, newdata = data))

  # estimated Q-function when all A=1
  data$A <- 1L 
  Q1 <- drop(x = modelObj::predict(object = fitOR, newdata = data))

  #### Treatment Effect

  EY <- array(data = 0.0, dim = 2L, dimnames = list(c("0","1")))
  aug <- {A - p1} * {Q1 / p1 + Q0 / {1.0 - p1}}
  EY[2L] <- mean(x = {A == 1L} * {y / p1 - aug})
  EY[1L] <- mean(x = {A == 0L} * {y / {1.0 - p1} + aug})
  delta <- unname(obj = EY[2L] - EY[1L])

  return( list(   "fitOR" = fitOR,
                  "fitPS" = fitPS,
               "deltaHat" = delta,
                     "EY" = EY) )

}


#------------------------------------------------------------------------------#
# doubly robust estimator of tx effect and its standard error
#
# REQUIRES
#   swv_ML(), swv_OLS(), and delta_DR_swv()
#
# ASSUMPTIONS
#   tx is denoted by A and is binary coded as {0,1}
#   outcome regression model is a linear model and parameters are estimated 
#     using OLS
#   propensity score regression model is a logistic model and parameters are
#     estimated using ML
#
# INPUTS
#  moOR : a modeling object specifying the outcome regression step
#         *** must be a linear model ***
#  moPS : a modeling object specifying the propensity score regression step
#         *** must be a logistic model ***
#  data : a data.frame containing covariates and tx
#         *** tx must be named 'A' and coded 0/1 ***
#     y : a vector containing the outcome of interest
#
# RETURNS
#  a list containing
#     fitOR : a modelObjFit object containing the results of the
#             outcome regression step
#     fitPS : a modelObjFit object containing the results of the
#             propensity score regression step
#        EY : the sample mean of the outcome under each tx option
#  deltaHat : the doubly robust estimator for the tx effect
#  sigmaHat : the estimated standard error
#------------------------------------------------------------------------------#
delta_DR_se <- function(moOR, moPS, data, y) {

  #### ML components
  ML <- swv_ML(mo = moPS, data = data, y = data$A) 

  #### OLS components
  OLS <- swv_OLS(mo = moOR, data = data, y = y) 

  #### estimator components
  DR <- delta_DR_swv(moOR = moOR, moPS = moPS, data = data, y = y)

  #### 1,1 Component of Sandwich Estimator

  # ML contribution
  temp <- DR$dMdG %*% ML$invdM
  sig11ML <- temp %*% ML$MM %*% t(x = temp)

  # OLS contribution
  temp <- DR$dMdB %*% OLS$invdM
  sig11OLS <- temp %*% OLS$MM %*% t(x = temp)

  sig11 <- drop(x = DR$MM + sig11ML + sig11OLS)

  return( c(DR$delta, "sigmaHat" = sqrt(x = sig11 / nrow(x = data))) )

}


#------------------------------------------------------------------------------#
# tx effect estimator component of the sandwich estimator 
# doubly robust estimator
#
# REQUIRES
#   delta_DR()
#
# ASSUMPTIONS
#   outcome regression model is a linear model
#   propensity score regression model is a logistic model
#   tx is denoted by A and is binary coded as {0,1}
#
# INPUTS:
#  moOR : a modeling object specifying the outcome regression step
#         *** must be a linear model ***
#  moPS : a modeling object specifying the propensity score regression step
#         *** must be a logistic model ***
#  data : a data.frame containing covariates and tx
#         *** tx must be named 'A' and coded 0/1 ***
#     y : a vector containing the outcome of interest
#
# OUTPUTS:
#  a list containing
#  delta : the list returned by delta_DR()
#     MM : M M^T matrix
#   dMdB : matrix of derivatives of M wrt beta
#   dMdG : matrix of derivatives of M wrt gamma
#------------------------------------------------------------------------------#
delta_DR_swv <- function(moOR, moPS, data, y) {

  # estimate treatment effect
  delta <- delta_DR(moOR = moOR, moPS = moPS, data = data, y = y)

  # pi(x;gammaHat)
  p1 <- modelObj::predict(object = delta$fitPS, newdata = data)
  if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]
          
  # model.matrix for propensity model
  piMM <- stats::model.matrix(object = modelObj::model(object = moPS), 
                              data = data)
          
  A <- data$A
          
  # Q(x,A=0;betaHat)
  data$A <- 0L 
  Q0 <- drop(modelObj::predict(object = delta$fitOR, newdata = data))

  # dQ(x,A=0;betaHat)
  # derivative of a linear model is the model.matrix
  dQ0 <- stats::model.matrix(object = modelObj::model(object = moOR), 
                             data = data)

  # Q(x,A=1;betaHat)
  data$A <- 1L 
  Q1 <- drop(modelObj::predict(object = delta$fitOR, newdata = data))

  # dQ(x,A=1;betaHat)
  # derivative of a linear model is the model.matrix
  dQ1 <- stats::model.matrix(object = modelObj::model(object = moOR), 
                             data = data)

  data$A <- A

  # delta component of M-equation
  mDeltai <- data$A*y/p1 - {1.0 - data$A}*y/{1.0 - p1} - 
             {data$A - p1}*{Q1/p1 + Q0/{1.0 - p1}} - delta$deltaHat

  mmDelta <- mean(x = mDeltai^2)
          
  # derivative of delta component w.r.t. beta
  dMdB <- colMeans(x = {data$A - p1}*{dQ1/p1 + dQ0/{1.0 - p1}})
          
  # derivative of delta component w.r.t. gamma
  dMdG <- - data$A/p1^2*{y - Q1} - {1.0 - data$A}/{1.0 - p1}^2*{y - Q0} 
  dMdG <- colMeans(x = dMdG*p1*{1.0 - p1}*piMM) 
          
  return( list("delta" = delta,
                  "MM" = mmDelta,
                "dMdB" = dMdB,
                "dMdG" = dMdG) )

}


#------------------------------------------------------------------------------#
# components of the sandwich estimator for a maximum likelihood estimation of 
#   a logistic regression model
#
# ASSUMPTIONS
#   mo is a logistic model with parameters to be estimated using ML
#
# INPUTS
#    mo : a modeling object specifying a regression step
#         *** must be a logistic model ***
#  data : a data.frame containing covariates and tx
#     y : a vector containing the binary outcome of interest
#
#
# RETURNS
# a list containing
#     MM : M^T M component from ML
#  invdM : inverse of the matrix of derivatives of M wrt gamma
#------------------------------------------------------------------------------#
swv_ML <- function(mo, data, y) {

  # regression step
  fit <- modelObj::fit(object = mo, data = data, response = y)

  # yHat
  p1 <- modelObj::predict(object = fit, newdata = data) 
  if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]

  # model.matrix for model
  piMM <- stats::model.matrix(object = modelObj::model(object = mo), 
                              data = data)

  n <- nrow(x = piMM)
  p <- ncol(x = piMM)

  # ML M-estimator component
  mMLi <- {y - p1} * piMM

  # ML component of M MT
  mmML <- sapply(X = 1L:n, 
                 FUN = function(i){ mMLi[i,] %o% mMLi[i,] }, 
                 simplify = "array")

  # derivative of ML component
  dFun <- function(i){ 
            - piMM[i,] %o% piMM[i,] * p1[i] * {1.0 - p1[i]}
           } 
  dmML <- sapply(X = 1L:n, FUN = dFun, simplify = "array")

  if( p == 1L ) {
    mmML <- mean(x = mmML)
    dmML <- mean(x = dmML)
  } else {
    mmML <- rowMeans(x = mmML, dim = 2L)
    dmML <- rowMeans(x = dmML, dim = 2L)
  }

  # invert derivative ML component
  invD <- solve(a = dmML) 

  return( list("MM" = mmML, "invdM" = invD) )

}


#------------------------------------------------------------------------------#
# components of the sandwich estimator for an ordinary least squares estimation
#   of a linear regression model
#
# ASSUMPTIONS
#   mo is a linear model with parameters to be estimated using OLS
#
# INPUTS
#    mo : a modeling object specifying a regression step
#         *** must be a linear model ***
#  data : a data.frame containing covariates and tx
#     y : a vector containing the outcome of interest
#
#
# RETURNS
# a list containing
#     MM : M^T M component from OLS
#  invdM : inverse of the matrix of derivatives of M wrt beta
#------------------------------------------------------------------------------#
swv_OLS <- function(mo, data, y) {

  n <- nrow(x = data)

  fit <- modelObj::fit(object = mo, data = data, response = y)

  # Q(X,A;betaHat)
  Qa <- drop(x = modelObj::predict(object = fit, newdata = data))

  # derivative of Q(X,A;betaHat)
  dQa <- stats::model.matrix(object = modelObj::model(object = mo), 
                             data = data)

  # number of parameters in model
  p <- ncol(x = dQa)

  # OLS component of M
  mOLSi <- {y - Qa}*dQa

  # OLS component of M MT
  mmOLS <- sapply(X = 1L:n, 
                  FUN = function(i){ mOLSi[i,] %o% mOLSi[i,] }, 
                  simplify = "array")

  # derivative OLS component
  dmOLS <- - sapply(X = 1L:n, 
                    FUN = function(i){ dQa[i,] %o% dQa[i,] }, 
                    simplify = "array")

  # sum over individuals
  if (p == 1L) {
    mmOLS <- mean(x = mmOLS)
    dmOLS <- mean(x = dmOLS)
  } else {
    mmOLS <- rowMeans(x = mmOLS, dim = 2L)
    dmOLS <- rowMeans(x = dmOLS, dim = 2L)
  }

  # invert derivative OLS component
  invD <- solve(a = dmOLS) 

  return( list("MM" = mmOLS, "invdM" = invD) )

}


The outcome models discussed here are those discussed for the outcome regression estimator. We repeat them here for convenience.

Throughout Chapters 2-4, we consider three outcome regression models selected to represent a range of model (mis)specification. Note that we are not demonstrating a definitive analysis that one might do in practice, in which the analyst would use all sorts of variable selection techniques, etc, to arrive at a posited model. Our objective is to illustrate how the methods work under a range of model (mis)specification.


Click on each tab below to see the model and basic model diagnostic steps.



The first model is a completely misspecified model

\[ Q^{1}(x,a;\beta) = \beta_{0} + \beta_{1} \text{W} + \beta_{2} \text{Cr} + a~(\beta_{3} + \beta_{4} \text{Cr}). \]


Modeling Object


The parameters of this model will be estimated using ordinary least squares via R’s stats::lm() function. Predictions will be made using stats::predict.lm(), which by default returns predictions on the scale of the response variable.

The modeling objects for this regression step is

q1 <- modelObj::buildModelObj(model = ~ W + A*Cr,
                              solver.method = 'lm',
                              predict.method = 'predict.lm')

Model Diagnostics


Though ultimately, the regression steps will be performed within the implementations of the treatment effect and value estimators, we make use of modelObj::fit() to perform some preliminary model diagnostics.

For \(Q^{1}(x,a;\beta)\) the regression can be completed as follows

OR1 <- modelObj::fit(object = q1, data = dataSBP, response = y)
OR1 <- modelObj::fitObject(object = OR1)
OR1

Call:
lm(formula = YinternalY ~ W + A + Cr + A:Cr, data = data)

Coefficients:
(Intercept)            W            A           Cr         A:Cr  
   -6.66893      0.02784     16.46653      0.56324      2.41004  

where for convenience we have made use of modelObj’s fitObject() function to strip away the modeling object framework making OR1 an object of class “lm.”

Let’s examine the regression results for the model under consideration. First, the diagnostic plots defined for “lm” objects.

par(mfrow = c(2,2))
graphics::plot(x = OR1)

We see that the diagnostic plots for model \(Q^{1}(x,a;\beta)\) show some unusual behaviors. The two groupings of residuals in the Residuals vs Fitted and the Scale-Location plots are reflecting the fact that the model includes only the covariates W and Cr, neither of which are associated with outcome in the true regression relationship. Thus, for all practical purposes the model is fitting the two treatment means.

Now, let’s examine the summary statistics for the regression step

summary(object = OR1)

Call:
lm(formula = YinternalY ~ W + A + Cr + A:Cr, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-35.911  -7.605  -0.380   7.963  35.437 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) -6.66893    4.67330  -1.427  0.15389   
W            0.02784    0.02717   1.025  0.30564   
A           16.46653    5.96413   2.761  0.00587 **
Cr           0.56324    5.07604   0.111  0.91167   
A:Cr         2.41004    7.22827   0.333  0.73889   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.6 on 995 degrees of freedom
Multiple R-squared:  0.3853,    Adjusted R-squared:  0.3828 
F-statistic: 155.9 on 4 and 995 DF,  p-value: < 2.2e-16

We see that the residual standard error is large and that the adjusted R-squared value is small.

Readers familiar with R might have noticed that the response variable specified in the call to the regression method is YinternalY. This is an internal naming convention within package modelObj; it is understood to represent the outcome of interest \(y\).



The second model is an incomplete model having only one of the covariates of the true model,

\[ Q^{2}(x,a;\beta) = \beta_{0} + \beta_{1} \text{Ch} + a~(\beta_{2} + \beta_{3} \text{Ch}). \]


Modeling Object


As for \(Q^{1}(x,a;\beta)\), the parameters of this model will be estimated using ordinary least squares via R’s stats::lm() function. Predictions will be made using stats::predict.lm(), which by default returns predictions on the scale of the response variable.

The modeling object for this regression step is

q2 <- modelObj::buildModelObj(model = ~ Ch*A,
                              solver.method = 'lm',
                              predict.method = 'predict.lm')

Model Diagnostics


For \(Q^{2}(x,a;\beta)\) the regression can be completed as follows

OR2 <- modelObj::fit(object = q2, data = dataSBP, response = y)
OR2 <- modelObj::fitObject(object = OR2)
OR2

Call:
lm(formula = YinternalY ~ Ch + A + Ch:A, data = data)

Coefficients:
(Intercept)           Ch            A         Ch:A  
    36.5101      -0.2052     -89.5245       0.5074  

where for convenience we have made use of modelObj’s fitObject() function to strip away the modeling object framework making OR2 an object of class “lm.”

First, let’s examine the diagnostic plots defined for “lm” objects.

par(mfrow = c(2,4))
graphics::plot(x = OR2)
graphics::plot(x = OR1)

where we have included the diagnostic plots for model \(Q^{1}(x,a;\beta)\) for easy comparison. We see that the residuals from the fit of model \(Q^{2}(x,a;\beta)\) do not exhibit the two groupings, reflecting the fact that \(Q^{2}(x,a;\beta)\) is only partially misspecified in that it includes the important covariate Ch.

Now, let’s examine the summary statistics for the regression step

summary(object = OR2)

Call:
lm(formula = YinternalY ~ Ch + A + Ch:A, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-16.1012  -2.7476  -0.0032   2.6727  15.1825 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  36.510110   0.933019   39.13   <2e-16 ***
Ch           -0.205226   0.004606  -44.56   <2e-16 ***
A           -89.524507   1.471905  -60.82   <2e-16 ***
Ch:A          0.507374   0.006818   74.42   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.511 on 996 degrees of freedom
Multiple R-squared:  0.907, Adjusted R-squared:  0.9068 
F-statistic:  3239 on 3 and 996 DF,  p-value: < 2.2e-16

Comparing to the diagnostics for model \(Q^{1}(x,a;\beta)\), we see that the residual standard error is smaller (4.51 vs 11.6) and that the adjusted R-squared value is larger (0.91 vs 0.38). Both of these results indicate that model \(Q^{2}(x,a;\beta)\) is a more suitable model for \(E(Y|X=x,A=a)\) than model \(Q^{1}(x,a;\beta)\).



The third model we will consider is the correctly specified model used to generate the dataset,

\[ Q^{3}(x,a;\beta) = \beta_{0} + \beta_{1} \text{Ch} + \beta_{2} \text{K} + a~(\beta_{3} + \beta_{4} \text{Ch} + \beta_{5} \text{K}). \]


Modeling Object


As for \(Q^{1}(x,a;\beta)\) and \(Q^{2}(x,a;\beta)\), the parameters of this model will be estimated using ordinary least squares via R’s stats::lm() function. Predictions will be made using stats::predict.lm(), which by default returns predictions on the scale of the response variable.

The modeling object for this regression step is

q3 <- modelObj::buildModelObj(model = ~ (Ch + K)*A,
                              solver.method = 'lm',
                              predict.method = 'predict.lm')

Model Diagnostics


For \(Q^{3}(x,a;\beta)\) the regression can be completed as follows

OR3 <- modelObj::fit(object = q3, data = dataSBP, response = y)
OR3 <- modelObj::fitObject(object = OR3)
OR3

Call:
lm(formula = YinternalY ~ Ch + K + A + Ch:A + K:A, data = data)

Coefficients:
(Intercept)           Ch            K            A         Ch:A          K:A  
   -15.6048      -0.2035      12.2849     -61.0979       0.5048      -6.6099  

where for convenience we have made use of modelObj’s fitObject() function to strip away the modeling object framework making OR3 an object of class “lm.”

Again, let’s start by examining the diagnostic plots defined for “lm” objects.

par(mfrow = c(2,4))
graphics::plot(x = OR3)
graphics::plot(x = OR2)

where we have included the diagnostic plots for model \(Q^{2}(x,a;\beta)\) for easy comparison. We see that the results for models \(Q^{2}(x,a;\beta)\) and \(Q^{3}(x,a;\beta)\) are very similar, with model \(Q^{3}(x,a;\beta)\) yielding slightly better diagnostics (e.g., smaller residuals); a result in line with the knowledge that \(Q^{3}(x,a;\beta)\) is the model used to generate the data.

Now, let’s examine the summary statistics for the regression step

summary(object = OR3)

Call:
lm(formula = YinternalY ~ Ch + K + A + Ch:A + K:A, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.0371 -1.9376  0.0051  2.0127  9.6452 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -15.604845   1.636349  -9.536   <2e-16 ***
Ch           -0.203472   0.002987 -68.116   <2e-16 ***
K            12.284852   0.358393  34.278   <2e-16 ***
A           -61.097909   2.456637 -24.871   <2e-16 ***
Ch:A          0.504816   0.004422 114.168   <2e-16 ***
K:A          -6.609876   0.538386 -12.277   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.925 on 994 degrees of freedom
Multiple R-squared:  0.961, Adjusted R-squared:  0.9608 
F-statistic:  4897 on 5 and 994 DF,  p-value: < 2.2e-16

As for model \(Q^{2}(x,a;\beta)\), we see that the residual standard error is smaller (2.93 vs 4.51) and that the adjusted R-squared value is larger (0.96 vs 0.91). Again, these results indicate that model \(Q^{3}(x,a;\beta)\) is a more suitable model than both models \(Q^{1}(x,a;\beta)\) and \(Q^{2}(x,a;\beta)\).



The propensity score models discussed here are those discussed for the stratitifacation and IPW estimators. We repeat them here for convenience.

Throughout Chapters 2-4, we consider three propensity score models selected to represent a range of model (mis)specification. Note that we are not demonstrating a definitive analysis that one might do in practice, in which the analyst would use all sorts of variable selection techniques, etc, to arrive at a posited model. Our objective is to illustrate how the methods work under a range of model (mis)specification.


Click on each tab below to see the model and basic model diagnostic steps.



The first model is a completely misspecified model having none of the covariates used in the data generating model

\[ \pi^{1}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}. \]


Modeling Object


The parameters of this model will be estimated using maximum likelihood via R’s stats::glm() function. Predictions will be made using stats::predict.glm(), which by default returns predictions on the scale of the linear predictors. We will see in the coming sections that this is not the most convenient scale, so we opt to include a modification to the default input argument of stats::predict.glm() to return predictions on the scale of the response variable, i.e., the probabilities. The modeling object for this model is specified as

p1 <- modelObj::buildModelObj(model = ~ W + Cr,
                              solver.method = 'glm',
                              solver.args = list(family='binomial'),
                              predict.method = 'predict.glm',
                              predict.args = list(type='response'))

Model Diagnostics


Though we will implement our treatment effect and value estimators in such a way that the regression step is performed internally, it is convenient to do model diagnostics separately here. We will make use of modelObj::fit() to complete the regression step before considering individual treatment effect estimators.



For \(\pi^{1}(x;\gamma)\) the regression can be completed as follows:

PS1 <- modelObj::fit(object = p1, data = dataSBP, response = dataSBP$A)
PS1 <- modelObj::fitObject(object = PS1)
PS1

Call:  glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)

Coefficients:
(Intercept)            W           Cr  
   0.966434    -0.007919    -0.703766  

Degrees of Freedom: 999 Total (i.e. Null);  997 Residual
Null Deviance:      1378 
Residual Deviance: 1374     AIC: 1380

where for convenience we have made use of modelObj’s fitObject() function to strip away the modeling object framework making PS1 an object of class “glm.”

Now, let’s examine the regression results for the model under consideration.

summary(object = PS1)

Call:
glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.239  -1.104  -1.027   1.248   1.443  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.966434   0.624135   1.548   0.1215  
W           -0.007919   0.004731  -1.674   0.0942 .
Cr          -0.703766   0.627430  -1.122   0.2620  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1377.8  on 999  degrees of freedom
Residual deviance: 1373.8  on 997  degrees of freedom
AIC: 1379.8

Number of Fisher Scoring iterations: 4

First, in comparing the null deviance (1377.8) and the residual deviance (1373.8), we see that including the independent variables does not significantly reduce the deviance. Thus, this model is not significantly better than the constant propensity score model. However, we know that the data mimics an observational study for which the propensity score is not constant or known. And, note that the Akaike Information Criterion (AIC) is large (1379.772). Though the AIC value alone does not tell us much about the quality of our model, we can compare this to other models to determine a relative quality.



The second model is an incomplete model having only one of the covariates of the true data generating model

\[ \pi^{2}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1} \text{Ch})}{1+\exp(\gamma_{0} + \gamma_{1} \text{Ch})}. \]


Modeling Object


As for \(\pi_{1}(h_{1};\gamma)\) the parameters of this model will be estimated using maximum likelihood via R’s stats::glm() function. For convenience in later method implementations, we will again require that the predictions be returned on the scale of the probability.

The modeling objects for this regression step is

p2 <- modelObj::buildModelObj(model = ~ Ch,
                              solver.method = 'glm',
                              solver.args = list(family='binomial'),
                              predict.method = 'predict.glm',
                              predict.args = list(type='response'))

The regression is completed as follows:

PS2 <- modelObj::fit(object = p2, data = dataSBP, response = dataSBP$A)
PS2 <- modelObj::fitObject(object = PS2)
PS2

Call:  glm(formula = YinternalY ~ Ch, family = "binomial", data = data)

Coefficients:
(Intercept)           Ch  
   -3.06279      0.01368  

Degrees of Freedom: 999 Total (i.e. Null);  998 Residual
Null Deviance:      1378 
Residual Deviance: 1298     AIC: 1302

Again, we use summary() to examine statistics about the fit.

summary(PS2)

Call:
glm(formula = YinternalY ~ Ch, family = "binomial", data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7497  -1.0573  -0.7433   1.1449   1.9316  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.062786   0.348085  -8.799   <2e-16 ***
Ch           0.013683   0.001617   8.462   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1377.8  on 999  degrees of freedom
Residual deviance: 1298.2  on 998  degrees of freedom
AIC: 1302.2

Number of Fisher Scoring iterations: 4

First, in comparing the null deviance (1377.8) and the residual deviance (1298.2), we see that including the independent variables leads to a smaller deviance than obtained from the intercept only model. And finally, we see that the AIC is large (1302.247), but it is less than that obtained for \(\pi^{1}(x;\gamma)\) (1379.772). This result is not unexpected as we know that the model is only partially misspecified.



The third model we will consider is the correctly specified model used to generate the data set,

\[ \pi^{3}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{SBP0} + \gamma_{2}~\text{Ch})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{SBP0}+ \gamma_{2}~\text{Ch})}. \]

The parameters of this model will be estimated using maximum likelihood via R’s stats::glm() function. Predictions will be made using stats::predict.glm(), and we include a modification to the default input argument of stats::predict.glm() to ensure that predictions are returned on the scale of the probability. .

The modeling objects for this regression step is

p3 <- modelObj::buildModelObj(model = ~ SBP0 + Ch,
                              solver.method = 'glm',
                              solver.args = list(family='binomial'),
                              predict.method = 'predict.glm',
                              predict.args = list(type='response'))

The regression is completed as follows:

PS3 <- modelObj::fit(object = p3, data = dataSBP, response = dataSBP$A)
PS3 <- modelObj::fitObject(object = PS3)
PS3

Call:  glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)

Coefficients:
(Intercept)         SBP0           Ch  
  -15.94153      0.07669      0.01589  

Degrees of Freedom: 999 Total (i.e. Null);  997 Residual
Null Deviance:      1378 
Residual Deviance: 1162     AIC: 1168

Again, we use summary() to examine statistics about the fit.

summary(PS3)

Call:
glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3891  -0.9502  -0.4940   0.9939   2.1427  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -15.941527   1.299952 -12.263   <2e-16 ***
SBP0          0.076687   0.007196  10.657   <2e-16 ***
Ch            0.015892   0.001753   9.066   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1377.8  on 999  degrees of freedom
Residual deviance: 1161.6  on 997  degrees of freedom
AIC: 1167.6

Number of Fisher Scoring iterations: 3

First, we see from the null deviance (1377.8) and the residual deviance (1161.6) that including the independent variables does reduce the deviance as compared to the intercept only model. And finally, we see that the AIC is large (1167.621), but it is less than that obtained for both \(\pi^{1}(x;\gamma)\) (1379.772) and \(\pi^{2}(x;\gamma)\) (1302.247). This result is in-line with the knowledge that this is the data generating model.



It is now straightforward to estimate the treatment effect and the standard error for a variety of linear outcome models and logistic regression propensity score models. Throughout this chapter, we have considered three outcome regression models and three propensity score models selected to represent a range of model (mis)specification. Below, we show the analysis for only a few of these combinations. The treatment effects and standard errors for all combinations of outcome and propensity score models are summarized under the heading Comparison.



First we consider the two completely misspecified models

\[ Q^{1}(x,a;\beta) = \beta_{0} + \beta_{1} \text{W} + \beta_{2} \text{Cr} + a~(\beta_{3} + \beta_{4} \text{Cr}), \]

and

\[ \pi^{1}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}. \]

The modeling objects for the regression steps are

q1 <- modelObj::buildModelObj(model = ~ W + A*Cr,
                              solver.method = 'lm',
                              predict.method = 'predict.lm')
p1 <- modelObj::buildModelObj(model = ~ W + Cr,
                              solver.method = 'glm',
                              solver.args = list(family='binomial'),
                              predict.method = 'predict.glm',
                              predict.args = list(type='response'))

The estimated treatment effect and its standard error are obtained as follows:

delta_DR_se(moOR = q1, moPS = p1, data = dataSBP, y = y)
$fitOR

Call:
lm(formula = YinternalY ~ W + A + Cr + A:Cr, data = data)

Coefficients:
(Intercept)            W            A           Cr         A:Cr  
   -6.66893      0.02784     16.46653      0.56324      2.41004  


$fitPS

Call:  glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)

Coefficients:
(Intercept)            W           Cr  
   0.966434    -0.007919    -0.703766  

Degrees of Freedom: 999 Total (i.e. Null);  997 Residual
Null Deviance:      1378 
Residual Deviance: 1374     AIC: 1380

$deltaHat
[1] 18.43236

$EY
         0          1 
-10.090516   8.341846 

$sigmaHat
[1] 0.7559916

We see that the estimated treatment effect is 18.43 (0.76) mmHg.



Next we consider the scenario where one model is correctly specified and the other is not. Here, the outcome regression model is that used to generate the data

\[ Q^{3}(x,a;\beta) = \beta_{0} + \beta_{1} \text{Ch} + \beta_{2} \text{K} + a~(\beta_{3} + \beta_{4} \text{Ch} + \beta_{5} \text{K}), \] and the propensity score model is completely misspecified

\[ \pi^{1}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}. \]

The modeling objects for the regression steps are

q3 <- modelObj::buildModelObj(model = ~ (Ch + K)*A,
                              solver.method = 'lm',
                              predict.method = 'predict.lm')
p1 <- modelObj::buildModelObj(model = ~ W + Cr,
                              solver.method = 'glm',
                              solver.args = list(family='binomial'),
                              predict.method = 'predict.glm',
                              predict.args = list(type='response'))

The estimated treatment effect and its standard error is obtained as follows:

delta_DR_se(moOR = q3, moPS = p1, data = dataSBP, y = y)
$fitOR

Call:
lm(formula = YinternalY ~ Ch + K + A + Ch:A + K:A, data = data)

Coefficients:
(Intercept)           Ch            K            A         Ch:A          K:A  
   -15.6048      -0.2035      12.2849     -61.0979       0.5048      -6.6099  


$fitPS

Call:  glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)

Coefficients:
(Intercept)            W           Cr  
   0.966434    -0.007919    -0.703766  

Degrees of Freedom: 999 Total (i.e. Null);  997 Residual
Null Deviance:      1378 
Residual Deviance: 1374     AIC: 1380

$deltaHat
[1] 16.71185

$EY
        0         1 
-6.075114 10.636739 

$sigmaHat
[1] 0.7286743

We see that the estimated treatment effect is 16.71 (0.73) mmHg and is less than that obtained previously under completely misspecified models. And, the standard error is similar to that obtained previously.



Next we consider the second scenario under which one model is correctly specified and the other is not. That is, the outcome regression model is completely misspecified

\[ Q^{1}(x,a;\beta) = \beta_{0} + \beta_{1} \text{W} + \beta_{2} \text{Cr} + a~(\beta_{3} + \beta_{4} \text{Cr}), \]

and the propensity score model is that used to generate the data

\[ \pi^{3}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{SBP0} + \gamma_{2}~\text{Ch})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{SBP0}+ \gamma_{2}~\text{Ch})}. \]

The modeling objects for the regression steps are

q1 <- modelObj::buildModelObj(model = ~ W + A*Cr,
                              solver.method = 'lm',
                              predict.method = 'predict.lm')
p3 <- modelObj::buildModelObj(model = ~ SBP0 + Ch,
                              solver.method = 'glm',
                              solver.args = list(family='binomial'),
                              predict.method = 'predict.glm',
                              predict.args = list(type='response'))

The estimated treatment effect is obtained as follows:

delta_DR_se(moOR = q1, moPS = p3, data = dataSBP, y = y)
$fitOR

Call:
lm(formula = YinternalY ~ W + A + Cr + A:Cr, data = data)

Coefficients:
(Intercept)            W            A           Cr         A:Cr  
   -6.66893      0.02784     16.46653      0.56324      2.41004  


$fitPS

Call:  glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)

Coefficients:
(Intercept)         SBP0           Ch  
  -15.94153      0.07669      0.01589  

Degrees of Freedom: 999 Total (i.e. Null);  997 Residual
Null Deviance:      1378 
Residual Deviance: 1162     AIC: 1168

$deltaHat
[1] 17.23598

$EY
         0          1 
-12.416919   4.819056 

$sigmaHat
[1] 1.111939

We see that the estimated treatment effect is 17.24 (1.11) mmHg and is similar to that obtained previously under the alternative (true/misspecified) combination. And, the standard error is larger than that obtained under the alternative.



In the table below, we see that the estimated treatment effect under these model combinations ranges from 16.65 mmHg - 18.43 mmHg.

(mmHg) \(Q^{1}(x,a;\beta)\) \(Q^{2}(x,a;\beta)\) \(Q^{3}(x,a;\beta)\)
\(\pi^{1}(x;\gamma)\) 18.43 (0.76) 16.65 (0.76) 16.71 (0.73)
\(\pi^{2}(x;\gamma)\) 16.70 (1.08) 16.70 (0.76) 16.74 (0.73)
\(\pi^{3}(x;\gamma)\) 17.24 (1.11) 16.72 (0.78) 16.77 (0.74)

Comparison of Estimators





The table below compares the estimated treatment effects and standard errors for all of the estimators discussed here and under all the models considered in this chapter.

--- \(\widehat{\delta}^{\text{*}}_{S}\) \(\widehat{\delta}^{\text{*}}_{IPW}\) \(Q^{1}(x,a;\beta)\) \(Q^{2}(x,a;\beta)\) \(Q^{3}(x,a;\beta)\)
\(\widehat{\delta}^{\text{*}}_{N}\) 18.39 (0.75) --- --- --- --- ---
\(\widehat{\delta}^{\text{*}}_{OR}\) --- --- --- 18.44 (0.76) 16.64 (0.76) 16.70 (0.73)
\(\pi^{1}(x;\gamma)\) --- 18.36 (0.84) 18.43 (0.93) 18.43 (0.76) 16.65 (0.76) 16.71 (0.73)
\(\pi^{2}(x;\gamma)\) --- 16.83 (0.41) 16.65 (0.82) 16.70 (1.08) 16.70 (0.76) 16.74 (0.73)
\(\pi^{3}(x;\gamma)\) --- 16.86 (0.76) 17.01 (0.99) 17.24 (1.11) 16.72 (0.78) 16.77 (0.74)

The estimated standard errors are shown in parentheses. The first row contains the naive estimator, which does not depend on a regression model. The second row contains \(\widehat{\delta}^{\text{*}}_{OR}\) estimated under each outcome regression model. The second column corresponds to \(\widehat{\delta}^{\text{*}}_{S}\) under each propensity score regression model; the third to \(\widehat{\delta}^{\text{*}}_{IPW}\). The pale shaded elements show \(\widehat{\delta}^{\text{*}}_{DR}\) for each combination of outcome and propensity score models.



We see that the estimated treatment effect tends to be smaller with less variance for methods that use the correctly specified outcome regression model, \(Q^{3}(x,a;\widehat{\beta})\).





We have carried out a simulation study to evaluate the performances of the presented methods. We generated 1000 Monte Carlo data sets, each with \(n=1000\). The table below compares the Monte Carlo average estimated treatment effects and standard deviation of the estimated treatment effect. In addition, the Monte Carlo average of the standard errors as defined previously for each estimator is provided in square brackets.

--- \(\widehat{\delta}^{\text{*}}_{S}\) \(\widehat{\delta}^{\text{*}}_{IPW}\) \(Q^{1}(x,a;\beta)\) \(Q^{2}(x,a;\beta)\) \(Q^{3}(x,a;\beta)\)
\(\widehat{\delta}^{\text{*}}_{N}\) 19.26 (0.76); [0.85] --- --- --- --- ---
\(\widehat{\delta}^{\text{*}}_{OR}\) --- --- --- 19.26 (0.76); [0.76] 17.39 (0.78); [0.77] 17.39 (0.75); [0.74]
\(\pi^{1}(x;\gamma)\) --- 19.26 (0.76); [0.85] 19.26 (0.76); [0.94] 19.26 (0.76); [0.76] 17.39 (0.78); [0.77] 17.39 (0.75); [0.74]
\(\pi^{2}(x;\gamma)\) --- 17.57 (0.78); [0.43] 17.37 (0.86); [0.86] 17.38 (0.89); [1.16] 17.39 (0.78); [0.77] 17.39 (0.75); [0.74]
\(\pi^{3}(x;\gamma)\) --- 17.61 (0.86); [0.78] 17.40 (1.03); [1.07] 17.41 (1.13); [1.37] 17.39 (0.80); [0.79] 17.39 (0.76); [0.75]

Recall that the true treatment effect is \(\delta^{\text{*}} = 17.4\) mmHg. We see that the naive estimator overestimates the treatment effect. Further, when either the propensity or outcome model is correctly specified, the estimated treatment effect agrees well with the true value.



This simulation demonstrates some results and subtleties that are not discussed in Chapter 2 of the book but are well-known in the literature on causal inference. To allow a general discussion of these, define

$$ X_{1} = \text{Ch},~~ X_{2} = \text{K},~~ X_{3} = \text{SBP0}, \text{and}~~X_{4} =\text{ all other variables}. $$

A review of the simulation scenario under which the data were generated and on which this simulation is based shows that the conditional distribution of \(Y\) given \(A, X_{1}, X_{2}, X_{3}, X_{4}\) does not depend on \(X_{3}, X_{4}\) and thus is the same as the conditional distribution of \(Y\) given \(A, X_{1}, X_{2}\). Similarly, the conditional distribution of \(A\) given \(X_{1}, X_{2}, X_{3}, X_{4}\) does not depend on \(X_{2} and X_{4}\) and thus is the same as the conditional distribution of \(A\) given \(X_{1}, X_{3}\). We summarize as

$$ \begin{align} Y | A, X_1, X_2, X_3, X_4 \sim Y | A, X_1, X_2 \\ A| X_1, X_2, X_3, X_4 \sim A | X_1, X_3; \end{align} $$

that is, \(X_1, X_2\) are the relevant individual characteristics (covariates) for the true outcome regression and \(X_1, X_3\) are the relevant covariates for the true propensity score.



Underlying this situation are the assumptions of SUTVA (consistency),

$$ Y = Y^{\text{*}}(1) A + Y^{\text{*}}(0) (1-A), $$

where \(Y^{\text{*}}(a)\) is the potential outcome if an individual were to receive treatment a; and no unmeasured confounders,

$$ \{ Y^{\text{*}}(1), Y^{\text{*}}(0) \} \perp\!\!\!\perp A | X_1, X_2, X_3, X_4. $$

Because of the way the data were generated, it can be shown that, under SUTVA and the no unmeasured confounders assumption, it suffices to consider \(X_1, X_2\) (covariates associated with outcome) or \(X_1, X_3\) (covariates associated with propensity) as confounders for which an adjustment must be made to obtain valid inference on \(\delta^\text{*}\). That is, we can show that in fact

$$ \begin{align} \{ Y^{\text{*}}(1), Y^{\text{*}}(0) \} \perp\!\!\!\perp A | X_1, X_2, \\ \{ Y^{\text{*}}(1), Y^{\text{*}}(0) \} \perp\!\!\!\perp A | X_1, X_3. \end{align} $$

Moreover, because \(X_1, X_2, X_3\) were generated independently of each other in this scenario, it suffices to consider only \(X_1\) as a confounder for which an adjustment must be made; that is, it can be shown that

$$ \{ Y^{\text{*}}(1), Y^{\text{*}}(0) \} \perp\!\!\!\perp A | X_1. $$

Consequently, if we use a correctly specified model for \(E(Y | A, X_1)\) with either the outcome regression estimator or the doubly-robust (AIPW) estimator for \(\delta^{\text{*}}\), then we should obtain a consistent estimator for \(\delta^{\text{*}}\).

Similarly, if we use a correctly specified model for \(E(A | X_1)\) as the model for the propensity score in either of the IPW or doubly-robust (AIPW) estimators, then, again, we should obtain a consistent estimator for \(\delta^{\text{*}}\).

The incomplete models \(Q^2\) and \(\pi^2\) for the outcome regression and propensity score, respectively, both include \(X_1\) (Ch). Thus, the above implies that we should obtain consistent estimators for \(\delta^{\text{*}}\) using \(Q^2\) with the outcome regression estimator or the doubly-robust (AIPW) estimator (in the latter case even if the propensity model is misspecified, as in the case of \(\pi^1\)). The above also implies that we should obtain consistent estimators for \(\delta^{\text{*}}\) using \(\pi^2\) with either of the IPW or doubly-robust (AIPW) estimators (in the latter case even if the outcome regression model is misspecified, as in the case of \(Q^1\)). The simulation results in the table above are consistent with these observations.

Theory also suggests that using all of the covariates associated with outcome in either of the outcome regression model or the propensity score model should result in more efficient estimation of \(\delta^{\text{*}}\). That is, including \(X_1, X_2\) in the outcome regression model or the propensity score model should improve the performance of any of the outcome regression, IPW, or doubly-robust (AIPW) estimators for \(\delta^{\text{*}}\) even though \(X_2\) is not needed to adjust for confounding. Again, this theoretical result is consistent with the simulations above.

It is also known that adjusting for covariates that are related to the propensity score but not to the outcome regression actually yields estimators that are less precise asymptotically. Thus, including \(X_3\) in the propensity model, so using the correctly specified model \(\pi^3\) in the IPW estimator actually results in a less precise IPW estimator than that obtained using \(\pi^2\). Again, this is borne out in the simulations above.

Note, however, that, because the logistic model is nonlinear, if \(\pi^3\) is a correctly specified model for \(E(A | X_1, X_3)\), a correctly specified model for \(E(A | X_1)\) will not have the logistic form. Thus, in the simulations, \(\pi^1\) is not a correctly specified model for the true relationship \(E(A | X_1)\) implied by the simulation scenario, so, technically, the IPW estimator using \(\pi^1\) need not be consistent. However, as seen in the simulations, there is very little effect of this misspecification on the performance of the estimator.

Finally, although not demonstrated in these simulations, the theory implies that modeling the propensity score using \(X_1, X_2\) (so including the covariates in the propensity model that are associated with response) should result in an IPW estimator that is consistent and more efficient. That is, adding \(X_2\), a covariate that is related to outcome but not to treatment assignment, to the propensity score model including the confounder \(X_1\) results in improved precision. This is shown explicitly in Lunceford and Davidian (2004).