set.seed(1235)
= 200
n = rnorm(n)
x = cbind(1, x)
X = ncol(X)
d = c(1, -0.25)
beta_true = exp(X %*% beta_true)
lambda_true = rpois(n, lambda_true) y
Numerical Maximum Likelihood with the stats4 Package
1 Introduction
The stats4 package included in R provides a layer of usability on top of optim
for numerial maximum likelihood (ML). For example, the user can request an estimate of the covariance matrix associated with the ML estimates rather than computing it manually. According to Henningsen and Toomet (2011), stats4
has been bundled into R since 2003, but I only found out about it recently. In this post, we will present some brief examples using stats4
. Note that the maxLik package presented by Henningsen and Toomet (2011) covers similar ground with additional functionality; therefore it may also be of interest.
2 Poisson Regression Example
Consider a Poisson regression setting with \(Y_i \sim \text{Poisson}(\lambda_i)\), independently distributed for \(i = 1, \ldots, n\) where \(\lambda_i = \exp(x_i^\top \beta)\) for covariate \(x_i \in \mathbb{R}^d\) and unknown coefficient \(\beta \in \mathbb{R}^d\).
Let us simulate a dataset from this setting.
Of course we can use the glm
function to obtain the MLE \(\hat{\beta}\) in this setting. Let us do that as a point of comparison.
= glm(y ~ x, family = poisson)
glm_out summary(glm_out)
Call:
glm(formula = y ~ x, family = poisson)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.98109 0.04392 22.338 < 2e-16 ***
x -0.25893 0.03908 -6.626 3.44e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 272.04 on 199 degrees of freedom
Residual deviance: 228.85 on 198 degrees of freedom
AIC: 749.28
Number of Fisher Scoring iterations: 5
3 Using the mle
Function in stats4
Recall that the loglikelihood for this model is \[
\log \mathcal{L}(\beta) = \sum_{i=1}^n \Big\{ y_i \log \lambda_i - \lambda_i - \log \Gamma(y_i + 1) \Big\}.
\] If this were a nonstandard likelihood, we would need to be able to code and optimize it ourselves. The stats4
package allows us to do this. Here are the arguments of the stats4::mle
function which may be used to fit the MLE.
str(mle, give.attr = F)
function (minuslogl, start, optim = stats::optim, method = if (!useLim) "BFGS" else "L-BFGS-B",
fixed = list(), nobs, lower, upper, ...)
Note that the first argument expects the negative loglikelihood; the optimization is then carried out as a minimization. Let us code it.
= function(beta = numeric(d)) {
nloglik -sum(dpois(y, lambda = exp(X %*% beta), log = TRUE))
}
= mle(nloglik, method = "L-BFGS-B", nobs = length(y)) mle_out
A variety of included accessors can be used to work with the result.
summary(mle_out)
Maximum likelihood estimation
Call:
mle(minuslogl = nloglik, method = "L-BFGS-B", nobs = length(y))
Coefficients:
Estimate Std. Error
beta1 0.9810935 0.04392105
beta2 -0.2589335 0.03907591
-2 log L: 745.2773
coef(mle_out)
beta1 beta2
0.9810935 -0.2589335
logLik(mle_out)
'log Lik.' -372.6387 (df=2)
AIC(mle_out)
[1] 749.2773
BIC(mle_out)
[1] 755.874
confint(mle_out, level = 0.90)
Profiling...
5 % 95 %
beta1 0.9079239 1.052437
beta2 -0.3230551 -0.194514
vcov(mle_out)
beta1 beta2
beta1 0.0019290582 0.0004114616
beta2 0.0004114616 0.0015269270
profile(mle_out)
An object of class "profile.mle"
Slot "profile":
$beta1
z par.vals.beta1 par.vals.beta2
1 -3.0192356 0.8453337 -0.2894697
2 -2.5258203 0.8679603 -0.2841583
3 -2.0285449 0.8905870 -0.2789366
4 -1.5273678 0.9132136 -0.2738039
5 -1.0222468 0.9358402 -0.2687598
6 -0.5131396 0.9584668 -0.2638030
7 0.0000000 0.9810935 -0.2589335
8 0.5172066 1.0037201 -0.2541504
9 1.0385329 1.0263467 -0.2494528
10 1.5640206 1.0489733 -0.2448400
11 2.0937147 1.0716000 -0.2403112
12 2.6276606 1.0942266 -0.2358654
$beta2
z par.vals.beta1 par.vals.beta2
1 -2.5855783 0.9478931 -0.3595864
2 -2.0668516 0.9555116 -0.3394558
3 -1.5489482 0.9626386 -0.3193253
4 -1.0318503 0.9692764 -0.2991947
5 -0.5155395 0.9754273 -0.2790641
6 0.0000000 0.9810935 -0.2589335
7 0.5147983 0.9862771 -0.2388029
8 1.0288654 0.9909803 -0.2186724
9 1.5422257 0.9952050 -0.1985418
10 2.0549007 0.9989531 -0.1784112
11 2.5669123 1.0022264 -0.1582806
12 3.0782832 1.0050266 -0.1381501
Slot "summary":
Maximum likelihood estimation
Call:
mle(minuslogl = nloglik, method = "L-BFGS-B", nobs = length(y))
Coefficients:
Estimate Std. Error
beta1 0.9810935 0.04392105
beta2 -0.2589335 0.03907591
-2 log L: 745.2773
4 Fixed Parameters
An argument of mle
that can be useful is fixed
, where we can specify some parameters to be held fixed during the optimization. In our example with \(\beta = (\beta_1, \beta_2)\) let us consider another fit with the slope coefficient fixed at \(\beta_2 = 0\) .
= mle(nloglik, method = "L-BFGS-B", nobs = length(y),
mle0_out fixed = list(beta = c(NA, 0)))
print(mle0_out)
Call:
mle(minuslogl = nloglik, method = "L-BFGS-B", fixed = list(beta = c(NA,
0)), nobs = length(y))
Coefficients:
beta1 beta2
1.011601 0.000000
Note that NA
was associated with \(\beta_1\) to indicate that it should not remain fixed. We can compute a likelihood ratio test of the hypothesis \(H_0: \beta_2 = 0\) versus \(H_1: \beta_2 \neq 0\) using the two fits mle_out
and mle0_out
.
= 2 * (logLik(mle_out) - logLik(mle0_out))
lrt = pchisq(lrt, df = 1, lower.tail = F)
pval cat("LRT was computed with test statistic", lrt, "and p-value", pval, ".\n")
LRT was computed with test statistic 43.19441 and p-value 4.956214e-11 .
5 Transforming Parameters
A convenient property of ML estimates (MLEs) is the invariance property: suppose the MLE of \(\beta\) is \(\hat{\beta}\), then the MLE of a function \(g(\beta)\) is \(g(\hat{\beta})\). An associated variance estimate is given by \[
\widehat{\text{Var}}(g(\hat{\beta})) =
[J(\hat{\beta})]
\widehat{\text{Var}}(\hat{\beta})
[J(\hat{\beta})]^\top
\] where \(J(\beta) = \partial g(\beta) / \partial \beta\) is the Jacobian of the transformation. Here is an R function to compute the transformed variance. Note that we use the numDeriv
package.
= function(object, tx) {
vcov_tx = numDeriv::jacobian(func = tx, x = coef(object))
J = vcov(object)
V %*% tcrossprod(V, J)
J }
As an example, let us estimate each \(\lambda_i = \exp(x_i^\top \beta)\) and its variance, and construct an associated confidence interval with level 90% nominal coverage level.
= 0.10
alpha = function(beta) { exp(X %*% beta) }
tx = tx(coef(mle_out))
tx_hat = sqrt(diag(vcov_tx(mle_out, tx)))
sd_tx_hat = tx_hat - qnorm(1 - alpha/2) * sd_tx_hat
tx_lo = tx_hat + qnorm(1 - alpha/2) * sd_tx_hat tx_hi
Here are the results in a data frame.
library(tidyverse)
library(kableExtra)
data.frame(EST = tx_hat, SD = sd_tx_hat, LO = tx_lo, HI = tx_hi) %>%
mutate(TRUTH = lambda_true) %>%
head() %>%
kbl(booktabs = T)
EST | SD | LO | HI | TRUTH |
---|---|---|---|---|
3.195763 | 0.1463983 | 2.954960 | 3.436567 | 3.236515 |
3.720229 | 0.2166837 | 3.363816 | 4.076642 | 3.747967 |
2.064235 | 0.1344154 | 1.843142 | 2.285329 | 2.122321 |
2.591277 | 0.1170420 | 2.398760 | 2.783794 | 2.643374 |
2.589646 | 0.1170497 | 2.397117 | 2.782176 | 2.641767 |
1.718365 | 0.1510794 | 1.469862 | 1.966869 | 1.777932 |