Numerical Maximum Likelihood with the stats4 Package

programming

Published

December 11, 2024

1 Introduction

The stats4 package included in R provides a layer of usability on top of optim for numerial maximum likelihood (ML). For example, the user can request an estimate of the covariance matrix associated with the ML estimates rather than computing it manually. According to Henningsen and Toomet (2011), stats4 has been bundled into R since 2003, but I only found out about it recently. In this post, we will present some brief examples using stats4. Note that the maxLik package presented by Henningsen and Toomet (2011) covers similar ground with additional functionality; therefore it may also be of interest.

2 Poisson Regression Example

Consider a Poisson regression setting with \(Y_i \sim \text{Poisson}(\lambda_i)\), independently distributed for \(i = 1, \ldots, n\) where \(\lambda_i = \exp(x_i^\top \beta)\) for covariate \(x_i \in \mathbb{R}^d\) and unknown coefficient \(\beta \in \mathbb{R}^d\).

Let us simulate a dataset from this setting.

set.seed(1235)
n = 200
x = rnorm(n)
X = cbind(1, x)
d = ncol(X)
beta_true = c(1, -0.25)
lambda_true = exp(X %*% beta_true)
y = rpois(n, lambda_true)

Of course we can use the glm function to obtain the MLE \(\hat{\beta}\) in this setting. Let us do that as a point of comparison.

glm_out = glm(y ~ x, family = poisson)
summary(glm_out)


Call:
glm(formula = y ~ x, family = poisson)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.98109    0.04392  22.338  < 2e-16 ***
x           -0.25893    0.03908  -6.626 3.44e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 272.04  on 199  degrees of freedom
Residual deviance: 228.85  on 198  degrees of freedom
AIC: 749.28

Number of Fisher Scoring iterations: 5

3 Using the `mle` Function in `stats4`

Recall that the loglikelihood for this model is \[ \log \mathcal{L}(\beta) = \sum_{i=1}^n \Big\{ y_i \log \lambda_i - \lambda_i - \log \Gamma(y_i + 1) \Big\}. \] If this were a nonstandard likelihood, we would need to be able to code and optimize it ourselves. The stats4 package allows us to do this. Here are the arguments of the stats4::mle function which may be used to fit the MLE.

str(mle, give.attr = F)

function (minuslogl, start, optim = stats::optim, method = if (!useLim) "BFGS" else "L-BFGS-B", 
    fixed = list(), nobs, lower, upper, ...)

Note that the first argument expects the negative loglikelihood; the optimization is then carried out as a minimization. Let us code it.

nloglik = function(beta = numeric(d)) {
    -sum(dpois(y, lambda = exp(X %*% beta), log = TRUE))
}

mle_out = mle(nloglik, method = "L-BFGS-B", nobs = length(y))

A variety of included accessors can be used to work with the result.

summary(mle_out)

Maximum likelihood estimation

Call:
mle(minuslogl = nloglik, method = "L-BFGS-B", nobs = length(y))

Coefficients:
        Estimate Std. Error
beta1  0.9810935 0.04392105
beta2 -0.2589335 0.03907591

-2 log L: 745.2773

coef(mle_out)

     beta1      beta2 
 0.9810935 -0.2589335

logLik(mle_out)

'log Lik.' -372.6387 (df=2)

AIC(mle_out)

[1] 749.2773

BIC(mle_out)

[1] 755.874

confint(mle_out, level = 0.90)

Profiling...

             5 %      95 %
beta1  0.9079239  1.052437
beta2 -0.3230551 -0.194514

vcov(mle_out)

             beta1        beta2
beta1 0.0019290582 0.0004114616
beta2 0.0004114616 0.0015269270

profile(mle_out)

An object of class "profile.mle"
Slot "profile":
$beta1
            z par.vals.beta1 par.vals.beta2
1  -3.0192356      0.8453337     -0.2894697
2  -2.5258203      0.8679603     -0.2841583
3  -2.0285449      0.8905870     -0.2789366
4  -1.5273678      0.9132136     -0.2738039
5  -1.0222468      0.9358402     -0.2687598
6  -0.5131396      0.9584668     -0.2638030
7   0.0000000      0.9810935     -0.2589335
8   0.5172066      1.0037201     -0.2541504
9   1.0385329      1.0263467     -0.2494528
10  1.5640206      1.0489733     -0.2448400
11  2.0937147      1.0716000     -0.2403112
12  2.6276606      1.0942266     -0.2358654

$beta2
            z par.vals.beta1 par.vals.beta2
1  -2.5855783      0.9478931     -0.3595864
2  -2.0668516      0.9555116     -0.3394558
3  -1.5489482      0.9626386     -0.3193253
4  -1.0318503      0.9692764     -0.2991947
5  -0.5155395      0.9754273     -0.2790641
6   0.0000000      0.9810935     -0.2589335
7   0.5147983      0.9862771     -0.2388029
8   1.0288654      0.9909803     -0.2186724
9   1.5422257      0.9952050     -0.1985418
10  2.0549007      0.9989531     -0.1784112
11  2.5669123      1.0022264     -0.1582806
12  3.0782832      1.0050266     -0.1381501


Slot "summary":
Maximum likelihood estimation

Call:
mle(minuslogl = nloglik, method = "L-BFGS-B", nobs = length(y))

Coefficients:
        Estimate Std. Error
beta1  0.9810935 0.04392105
beta2 -0.2589335 0.03907591

-2 log L: 745.2773

4 Fixed Parameters

An argument of mle that can be useful is fixed, where we can specify some parameters to be held fixed during the optimization. In our example with \(\beta = (\beta_1, \beta_2)\) let us consider another fit with the slope coefficient fixed at \(\beta_2 = 0\) .

mle0_out = mle(nloglik, method = "L-BFGS-B", nobs = length(y),
    fixed = list(beta = c(NA, 0)))
print(mle0_out)


Call:
mle(minuslogl = nloglik, method = "L-BFGS-B", fixed = list(beta = c(NA, 
    0)), nobs = length(y))

Coefficients:
   beta1    beta2 
1.011601 0.000000

Note that NA was associated with \(\beta_1\) to indicate that it should not remain fixed. We can compute a likelihood ratio test of the hypothesis \(H_0: \beta_2 = 0\) versus \(H_1: \beta_2 \neq 0\) using the two fits mle_out and mle0_out.

lrt = 2 * (logLik(mle_out) - logLik(mle0_out))
pval = pchisq(lrt, df = 1, lower.tail = F)
cat("LRT was computed with test statistic", lrt, "and p-value", pval, ".\n")

LRT was computed with test statistic 43.19441 and p-value 4.956214e-11 .

5 Transforming Parameters

A convenient property of ML estimates (MLEs) is the invariance property: suppose the MLE of \(\beta\) is \(\hat{\beta}\), then the MLE of a function \(g(\beta)\) is \(g(\hat{\beta})\). An associated variance estimate is given by \[ \widehat{\text{Var}}(g(\hat{\beta})) = [J(\hat{\beta})] \widehat{\text{Var}}(\hat{\beta}) [J(\hat{\beta})]^\top \] where \(J(\beta) = \partial g(\beta) / \partial \beta\) is the Jacobian of the transformation. Here is an R function to compute the transformed variance. Note that we use the numDeriv package.

vcov_tx = function(object, tx) {
    J = numDeriv::jacobian(func = tx, x = coef(object))
    V = vcov(object)
    J %*% tcrossprod(V, J)
}

As an example, let us estimate each \(\lambda_i = \exp(x_i^\top \beta)\) and its variance, and construct an associated confidence interval with level 90% nominal coverage level.

alpha = 0.10
tx = function(beta) { exp(X %*% beta) }
tx_hat = tx(coef(mle_out))
sd_tx_hat = sqrt(diag(vcov_tx(mle_out, tx)))
tx_lo = tx_hat - qnorm(1 - alpha/2) * sd_tx_hat
tx_hi = tx_hat + qnorm(1 - alpha/2) * sd_tx_hat

Here are the results in a data frame.

library(tidyverse)
library(kableExtra)
data.frame(EST = tx_hat, SD = sd_tx_hat, LO = tx_lo, HI = tx_hi) %>%
    mutate(TRUTH = lambda_true) %>%
    head() %>%
    kbl(booktabs = T)

EST	SD	LO	HI	TRUTH
3.195763	0.1463983	2.954960	3.436567	3.236515
3.720229	0.2166837	3.363816	4.076642	3.747967
2.064235	0.1344154	1.843142	2.285329	2.122321
2.591277	0.1170420	2.398760	2.783794	2.643374
2.589646	0.1170497	2.397117	2.782176	2.641767
1.718365	0.1510794	1.469862	1.966869	1.777932

6 References

Henningsen, Arne, and Ott Toomet. 2011. “maxLik: A Package for Maximum Likelihood Estimation in R.” Computational Statistics 26 (3): 443–58. https://doi.org/10.1007/s00180-010-0217-1.