Potential discrepancy in second derivative of log-likelihood (Hessian) for overdispersion parameter \theta in overdispersion.cpp

### **Description**

Hi `glmGamPoi` author,

I have been reviewing the mathematical formulation for the overdispersion parameter $\theta$ optimization and noticed a potential mismatch between the theoretical second derivative of the log-likelihood ($\Large\frac{d^2l}{d\theta^2}$) and its actual code implementation in `src/overdispersion.cpp`.

Could you help clarify if this is an intentional optimization (e.g., an approximation like Fisher Scoring) or a mathematical typo?

#### **1. Theoretical Second Derivative**

Based on the log-likelihood function, the first derivative with respect to $\theta$ is:

$$\frac{d l}{d \theta} = \frac{1}{\theta} \left[ \sum_{k=1}^{N}{ \left( -\frac{1}{\theta} \left( \psi(C_k + \theta^{-1}) - \psi(\theta^{-1})  \right)  + \ln(1+\mu_k\theta) + \frac{C_k - \mu_k}{\mu_k + \theta^{-1}} \right) } \right]$$

Differentiating this a second time yields the theoretical Hessian component for $\theta$:

$$\frac{d^2l}{d\theta^2} = 
\sum_{k=1}^{N} \left[ 
   \frac{2}{\theta^3}\left(\psi(C_k + \theta^{-1}) - \psi(\theta^{-1})\right) + \frac{1}{\theta^4}\left(\psi'(C_k + \theta^{-1}) - \psi'(\theta^{-1})\right) - \frac{1}{\theta^2}\ln(1 + \mu_k\theta) + \frac{\mu_k}{\theta(1+\mu_k\theta)} - \frac{\mu_k(C_k-\mu_k)}{(1+\mu_k\theta)^2}
\right]$$

#### **2. Code Implementation Analysis**

Looking at `src/overdispersion.cpp` (around line 233):

```cpp
for(size_t iter = 0; iter < count_frequencies.size(); ++iter){
      digamma_term += count_frequencies[iter] * Rf_digamma(unique_counts[iter] + theta_neg1);
      trigamma_term += count_frequencies[iter] * Rf_trigamma(unique_counts[iter] + theta_neg1);
    }
    trigamma_term *= theta_neg2;

    digamma_term -= y.size() * Rf_digamma(theta_neg1);
    trigamma_term -=  theta_neg2 * y.size() * Rf_trigamma(theta_neg1);
...

ll_part_1 += log(1 + mu[i] * theta) + (y[i] - mu[i]) / (mu[i] + theta_neg1);
ll_part_2 += (mu[i] * mu[i] * theta + y[i]) / (1 + mu[i] * theta) / (1 + mu[i] * theta);

double ll_part = -2 * theta_neg1 * (ll_part_1 - digamma_term) + (ll_part_2 + trigamma_term);

double res = ll_part + cr_term * R_pow_di(theta, 2) + (ll_part_1 - digamma_term) * theta_neg1 + cr_term2 * theta;

```

If we map these code components directly back to the math notation (ignoring the Cox-Reid adjustment terms `cr_term` for a moment), the logic computes:

$$
\begin{aligned}
\text{digamma-term} &= \psi(C_k + \theta^{-1}) - \psi(\theta^{-1}) \\
\text{trigamma-term} &= \frac{1}{\theta^2} (\psi'(C_k + \theta^{-1}) - \psi'(\theta^{-1})) \\
\text{ll-part-1} &= \ln(1 + \mu_k\theta) + \frac{C_k - \mu_k}{\mu_k + \theta^{-1}} \\
\text{ll-part-2} &= \frac{\mu^2_k\theta+C_k}{(1+\mu\theta)^2} \\
\text{ll-part} &= \frac{-2}{\theta} \left( \text{ll-part-1} - \text{digamma-term}  \right) + \text{ll-part-2} + \text{trigamma-term} \\
\text{res} &= \text{ll-part} + \frac{1}{\theta}(\text{ll-part-1} - \text{digamma-term})
\end{aligned}
$$

#### **The Discrepancy**

When expanding the final returned variable `res`, the resulting combination yields:

$$\text{res} = \sum_{k=1}^{N} \left[ \frac{1}{\theta} \left( \psi(C_k + \theta^{-1}) - \psi(\theta^{-1})  - \ln(1 + \mu_k\theta) - \frac{C_k - \mu_k}{\mu_k + \theta^{-1}} \right) + \frac{\mu^2_k\theta+C_k}{(1+\mu_k\theta)^2} + \frac{1}{\theta^2} (\psi'(C_k + \theta^{-1}) - \psi'(\theta^{-1})) \right]$$

Comparing this to the analytic second derivative ($\Large\frac{d^2l}{d\theta^2}$), the coefficients for $\Large\theta$ do not match (for example, the code ends up with a $\frac{1}{\theta}$ scaling on the digamma differences instead of $\Large\frac{2}{\theta^3}$, and $\Large\frac{1}{\theta^2}$ on the trigamma differences instead of $\Large\frac{1}{\theta^4}$).

Could you please clarify if `res` is intended to evaluate exactly to $\Large\frac{d^2l}{d\theta^2}$, or if this reflects a specific parameter transformation / expected information approximation step?

Thank you for your time and for maintaining this excellent package!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential discrepancy in second derivative of log-likelihood (Hessian) for overdispersion parameter \theta in overdispersion.cpp #74

Description

1. Theoretical Second Derivative

2. Code Implementation Analysis

The Discrepancy

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Potential discrepancy in second derivative of log-likelihood (Hessian) for overdispersion parameter \theta in overdispersion.cpp #74

Description

Description

1. Theoretical Second Derivative

2. Code Implementation Analysis

The Discrepancy

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions