Skip to content

Potential discrepancy in second derivative of log-likelihood (Hessian) for overdispersion parameter \theta in overdispersion.cpp #74

@Truongphi20

Description

@Truongphi20

Description

Hi glmGamPoi author,

I have been reviewing the mathematical formulation for the overdispersion parameter $\theta$ optimization and noticed a potential mismatch between the theoretical second derivative of the log-likelihood ($\Large\frac{d^2l}{d\theta^2}$) and its actual code implementation in src/overdispersion.cpp.

Could you help clarify if this is an intentional optimization (e.g., an approximation like Fisher Scoring) or a mathematical typo?

1. Theoretical Second Derivative

Based on the log-likelihood function, the first derivative with respect to $\theta$ is:

$$\frac{d l}{d \theta} = \frac{1}{\theta} \left[ \sum_{k=1}^{N}{ \left( -\frac{1}{\theta} \left( \psi(C_k + \theta^{-1}) - \psi(\theta^{-1}) \right) + \ln(1+\mu_k\theta) + \frac{C_k - \mu_k}{\mu_k + \theta^{-1}} \right) } \right]$$

Differentiating this a second time yields the theoretical Hessian component for $\theta$:

$$\frac{d^2l}{d\theta^2} = \sum_{k=1}^{N} \left[ \frac{2}{\theta^3}\left(\psi(C_k + \theta^{-1}) - \psi(\theta^{-1})\right) + \frac{1}{\theta^4}\left(\psi'(C_k + \theta^{-1}) - \psi'(\theta^{-1})\right) - \frac{1}{\theta^2}\ln(1 + \mu_k\theta) + \frac{\mu_k}{\theta(1+\mu_k\theta)} - \frac{\mu_k(C_k-\mu_k)}{(1+\mu_k\theta)^2} \right]$$

2. Code Implementation Analysis

Looking at src/overdispersion.cpp (around line 233):

for(size_t iter = 0; iter < count_frequencies.size(); ++iter){
      digamma_term += count_frequencies[iter] * Rf_digamma(unique_counts[iter] + theta_neg1);
      trigamma_term += count_frequencies[iter] * Rf_trigamma(unique_counts[iter] + theta_neg1);
    }
    trigamma_term *= theta_neg2;

    digamma_term -= y.size() * Rf_digamma(theta_neg1);
    trigamma_term -=  theta_neg2 * y.size() * Rf_trigamma(theta_neg1);
...

ll_part_1 += log(1 + mu[i] * theta) + (y[i] - mu[i]) / (mu[i] + theta_neg1);
ll_part_2 += (mu[i] * mu[i] * theta + y[i]) / (1 + mu[i] * theta) / (1 + mu[i] * theta);

double ll_part = -2 * theta_neg1 * (ll_part_1 - digamma_term) + (ll_part_2 + trigamma_term);

double res = ll_part + cr_term * R_pow_di(theta, 2) + (ll_part_1 - digamma_term) * theta_neg1 + cr_term2 * theta;

If we map these code components directly back to the math notation (ignoring the Cox-Reid adjustment terms cr_term for a moment), the logic computes:

$$ \begin{aligned} \text{digamma-term} &= \psi(C_k + \theta^{-1}) - \psi(\theta^{-1}) \\ \text{trigamma-term} &= \frac{1}{\theta^2} (\psi'(C_k + \theta^{-1}) - \psi'(\theta^{-1})) \\ \text{ll-part-1} &= \ln(1 + \mu_k\theta) + \frac{C_k - \mu_k}{\mu_k + \theta^{-1}} \\ \text{ll-part-2} &= \frac{\mu^2_k\theta+C_k}{(1+\mu\theta)^2} \\ \text{ll-part} &= \frac{-2}{\theta} \left( \text{ll-part-1} - \text{digamma-term} \right) + \text{ll-part-2} + \text{trigamma-term} \\ \text{res} &= \text{ll-part} + \frac{1}{\theta}(\text{ll-part-1} - \text{digamma-term}) \end{aligned} $$

The Discrepancy

When expanding the final returned variable res, the resulting combination yields:

$$\text{res} = \sum_{k=1}^{N} \left[ \frac{1}{\theta} \left( \psi(C_k + \theta^{-1}) - \psi(\theta^{-1}) - \ln(1 + \mu_k\theta) - \frac{C_k - \mu_k}{\mu_k + \theta^{-1}} \right) + \frac{\mu^2_k\theta+C_k}{(1+\mu_k\theta)^2} + \frac{1}{\theta^2} (\psi'(C_k + \theta^{-1}) - \psi'(\theta^{-1})) \right]$$

Comparing this to the analytic second derivative ($\Large\frac{d^2l}{d\theta^2}$), the coefficients for $\Large\theta$ do not match (for example, the code ends up with a $\frac{1}{\theta}$ scaling on the digamma differences instead of $\Large\frac{2}{\theta^3}$, and $\Large\frac{1}{\theta^2}$ on the trigamma differences instead of $\Large\frac{1}{\theta^4}$).

Could you please clarify if res is intended to evaluate exactly to $\Large\frac{d^2l}{d\theta^2}$, or if this reflects a specific parameter transformation / expected information approximation step?

Thank you for your time and for maintaining this excellent package!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions