Statistical properties of MLEM

What about counts

How does non-uniformity, or plain image noise, or SNR relate to N, number of collected counts? It depends.

For example, it is a well known fact that if one uses filtered back projection (FBP), the relation is approximately: $$SNR\propto\sqrt{N}$$ see Watson for example. However, iterative methods with regularization and smoothing produce much better images, according to, among others, Jeff Fessler at University of Michigan. What kind of relation does one expect for such reconstruction methods?

There is some empirical evidence, for example Chang and co-workers for the same relation to hold, however only for trues count rate used on the horizontal axis.

On the more rigorous analytical side, one has a bunch of papers dealing with variance and mean of esitmators. Moreover, MLEM is an implicitly defined estimator, such that

$$\hat{\lambda}=max_{arg}[L(\mathbf{y}\vert\lambda)]$$

or some such contraption, with L the maximum likelihood cost function. So the mathematics gets hairy.

First - all implications are for converged estimators, while in reality, iterations are cut somewhere, probably far from convergence.
Second - there is probably background (scattering and randoms) which one probably has little idea of, and doesn't play well with statistical predictions.
Third - pure, unbiased, converged MLEM image estimates are rubbish, because they are too noisy. So some level of smoothing and regularization is normally present, either as a remainder of incomplete MLEM convergence or additional smoothing and regularization.

Here I focus on regularized and perhaps even smoothed estimates. Green shows how MLEM can be modified to include a regularization term. In the simplest case of using an energy penalty term, regularization is just adding a $\beta$ weighted image to sensitivity: $$S=S+2\beta R\hat{\lambda}$$ where $\hat{\lambda}$ is the current image estimate, and R a symmetric positive-definite matrix describing a large family of regularizations where their contribution to likelihood can be written as: $$-\beta\lambda^TR\lambda$$ And now the catch. Assuming a count independent $\beta$ and R, relative importance of the regularization would change. So either $\beta$ or R must be count dependent with $1/N$ proportionality.

As Qi estimates variance of the MAP estimator, they arrive at the following estimate: $$var_i=\kappa_j^{-2}\frac{1}{N}\sum_{i=0}^{N-1}\frac{\lambda_i(j)}{(\lambda_i(j)+\beta\kappa_j^{-2}\mu_i)^2}$$ Unfortunately, $\lambda_i(j)$ in this case refers to a different entity, namely the Fourier transform of the diagonal of the system matrix (count indpendent), and N is the number of image bins. The parameter $\mu_i$ is a column of the matrix R, and from definition of $\kappa_j$ one can infere a count dependence of $\kappa_j\propto 1/\sqrt{N}$; here N is again the number of counts, as is in continuation. Because either R or $\beta$ show $1/N$ count dependence, the product $\beta\kappa_j^{-2}\mu_i(j)$ is count independent (as it should be, appearing in direct summation with count independent $\lambda_i(j)$), and total dependence of variance on counts is: $$\sigma^2\propto \kappa_j^{-2}\propto N.$$

Furthermore Meng shows that, for MAP estimators and desired mean gradient response $\mathbf{f}$, the Cramer-Rao bound is given as $$\sigma^2\geq\mathbf{g}^TF^{-1}\mathbf{g}$$ where $\mathbf{g}$ is the achieved mean gradient, close to $\mathbf{f}$. Since the mean gradient is count independent (being a derivative), the count dependence of the bound comes solely from F. As the Fischer information matrix $F$ can be written as: $$F=M^Tdiag(\frac{1}{M\lambda})M$$ the count dependency of F comes from the diagonal matrix with $1/M\lambda$ on the diagonal. As the activity concentration or imaging time is increased, $\lambda$ grows linearly as do number of recorded events (should the low background assumption hold). Therefore: $$F\propto 1/N$$ and $$\sigma^2\propto N$$ even for MAP reconstructed images filtered with a desired mean gradient. So one can afford to regularize and smooth, and in the limit of convergence, the relation will still hold.

Back to problems. Related to convergence - regularized images generally converge faster. Also, the convergence must not be absolute, but sufficient for combination of regularization and smoothing to dominate the recorded image, which is much less stringent condition than convergence of bare MLEM. There is no escape from the second comment; in science one adjusts the experiment to collect data that are sufficiently free from background for implications to be valid. In real life, the task of regularization and smoothing is to smooth out the inconsistencies to bearable level. The third comment however is completely answered by using regularized smoothed images.

In total, for reasonably controled conditions and reasonable number of iterations, I expect the $\sigma^2\propto N$ to be a valid asumption.

What about counts

links

social