Jeff's image reconstruction opinions

as of Wed Jun 28 09:48:06 EDT 1995

The links in this document are out of date. I leave it here only for sentimental historical reasons. See the latest version for good links. There are as many different methods for image reconstruction as there are researchers in the field. More in fact, because most researchers (myself included) have proposed multiple methods! A problem with archival papers is that by the time they appear, the author has already found a better method or variation. Also, because of the strong opinions (biases?) held by many researchers (myself included), there is often little opportunity for speculation and unsubstantiated opinion in refereed papers. This document is a collection of my current opinions on image reconstruction methods. These are subject to change, of course, but hopefully will be more current than the papers appearing in the literature. Many readers will disagree with these opinions. Since the WWW offers the ultimate opportunity for "equal time," I encourage dissenters to post their opinions in their own WWW sites, and I will gladly install pointers here to the dissenting opinions.

How to regularize?

Everyone knows that some form of regularization is required for image reconstruction, and the predominant choices seem to be stopping rules, sieves or post-smoothing, and penalty functions. A problem with stopping rules is that different parts of the image converge at different rates, so you get object-dependent resolution and noise characteristics that are difficult to predict. (Although Barrett, Wilson, and Tsui (Phys. Med. Biol., 39:833-846, 1994) have done an impressive job of analyzing ML-EM. Still, to my knowledge this analysis did not lead to a solution for the object-dependent resolution properties.) Sieves are often implemented as post-filtering, and often described as a formal regularization method by citing Snyder et. al. (IEEE-T-MI 6(3):228-238, 1987), but the equivalence between post-filtering and sieves hinges on a commutability condition (eqn. (12) of the above paper) which to my knowledge only has been shown to hold in the spatially-invariant Gaussian blur case used in that paper. But the real practical problem with post-filtering is that you first have to run your (unregularized) algorithm until "convergence," (which may be hundreds of iterations, unless you use one of the many acceleration methods that mostly have uncertain convergence properties) and then post-filter, otherwise some parts of the image may not have converged (e.g. Miller et. al., JNM 33(9):1678-1684, 1992). In contrast, there are several "fast" algorithms for penalized likelihood objectives with proven global convergence. Also, the Washington University group that originally developed the method of sieves has found that MAP (or equivalently penalized likelihood) methods outperform the method of sieves (Butler et. al., IEEE-T-MI 12(1):84-89, 1993). So as you have guessed by now, I am a firm believer in the penalized likelihood approach, and everything below applies only to algorithms for that case.

Estimators vs Algorithms

Many emission tomography papers discuss "image reconstruction algorithms" as if the algorithm is the estimator . This is partially true if you use stopping rules, since then the specific characteristics of the iterations determine the image properties perhaps more than the objective function does, since the user rarely iterates to convergence. With penalized objective functions, the estimator (as specified by the objective) determines the image properties, and the algorithm is only a nuisance necessary for maximizing that objective. The choice of the objective and the algorithm ideally should be kept separate, i.e., the objective should be chosen based on statistical principles, and then the algorithm should be chosen based on how fast it maximizes the chosen objective. Historically these two have been very frequently blurred. In the following I will attempt to keep the two choices separate.

Transmission Tomography

At low count rates, both FBP and penalized weighted least squares (PWLS) objective lead to systematic biases due to the nonlinearity of the logarithm (Fessler, in press). Therefore, I recommend using penalized likelihood objectives rather than PWLS objectives (or FBP) for noisy transmission measurements. (Penalized likelihood avoids the logarithm by building the exponential into the statistical model.) Penalized likelihood also allows you to properly account for additive background effects like cross-talk (SPECT) and randoms (PET) into the model.

What algorithm do I recommend for maximimizing penalized-likelihood objectives in transmission tomography? First, do not use the transmission ML-EM algorithm of Lange and Carson (1984, JCAT)!! Ken Lange (who now works here at the Univ. of Michigan) and I have two algorithms in press that converge much faster than transmission ML-EM. Those algorithms are parallelizable, but if you have an ordinary non-parallel workstation, then a coordinate-ascent type algorithm converges faster than Lange's two new algorithms. (See also the nice paper by Bouman and Sauer (To appear in IEEE T-IP)). So for the moment I recommend the coordinate-ascent algorithm applied to a penalized likelihood objective, using your favorite penalty. However, I submitted a faster converging algorithm to NSS-MIC, and once I hear it is accepted (knock on wood) I will recommend it here instead. So stay tuned!

Emission Tomography

If you have truly Poisson data, then again I recommend a penalized likelihood objective over a penalized weighted least squares (PWLS) objective, since data-weighted least squares again leads to biases at low count rates (Fessler, in press). But if you have emission data that has been unavoidably "corrupted" by correction methods, interpolations, etc., and is no longer Poisson, then you may as well use PWLS, since the algorithms typically converge a little faster. I use PWLS for PET data that has been precorrected for random coincidences, and penalized likelihood otherwise.

Emission Penalized Likelihood

Currently, I recommend the SAGE algorithm for maximizing emission penalized likelihood objectives, specifically the PML-SAGE-3 version of the algorithm, which supersedes the earlier SAGE algorithms . Note that the way we wrote the T-SP SAGE paper, the NSS-MIC paper and the technical report suggested that you have to have some "background events" (randoms, scatter, etc.) to make SAGE work well. That was true of PML-SAGE-1, but the current favorite version, PML-SAGE-3, works well with or without background events, although the convergence rate would probably be a bit faster when a background is included. Which makes it a "win-win" situation, since you can make the statistical model more accurate and get faster convergence too! Preliminary results suggest that SAGE converges slightly faster than the coordinate-wise Newton Raphson (CNR) method of Bouman and Sauer (IEEE T-IP, to appear), but the differences were small. SAGE has the advantage of guaranteeing monotonicity, so I will continue to prefer it until someone finds a case where CNR converges faster.

Penalized Weighted Least Squares

If you decide you would rather use the PWLS objective (for reasons of simplicity, non-Poisson data, or possibly algorithm speed), then I again recommend a coordinate ascent algorithm. A nice thing about coordinate ascent is that not only does it converge very fast (if you give it a reasonable starting image like an FBP image, not a uniform image, see the nice frequency-domain analyses of Sauer and Bouman (IEEE T-SP, 41(2):534-548, 1993)), it also handles nonnegativity trivially. But if nonnegativity isn't an issue, then the conjugate gradient algorithm is very attractive, and if you precondition it well, then it appears to converge at roughly the same rate as (properly under-relaxed) coordinate ascent, and has the advantage of being more parallelizable. See the nice work of Delaney and Bresler (ICASSP 95) and the nice recent paper by Mumcuoglu and Leahy in IEEE T-MI, who also address nonnegativity.

Penalty Functions

Giving any opinions about this is an optimal way to alienate the greatest number of people in the shortest amount of time. So here goes. First: convex or non-convex? Bouman and Sauer (IEEE-T-IP 3(2):162-177, 1994) showed that non-convex penalties lead to estimates that are discontinuous functions of the measurements. This means that it is possible that changing the measurements by even just one photon count could change the image significantly. To me this seems unacceptable for medical imaging, although one has to admit that the reconstructed images (of piecewise constant phantoms, usually) look very nice. Time may tell, but I am sticking to convex penalties for now. Second, quadratic or non-quadratic? Quadratic penalties yield Butterworth-filter-like resolution properties, so the images have reduced noise relative to FBP, but comparable resolution characteristics. I think this makes quadratic penalties well suited to the first couple rounds of clinical studies comparing FBP with statistical methods. The non-quadratic penalties have noise texture that to me looks very different, and may take more time for the clinicians to adapt to. But for quantitative studies (non-visual), if the object under study is really piecewise constant or maybe piecewise smooth, then there is no doubt that non-quadratic penalties give a better bias/variance tradeoff. The questions unanswered are whether brains and tumors and such are piecewise constant, and if not, how the (different) biases of quadratic and non-quadratic penalties would compare.

For all penalties, there is an interaction between the Poisson measurement covariance and the penalty that leads to nonuniform spatial resolution. I have proposed a modified penalty makes the spatial resolution nearly uniform for nearly spatially-invariant tomographs (i.e. PET), and I now use it routinely. But I have not investigated spatially-variant systems like SPECT enough to comment conclusively. My intuition is that the proposed modified penalty will somewhat help to improve the resolution uniformity, but there are probably better modifications. I have more work to do there.

If you would rather have uniform noise than uniform resolution, then in principle you could use Sterling-type (co)variance approximations to predict the variance as a function of the smoothing parameter, and then adjust the parameter spatially to achieve your desired noise level. But that would actually be pretty complicated, so finding a simpler way is an open problem.

Differing opinions?

Email "fessler AT umich DOT edu"
Back to Fessler home page