Jeff Fessler's image reconstruction opinions

as of Tue Oct 13 11:57:25 EDT 1998

There are as many different methods for image reconstruction as there are researchers in the field. More in fact, because many researchers (myself included) have proposed multiple methods! A problem with archival papers is that by the time they appear, the authors have already found better methods or variations. Also, because of the strong opinions (biases?) held by many researchers (myself included), there is often little opportunity for speculation and unsubstantiated opinion in refereed papers. This document is a collection of my current opinions on image reconstruction methods. These are subject to change, of course, but hopefully will be more current and more frank than what appears in my papers in the literature.

Some readers will disagree with these opinions. Since the WWW offers the ultimate opportunity for "equal time," I encourage dissenters to post their opinions in their own WWW sites, and I will gladly install pointers here to the dissenting opinions.

How to regularize?

Everyone knows that some form of regularization is required for image reconstruction, and the predominant choices seem to be stopping rules, sieves or post-smoothing, and penalty functions (aka priors). A serious problem with stopping rules is that different parts of the image converge at different rates, so you get object-dependent resolution and noise characteristics that are difficult to predict. (Barrett, Wilson, and Tsui (Phys. Med. Biol., 39:833-846, 1994) did an impressive job of analyzing ML-EM. Nevertheless, to my knowledge this analysis did not lead to a solution for the object-dependent resolution properties.) Sieves are often implemented as post-filtering, and often described as a formal regularization method by citing Snyder et. al. (IEEE-T-MI 6(3):228-238, 1987), but the equivalence between post-filtering and sieves hinges on a commutability condition (eqn. (12) of the above paper) which to my knowledge only has been shown to hold in the spatially-invariant Gaussian blur case used in that paper. But the real practical problem with post-filtering is that you first have to run your (unregularized) algorithm until "convergence," (which may be hundreds of iterations, unless you use one of the many acceleration methods that mostly have uncertain convergence properties) and then post-filter, otherwise some parts of the image may not have converged (e.g. Miller et. al., JNM 33(9):1678-1684, 1992). In contrast, there are now several "fast" algorithms for maximizing penalized-likelihood objective functions with proven global convergence. Also, the Washington University group that originally developed the method of sieves has found that MAP (or equivalently penalized likelihood) methods outperform the method of sieves (Butler et. al., IEEE-T-MI 12(1):84-89, 1993). So as you have guessed by now, I am a firm believer in the penalized-likelihood approach, and everything below applies only to algorithms for that case.

Estimators vs Algorithms

Many emission tomography papers discuss "image reconstruction algorithms" as if the algorithm is the estimator . This is partially true if you use stopping rules, since then the specific characteristics of the iterations determine the image properties perhaps more than the objective function does, since the user rarely iterates to convergence. With penalized objective functions, the estimator (as specified by the objective) determines the image properties, and the algorithm is merely a nuisance that is necessary for finding the maximizer that objective. As others have argued before me, the choice of the objective and the algorithm ideally should be kept separate, i.e., the objective should be chosen based on statistical principles, and then the algorithm should be chosen based on how fast it maximizes the chosen objective. All too frequently these two distinct issues have been blurred together. In the following I will attempt to keep the two choices separate.

Transmission Tomography

When transmission scans have low counts, or when they are significantly contaminated by "background" counts (such as random coincidences in PET or emission crosstalk in SPECT), both the FBP reconstruction method and penalized weighted least squares (PWLS) methods lead to attenuation maps with systematic biases due to the nonlinearity of the logarithm (Fessler, IEEE-T-IP, 4(10):1439-50, 1995). Therefore, I strongly recommend using penalized-likelihood objectives rather than PWLS objectives (or FBP) for noisy transmission measurements. (Penalized-likelihood methods avoid the logarithm by building the exponential into the statistical model.) Penalized-likelihood methods also allow you to properly account for additive background effects like cross-talk (SPECT) and randoms (PET) into the model.

What algorithm do I recommend for maximimizing penalized-likelihood objectives in transmission tomography? First, do not use the transmission ML-EM algorithm of Lange and Carson (1984, JCAT)!! That algorithm is very slow to converge, and requires inordinate computation per iteration (and is also not guaranteed to be monotonic if the approximate M-step is used). I have published several papers describing faster converging algorithms for this problem. All of these papers have been made obsolete by our most recent method, based on Parabaloidal Surrogates. This method was presented by Hakan Erdogan at the 1998 ICIP conference, and is currently in review for IEEE T-MI. Email me if you would like a preprint. This latest method is guaranteed to be monotonic, even for the nonconvex case, and the CPU time per iteration is comparable to that of PWLS methods. So there is no longer any computational advantage of using PWLS over penalized likelihood for transmission tomography. (The algorithm as also available in ASPIRE.)

Emission Tomography

If you have truly Poisson data (which is typically the case in SPECT, and sometimes the case in PET), then again I recommend a penalized-likelihood objective rather than the penalized data-weighted least squares (PWLS) objective, since data-weighted least squares again leads to biases at low count rates (Fessler, IEEE-T-IP, 5(3):493-506, 1995). If your PET data has been precorrected for random coincidences, then I recommend the shifted Poisson method (Yavuz and Fessler, Med. Im. Anal., to appear). If you have emission data that has been unavoidably "corrupted" by other correction methods, interpolations, etc. (such as Fourier rebinning - see papers by Comtat and Kinahan), and is no longer Poisson, then you may as well use PWLS, since the algorithms for PWLS typically converge a little faster than the algorithms for penalized likelihood.

Emission Penalized Likelihood

Currently, I recommend the SAGE algorithm (Fessler and Hero, IEEE-T-IP 4(10):1417-29, 1995) for maximizing emission penalized-likelihood objectives, specifically the PML-SAGE-3 version of the algorithm, which supersedes the earlier SAGE algorithms (Fessler and Hero, IEEE-T-SP 42(10):2664-77, 1994). However, I will be presenting a Parabaloidal Surrogates algorithm at NSS/MIC 1998 that is comparable or slightly faster than SAGE, but is easier to implement, so you may prefer that method. Email me for preprints.

Penalized Weighted Least Squares

If you decide you would rather use the PWLS objective (for reasons of simplicity, non-Poisson data, or possibly algorithm speed), then I again recommend a coordinate ascent algorithm. A nice thing about coordinate ascent is that not only does it converge very fast (if you give it a reasonable starting image like an FBP image, not a uniform image, see the nice frequency-domain analyses of Sauer and Bouman (IEEE T-SP, 41(2):534-548, 1993)), it also handles nonnegativity trivially. But if nonnegativity is not an issue, then the conjugate gradient algorithm is very attractive, and if you precondition it well, then it appears to converge at roughly the same rate as (properly under-relaxed) coordinate ascent, and has the advantage of being more parallelizable. See the nice work of Delaney and Bresler (ICASSP 95) and the nice recent paper by Mumcuoglu and Leahy in IEEE T-MI, who also address nonnegativity.

Penalty Functions

Giving any opinions about this is an optimal way to alienate the greatest number of people in the shortest amount of time. So here goes. First decision: convex or non-convex? Bouman and Sauer (IEEE-T-IP 3(2):162-177, 1994) showed that non-convex penalties lead to estimates that are discontinuous functions of the measurements. This means that it is possible that changing the measurements by even just one photon count could change the image significantly if you use a nonconvex penalty. To me this seems unacceptable for medical imaging, although one has to admit that the reconstructed images (of piecewise constant phantoms, usually) look very nice. Time may tell otherwise, but I am mostly sticking to convex penalties for now. Second decision: quadratic or non-quadratic? Quadratic penalties (with 1st or 2nd-order neighborhoods) yield Butterworth-filter-like resolution properties, so the images have reduced noise relative to FBP, but comparable resolution characteristics. I think this makes quadratic penalties well suited to the first couple rounds of clinical studies comparing FBP with statistical methods. The non-quadratic penalties have noise texture that to me looks very different, and may take more time for the clinicians to adapt to. But for quantitative studies (non-visual), if the object under study is really piecewise constant or maybe piecewise smooth, then there is no doubt that non-quadratic penalties give a better bias/variance tradeoff. The questions unanswered are whether brains and tumors and such are piecewise constant, and if not, how the (different) biases of quadratic and non-quadratic penalties would compare.

For all penalties, there is an interaction between the Poisson measurement covariance and the penalty that leads to nonuniform spatial resolution. I have proposed a modified penalty makes the spatial resolution nearly uniform for nearly spatially-invariant tomographs (i.e. 2D PET), and I now use it routinely. But I have not investigated spatially-variant systems like SPECT enough to comment conclusively. My intuition is that the proposed modified penalty will somewhat help to improve the resolution uniformity, but there are probably better modifications. I have more work to do there. Web Stayman and I have recently developed a new penalty function that leads to more uniform spatial resolution, presented at ICIP 98. Email if you would like preprints of this paper.

If you would rather have uniform noise than uniform resolution, then in principle you could use Serfling-type (co)variance approximations to predict the variance as a function of the smoothing parameter, and then adjust the parameter spatially to achieve your desired noise level. But that would actually be pretty complicated, so finding a simpler way is an open problem.

Differing opinions?

Email "fessler AT umich DOT edu"
Back to Fessler home page