Jeff Fessler's image reconstruction opinions
There are as many different methods for image reconstruction
as there are researchers in the field.
More in fact, because many researchers
have proposed multiple methods!
A problem with archival papers
is that by the time they appear,
often the authors have already found better methods or variations.
Also, because of the strong opinions (biases?)
held by many researchers
there is often little opportunity for speculation
and unsubstantiated opinion in refereed papers.
This document is a collection of my current opinions
on image reconstruction methods.
These are subject to change, of course,
but hopefully will be more current and more frank
than what appears in my papers in the literature.
Some readers will disagree with these opinions.
Since the web offers the ultimate opportunity
for "equal time," I encourage dissenters to post
their opinions in their own web sites,
and I will gladly install pointers here to the dissenting opinions.
How to regularize?
Everyone knows that some form of regularization is required
for image reconstruction,
and the predominant choices seem to be
sieves or post-smoothing,
and penalty functions (aka priors).
A serious problem with stopping rules is that different parts
of the image converge at different rates,
so you get object-dependent resolution and noise characteristics
that are difficult to predict.
(Barrett, Wilson, and Tsui
(Phys. Med. Biol., 39:833-846, 1994)
did an impressive job of analyzing ML-EM.
Nevertheless, to my knowledge this analysis did not lead
to a solution for the object-dependent resolution properties.)
Sieves are often implemented as post-filtering,
and often described as a formal regularization method
Snyder et. al. (IEEE-T-MI 6(3):228-238, 1987),
but the equivalence between post-filtering and sieves
hinges on a commutability condition
(eqn. (12) of the above paper)
which to my knowledge only has been shown to hold
in the spatially-invariant Gaussian blur case used in that paper.
But the real practical problem with post-filtering
is that you first have to run
your (unregularized) algorithm until "convergence,"
(which may be hundreds of iterations,
unless you use one of the many acceleration methods
that mostly have uncertain convergence properties)
and then post-filter,
otherwise some parts of the image
may not have converged
(e.g. Miller et. al., JNM 33(9):1678-1684, 1992).
In contrast, there are now several "fast" algorithms
for maximizing penalized-likelihood objective functions
with proven global convergence.
the Washington University group
that originally developed the method of sieves
has found that MAP (or equivalently penalized likelihood) methods
outperform the method of sieves
(Butler et. al., IEEE-T-MI 12(1):84-89, 1993).
So as you have guessed by now,
I am a firm believer in the penalized-likelihood approach,
and everything below applies only to algorithms
for that case.
Estimators vs Algorithms
Many emission tomography papers discuss
"image reconstruction algorithms"
as if the algorithm is the
This is partially true
if you use stopping rules,
since then the specific characteristics
of the iterations
determine the image properties
perhaps more than the objective function does,
since the user rarely iterates to convergence.
With penalized objective functions,
(as specified by the objective)
determines the image properties,
and the algorithm
is merely a nuisance that is necessary
for finding the maximizer that objective.
As others have argued before me,
the choice of the objective and the algorithm
ideally should be kept separate,
i.e., the objective should be chosen
based on statistical principles,
and then the algorithm should be chosen
based on how fast it maximizes the chosen objective.
All too frequently
these two distinct issues
have been blurred together.
In the following I will attempt
to keep the two choices separate.
When transmission scans have low counts,
or when they are significantly contaminated
by "background" counts (such as random coincidences in PET
or emission crosstalk in SPECT),
both the FBP reconstruction method
and penalized weighted least squares (PWLS) methods
lead to attenuation maps with systematic biases
due to the nonlinearity of the logarithm
(Fessler, IEEE-T-IP, 4(10):1439-50, 1995).
I strongly recommend using penalized-likelihood objectives
rather than PWLS objectives
(or FBP) for noisy transmission measurements.
(Penalized-likelihood methods avoid the logarithm
by building the exponential into the statistical model.)
Penalized-likelihood methods also allow you to properly
account for additive background effects like
cross-talk (SPECT) and randoms (PET) into the model.
What algorithm do I recommend
for maximimizing penalized-likelihood objectives
in transmission tomography?
do not use the transmission ML-EM algorithm
of Lange and Carson (1984, JCAT)!!
That algorithm is very slow to converge,
and requires inordinate computation per iteration
(and is also not guaranteed to be monotonic
if the approximate M-step is used).
I have published several papers describing faster converging
algorithms for this problem.
All of these papers have been made obsolete
by our most recent method,
based on Parabaloidal Surrogates.
This method was presented by Hakan Erdogan
at the 1998 ICIP conference,
and is currently in review for IEEE T-MI.
Email me if you would like a preprint.
This latest method
is guaranteed to be monotonic,
even for the nonconvex case,
and the CPU time per iteration
is comparable to that of PWLS methods.
So there is no longer any computational advantage
of using PWLS over penalized likelihood
for transmission tomography.
(The algorithm as also available in
If you have truly Poisson data
(which is typically the case in SPECT,
and sometimes the case in PET),
then again I recommend a penalized-likelihood objective
rather than the
penalized data-weighted least squares (PWLS) objective,
since data-weighted least squares
again leads to
biases at low count rates
(Fessler, IEEE-T-IP, 5(3):493-506, 1995).
If your PET data has been precorrected for random coincidences,
then I recommend the shifted Poisson method
(Yavuz and Fessler, Med. Im. Anal., to appear).
If you have emission data
that has been unavoidably "corrupted"
by other correction methods, interpolations, etc.
(such as Fourier rebinning - see papers by Comtat and Kinahan),
and is no longer Poisson,
then you may as well use PWLS,
since the algorithms for PWLS
typically converge a little faster than the algorithms for penalized likelihood.
Emission Penalized Likelihood
I recommend the SAGE algorithm
(Fessler and Hero, IEEE-T-IP 4(10):1417-29, 1995)
for maximizing emission penalized-likelihood objectives,
PML-SAGE-3 version of the algorithm,
which supersedes the
earlier SAGE algorithms
(Fessler and Hero, IEEE-T-SP 42(10):2664-77, 1994).
I will be presenting a Parabaloidal Surrogates algorithm
at NSS/MIC 1998
that is comparable or slightly faster than SAGE,
but is easier to implement,
so you may prefer that method.
Email me for preprints.
Penalized Weighted Least Squares
If you decide you would rather use the PWLS objective
(for reasons of simplicity, non-Poisson data, or possibly algorithm speed),
then I again recommend
a coordinate ascent algorithm.
A nice thing about coordinate ascent is
that not only does it converge very fast
(if you give it a reasonable starting image
like an FBP image,
not a uniform image,
see the nice frequency-domain analyses
of Sauer and Bouman
(IEEE T-SP, 41(2):534-548, 1993)),
it also handles nonnegativity trivially.
But if nonnegativity is not an issue,
then the conjugate gradient algorithm
is very attractive,
and if you
then it appears to converge
at roughly the same rate
as (properly under-relaxed) coordinate ascent,
and has the advantage of being more parallelizable.
See the nice work of Delaney and Bresler (ICASSP 95)
and the nice recent paper by Mumcuoglu and Leahy in IEEE T-MI,
who also address nonnegativity.
Giving any opinions about this
is an optimal way to alienate
the greatest number of people
in the shortest amount of time.
So here goes.
First decision: convex or non-convex?
Bouman and Sauer
(IEEE-T-IP 3(2):162-177, 1994)
showed that non-convex penalties
lead to estimates that are
of the measurements.
This means that it is possible
that changing the measurements
by even just one photon count
could change the image significantly
if you use a nonconvex penalty.
To me this seems unacceptable
for medical imaging,
although one has to admit
that the reconstructed images
(of piecewise constant phantoms, usually)
look very nice.
Time may tell otherwise,
but I am mostly sticking to convex penalties for now.
Second decision: quadratic or non-quadratic?
Quadratic penalties (with 1st or 2nd-order neighborhoods)
so the images have reduced noise relative to FBP,
but comparable resolution characteristics.
I think this makes quadratic penalties
well suited to the first couple rounds
of clinical studies comparing FBP with statistical methods.
The non-quadratic penalties
have noise texture that to me looks very different,
and may take more time for the clinicians to adapt to.
But for quantitative studies (non-visual),
if the object under study is really piecewise constant
or maybe piecewise smooth,
then there is no doubt that non-quadratic penalties
give a better bias/variance tradeoff.
The questions unanswered are whether brains
and tumors and such are piecewise constant,
and if not, how the (different) biases
of quadratic and non-quadratic penalties
For all penalties,
there is an interaction between the Poisson measurement covariance
and the penalty
that leads to
nonuniform spatial resolution.
I have proposed a
makes the spatial resolution nearly uniform
for nearly spatially-invariant tomographs
(i.e. 2D PET),
and I now use it routinely.
But I have not investigated
spatially-variant systems like SPECT
enough to comment conclusively.
My intuition is that the proposed modified penalty
will somewhat help to improve the resolution uniformity,
but there are probably better modifications.
I have more work to do there.
Web Stayman and I have recently developed a new penalty function
that leads to more uniform spatial resolution,
presented at ICIP 98.
Email if you would like preprints of this paper.
If you would rather have uniform noise than uniform resolution,
then in principle you could use Sterling-type
to predict the variance as a function of the smoothing parameter,
and then adjust the parameter spatially
to achieve your desired noise level.
But that would actually be pretty complicated,
so finding a simpler way is an open problem.
Email "fessler AT umich DOT edu"
Back to Fessler home page