Jeff's image reconstruction opinions
as of
Wed Jun 28 09:48:06 EDT 1995
The links in this document are out of date.
I leave it here only for sentimental historical reasons.
See the latest version for good links.
There are as many different methods for image reconstruction
as there are researchers in the field.
More in fact, because most researchers
(myself included)
have proposed multiple methods!
A problem with archival papers
is that by the time they appear,
the author has already found a better method or variation.
Also, because of the strong opinions (biases?)
held by many researchers
(myself included),
there is often little opportunity for speculation
and unsubstantiated opinion in refereed papers.
This document is a collection of my current opinions
on image reconstruction methods.
These are subject to change, of course,
but hopefully will be more current
than the papers appearing in the literature.
Many readers will disagree with these opinions.
Since the WWW offers the ultimate opportunity
for "equal time," I encourage dissenters to post
their opinions in their own WWW sites,
and I will gladly install pointers here to the dissenting opinions.
How to regularize?
Everyone knows that some form of regularization is required
for image reconstruction,
and the predominant choices seem to be
stopping rules,
sieves or post-smoothing,
and penalty functions.
A problem with stopping rules is that different parts
of the image converge at different rates,
so you get object-dependent resolution and noise characteristics
that are difficult to predict.
(Although Barrett, Wilson, and Tsui
(Phys. Med. Biol., 39:833-846, 1994)
have done an impressive job of analyzing ML-EM.
Still, to my knowledge this analysis did not lead
to a solution for the object-dependent resolution properties.)
Sieves are often implemented as post-filtering,
and often described as a formal regularization method
by citing
Snyder et. al. (IEEE-T-MI 6(3):228-238, 1987),
but the equivalence between post-filtering and sieves
hinges on a commutability condition
(eqn. (12) of the above paper)
which to my knowledge only has been shown to hold
in the spatially-invariant Gaussian blur case used in that paper.
But the real practical problem with post-filtering
is that you first have to run
your (unregularized) algorithm until "convergence,"
(which may be hundreds of iterations,
unless you use one of the many acceleration methods
that mostly have uncertain convergence properties)
and then post-filter,
otherwise some parts of the image
may not have converged
(e.g. Miller et. al., JNM 33(9):1678-1684, 1992).
In contrast, there are several "fast" algorithms
for penalized likelihood objectives
with proven global convergence.
Also,
the Washington University group
that originally developed the method of sieves
has found that MAP (or equivalently penalized likelihood) methods
outperform the method of sieves
(Butler et. al., IEEE-T-MI 12(1):84-89, 1993).
So as you have guessed by now,
I am a firm believer in the penalized likelihood approach,
and everything below applies only to algorithms
for that case.
Estimators vs Algorithms
Many emission tomography papers discuss
"image reconstruction algorithms"
as if the algorithm is the
estimator .
This is partially true
if you use stopping rules,
since then the specific characteristics
of the iterations
determine the image properties
perhaps more than the objective function does,
since the user rarely iterates to convergence.
With penalized objective functions,
the
estimator
(as specified by the objective)
determines the image properties,
and the algorithm
is only a nuisance necessary for maximizing that objective.
The choice of the objective and the algorithm
ideally should be kept separate,
i.e., the objective should be chosen
based on statistical principles,
and then the algorithm should be chosen
based on how fast it maximizes the chosen objective.
Historically these two have been very frequently blurred.
In the following I will attempt
to keep the two choices separate.
Transmission Tomography
At low count rates,
both FBP and penalized weighted least squares (PWLS) objective
lead to systematic biases
due to the nonlinearity of the logarithm
(Fessler, in press).
Therefore,
I recommend using penalized likelihood objectives
rather than PWLS objectives
(or FBP) for noisy transmission measurements.
(Penalized likelihood avoids the logarithm
by building the exponential into the statistical model.)
Penalized likelihood also allows you to properly
account for additive background effects like
cross-talk (SPECT) and randoms (PET) into the model.
What algorithm do I recommend
for maximimizing penalized-likelihood objectives
in transmission tomography?
First,
do not use the transmission ML-EM algorithm
of Lange and Carson (1984, JCAT)!!
Ken Lange (who now works here at the Univ. of Michigan)
and I have
two algorithms in press
that converge much faster
than transmission ML-EM.
Those algorithms are parallelizable,
but if you have an ordinary non-parallel workstation,
then a
coordinate-ascent type algorithm
converges faster than
Lange's two new algorithms.
(See also the nice paper
by Bouman and Sauer
(To appear in IEEE T-IP)).
So for the moment I recommend the coordinate-ascent algorithm
applied to a penalized likelihood objective,
using your favorite penalty.
However,
I submitted a faster converging algorithm
to NSS-MIC,
and once I hear it is accepted
(knock on wood)
I will recommend it here instead.
So stay tuned!
Emission Tomography
If you have truly Poisson data,
then again I recommend a penalized likelihood objective
over a
penalized weighted least squares (PWLS)
objective,
since data-weighted least squares
again leads to
biases at low count rates (Fessler, in press).
But if you have emission data
that has been unavoidably "corrupted"
by correction methods, interpolations, etc.,
and is no longer Poisson,
then you may as well use PWLS,
since the algorithms typically converge a little faster.
I use PWLS for PET data that has been
precorrected for random coincidences,
and penalized likelihood otherwise.
Emission Penalized Likelihood
Currently,
I recommend the SAGE algorithm
for maximizing emission penalized likelihood objectives,
specifically the
PML-SAGE-3 version of the algorithm,
which supersedes the
earlier SAGE algorithms
.
Note that the way we wrote
the T-SP SAGE paper,
the NSS-MIC paper
and
the technical report
suggested that you have to have some "background events"
(randoms, scatter, etc.)
to make SAGE work well.
That was true of PML-SAGE-1,
but the current favorite version,
PML-SAGE-3,
works well with or without background events,
although the convergence rate would probably be
a bit faster when a background is included.
Which makes it a "win-win" situation,
since you can make the statistical model more accurate
and get faster convergence too!
Preliminary
results
suggest that SAGE converges slightly
faster than the coordinate-wise Newton Raphson (CNR)
method of Bouman and Sauer
(IEEE T-IP, to appear),
but the differences were small.
SAGE has the advantage of guaranteeing monotonicity,
so I will continue to prefer it
until someone finds a case where CNR converges faster.
Penalized Weighted Least Squares
If you decide you would rather use the PWLS objective
(for reasons of simplicity, non-Poisson data, or possibly algorithm speed),
then I again recommend
a coordinate ascent algorithm.
A nice thing about coordinate ascent is
that not only does it converge very fast
(if you give it a reasonable starting image
like an FBP image,
not a uniform image,
see the nice frequency-domain analyses
of Sauer and Bouman
(IEEE T-SP, 41(2):534-548, 1993)),
it also handles nonnegativity trivially.
But if nonnegativity isn't an issue,
then the conjugate gradient algorithm
is very attractive,
and if you
precondition
it well,
then it appears to converge
at roughly the same rate
as (properly under-relaxed) coordinate ascent,
and has the advantage of being more parallelizable.
See the nice work of Delaney and Bresler (ICASSP 95)
and the nice recent paper by Mumcuoglu and Leahy in IEEE T-MI,
who also address nonnegativity.
Penalty Functions
Giving any opinions about this
is an optimal way to alienate
the greatest number of people
in the shortest amount of time.
So here goes.
First: convex or non-convex?
Bouman and Sauer
(IEEE-T-IP 3(2):162-177, 1994)
showed that non-convex penalties
lead to estimates that are
discontinuous functions
of the measurements.
This means that it is possible
that changing the measurements
by even just one photon count
could change the image significantly.
To me this seems unacceptable
for medical imaging,
although one has to admit
that the reconstructed images
(of piecewise constant phantoms, usually)
look very nice.
Time may tell,
but I am sticking to convex penalties for now.
Second, quadratic or non-quadratic?
Quadratic penalties yield Butterworth-filter-like
resolution properties,
so the images have reduced noise relative to FBP,
but comparable resolution characteristics.
I think this makes quadratic penalties
well suited to the first couple rounds
of clinical studies comparing FBP with statistical methods.
The non-quadratic penalties
have noise texture that to me looks very different,
and may take more time for the clinicians to adapt to.
But for quantitative studies (non-visual),
if the object under study is really piecewise constant
or maybe piecewise smooth,
then there is no doubt that non-quadratic penalties
give a better bias/variance tradeoff.
The questions unanswered are whether brains
and tumors and such are piecewise constant,
and if not, how the (different) biases
of quadratic and non-quadratic penalties
would compare.
For all penalties,
there is an interaction between the Poisson measurement covariance
and the penalty
that leads to
nonuniform spatial resolution.
I have proposed a
modified penalty
makes the spatial resolution nearly uniform
for nearly spatially-invariant tomographs
(i.e. PET),
and I now use it routinely.
But I have not investigated
spatially-variant systems like SPECT
enough to comment conclusively.
My intuition is that the proposed modified penalty
will somewhat help to improve the resolution uniformity,
but there are probably better modifications.
I have more work to do there.
If you would rather have uniform noise than uniform resolution,
then in principle you could use Sterling-type
(co)variance approximations
to predict the variance as a function of the smoothing parameter,
and then adjust the parameter spatially
to achieve your desired noise level.
But that would actually be pretty complicated,
so finding a simpler way is an open problem.
Differing opinions?
Email "fessler AT umich DOT edu"
Back to Fessler home page