Over the years, I've gotten a number of questions about evaluating surface normal results. This document is aimed at helping answer questions like ``are you doing X or Y?'' and especially ``why isn't my number in the same ballpark?''. The overall criterion introduced in [1] is as follows:

for each **valid** pixel, compute the **angular error** with respect

to the **ground-truth** and then **aggregate** this error
over the test set.

Different definitions of each bolded point change results. Here, I clarify each and give a list of common issues I have seen.

**Valid:** This is crucial: while interpolated depth is perhaps
useful for RGBD understanding, surface normal prediction can only be evaluated at locations
measured by the Kinect. In NYU this is given in the `rawDepths` variable.

**Angular error:** this is the angle between two __unit__ vectors, or
`acos(x'*y)`. In all of our papers and ones that I am aware of,
this is what is used. Given two HxWx3 maps `N` and `P`, this is quickly computed in degrees
as

`acosd(min(1,max(-1,sum(N.*P,3))))`.

Here, `sum(N.*P,3)` computes
a map of the dot-product; and acosd converts that to degrees. The min/max is necessary
for numeric reasons.

**Ground-truth:**
Surface normals are __estimated__, not computed, so there is no one single
ground-truth. My recent work has used the ground-truth from [2] since it is very
high-quality.

**Aggregate:**
We introduced 6 metrics in [1]. If `E` is the collection of all the errors
over the dataset, you compute them via `mean(E)`, `median(E)`,
`mean(E.^2).^0.5`, `mean(E<t)`.

**Common Issues:**
There are a number of subtleties that often cause issues the first time around:

*Check your coordinate frame:*different results and ground truths come in different coordinate frames (i.e., meanings of X/Y/Z). Check that the ground truth and predictions look similar. The easiest way is`uint8(N*128+128)`, or`uint8(255*(N+1)/2)`if you prefer typing more.*Clamp your dot product:*Sometimes, when computing x^T y, the result is slightly more than 1 or less than -1. This gives you complex results when you do acos.*Normalize your ground-truth and predictions:*The formula for the angular error__does not work__if its inputs are not normalized to unit norm. It's always easier and safer to renormalize.*Use doubles:*Computing`mean`over a large amount of data in single precision results in catastrophic cancellation. Here is a MATLAB try-at-home demonstration, which simulates surface normal results over 1K 640x480 images:

>> z = 30+randn(640*480*1000,1)*10; >> [mean(z),mean(single(z))] ans = 29.9992 14.0030

You get the right answer, up to noise, with doubles, and a very wrong one with singles.

**Sample Code:** As a demonstration, here is a sample evaluation code. You'll have
to fill in some of the data loading, but it should give you an idea: evalSimple.m

-David Fouhey, January 2016

References:

[1] Data-Driven 3D Primitives for Single Image Understanding. D. Fouhey, A. Gupta, M. Hebert. ICCV 2013.

[2] Discriminatively Trained Dense Surface Normal Estimation. Ľubor Ladický, Bernhard Zeisl, Marc Pollefeys. ECCV 2014