Evaluating Surface Normal Predictions

Over the years, I've gotten a number of questions about evaluating surface normal results. This document is aimed at helping answer questions like ``are you doing X or Y?'' and especially ``why isn't my number in the same ballpark?''. The overall criterion introduced in [1] is as follows:

for each valid pixel, compute the angular error with respect
to the ground-truth and then aggregate this error over the test set.

Different definitions of each bolded point change results. Here, I clarify each and give a list of common issues I have seen.

Valid: This is crucial: while interpolated depth is perhaps useful for RGBD understanding, surface normal prediction can only be evaluated at locations measured by the Kinect. In NYU this is given in the rawDepths variable.

Angular error: this is the angle between two unit vectors, or acos(x'*y). In all of our papers and ones that I am aware of, this is what is used. Given two HxWx3 maps N and P, this is quickly computed in degrees as

acosd(min(1,max(-1,sum(N.*P,3)))).

Here, sum(N.*P,3) computes a map of the dot-product; and acosd converts that to degrees. The min/max is necessary for numeric reasons.

Ground-truth: Surface normals are estimated, not computed, so there is no one single ground-truth. My recent work has used the ground-truth from [2] since it is very high-quality.

Aggregate: We introduced 6 metrics in [1]. If E is the collection of all the errors over the dataset, you compute them via mean(E), median(E), mean(E.^2).^0.5, mean(E<t).

Common Issues: There are a number of subtleties that often cause issues the first time around:

• Check your coordinate frame: different results and ground truths come in different coordinate frames (i.e., meanings of X/Y/Z). Check that the ground truth and predictions look similar. The easiest way is uint8(N*128+128), or uint8(255*(N+1)/2) if you prefer typing more.
• Clamp your dot product: Sometimes, when computing x^T y, the result is slightly more than 1 or less than -1. This gives you complex results when you do acos.
• Normalize your ground-truth and predictions: The formula for the angular error does not work if its inputs are not normalized to unit norm. It's always easier and safer to renormalize.
• Use doubles: Computing mean over a large amount of data in single precision results in catastrophic cancellation. Here is a MATLAB try-at-home demonstration, which simulates surface normal results over 1K 640x480 images:
```>> z = 30+randn(640*480*1000,1)*10;
>> [mean(z),mean(single(z))]

ans =

29.9992   14.0030
```
You get the right answer, up to noise, with doubles, and a very wrong one with singles.

Sample Code: As a demonstration, here is a sample evaluation code. You'll have to fill in some of the data loading, but it should give you an idea: evalSimple.m

-David Fouhey, January 2016

References:
[1] Data-Driven 3D Primitives for Single Image Understanding. D. Fouhey, A. Gupta, M. Hebert. ICCV 2013.
[2] Discriminatively Trained Dense Surface Normal Estimation. Ľubor Ladický, Bernhard Zeisl, Marc Pollefeys. ECCV 2014