Over the years, I've gotten a number of questions about evaluating surface normal results. This document is aimed at helping answer questions like ``are you doing X or Y?'' and especially ``why isn't my number in the same ballpark?''. The overall criterion introduced in  is as follows:
for each valid pixel, compute the angular error with respect
to the ground-truth and then aggregate this error over the test set.
Different definitions of each bolded point change results. Here, I clarify each and give a list of common issues I have seen.
Valid: This is crucial: while interpolated depth is perhaps useful for RGBD understanding, surface normal prediction can only be evaluated at locations measured by the Kinect. In NYU this is given in the rawDepths variable.
Angular error: this is the angle between two unit vectors, or acos(x'*y). In all of our papers and ones that I am aware of, this is what is used. Given two HxWx3 maps N and P, this is quickly computed in degrees as
Here, sum(N.*P,3) computes a map of the dot-product; and acosd converts that to degrees. The min/max is necessary for numeric reasons.
Ground-truth: Surface normals are estimated, not computed, so there is no one single ground-truth. My recent work has used the ground-truth from  since it is very high-quality.
Aggregate: We introduced 6 metrics in . If E is the collection of all the errors over the dataset, you compute them via mean(E), median(E), mean(E.^2).^0.5, mean(E<t).
Common Issues: There are a number of subtleties that often cause issues the first time around:
>> z = 30+randn(640*480*1000,1)*10; >> [mean(z),mean(single(z))] ans = 29.9992 14.0030You get the right answer, up to noise, with doubles, and a very wrong one with singles.
Sample Code: As a demonstration, here is a sample evaluation code. You'll have to fill in some of the data loading, but it should give you an idea: evalSimple.m
 Data-Driven 3D Primitives for Single Image Understanding. D. Fouhey, A. Gupta, M. Hebert. ICCV 2013.
 Discriminatively Trained Dense Surface Normal Estimation. Ľubor Ladický, Bernhard Zeisl, Marc Pollefeys. ECCV 2014