[One can tell it’s reviewing and letter-writing season when I escape to blogging more often..]
There’s been some discussion on the NIPS experiment, enough of it that even my neuro-scientist brother sent me a link to Eric Price’s blog post. The gist of it is that the program chairs duplicated the reviewing process for 10% of the papers, to see how many would get inconsistent decisions, and it turned out that 25.9% of them did (one of the program chairs predicted that it would be 20% and the other that it would be 25%, see also here, here and here). Eric argues that the right way to measure disagreement is to look at the fraction of papers that process A accepted that would have been rejected by process B, which comes out to more than 50%.
It’s hard for me to interpret this number. One interpretation is that it’s a failure of the refereeing process that people can’t agree on more than half of the list of accepted papers. Another viewpoint is that since the disagreement is not much larger than predicted beforehand, we shouldn’t be that surprised about it. It’s tempting when having such discussions to model papers as having some inherent quality score, where the goal of the program committee is to find all papers above a certain threshold. The truth is that different papers have different, incomparable qualities, that appeal to different subsets of people. The goal of the program committee is to curate an a diverse and intellectually stimulating program for the conference. This is an inherently subjective task, and it’s not surprising that different committees would arrive at different conclusions. I do not know what’s the “optimal” amount of variance in this process, but I would have been quite worried if it was zero, since it would be a clear sign of groupthink. Lastly, I think this experiment actually points out to an important benefit of the conference system. Unlike journals, where the editorial board tends to stay constant for a long period, in conferences one gets a fresh draw of the committee every 6 months or a year.