An important note:


Please note that we are actively experimenting with various formulas to find the one that we feel best helps users identify the most relevant papers for them, so this page is likely to change from time to time. There are an almost infinite number of ways to calculate these things, each with its pros and cons, so it's virtually certainly that - no matter how we calculate these things - we will not be able to please everyone. For that reason, we believe the best we can do is always be transparent in how our ratings are being calculated and our thinking behind it. While we are always open to suggestions on how we might further refine such calculations, we ask that you please understand that accommodating all such requests would be possible.



Overall Rating for Single Review

(i.e. You rate a paper on its various dimensions - how do we then calculate the overall rating?)

The overall rating for a single review is a weighted-average of each of the individual dimension ratings, with the following weights:


DimensionWeight
Reproducibility25%
Logic / Design15%
Impact15%
Transparency15%
Clarity15%
Versatility15%
Total100%



Aggregate Ratings of Papers and Reviews

  • I just submitted a review of a paper. How does that affect the ratings for that paper across all reviews? OR
  • I just rated a review. How does that affect the rating for that review across all ratings?


When calculating the aggregate rating for a paper or a review, we use something called the bayesian average.


Bayesian Average


As modified from Wikipeida:


A Bayesian average is a method of estimating the mean of a population using outside information, especially a pre-existing belief,[1] that is factored into the calculation. This is a central feature of Bayesian interpretation. This is useful when the available data set is small.[2]

Calculating the Bayesian average uses the prior mean m and a constant C. C is chosen based on the typical data set size required for a robust estimate of the sample mean. The value is larger when the expected variation between data sets (within the larger population) is small. It is smaller when the data sets are expected to vary substantially from one another.

{\displaystyle {\bar {x}}={Cm+\sum _{i=1}^{n}x_{i} \over C+n}}{\displaystyle {\bar {x}}={Cm+\sum _{i=1}^{n}x_{i} \over C+n}}

This is equivalent to adding C data points of value m to the data set. It is a weighted average of a prior average m and the sample average.

m.


In our case, we currently have m = 3.2 (i.e. 3.2 stars) and C = either 0 (in there are fewer than four ratings) or 10 (if there are four or more).


An Example


Let's say paper P currently has no reviews. A first review comes in as the following:

 

WeightDimensionReview 1
25%Reproducibility3
15%Logic / Design5
15%Impact2
15%Transparency2
15%Clarity1
15%Versatility3
100%Total2.7


So the 'overall' rating for the that first review for paper P is 2.7 stars. We now need to adjust the paper's 'aggregate' ratings (i.e. the ones we display on the main page for paper P) based upon this information. To do this, we use the Bayesian average with the parameters given above. 


So, to update the 'Reproducibility' score, we use the following:

  • Since n (the number of reviews) = 1, it is still less than 4,  so C = 0
  • m = 3.2
  • x1 (the first rating for reproducibility) = 3


Plugging that into the formula above, we get x = ( 0*3.2 + 3)  /  ( 0 + 1) = 3 / 1 = 3. So the updated aggregate 'reproducibility' score for the paper is 3.  


   For a more complete example, please see the attached Excel file.