why we should use gaussian rounding

the problem started from a very random thought this morning: ‘how much is -3.5 rounded to the nearest unit?’ while this may be a silly question at the first sight, it is really intriguing to see my colleagues expression of getting lost, although it only maintained for a very brief second. the rule of traditional rounding is straight forwarded:

i am a traditional person, and i have no issue with this well-proven method, until i was reminded: if the critical digit is 0, it’s not really rounding, it’s an accurate representation. thus it seems that we are more likely to ‘round up’ than ‘round down’. there are in general two arguments/mitigation to this:

let’s debunk argument ([argument1]) first: to a statistician, of course any thing that is not significant, is not significant, thus, treat the numbers as if they are continuous would not reveal their true identity behind the critical digit. and at the critical digit, it is evenly possible to be the digit from 0 to 9, each with probability of 0.1, as a result, 0.4 of the time we are rounding down and 0.5 up.

the approach in ([argument2]) is probably most widely taught in primary schools. it is true that in this case, the digit 5 is equally-likely to be rounded up and down provided that the numbers are equally-likely to be positive and negative. for people in financial industry, and admittedly many other fields, negative is a rarity. people do not normally buy -3.5 litre of gas or have -2.5 dollars in their saving accounts.

that is why the guassian’s rounding (and i was told that it was known as ’Mathematische Rundung’, in german, which you can guess what it means) are preferred to the traditional rounding (’Kaufmaennish Rundung’ should you want to know…), where the rules are as following:

with this setting, the critical digit 5 has even chance of being rounding up and down. i am very interested to see if gaussian rounding performs better than the traditional one.

notice that the built-in round() function and its numpy derivative np.round() have already adopted the gaussian rounding strategy.

traditional_rounding = np.vectorize(
        lambda x: math.floor(x + 0.5)
    )
    gaussian_rounding = np.round

    data = np.vectorize(math.trunc)(
        np.random.rand(1000, 1000)*100
    ) / 10
    rounding_err = lambda a, f: \
        abs(np.sum(f(a)) / np.sum(a) - 1)
    rounding_errs = lambda a, f: \
        np.apply_along_axis(rounding_error, 0, a, f)
    err_comparison = pd.DataFrame({
        'Traditional': rounding_errs(data, traditional_rounding),
        'Gaussian': rounding_errs(data, gaussian_rounding)
    })

    sns.displot(
        pd.melt(err_comparison),
        x="value",
        hue="variable",
        kde=True
    )

the results are quite convincing: with 1,000 random samples of 1,000 random number, the percentage errors in the rounding are significantly smaller for the gaussian rounding method, as shown below:

where, as expected, gaussian rounding performs better than the traditional one consistently.

an interesting trivia (if the german one is not interesting enough) is that, gaussian rounding is frequently referred to as ‘banker’s rounding’. it is rather understandable as the bankers, i guess, in general deal with positive numbers, and small disturbance may cost a lot to them.