why we should use gaussian rounding

hantian

2025-01-09

the problem started from a very random thought this morning: ‘how much is -3.5 rounded to the nearest unit?’ while this may be a silly question at the first sight, it is really intriguing to see my colleagues expression of getting lost, although it only maintained for a very brief second. the rule of traditional rounding is straight forwarded:

  1. if the digit behind the last significant digit is 0, 1, 2, 3, 4, then round to the previous integer;

  2. else if the digit behind the last signficant digit is 5, 6, 7, 8, 9, then round to the next integer.

i am a traditional person, and i have no issue with this well-proven method, until i was reminded: if the critical digit is 0, it’s not really rounding, it’s an accurate representation. thus it seems that we are more likely to ‘round up’ than ‘round down’. there are in general two arguments/mitigation to this:

  1. recognising the continuity of numbers: actually it’s unfair to compare only the critical digit, numbers from (0, 0.5) are rounded down, and [0.5, 1) are rounded up, the measure of the two sets are exactly the same.

  2. rounding towards/against zero: that is ignore the sign before the number, thus -3.5 will be rounded to -4 if rounded against zero,

unfortunately, they’re not entirely true, if not false.

let’s debunk argument ([argument1]) first: to a statistician, of course any thing that is not significant, is not significant, thus, treat the numbers as if they are continuous would not reveal their true identity behind the critical digit. and at the critical digit, it is evenly possible to be the digit from 0 to 9, each with probability of 0.1, as a result, 0.4 of the time we are rounding down and 0.5 up.

the approach in ([argument2]) is probably most widely taught in primary schools. it is true that in this case, the digit 5 is equally-likely to be rounded up and down provided that the numbers are equally-likely to be positive and negative. for people in financial industry, and admittedly many other fields, negative is a rarity. people do not normally buy -3.5 litre of gas or have -2.5 dollars in their saving accounts.

that is why the guassian’s rounding (and i was told that it was known as ’Mathematische Rundung’, in german, which you can guess what it means) are preferred to the traditional rounding (’Kaufmaennish Rundung’ should you want to know…), where the rules are as following:

  1. if the critical digit is 1, 2, 3, 4 then round to the previous integer;

  2. else if the critical digit is 6, 7, 8, 9 then round to the next integer;

  3. if, however, the critical digit is 5, then round to the nearest even number.

with this setting, the critical digit 5 has even chance of being rounding up and down. i am very interested to see if gaussian rounding performs better than the traditional one.

notice that the built-in round() function and its numpy derivative np.round() have already adopted the gaussian rounding strategy.

traditional_rounding = np.vectorize(
        lambda x: math.floor(x + 0.5)
    )
    gaussian_rounding = np.round

    data = np.vectorize(math.trunc)(
        np.random.rand(1000, 1000)*100
    ) / 10
    rounding_err = lambda a, f: \
        abs(np.sum(f(a)) / np.sum(a) - 1)
    rounding_errs = lambda a, f: \
        np.apply_along_axis(rounding_error, 0, a, f)
    err_comparison = pd.DataFrame({
        'Traditional': rounding_errs(data, traditional_rounding),
        'Gaussian': rounding_errs(data, gaussian_rounding)
    })

    sns.displot(
        pd.melt(err_comparison),
        x="value",
        hue="variable",
        kde=True
    )

the results are quite convincing: with 1,000 random samples of 1,000 random number, the percentage errors in the rounding are significantly smaller for the gaussian rounding method, as shown below:

%errors of roundings

where, as expected, gaussian rounding performs better than the traditional one consistently.

an interesting trivia (if the german one is not interesting enough) is that, gaussian rounding is frequently referred to as ‘banker’s rounding’. it is rather understandable as the bankers, i guess, in general deal with positive numbers, and small disturbance may cost a lot to them.