Universal Codes for Integers

Lloyd Allison, Arun S. Konagurthu & Daniel F. Schmidt, 'On Universal Codes for Integers: Wallace Tree, Elias Omega and Beyond', Data Compression Conference (DCC), pp.313-322, doi:10.1109/DCC50243.2021.00039, 22-26 March 2021.

N=
Ints≥  

Above: CW:=enc(N) then check that dec(CW)=N, for each of several codes – Fibonacci, Elias's omega, our omega2 and omega*, and Wallace Tree Code – all for integers ≥0 or ≥1. Note, treat the timings with great caution as who knows when javascript's garbage collector kicks in.


Above: Code-word lengths for powers of 10.


N fib1 omega1 WTC1
1 2 1* 1*
2 3* 3* 3*
3 4 3* 5
4 4* 6 5
13 7* 7* 9
16 7* 11 9
610 15* 17 15*
627 15* 17 17
1597 17* 18 17*
2057 17* 19 19
4181 19* 20 19*
6765 20 20 19*
6919 20* 20* 21
8192 20* 21 21
10946 21* 21* 21*
16384 21* 22 21*
17711 22 22 21*
23715 22* 22* 23
28657 23 22* 23
32768 23* 23* 23*
46368 24 23* 23*
6553624 28 23*
8250125*28 25*
12139326 28 25*
29051327*30 27*
31781128 30 27*

Above: Code-word lengths, early points of change of the lead.


w fib1 omega1 WTC1
1 0 0.5 0.5
2 0.25 0.5 0.5
3 0.375 0.75 0.625
4 0.5 0.75 0.625
10 0.859 0.875 0.754
100 0.999... 0.947 0.920
1000 -"- 0.957 0.975
10000 -"- 0.963 0.992
100000 -"- 0.9688 0.997
1000000 -"- 0.9692 0.9992
N s.t. |codeword(N)|≤w p(N)

Above: Cummulative probabilities upto code-word length w (bits).


Is there an ultimate code?

Is there an ultimate code, i.e., a code that gives the shortest code-words, shorter than any other code gives, to sufficiently big integers?


Above: probabilities Pr(.) under code C
Any code C can be "stretched" to make another code C': If code C gives an integer n≥1 the code-word CW(n) then code-words for C' are defined by
CW'(2n-1) = CW(n) ++ '0', and
CW'(2n) = CW(n) ++ '1'.
and if Pr(m) is the probability of m under C and Pr'(m) the probability of m under C' then
Pr'(2n-1) = Pr'(2n) = Pr(n)/2.

If C is a prefix code so is C': The end of CW(n) can be detected and we expect one more bit for CW'(2n-1) and CW'(2n).


Above: probabilities Pr'(.) under code C'

If C is efficient the area under the curve D+E+F=1 and C' is also efficient, area G+H=1. Now the region marked G is region D shrunk by 1/2 vertically and stretched ×2 horizontally, so area D=G. Consequently H=E+F and area H>F. Put another way, C' assigns more probability (eventually gives shorter code-words) than C to big enough integers. And of course C' can itself be stretched, "improved".

So in that sense, no, there is no ultimate code.


Robustness

Code:     

Above: Experiment with robustness; Ns→code-words→code-string→mutate(CW[0])→decode.
Key: || correct number of Ns, # wrong number of Ns, = late numbers correct, non-trivial remnant ⇒ v. bad (error msg shows why).

References