ent

BLUF:-

A subtle mistake(?) that means ent requires nuanced usage. The program seems to be a mix of randomness test and entropy measurement. But those are incompatible. Use it carefully either with the -c option for IID min.entropy measurement, or as a compact randomness test focusing on bit/byte distribution only.

ent is:-

“a program, ent, which applies various tests to sequences of bytes stored in files and reports the results of those tests. The program is useful for evaluating pseudorandom number generators for encryption…”

In “evaluating pseudorandom number generators for encryption”, the required entropy rate is 1 bit/bit or 8 bits/byte. Anything substantially less is useless for cryptography. There is no need to measure it as the only result concerning us is a pass/fail determination within agreed confidence bounds, à la the other standard randomness tests like dieharder. Yet there are no bounds and no determination of any p values for confidence other than for a bit/byte distribution $ \chi^2 $.

And it can’t be used for general entropy measurement in it’s default setting, as it reports the wrong type of entropy. Cryptography focuses on the most conservative min.entropy $(H_{\infty})$, not Shannon entropy. ent reports Shannon entropy which is always higher for all sample distributions other than uniform. See Note 1. And uniform distributions are uncommon from most entropy sources.

As an example, see the following entropy calculations for a synthetic IID Gaussian distribution which might be optimal ADC samples of Zener breakdown noise, like:-

Synthetic Zener breakdown noise samples.

Synthetic Zener breakdown noise samples.

$ ent /tmp/gauss.bin
Entropy = 6.369663 bits per byte.        <====

Optimum compression would reduce the size
of this 1000000 byte file by 20 percent.

Chi square distribution for 1000000 samples is 2607328.04, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bytes is 126.9692 (127.5 = random).
Monte Carlo value for Pi is 3.999639999 (error 27.31 percent).
Serial correlation coefficient is -0.001901 (totally uncorrelated = 0.0).
Expand ent -c report for above distribution:-

Which gives $Pr(X=127) = 0.020151$ and hence $H_{\infty} = -\log_2(0.020151) = 5.633004$ bits/byte. That is only 88% of ent’s default measure. A wackier sample distribution might drop that percentage considerably lower still. And wacky distributions are certainly possible as you can see elsewhere on this site.


Notes:-

  1. Also on the ent page (bottom) is this:-

BUGS: Note that the “optimal compression” shown for the file is computed from the byte- or bit-stream entropy and thus reflects compressibility based on a reading frame of the chosen width (8-bit bytes or individual bits if the -b option is specified). Algorithms which use a larger reading frame, such as the Lempel-Ziv [Lempel & Ziv] algorithm, may achieve greater compression if the file contains repeated sequences of multiple bytes.

  1. A consequence of note 1 above is that an 8-bit window presupposes IID data with a relaxation period $\ngtr$ 8 bits. Sadly it is common to see ent used (incorrectly) against non-IID data sets. In those cases the default reported entropy would be much higher than the true rate.

  2. More about measuring entropy.

  3. For interest, $\frac{H_{Compression|ent}}{H_{\infty}} = \frac{6.369663}{5.633004} = 1.13 $ for this particular IID Gaussian distribution.