Contents:-

# Entropy Analysis

“Being chaotic signifies a complexity of structure. Then the fact that a sequence is chaotic signifies that the complexity of its beginning segments grows sufficiently fast.”

-A. N. Kolmogorov.

We firstly check that entropy is actually being generated by Mata Hari’s entropy source. We are trying to confirm that irreducible information is being generated in a near homogeneous manner (ergodicity), rather than some initial chaotic start up event. The complexity test below shows a good uniformity of entropy production, with all ten normalised segments having a value of around 1.

Essentially the device works due to quantization error when sampling a random (uncorrelated) source at a 10 bit resolution. And the beauty is that we can form an additive noise model for the circuit whereby the Zener generated noise rides atop any ambient noise like radio, EMI or anything induced from the mains supply.

The above probability density function shows the range of quantized signal values via the internal ADC sampling at the default nominal rate of 10 kSa/s $(\epsilon = 1.1 \text{mV}, \tau = 112 \: \mu \text{s})$ or 8.93 kSa/s actual. We have fitted a curve to it, and it appears to be slightly skewed which we infer to be a log-normal distribution. This is proof that some Zener avalanche effect occurs within our 8.2 V diode with slightly over 0.8 V compliance (given a healthy battery).

But this signal can’t be used as is. ent gives a lag = 1 autocorrelation ($R$) value of 0.049733. Typically $R < 10^{-3}$ if the data set is to be considered IID. IIDness dramatically increases our confidence in min.entropy ($H_{\infty}$) determination and bypasses all those dodgy NIST 90B non-IID shenanigans and hoo-ha detailed here. We will not even bother to perform an IID test on this raw data; it would fail.

So in keeping with our Three Golden rules, we will modify our $(\epsilon, \tau)$ sampling methodology and reduce $\epsilon$ to only one bit, equivalent to 1.1 mV. We will sample as (analogRead(portNo) & 0b1). Hopefully this will result in IID samples. Thus testing…

Expand our slow IID test over samples taken with $\epsilon = 1$ bit:-
Expand NIST IID test over samples taken with $\epsilon = 1$ bit:-

These two tests confirm that our sampling regime is IID. Therefore we can compress without worry the individual bits into a byte to increase the efficiency of transferring the entropy off-circuit. We can do this utilising a left shift and some ORs. Incidentally, the extra processing has the effect of ever so so slightly decreasing the sample rate ($\tau \uparrow$) and thus enhancing IIDness, but we couldn’t exactly quantify this small difference ourselves $(\tau \approx 112 \: \mu \text{s})$. The internal Mata Hari sample loop is like so:-

for (int16_t i = 0; i < totalSamples; i++) {
uint8_t sample = 0;
sample <<= 1;
sample <<= 1;
sample <<= 1;
sample <<= 1;
sample <<= 1;
sample <<= 1;
sample <<= 1;
$ent -b /tmp/mata-hari-10mb-x8bit.bin Entropy = 0.999995 bits per bit. Optimum compression would reduce the size of this 80000000 bit file by 0 percent. Chi square distribution for 80000000 samples is 587.11, and randomly would exceed this value less than 0.01 percent of the times. Arithmetic mean value of data bits is 0.4986 (0.5 = random). <<<< Monte Carlo value for Pi is 3.151026060 (error 0.30 percent). Serial correlation coefficient is -0.000459 (totally uncorrelated = 0.0). <<<< We can ignore all the metrics other than arithmetic mean and serial correlation. An arithmetic mean of 0.4968 suggests the expected bit bias of$\epsilon = 0.0014$, or$ \epsilon = 2^{-9.5}$away from$0.5$towards$0$, calculated as$ 9.5 = \frac{\log \big (\frac{1}{0.5-0.4986} \big)}{\log(2)} $. Not exactly NIST’s$\epsilon \le 2^{-64}$next bit requirement, but then this is not a TRNG. It’s the entropy source for a TRNG. Randomness extraction will follow. And notice the correlation.$|R| = 0.000459$is very good indeed being an order of magnitude below what’s expected of IID data. Which we further confirm by three independent means… Expand our slow IID test for the compressed output of 8 samples/byte:- Expand our fast IID test for the compressed output of 8 samples/byte:- Expand NIST's IID test for the compressed output of 8 samples/byte:- Both our and NIST’s IID tests confirm that the compressed data sample is IID. Therefore the entropy$(H_{\infty})$rate is simply$-\log_2 (p_{max})\$ or min(H_original, 8 X H_bitstring): 7.947502 taken from the NIST test output above.