Entropy measurement
This page has been superseded with a more accurate entropy measurement technique here…
As this section deals with entropy (

Dom. The purest kind of entropy.
Since the entropy sources on this site are to form the basis of TRNGs, there is Golden rule 3 that must be met, namely that
The crucial aspect is that we consider the raw .jpg file and not the opened/decoded raster image. Our JPEGs ~ 21.4 kB. Intuitively those files’ entropy cannot exceed the filesize. A typical rendered 640 * 480 pixel image creates a lossless PNG file of 284kB using maximum compression. This creates the false impression that the entropy may be in the 100s of kB. What actually happens is that the JPEG algorithm acts similarly to a PRNG seeded by xxx.jpg as
It’s tempting to use the traditional Shannon [1]

Byte value distribution in 100 JPEGs.
and the blue line is what it should be to be uniformly distributed with a ent
, but it returns (under an incorrect IID assumption) an entropy measurement of 7.66 bits per byte. Yet it’s perfectly random. Random
Fortunately, that document’s §6.3.4 The Compression Estimate offers a useful pointer. The compression estimate computes the entropy rate of a dataset, based on how much that dataset can be compressed and that you cannot losslessly compress anything below it’s entropy content. The estimator is based on the Maurer Universal Statistical test. The previous draft version of SP 800-90B (August 2012 Draft) also recommended the Burrows-Wheeler BZ2 compression algorithm as an entropy sanity check in §9.4.1. We can rewrite their compression test as scomp
test within the TestU01 randomness test suite also uses compression, this time based on the Lempel-Ziv (LZMA) algorithm. And since the Lempel-Ziv algorithm consists of just 26 lines of pseudo code, there are much more efficient and advanced compressors.
Even without considering that the entropy is packaged in the binary format of the JPEG specification, autocorrelation is large. Especially within the first 50 bytes of each frame, reaching nearly 0.2. This can also be seen as a clear discontinuity between frames within the raw sample set:-

Autocorrelation analysis featuring JPEG boundary.
An alternative autocorrelation/IID test is our permuted compressions test, based on the 800-90B Compression Estimate and sanity check. However we use a combination of algorithms rather than just BZ2. The test is based on the principle that a permuted correlated file will not compress as well as the original correlated file For a 1 MB file of 50 concatenated frames, as expected, the IID test fails:-

Badly failing custom permutation IID test.
There is also an alternative to the alternative, in the form of a conceptual entropy shmear measure. Strangely, the life sciences seem to use compression for entropy estimation much more than typical TRNG designers do. It’s used for example in protein folding assessments, fetal heart rate characterization and even music identification.
We constructed a data set from the Photonic Instrument comprising 100 concatenated JPEGs of overall size 2,120,003 bytes. Using paq8px
from the Hutter Prize compression competition, our data set compressed to 1,310,191 bytes. This level of JPEG compression is achieved by partially decoding the image back to the DCT coefficients and recompressing them with a much better algorithm than the default Huffman coding. The cmix
compressor (from the same competition) could not achieve a smaller compressed size as it does not incorporate JPEG models. Even recent asymmetric numeral system based compressors cannot achieve such a high degree of compression.
Compression algorithms have improved over time, and the following graph illustrates the decreasing records in the 12 year old Hutter competition:-

Compressed size of 100MB enwik8 test file.
Logically there must be an absolute lower bound to entropy of any given sequence as the Pigeonhole principle always applies. Otherwise we would not be able to differentiate one piece of information from the next. The fitted purple curve does seem to suggest that there is an asymptotic minimum compressible size for the competition’s 100MB enwik8
file. Perhaps it may go as low as 10MB. We thus divide 4.94 by a safety factor of 2, and again by 2 for ultra conservatism, uber security and just becoz’. Incidentally, the same conservative level of prediction would result in a compressed enwik8
file size of 3.82MB (the green dot on the compression graph). We therefore posit that the Photonic Instrument’s entropy rate is 1.235 bits per byte. Which to us looks like 1 bit per byte for simplicity’s sake. And 1 is something that Dom can understand. There are also benefits to this conservative entropy assessment in safely (and simply) accommodating temperature effects.
The mean size of JPEGs from the Photonic Instrument is 21.2kB. With
This page has been superseded with a more accurate entropy measurement technique here…
References:-
[1] A Mathematical Theory of Communication, Claude E. Shannon, Bell Telephone System Technical Publications, 1948.