One of the two ways that the Chaos Device will condense entropy is via ip web cams (the other being via the sound card). Web cams produce JPEG files, and it is these that will be used as sources of raw entropy. Uniquely, we will not decode the JPEG files into images. We will use them directly for random number extraction. Naysayers of limited vision and understanding will immediately shout that JPEG files are not good sources of random numbers. We believe that they are excellent sources of entropy. It's just that it's not at 100% purity. This extraction technique will be detailed at a later time, but the following is a proof of concept analysis.
A web cam was set up viewing a static scene. The lighting was artificial with no other natural lighting affecting the scene. Multiple frames were taken from the camera and analysed. Now for the science bit...
In order to compare the JPEG files, we calculated the probabilities of individual byte values and compared them. This is a graph of two sequential frames, of about 38 KB in size. To highlight the variations in probabilities, we also calculated the Kullback–Leibler divergence between the distributions in accordance with the classic interpretation-
from which we get
Note that the two frames are of exactly the same physical scene. The blue trace looks really random, with little apparent bias above or below the axis. The probability divergence is solely due to noise effects within the camera's CMOS sensor and associated electronics. These effects are chiefly photon shot noise and Johnson–Nyquis thermal noise. The exact contribution of each is to be determined experimentally.
We actually took 330,000 frames during the experiment. Each JPEG file was SHA-1 hashed, and the resultant 160 bit hash strings histogrammed. There was only one count against each hash string. The conclusion is that each JPEG file was unique. Via a reverse Birthday Problem calculation, we estimate that it would take 10^11 unique JPEG files for a 50% chance of hash collision. Due to the extent of probability variability between JPEG files, we go out on a proverbial limb and posit that all JPEG files are unique. We mean all unique in the world. We mean all photos taken throughout the land, by all devices are unique, even if they look the same to the unaided eye. Try it. Put a camera on a timer and take several identical shots. Then hash them and compare. The hash values will be different. We (almost) guarantee it. And what of EXIF data we hear you ask? The web cam we use is a simple model, and does not export EXIF data, so there is no timestamp to be uniquely hashed. A test frame was also analysed here.