Image of the glider from the Game of Life by John Conway
Skip to content

The Entropy of a Digital Camera CCD/CMOS Sensor

Recently, Vault12 released an app for iOS that uses the mobile device's camera as a source of randomness. Unfortunately, when putting the generated binary files through the Dieharder tests, it comes out pretty bad. I get 20 "PASSED", 13 "WEAK", and 81 "FAILED" results. For a TRNG, it should be doing much better than that. Now, to be clear, I'm not writing this post to shame Vault12. I actually really love the TrueEntropy app, and am eagerly waiting for it to hit Android, so I can carry around a TRNG in my pocket. However, things can get better, and that is what this post is hopefully addressing.

Using a camera as a TRNG is nothing new. SGI created a patent for pointing a webcam at a lava lamp, using the chaotic nature of the lava lamp itself as the source of entropy. Later, it was realized that this was unnecessary. The CCD/CMOS in the camera was experiencing enough noise from external events to be more than sufficient. This noise is shown in the photo itself, and is appropriately referred to as "image noise".

The primary sources of noise come from the following:

  • Thermal noise- Caused by temperature fluctuations due to electrons flowing across resistant mediums.
  • Photon noise- Caused by photons hitting the CCD/CMOS and releasing energy to neighboring electronics.
  • Shot noise- Caused by current flow across diodes and bipolar transistors.
  • Flicker noise- Caused by traps due to crystal defects and contaniments in the CCD/CMOS chip.
  • Radiation noise- Caused by alpha, beta, gamma, x-ray, and proton decay from radioactive sources (such as outer-space) interacting with the CCD/CMOS.

Some of these noise sources can be manipulated. For example, by cooling the camera, we can limit thermal noise. A camera at 0 degrees Celsius will experience less noise than one at 30 degrees Celsius. A camera in a dark room with less photons hitting the sensor will experience less noise than a bright room. Radiation noise can be limited by isolating the sensor in a radiation-protective barrier.

Let's put this to the test, and see if we can actually calculate the noise in a webcam. To do this, we'll look at a single frame with the lens cap covered, where the photo is taken in a dark room, and the web cam is further encompassed in a box. We'll take the photo at about 20 degrees Celsius (cool room temperature).

In order to get a basis for the noise in the frame, we'll use Shannon Entropy from information theory. Thankfully, understanding Shannon Entropy isn't that big of a deal. My frame will be taken with OpenCV from a PlayStation 3 Eye webcam, which means the frame itself is just a big multidimensional array of numbers between 0 and 255 (each pixel only provides 8 bits of color depth). So, to calculate the Shannon Entropy of a frame, we'll do the following:

  1. Place each number in its own unique bin of 0 through 255.
  2. Create an observed probability distribution (histogram) by counting the numbers in each bin.
  3. Normalize the distribution, creating 256 p-values (the sum of which should equal "1").
  4. For each of the 256 bins, calculate: -p_i*log_2(p_i).
  5. Sum the 256 values.

Thankfully, I don't have all of this by hand- numpy provides a function for me to call that does a lot of the heavy lifting for me.

So, without further ado, let's look at the code, then run it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#!/usr/bin/python

import cv2
import math
import numpy

def max_brightness(frame):
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    h, s, v = cv2.split(hsv)
    v[v > 0] = 255
    v[v < = 0] += 255
    final_hsv = cv2.merge((h, s, v))
    frame = cv2.cvtColor(final_hsv, cv2.COLOR_HSV2BGR)
    return frame

def get_entropy(frame):
    histogram = numpy.histogram(frame, bins=256)[0]
    histogram_length = sum(histogram)
    samples_probability = [float(h) / histogram_length for h in histogram]
    entropy = -sum([p * math.log(p, 2) for p in samples_probability if p != 0])
    return entropy

cap = cv2.VideoCapture(0)
cap.set(3, 640)
cap.set(4, 480)

ret, frame1 = cap.read()
ret, frame2 = cap.read()
frame_diff = cv2.absdiff(frame1, frame2)

cv2.imwrite('/tmp/frame1.bmp', frame1)
cv2.imwrite('/tmp/frame2.bmp', frame2)
cv2.imwrite('/tmp/frame_diff.bmp', frame_diff)

frame1_max = max_brightness(frame1)
frame2_max = max_brightness(frame2)
frame_diff_max = max_brightness(frame_diff)

cv2.imwrite('/tmp/frame1_max.bmp', frame1_max)
cv2.imwrite('/tmp/frame2_max.bmp', frame2_max)
cv2.imwrite('/tmp/frame_diff_max.bmp', frame_diff_max)

print("Entropies:")
print("    frame1: {}".format(get_entropy(frame1)))
print("    frame2: {}".format(get_entropy(frame2)))
print("    frame_diff: {}".format(get_entropy(frame_diff)))
print("    frame1_max: {}".format(get_entropy(frame1_max)))
print("    frame2_max: {}".format(get_entropy(frame2_max)))
print("    frame_diff_max: {}".format(get_entropy(frame_diff_max)))

Let's look over the code before running it. First, I'm actually capturing two frames right next to each other, then taking their composite difference. We know that a photo consists of its signal (the data most people are generally interested in) and its noise (the data they're not). By taking the composite difference between the two, I'm attempting to remove the signal. Because the frames were taken in rapid succession, provided nothing was drastically changing between the frames, most of the data will be nearly identical. So the signal should disappear.

But what about the noise? Well, as discussed above, the noise is a bit unpredictable and slightly unmanageable. Unlike my signal, the noise will be drastically different between the two frames. So, rather than removing noise, I'll actually be adding noise in the difference.

The next thing you'll notice is that I'm either maximizing or completely removing the luminosity in an HSV color profile. This is done just as a visual demonstration of what the noise actually "looks like" in the frame. You can see this below (converted to PNG for space efficiency).

Frame 1

Frame 2

Difference of frames 1 & 2

Frame 1 maxed luminosity

Frame 2 maxed luminosity

Difference of frames 1 & 2 maxed luminosity


Running the Python script in my 20 degrees Celsius dark room with the lens cap on and all light removed as much as possible, I get:

$ python frame-entropy.py
Entropies:
    frame1: 0.0463253223509
    frame2: 0.0525489364555
    frame_diff: 0.0864590940377
    frame1_max: 0.0755975713815
    frame2_max: 0.0835428883103
    frame_diff_max: 0.134900632254

The "ent(1)" userspace utility confirms these findings when saving the frames as uncompressed bitmaps:

$ for I in frame*.bmp; do printf "$I: "; ent "$I" | grep '^Entropy'; done
frame1.bmp: Entropy = 0.046587 bits per byte.
frame1_max.bmp: Entropy = 0.076189 bits per byte.
frame2.bmp: Entropy = 0.052807 bits per byte.
frame2_max.bmp: Entropy = 0.084126 bits per byte.
frame_diff.bmp: Entropy = 0.086713 bits per byte.
frame_diff_max.bmp: Entropy = 0.135439 bits per byte.

It's always good to use an independent source to confirm your findings.

So, in the standard frames, I'm getting about 0.05 bits per byte of entropy. However, when taking the composite difference, that number almost doubles to about 0.09 bits per byte. This was expected, as you remember, we're essentially taking the noise from both frames, and composing them in the final frame. Thus, the noise is added in the final frame.

What was actually surprising to me were the entropy values after setting the extreme luminosity values. This may be due to the fact that there are larger deltas between adjacent pixels when creating our histogram. When taking the difference of the two adjusted frames, the entropy jumps up to about 0.13 bits per byte. So, we could safely say that a composed frame with maxed luminosity that is the difference of two frames has about 0.11 bits of entropy per byte, plus or minus 0.02 bits per byte.

What does this say about the frame as a whole though? In my case, my frame is 640x480 pixels. Knowing that each pixel in my PS3 Eye webcam only occupies 1 byte or 8 bits, we can calculate the entropy per frame:

(640*480) pixels/frame * 1 byte/pixel = 307200 bytes/frame
307200 bytes/frame * 0.11 entropy bits/byte = 33792 entropy bits/frame

Each frame in my 640x480 PS3 Eye webcame provides about 33,792 bits of entropy. For comparison SHA-256 theoretically provides a maximum of 256-bits of entropic security. Of course, we should run millions of trials, collecting the data, calculate the standard deviation, and determine a more true average entropy. But, this will suffice for this post.

So, now that we know this, what can we do with this data? Well, we can use it as a true random number generator, but we have to be careful. Unfortunately, as the frame is by itself, it's heavily biased. In the frame, there exists spatial correlation with adjacent pixels. In the frame difference, there exists both spatial and time correlations. This isn't sufficient as a secure true random number generator. As such, we need to remove the bias. There are a few ways of doing this, called "randomness extraction", "software whitening", "decorrelation", or "debiasing". Basically, we want to take a biased input, and remove any trace of bias in the output.

We could use John von Neumann decorrelation, where we look at two non-overlapping consecutive bits. If the two bits are identical, then both bits are discarded. If they are different, then the most significant bit is output, while the least significant bit is discarded. This means that at a best, you are discarding half of your data, but how much is discarded all depends on how badly biased the data is. We know that our frame is only providing 0.11 bits of entropy per 8 bits. So we're keeping 11 bits out of 800. That's a lot of data that is discarded. One drawback with this approach, however, is if one or more bits are "stuck", such is an a dead pixel. Of course, this will lower the overall entropy of the frame, but will also drastically impact the extractor.

A different approach would be to use chaos machines. This idea is relatively new and not thoroughly studied; at least I'm struggling to find good research papers on the topic. The idea is taking advantage of the chaotic behavior of certain dynamical systems, such as a double pendulum. Due to the nature of chaos, small fluctuations in initial conditions lead to large changes in the distant future. The trick here is selecting your variables, such as the time distance and chaotic process correctly. Unlike John von Neumann decorrelation, that automatically discovers the bias and discards it for you, care has to be taken to make sure that the resulting output is debiased.

A better approach is using cryptographic primitives like one-way hashing or encryption, sometimes called "Kaminsky debiasing". Because most modern crytographic primitives are designed to emulate theoretical uniform unbiased random noise, the security rests on whether or not that can be achieved. In our case, we could encrypt the frame with AES and feed the ciphertext as our TRNG output. Unfortunately, this means also managing a key, which doesn't necessarily have to be kept secret. A better idea would be to use cryptographic hashing functions, like SHA-2, or even better, extendable output functions (XOFs).

Obviously, it should go without stating, that encrypting or hashing your biased input isn't increasing the entropy. This means that we need to have a good handle on what our raw entropy looks like (as we did above) beforehand. We know in our case that we're getting about 35 kilobits of entropy per frame, so hashing with SHA-256 is perfectly acceptable, even if we're losing a great deal of entropy in the output. However, if we were only getting 200-bits of security in each frame, while SHA-256 is debiasing the data, we still only have 200-bits of entropy in the generated output.

Really though, the best approach is an XOF. We want to output as much of our raw entropy as we can. Thankfully, NIST has 2 XOFs standardized as part of the SHA-3: SHAKE128 and SHAKE256. An XOF allows you to output a digest of any length, where SHA-256 for example, only allows 256-bits of output. The security margin of the SHAKE128 XOF function is the minimum of half of the digest or 128-bits. If I have an entropy 35 kilobits, I would like to have all of that entropy available in the output. As such, I can output 4 KB in the digest knowing full well that's within my entropy margin. Even though I'm losing as much data as the John von Neumann extractor, I'm not vulnerable to "stuck pixels" being a problem manipulating the extractor.

When I put all this code together in Python:

  1. Take the difference of two consecutive overlapping frames.
  2. Maximize the luminosity of the new composed frame.
  3. Hash the frame with SHAKE128.
  4. Output 4 KB of data as our true random noise.

At 30 frames per second for a resolution of 640x480, outputting 4 KB per frame will provide 120 KBps of data per second, and this is exactly what I see when executing the Python script. The PS3 Eye camera also supports 60 fps at a lower resolution, so I could get 240 KBps if I can keep the same security margin of 4 KB per frame. I haven't tested this, but intuition tells me I'll have a lower security margin at the lower resolution.

Coming full circle, when we put our TRNG to the Dieharder test, things come out vastly different than Vault12's results:

  • Vault12 TrueEntropy:
    1. PASSED: 20
    2. WEAK: 13
    3. FAILED: 81
  • My webcam TRNG:
    1. PASSED: 72
    2. WEAK: 12
    3. FAILED: 30

Post a Comment

Your email is never published nor shared.