Intro: About Random Numbers

The concept of generating random numbers, or randomness as a concept itself, is nothing new. Random numbers are used in practically every type of electronic or video game, gambling machines, as well as many other types digitally or computer generated ways. This will be discussing mostly how computer generated randomness is created, and not the type of what I'll call "manually random" such as a lottery machine spitting out wooden balls with numbers on them. Random numbers are not always random, they just appear to be.

What exactly is random? This philosophical question is what makes this an issue in the first place. If you have ever taken an IQ test, you've definitely been asked questions about patterns. Here is one, tell me the next number in this sequence: 2468?. If it is possible to reverse engineer the creation of a sequence of random numbers in such a way that you are capable of predicting the numbers following that sequence, then that sequence simply can not be defined as random. Therefore the random number generator that created that number sequence is capable of being exploited.


Generating Random Numbers

Clock Based

One of the more common (and easiest to exploit) ways random numbers are generated is with the internal clock of the computer itself. To create a random number digitally you have to base it off of something, and the one and only thing in a computer that changes over time is the clock itself. You can't create something random from something that is constant.

Clock based takes the internal clock of the computer and uses the "DateTime" object itself. Let's say the time right now is "November 15th 2005, 1:16:39 AM". Regardless of what time it is, you have a few different options as to how to manipulate the DateTime to your randomness-liking. You could use the literal MMDDYYhhmmss value: 111505011639 and turn that into a number itself (111,505,011,639), or you could count how many seconds are between 'right now' and a previous date of your choosing. Comparing the example DateTime to "January 1st 2000, 12:00:00 AM" would result in: 185,332,599 seconds. Determining which method to use, or other custom-methods as created by the imagination of the programmer, is usually totally up to the programmer. Whatever the case may be, you should now understand how this form of (clock) randomness is used. For the sake of examples later we will say the clock based determination for 'right now' is 111,505,011,639.

Bit Based

This method of generating random numbers really only goes one step further than clock based randomness. It works exactly the same way as clock based, with a few extra steps. Bit based is a little more complex of a method as compared to clock based, making it a little more tedious to reverse engineer, but it can still be exploited none-the-less. (I have seen this also called "seed based" because it takes the 'seed' of the end result bit string and uses it as the randomness object. For the sake of this paper, Seed based and Bit based are 2 completely different things).

Refer to how clock based randomness is created. Now, take this end result number and figure out its bits and bytes (hence why this is called Bit Based). The bit value for 12,345 is 11000000111001, the bit value for 777,777 is 10111101111000110001, the bit value for 77 is 1001101, the bit value for 5 is 101. (See how a number is converted to a bit string, and a bit string is converted into a number: click here ) However what I'm trying to demonstrate is that a string of bits has no distinct length. 5 is 3 characters long, 77 is 7 characters long, 12,345 is 14 characters long... Calculating the bit value for a large number such as the clock based example of 185,332,599 or even 111,505,011,639 is pretty large and would take more than a few seconds for a standard home PC to calculate its bit value. Speed is extremely important in applications (a programmer's program) so each random number taking many seconds to generate is 'bad programming'. To speed it up, lets say they choose to take the last 6 characters of a number larger than 6 characters. 111,505,011,639 would become 011,639 or 11,639. The bit value for 11,639 is 10110101110111. After knowing the bits, you can alter them however you like, almost like shuffling a deck of cards but not actually changing how many cards you started with or what was on the cards. For the sake of this example, we will simply be flip flopping the bits. 1234 becomes 4321.. or 10110101110111 becomes 11101110101101... and 11101110101101 equals 15277. (11639 becomes 15277)

Seed Based

This method takes a completely different approach, it relies far less heavily on a clock. To some extent this makes it harder to reverse engineer. When a computer or the program first loads, it has a starting 'seed' to work with. The starting value could be a clock based, bit based, or any other 'simple randomness' number, or it could be the same number 'hardcoded' into the program to start with the same start number every time, it doesnt really matter what the start number is altho knowing what it is is helpful to exploit it. The key to this method is how that number evolves or grows.

One of my favorite mathematical patterns is 112358... Every subsequent number is the sum of the previous 2 numbers. This is an example of how a number grows. How this seed based method number grows is partially up to the programmer of this random function. Simply creating an algebra equation to make the number grow isn't enough so the clock's seconds or milliseconds are used as multipliers or other types of variables in the equation. Custom created random number generators are tedious and painful to create, and I highly recommend if you are a programmer to just use what someone else has already created: Abuse their hard work, its the honest thing to do. However, trying to understand how this works, I'll give a simple example (as compared to the math involved to make an almost 'unbreakable' random number generator).

Example: My seed based generator will start with the seed of right now's ("Minutes"+"Seconds")+1 which will be a number that ranges from 1 to 6060. This way the starting number won't be constant: which would consequently result in the first random number request being very easily predictable. Having 6000+ possibilities as the start number is a decent number: any (still working and being used) computer processor can handle a number that size. Let's make this Seed based math have a maximum value of a wierd number such as 84,312 so the total characters of the number never exceeds 5 to make the speed of math calculations extremely fast but also so the seed will continually be 'resized', making the process of reverse engineering this generator a lot harder... From here, every single random number request will have the following formula applied to the seed where s is the current seed and n is the new seed, and remember this is just my crap example so you understand this: n = ((s^2)+"Seconds"). if n is greater than the cap value (82312) then n = n - 82312. If right now is 16 minutes and 39 seconds, then the starting seed would be 1640. Then, the secondary cap would need to be applied to the next time this used: n^2. Lets give this secondary cap a wierd number of (2471).. I'll go into more detail how this would be applied to a random number function later on.


Other Ways

During my own experience working with random number generators I have found some pretty interesting ways other programmers have come up with to make random numbers almost impossible to break. I won't go into any more detail about these, just wanted to share how there are other ways to go about generating random numbers.

User Input

When the user browses a website that needs to generate random numbers, the site records their mouse movement and stores it when they leave the page they are on. It stores the X,Y coordinates as well as the DateTime of each mouse movement, essentially making it possible to recreate the mouse movements in real time if they actually wanted to. How this applies is that you could take "DateTime"+"X"+"Y" and use that as a number. This would be similar to Clock Based method but with the additional 'security' of 2 more variables.

Audio Based

Digitally preset as many online radio stations and/or local mp3s or other sound files that are in a digital medium, into an audio player. Stream this audio player to your sound analysis software. This software will turn the audio into number values, sometimes even bits depending on how much 'quality' you want to lose in the conversion, and then use the resulting number. The rate of conversion is as high as 100 new audio-based-numbers per second. The reason for having several preset sources of sound is so that the software can 'switch stations' every so many seconds as to make the source of audio unknown. To add to this even further, you could optionally mix 1-5 stations upon each 'station switch', (mozart+madonna=53?) making the audio garbled, but also making the conversion a little more secure, less predictable, in the event that someone wants to reverse engineer this.

Light Based

Optical equipment measures light emissions from trapped mercury atoms. Now that has got to be expensive.

Internet Based

If you're familiar with RSS feeds, site scraping, or other means of extracting external data from sites, content which the public really doesn't have any means of altering (www.yahoo.com's home page, RSS feed from multiple sites), anyhow take this 'pure text' of data that should change fairly regularly, and use some simple quick math equations (such as counting the total characters, total letter 't's, etc) to generate a Seed value.


Programming Random Number Functions

Clock, Bit, Seed: It doesn't matter

Regardless of how the resulting number is generated, in almost all cases the last math applied to the resulting number is this function: If the min number is not 1 then apply the difference to the min and max to shift them correctly. Then, subtract the max number from the resulting number over and over until the (new, temporary) resulting number is less than the max number. If the remainder is zero then the random number created is the max number, otherwise the random number is what you came up with in the step before. Now, in actual math values with some examples... Lets say you want a random number between 4 and 13, including 4 and 13 as possible outcomes: Let's re-use the opening 3 examples

Preliminary

4 > 1, so (4-1)=3. (min=4, max=13) - 3... (min=1, max=10)

Clock Based

(11639) : 11639 vs 10: remainder: 9. (compensate min max variance of +3) end result = 12

Bit Based

(15277) : 15277 vs 10: remainder: 7. (compensate min max variance of +3) end result = 10

Seed Based

(1640) : [n = ((s^2)+"Seconds"))] (82312)(2471) [Right now is 16 minutes 39 seconds]
- 1640*1640=2689600, 2689600+39=2689639. 2689639 vs 82312 = 55655. 55655 vs 10 = 5.
(compensate min max variance of +3) end result = 8...
- apply the wierd number cap for the next seed: 55655 vs 2471 = 1293.. to show how the next seed would work, with another date stamp as well: The new DateTime is now 17 minutes 03 seconds.
- 1293*1293=1671849, 1671849+03=1671852. 1671852 vs 82312 = 25612. 25612 vs 10 = 2.
(compensate min max variance of +3) end result = 5...

Reusing the previous examples yet again, if I were a programmer needing to create a random number generator from scratch so a business could say 'we have our own unique methods of generating random numbers', such as an online casino, and I created a function called GetRand(min,max), and 'right now' is still the same time as it was in the examples... If my GetRand(min,max) function used the same clock based formulas as the example, GetRand(4,13) would equal 9. A bit based GetRand(4,13) would equal 7. A seed based GetRand(4,13) would equal 8. Seems pretty simple now huh? It better.


Step 2: Reverse Engineering, Hacking

Breaking it down

Quite simply, everything prior to this was for you to understand the various ways random numbers are generated and how they aren't really random: They are based on a non-constant yet predictable variable, the time of day. The hard part is figuring out what techniques, methods, and math formulas have been applied to the time of day to generate these 'random' numbers. I'm going to expand upon some previous examples, making them slightly more complicated, while also using some practical examples as to how these can be applied.

Breaking it apart

Let's say I have been contracted to build a mini slot machine from my own hardware building and software programming skills. I choose to use a fairly old processor, one that can only handle about 7 or 8 random number requests per second, based on the GetRand(min,max) function I custom created. This fact is about all that matters for this scenario example: what other hardware peices are used, their speeds, wont have any affect.

Here is another pattern of numbers. Can you figure out the pattern?

-00, 8, 91
-01, 8, 95
-02, 8, 99
-03, 7, 90
-04, 8, 94
-05, 8, 98
-06, 7, 89
-07, 8, 93
-08, 8, 97
-09, 7, 88
-10, 8, 92
-11, 8, 96
-12, 7, 87

The pattern above is quite simply the number of times 13 can go into 100 before hitting 100, and what number it stops at before the cycle continues/restarts for the next iteration. 0 is counted as a number. (Iteration, Total, LastNumber). This mini-list is a quick comparison, shows how many times per second (after the 'start time') my newly created mini-slots machine can process a random number. - To be more specific, I am not actually calculating how many times per second the GetRand function can execute, because the processor is not actually hindered by this function. The processor is actually hindered by how fast I can spam it with 'what time is it, give me the milliseconds'. This means the milliseconds themselves dont actually go up +0.01 seconds each time, they go up almost exactly +0.13 each time: thank the processor for this. Only because the processor is 'this slow' can this technique be applied in such a way that I can explain it this simply. Now lets review the next list.

Here is another pattern of numbers. This shows the exact milliseconds that are spit out as fast as the processor can handle it:

- xx - y1.y2.y3.y4.y5.y6.y7.y8
- 00 - 00.13.26.39.52.65.78.91
- 01 - 04.17.30.43.56.69.82.95
- 02 - 08.21.34.47.60.73.86.99
- 03 - 12.25.38.51.64.77.90
- 04 - 03.16.29.42.55.68.81.94
- 05 - 07.20.33.46.59.72.85.98
- 06 - 11.24.37.50.63.76.89
- 07 - 02.15.28.41.54.67.80.93
- 08 - 06.19.32.45.58.71.84.97
- 09 - 10.23.36.49.62.75.88
- 10 - 01.14.27.40.53.66.79.92
- 11 - 05.18.31.44.57.70.83.96
- 12 - 09.22.35.48.61.74.87

Now, let's give the Clock Based example a minor change. Now it will also use the milliseconds. We are also going to use a new 'right now' time to make this example easier to follow. Right now it is "November 15th 2005, 4:17:00:00 AM". This makes the value: 11150504170000. We have to strip it down to just the last 6 characters of 170000. Refering to the chart above, there is 13 rows, the 1st row being 00, 2nd row being 01, etc. The first row is second 00, the 2nd row is second 01, etc. The list above is the 'before' and the list below is the 'after', when GetRand(1,10) is applied at 4:17:xx:y? AM. (since we are working with a number system that is 10-based and looking for a random number between 1 and 10, the resemblance between this list and the one above should be pretty damn easy to spot)

--- 10 based list
- xx - y1.y2.y3.y4.y5.y6.y7.y8
- 00 - 10.03.06.09.02.05.08.01
- 01 - 04.07.10.03.06.09.02.05
- 02 - 08.01.04.07.10.03.06.09
- 03 - 02.05.08.01.04.07.10
- 04 - 03.06.09.02.05.08.01.04
- 05 - 07.10.03.06.09.02.05.08
- 06 - 01.04.07.10.03.06.09
- 07 - 02.05.08.01.04.07.10.03
- 08 - 06.09.02.05.08.01.04.07
- 09 - 10.03.06.09.02.05.08
- 10 - 01.04.07.10.03.06.09.02
- 11 - 05.08.01.04.07.10.03.06
- 12 - 09.02.05.08.01.04.07

If an online casino for example used a random number generator as simple and stupid as the one in this example, they would be bankrupt in no time. Anyhow, now that the tedious part is over, lets put it to use. Obviously with this type of Clock based method, at least in this example, only the minute/second/millisecond matters. This means there is only 3 variables to be concerned with, each with a fairly small set of variance. 0-59, 0-59, 0-99. To you this might seem like a lot (60 * 60 * 100 = 360,000) but for a computer to process it, handle a number variance that large, figure out a pattern, its not very big. Unless the variance is in the tens of millions, there isn't much to be concerned over, the time it will take somewhat sophisticated software to come up with a viable possible solution of breaking/hacking a random number generator is in minutes and not hours. To further state the unimportance of variance (360000), since in this example we will only ever be working with GetRand(1,10), all that actually matters is the milliseconds. If we were working with any other number whose value is not a factor of 10, multiple of 10, or 10,(1,2,5,10,20,30,40..) or factor of a multiple of 10 (4,8,other even numbers..), then the minutes would definitely matter, especially with prime numbers such as GetRand(1,17)

Lets start making this make sense already! Your mini slot machine is set up such that every wheel of 3 wheels has 10 possible slot outcomes, 10x10x10 possibilities... hence the need for GetRand(1,10). The list below shows which fruit/thing is on each of the 10 locations on each slot. (normally, the same fruit would repeat on the same slot wheel in several spots, but that would make it a whole lot harder for you to figure out the pattern.. so to make this easy..)

- 01 - Apple
- 02 - Banana
- 03 - Cherry
- 04 - Grapes
- 05 - Lemon
- 06 - Lime
- 07 - Orange
- 08 - "7"
- 09 - "BAR"
- 10 - "blank"

Without having any specific knowledge of the timing between each slot being stopped after the arm lever is initially pulled, you should be able to predict outcomes based on when you first actually pull it. Below is the first 3 times you pulled the lever and the outcomes. The 4th time only shows the first one, I want you to compare the charts and see if you can tell what the next 2 items are most likely to be based on the 3 previous results. (there could be an argument that there is insufficient data, but youre not being graded, i just want you to understand the concept of how to figure this out)

-lemon, grapes, bar
-bar, apple, lime
-banana, grapes, bar
-lime, ???, ???

Tips for the above pattern solving: convert the slot wheel name into a number such as lemon = 05. lemon, grapes, bar = 05,04,09... Find every occurance of the number in question on the list called '10 based list', then figure out the distance between numbers. The distance between 05 and 04, for each 05 and the nearest 04 to it, is 3 away. 04 and 09 exist 5 spots away from eachother. So, for this specific 05(3)04(5)09 example the total range is 8 spots.

Just so you aren't going crazy about the lever timing, I just randomly came up with that after pulling the lever initially, there is a 0.57 delay between each slot wheel stopping. So if I pull the slot lever at 04:17:00:00 AM, then the slot wheels will stop at 00:57, 01:04, and 01:61. However these must be rounded up to the nearest '13 mark'. For this first example this would mean they would stop at: 00:65, 01:04, 01:69. This would mean the first slot results would be 05, 04, 09, or 05(3)04(5)09. I spun the second slots at 02:33 so they stopped at 02:90, 03:47, 04:04, or after rounding they stopped at 02:99, 03:51, 04:16, or 09 01 06 or 09(4)01(5)06. The third lever pull occured at 05:65, so: 06:22, 06:79, 07:36 or rounded: 05:72, 06:24, 06:89 or 02 04 09 or 02(4)04(5)09. The 4th pull was at 08:72.. so: 09:29, 09:86, 10:43 or rounded: 09:36, 09:88, 10:53, or 06 08 03 or 06(4)08(5)03... or as the puzzle above is asking:
- 05(3)04(5)09
- 09(4)01(5)06
- 02(4)04(5)09
- 06(?)??(?)??
Well, based on this data right here, there is a 67% chance that the number after 06 is 4 spots away and its very likely the 3rd number is 5 spots away from the 2nd number. Lets play the odds. 4 spots from 6 is 08 and 5 spots from 08 is 03... as it turns out, thats the answer.



The End

I hope I have dumbed this down enough and simplified this enough to make it understandable how random numbers can be reversed engineered so long as you know the math formulas behind how they are generated in the first place. If you don't know the math, you can always just run some simulations and throw the results into a table and compare the values, such as their distance from eachother, etc. In the real world, as far as random numbers are concerned, there is usually much more sophisticated math being used and reverse engineering or hacking random number generators to take advantage of them is pretty hard to do.

If you've learned anything from this paper, I hope its that you now understand that you can not create something random from something constant, and that everything we perceive as random is usually just because as a human with a brain you can't instantly see a pattern and be able to predict the next number in the pattern, that almost everything perceived as random actually has a predictable semi-constant behind it that can be extracted thereby solving a chunk of the random-equation.



Other Sources

Home