Exercise 15
note this exercise is particularly challenging. If you are a beginniner programmer, enter at your own risk!
XOR decryption
There is a site called Project Euler that lists a set of challening mathematical / computer programming problems.
This exercise is Project Euler Problem 59, "XOR decryption":
Each character on a computer is assigned a unique code and the preferred standard is ASCII (American Standard Code for Information Interchange). For example, uppercase A = 65, asterisk (*) = 42, and lowercase k = 107.
A modern encryption method is to take a text file, convert the bytes to ASCII, then XOR each byte with a given value, taken from a secret key. The advantage with the XOR function is that using the same encryption key on the cipher text, restores the plain text; for example, 65 XOR 42 = 107, then 107 XOR 42 = 65.
You can read about how bitwise XOR works on Wikipedia: bitwise XOR or here. Below I show you code snippets in Python, R and MATLAB for how to achieve bitwise XOR using integers.
For unbreakable encryption, the key is the same length as the plain text message, and the key is made up of random bytes. The user would keep the encrypted message and the encryption key in different locations, and without both "halves", it is impossible to decrypt the message.
Unfortunately, this method is impractical for most users, so the modified method is to use a password as a key. If the password is shorter than the message, which is likely, the key is repeated cyclically throughout the message. The balance for this method is using a sufficiently long password key for security, but short enough to be memorable.
Your task has been made easy, as the encryption key consists of three lower case characters. Using p059\_cipher.txt (right click and 'Save Link/Target As…'), a file containing the encrypted ASCII codes, and the knowledge that the plain text must contain common English words, decrypt the message and find the sum of the ASCII values in the original text.
Hints
You will need three components to your program. First you will need some facility to generate candidate encryption keys. Second you will need some facility to generate plaintext, given the ciphertext and a candidate encryption key (by using the XOR operation). Third you will need some facility to analyse candidate plaintext to determine whether it's the real plaintext. We know in this case the plaintext is in English and so you'll need some facility for deciding whether a certain plaintext is in English or not.
Candidate encryption keys
We know that the encryption key consists of three lower case characters, and so we know there are \((26)(26)(26) = 17576\) possible keys. My suggestion would be to try a brute force method, e.g. for every possible encryption key, generate plaintext and decide whether it's English. One of them will be the correct one, you just have to find it.
XOR in Python, MATLAB and R
In Python the function that returns the numerical ascii code for a
letter is ord()
. So for example ord('a')
returns 97
. The
function that goes the other way, that returns a character given a
code, is chr()
so for example chr(97)
returns a
. To apply the XOR operator to two numbers we can use the ^
operator, so for example ord('a') ^ ord('b')
returns 3
.
In MATLAB the double()
function will return the ascii code for a
letter, so double('a')
returns 97
. Going the other way, the
char()
function will convert an ascii code to a character, so
char(97)
returns a
. In MATLAB the bitxor()
function will perform
the XOR operation we need here.
In R there are the charToRaw()
and rawToChar()
functions but they
require a bit of TLC to work as you wish. Here are two wrapper
functions that help. The asc()
function converts a character into
the corresponding ascii value, so asc("a")
returns 97
. The chr()
function goes the other way, so chr(97)
returns a
. Don't worry
about the details here for now.
asc <- function(x) { strtoi(charToRaw(x),16L) } chr <- function(n) { rawToChar(as.raw(n)) }
In R the XOR function bitXor()
in the bitops
package is what we
need. So first load the bitops
library using: library(bitops)
and
then the bitXor()
function will be available to us.
Is the plaintext English?
One possibility is to simply take the words in the candidate plaintext
and see if they are found in an english dictionary. On Unix machines
(Linux, Mac OS X) you have a dictionary file in:
/usr/share/dict/words
.
Another possibility is to search the candidate plaintext for common English words such as "the" or "and" and so on.
Another possibility is to examine the relative frequencies of various letters in the candidate plaintext, and compare them to known frequencies of letters in English. Wikipedia has a nice page on frequency analysis in cryptanalysis. This might help you write a function to determine whether a given candidate plaintext is English or not.
Super Hints
Here is some sample code that illustrates how to generate plaintext given the ciphertext and a candidate encryption key.
The first 5 ciphertext values in p059\_cipher.txt are
79,59,12,2,79
. Let's assume that our candidate encryption key is
abc
. Here is how you would use the candidate encryption key to
generate the first 5 characters of plaintext from the first 5 values
in the ciphertext:
In Python:
plaintext = "" plaintext += chr(79 ^ ord('a')) plaintext += chr(59 ^ ord('b')) plaintext += chr(12 ^ ord('c')) plaintext += chr(2 ^ ord('a')) plaintext += chr(79 ^ ord('b')) print "plaintext = %s" % (plaintext)
In MATLAB:
plaintext = ''; plaintext = [plaintext char(bitxor(79, double('a')))]; plaintext = [plaintext char(bitxor(59, double('b')))]; plaintext = [plaintext char(bitxor(12, double('c')))]; plaintext = [plaintext char(bitxor(2, double('a')))]; plaintext = [plaintext char(bitxor(79, double('b')))]; disp(['plaintext = ',plaintext]);
In R:
asc <- function(x) { strtoi(charToRaw(x),16L) } chr <- function(n) { rawToChar(as.raw(n)) } library(bitops) plaintext = "" plaintext = paste(plaintext, chr(bitXor(79, asc('a'))), sep="") plaintext = paste(plaintext, chr(bitXor(59, asc('b'))), sep="") plaintext = paste(plaintext, chr(bitXor(12, asc('c'))), sep="") plaintext = paste(plaintext, chr(bitXor(2, asc('a'))), sep="") plaintext = paste(plaintext, chr(bitXor(79, asc('b'))), sep="") cat("plaintext = ", plaintext,"\n")
In C:
#include <stdio.h> int main(int argc, char *argv[]) { char plaintext[6]; plaintext[0] = 79 ^ 'a'; plaintext[1] = 59 ^ 'b'; plaintext[2] = 12 ^ 'c'; plaintext[3] = 2 ^ 'a'; plaintext[4] = 79 ^ 'b'; plaintext[5] = '\0'; // null termination for the string printf("plaintext = %s\n", plaintext); return 0; }