Often there has been a need to protect information from
'prying eyes'. In the electronic age, information that could otherwise benefit
or educate a group or individual can also be used against such groups or
individuals. Industrial espionage among highly competitive businesses often
requires that extensive security measures be put into place. And, those who wish
to exercise their personal freedom, outside of the oppressive nature of
governments, may also wish to encrypt certain information to avoid suffering the
penalties of going against the wishes of those who attempt to control.
Still, the methods of data encryption and decryption are
relatively straightforward, and easily mastered. I have been doing data
encryption since my college days, when I used an encryption algorithm to store
game programs and system information files on the university mini-computer, safe
from 'prying eyes'. These were files that raised eyebrows amongst those who did
not approve of such things, but were harmless [we were always careful NOT to run
our games while people were trying to get work done on the machine]. I was
occasionally asked what this "rather large file" contained, and I once
demonstrated the program that accessed it, but you needed a password to get to
'certain files' nonetheless. And, some files needed a separate encryption
program to decipher them.
Fortunately, my efforts back then have paid off now, in
the business world. I became rather good at keeping information away from
'prying eyes', by making it very difficult to decipher. Our
S.F.T. Setup Gizmo application has an
encrypted 'authorization code' scheme, allowing companies to evaluate a
fully-featured version of the actual software before purchasing it, and (when
licensed) a similar scheme authorizes a maximum number of users based on the
license purchased by the customer. Each of these features requires some type of
encrypted data to ensure that the 'lockout' works correctly, and cannot be
bypassed, protecting both our interests and preserving the 'full-featured demo'
capability for our customers.
Traditionally, several methods can be used to encrypt
data streams, all of which can easily be implemented through software, but not
so easily decrypted when either the original or its encrypted data stream are
unavailable. (When both source and encrypted data are available, code-breaking
becomes much simpler, though it is not necessarily easy). The best encryption
methods have little effect on system performance, and may contain other benefits
(such as data compression) built in. The well-known 'PKZIP®' utility offers
both compression AND data encryption in this manner. Also DBMS packages have
often included some kind of encryption scheme so that a standard 'file copy'
cannot be used to read sensitive information that might otherwise require some
kind of password to access. They also need 'high performance' methods to encode
and decode the data.
Fortunately, the simplest of all of the methods, the
'translation table', meets this need very well. Each 'chunk' of data (usually 1
byte) is used as an offset within a 'translation table', and the resulting
'translated' value from within the table is then written into the output stream.
The encryption and decryption programs would each use a table that translates to
and from the encrypted data. In fact, the 80x86 CPU's even have an instruction
'XLAT' that lends itself to this purpose at the hardware level. While this
method is very simple and fast, the down side is that once the translation table
is known, the code is broken. Further, such a method is relatively
straightforward for code breakers to decipher - such code methods have been used
for years, even before the advent of the computer. Still, for general
"unreadability" of encoded data, without adverse effects on
performance, the 'translation table' method lends itself well.
A modification to the 'translation table' uses 2 or more
tables, based on the position of the bytes within the data stream, or on the
data stream itself. Decoding becomes more complex, since you have to reverse the
same process reliably. But, by the use of more than one translation table,
especially when implemented in a 'pseudo-random' order, this adaptation makes
code breaking relatively difficult. An example of this method might use
translation table 'A' on all of the 'even' bytes, and translation table 'B' on
all of the 'odd' bytes. Unless a potential code breaker knows that there are
exactly 2 tables, even with both source and encrypted data available the
deciphering process is relatively difficult.
Similar to using a translation table, 'data
repositioning' lends itself to use by a computer, but takes considerably more
time to accomplish. A buffer of data is read from the input, then the order of
the bytes (or other 'chunk' size) is rearranged, and written 'out of order'. The
decryption program then reads this back in, and puts them back 'in order'. Often
such a method is best used in combination with one or more of the other
encryption methods mentioned here, making it even more difficult for code
breakers to determine how to decipher your encrypted data. As an example,
consider an anagram. The letters are all there, but the order has been changed.
Some anagrams are easier than others to decipher, but a well written anagram is
a brain teaser nonetheless, especially if it's intentionally misleading.
My favorite methods, however, involve something that only
computers can do: word/byte rotation and XOR bit masking. If you rotate the
words or bytes within a data stream, using multiple and variable direction and
duration of rotation, in an easily reproducable pattern, you can quickly encode
a stream of data with a method that is nearly impossible to break. Further, if
you use an 'XOR mask' in combination with this ('flipping' the bits in certain
positions from 1 to 0, or 0 to 1) you end up making the code breaking process
even more difficult. The best combination would also use 'pseudo random'
effects, the easiest of which would involve a simple sequence like Fibbonaci
numbers. The sequence '1,1,2,3,5,...' is easily generated by adding the previous
2 numbers in the sequence to get the next. Doing modular arithmetic on the
result (i.e. Fib. sequence mod 3 to get rotation factor) and operating on
multiple byte sequences (using a prime number of bytes for rotation is usually a
good guideline) will make the code breaker's job even more difficult, adding the
'pseudo-random' effect that is easily reproduced by your decryption program.
In some cases, you may want to detect whether data has
been tampered with, and encrypt some kind of 'checksum' into the data stream
itself. This is useful not only for authorization codes
but for programs themselves. A virus that infects such a 'protected' program
would no doubt neglect the encryption algorithm and authorization/checksum
signature. The program could then check itself each time it loads, and thus
detect the presence of file corruption. Naturally, such a method would have to
be kept VERY secret, as virus programmers represent the worst of the code
breakers: those who willfully use information to do damage to others. As such,
the use of encryption is mandatory for any decent anti-virus protection
scheme.
A cyclic redundancy check is one typically used checksum
method. It uses bit rotation and an XOR mask to generate a 16-bit or 32-bit
value for a data stream, such that one missing bit or 2 interchanged bits are
more or less guaranteed to cause a 'checksum error'. This method has been used
for file transfers for a long time, such as with XMODEM-CRC. The method is
somewhat well documented, and standard. But, a deviation from the standard CRC
method might be useful for the purpose of detecting a problem in an encrypted
data stream, or within a program file that checks itself for viruses.
One very important feature of a good encryption scheme is the ability to specify a 'key' or 'password' of some kind, and have the encryption method alter itself such that each 'key' or 'password' produces a different encrypted output, which requires a unique 'key' or 'password' to decrypt. This can either be a 'symmetrical' key (both encrypt and decrypt use the same key) or 'asymmetrical' (encrypt and decrypt keys are different). The popular 'PGP' public key encryption, and the 'RSA' encryption that it's based on, uses an 'asymmetrical' key. The encryption key, the 'public key', is significantly different from the decryption key, the 'private key', such that attempting to derive the private key from the public key involves many many hours of computing time, making it impractical at best.
There are few operations in mathematics that are truly 'irreversible'. In nearly all cases, if an operation is performed on 'a', resulting in 'b', you can perform an equivalent operation on 'b' to get 'a'. In some cases you may get the absolute value (such as a square root), or the operation may be undefined (such as dividing by zero). However, in the case of 'undefined' operations, it may be possible to base a key on an algorithm such that an operation like division by zero would PREVENT a public key from being translated into a private key. As such, only 'trial and error' would remain, which would require a significant amount of processing time to create the private key from the public key.
In the case of the RSA encryption algorithm, it uses very large prime numbers to generate the public key and the private key. Although it would be possible to factor out the public key to get the private key (a trivial matter once the 2 prime factors are known), the numbers are so large as to make it very impractical to do so. The encryption algorithm itself is ALSO very slow, which makes it impractical to use RSA to encrypt large data sets. What PGP does (and most other RSA-based encryption schemes do) is encrypt a symmetrical key using the public key, then the remainder of the data is encrypted with a faster algorithm using the symmetrical key. The symmetrical itself key is randomly generated, so that the only way to get it would be by using the private key to decrypt the RSA-encrypted symmetrical key.
Example: Suppose you want to encrypt data (let's say this web page) with a key of 12345. Using your public key, you RSA-encrypt the 12345, and put that at the front of the data stream (possibly followed by a marker or preceded by a data length to distinguish it from the rest of the data). THEN, you follow the 'encrypted key' data with the encrypted web page text, encrypted using your favorite method and the key '12345'. Upon receipt, the decrypt program looks for (and finds) the encrypted key, uses the 'private key' to decrypt it, and gets back the '12345'. It then locates the beginning of the encrypted data stream, and applies the key '12345' to decrypt the data. The result: a very well protected data stream that is reliably and efficiently encrypted, transmitted, and decrypted.
Source files for a simple RSA-based encryption algorithm
can be found HERE:
ftp://ftp.funet.fi/pub/crypt/cryptography/asymmetric/rsa
It is somewhat difficult to write a front-end to get this
code to work (I have done so myself), but for the sake of illustration, the
method actually DOES work and by studying the code you can understand the
processes involved in RSA encryption. RSA, incidentally, is reportedly
patented through the year 2000, and may be extended beyond that, so commercial
use of RSA requires royalty payments to the patent holder (www.rsa.com). But studying the methods and
experimenting with it is free, and with source code being published in print
(PGP) and outside the U.S., it's a good way to learn how it works, and maybe to
help you write a better algorithm yourself.
unsigned long dw1, dw2, dw3, dwMask;
int i1;
unsigned long aRandom[256];
dw1 = {seed #1};
dw2 = {seed #2};
dwMask = {seed #3};
// this gives you 3 32-bit "seeds", or 96 bits total
for(i1=0; i1 < 256; i1++)
{
dw3 = (dw1 + dw2) ^ dwMask;
aRandom[i1] = dw3;
dw1 = dw2;
dw2 = dw3;
}
int __cdecl MySortProc(void *p1, void *p2)
{
unsigned long **pp1 = (unsigned long **)p1;
unsigned long **pp2 = (unsigned long **)p2;
if(**pp1 < **pp2)
return(-1);
else if(**pp1 > *pp2)
return(1);
return(0);
}
...
int i1;
unsigned long *apRandom[256];
unsigned long aRandom[256]; // same array as before, in this case
int aResult[256]; // results go here
for(i1=0; i1 < 256; i1++)
{
apRandom[i1] = aRandom + i1;
}
// now sort it
qsort(apRandom, 256, sizeof(*apRandom), MySortProc);
// final step - offsets for pointers are placed into output array
for(i1=0; i1 < 256; i1++)
{
aResult[i1] = (int)(apRandom[i1] - aRandom);
}
...
The result in 'aResult' should be a randomly sorted (but unique) array
of integers with values between 0 and 255, inclusive. Such an array could be
useful, for example, as a byte for byte translation table, one that could easily
and reliably be reproduced based solely upon a short length key (in this case,
the random number generator seed); however, in the spirit of the 'GUTLESS
DISCLAIMER' (below), such a table could also have other uses, perhaps as a
random character or object positioner for a game program, or as a letter
scrambler for an anagram generator.
GUTLESS DISCLAIMER: The sample code above does not in and of itself constitute an encryption algorithm, or necessarily represent a component of one. It is provided solely for the purpose of explaining some of the more obscure concepts discussed in prose within this document. Any other use is neither proscribed nor encouraged by the author of this document, S.F.T. Inc., or any individual or organization that is even remotely connected with this web site.
As a test, I created an application that I can't export
outside of the U.S., to test the concept of the encryption algorithm described
in prose, above. It has been optimized and modified in a few ways to improve
"randomness" when keys are all 1's or all 0's, and to help prevent
'short cycle' repeating sequences from revealing where common character
sequences (like white space) might be. I then encrypted the source file for the
program itself (written in C++, by the way, as a command line filter app), and
put the resulting file HERE
(UUENCODE'd version HERE).
(NOTE: if you can actually decrypt this
algorithm, you'll have the source file! That is, if you've got a few zillion
lifetimes to waste in the effort of cracking it. G'head - I *dare* you to try!
)
So, guys and gals, HAVE AT IT!
Copy this HTML file, and send to your buds.
Because of the need to ensure that only those eyes
intended to view sensitive information can ever see this information, and to
ensure that the information arrives un-altered, security systems have often been
employed in computer systems for governments, corporations, and even
individuals. Encryption schemes can be broken, but making them as hard as
possible to break is the job of a good cipher designer. All you can really do is
make it very very difficult for the code breaker to decipher your cipher. Still,
as long as both source and encrypted data are available, it will always be
possible to break your code. It just won't necessarily be easy.
RELATED SITES (many with sample code, most outside U.S.
to avoid that idiotic U.S. law)