Data Encryption Techniques


Since you're interested in ENCRYPTION, maybe you'd like these:
Microsoft '.Net' - a billion dollar boondoggle?
The U.S. EPA is LYING to us. KYOTO TREATY is FALSE SCIENCE.


Introduction

Often there has been a need to protect information from 'prying eyes'. In the electronic age, information that could otherwise benefit or educate a group or individual can also be used against such groups or individuals. Industrial espionage among highly competitive businesses often requires that extensive security measures be put into place. And, those who wish to exercise their personal freedom, outside of the oppressive nature of governments, may also wish to encrypt certain information to avoid suffering the penalties of going against the wishes of those who attempt to control.

Still, the methods of data encryption and decryption are relatively straightforward, and easily mastered. I have been doing data encryption since my college days, when I used an encryption algorithm to store game programs and system information files on the university mini-computer, safe from 'prying eyes'. These were files that raised eyebrows amongst those who did not approve of such things, but were harmless [we were always careful NOT to run our games while people were trying to get work done on the machine]. I was occasionally asked what this "rather large file" contained, and I once demonstrated the program that accessed it, but you needed a password to get to 'certain files' nonetheless. And, some files needed a separate encryption program to decipher them.

Methods of Encrypting Data

Traditionally, several methods can be used to encrypt data streams, all of which can easily be implemented through software, but not so easily decrypted when either the original or its encrypted data stream are unavailable. (When both source and encrypted data are available, code-breaking becomes much simpler, though it is not necessarily easy). The best encryption methods have little effect on system performance, and may contain other benefits (such as data compression) built in. The well-known 'PKZIP®' utility offers both compression AND data encryption in this manner. Also DBMS packages have often included some kind of encryption scheme so that a standard 'file copy' cannot be used to read sensitive information that might otherwise require some kind of password to access. They also need 'high performance' methods to encode and decode the data.

Encryption methods can be SYMMETRIC in which encryption and decryption keys are the same, or ASYMMETRIC (aka 'Public Key') in which encryption and decryption keys differ. 'Public Key' methods must be asymmetric, to the extent that the decryption key CANNOT be easily derived from the encryption key. Symmetric keys, however, usually encrypt more efficiently, so they lend themselves to encrypting large amounts of data. Asymmetric encryption is often limited to ONLY encrypting a symmetric key and other information that is needed in order to decrypt a data stream, and the remainder of the encrypted data uses the symmetric key method for performance reasons. This does not in any way diminish the security nor the ability to use a public key to encrypt the data, since the symmetric key method is likely to be even MORE secure than the asymmetric method.

For symmetric key ciphers, there are basically two types: BLOCK CIPHERS, in which a fixed length block is encrypted, and STREAM CIPHERS, in which the data is encrypted one 'data unit' (typically 1 byte) at a time, in the same order it was received in. Fortunately, the simplest of all of the symmetric key 'stream cipher' methods is the TRANSLATION TABLE (or 'S table'), which should easily meet the performance requirements of even the most performance-intensive application that requires data to be encrypted. In a translation table, each 'chunk' of data (usually 1 byte) is used as an offset within one or more arrays, and the resulting 'translated' value is then written into the output stream. The encryption and decryption programs would each use a table that translates to and from the encrypted data. 80x86 CPU's have an instruction 'XLAT' that lends itself to this purpose.

While translation tables are very simple and fast, the down side is that once the translation table is known, the code is broken. Further, such a method is relatively straightforward for code breakers to decipher - such code methods have been used for years, even before the advent of the computer. Still, for general "unreadability" of encoded data, without adverse effects on performance, the 'translation table' method lends itself well.

A modification to the 'translation table' uses 2 or more tables, based on the position of the bytes within the data stream, or on the data stream itself. Decoding becomes more complex, since you have to reverse the same process reliably. But, by the use of more than one translation table, especially when implemented in a 'pseudo-random' order, this adaptation makes code breaking relatively difficult. An example of this method might use translation table 'A' on all of the 'even' bytes, and translation table 'B' on all of the 'odd' bytes. Unless a potential code breaker knows that there are exactly 2 tables, even with both source and encrypted data available the deciphering process is relatively difficult.

Similar to using a translation table, 'data repositioning' lends itself to use by a computer, but takes considerably more time to accomplish. This type of cipher would be a trivial example of a BLOCK CIPHER. A buffer of data is read from the input, then the order of the bytes (or other 'chunk' size) is rearranged, and written 'out of order'. The decryption program then reads this back in, and puts them back 'in order'. Often such a method is best used in combination with one or more of the other encryption methods mentioned here, making it even more difficult for code breakers to determine how to decipher your encrypted data. As an example, consider an anagram. The letters are all there, but the order has been changed. Some anagrams are easier than others to decipher, but a well written anagram is a brain teaser nonetheless, especially if it's intentionally misleading.

My favorite methods, however, involve something that only computers can do: word/byte rotation and XOR bit masking. This is very common since it has relatively high ENTROPY in the resulting cipher. High entropy data is difficult to extract information from, and the higher the entropy, the better the cipher. So, if you rotate the words or bytes within a data stream, using a method that involves multiple and variable direction and duration of rotation, in an easily reproducable pattern, you can quickly encode a stream of data with a method that can be nearly impossible to break. Further, if you use an 'XOR mask' in combination with this ('flipping' the bits in certain positions from 1 to 0, or 0 to 1) you end up making the code breaking process even more difficult. The best combination would also use 'pseudo random' effects, the easiest of which might involve a simple sequence like Fibbonaci numbers, which can appear 'pseudo-random' after many iterations of 'modular' arithmetic (i.e. math that 'wraps around' after reaching a limit, like integer math on a computer). The Fibbonaci sequence '1,1,2,3,5,...' is easily generated by adding the previous 2 numbers in the sequence to get the next. Doing modular arithmetic on the result and operating on multiple byte sequences (using a prime number of bytes for block rotation, as one example) would make the code breaker's job even more difficult, adding the 'pseudo-random' effect that is easily reproduced by your decryption program.

In some cases, you may want to detect whether data has been tampered with, and encrypt some kind of 'checksum' or CRC into the data stream itself. This is useful not only for authorization codes and licenses (where encrypted data is expected to be used) but also for programs themselves. A virus that infects such a 'protected' program is likely to neglect any encryption algorithm and authorization/checksum signature that has been written to the executable binary file(s). The program (and any dynamic library) could then check itself each time it loads, and thus detect the presence of file corruption. Such a method would have to be kept VERY secret, to prevent virus programmers from exploiting it at some point.

Public Key Encryption Algorithms

One very important feature of a good encryption scheme is the ability to specify a 'key' or 'password' of some kind, and have the encryption method alter itself such that each 'key' or 'password' produces a unique encrypted output, one that also requires a unique 'key' or 'password' to decrypt. This can either be a symmetric or asymmetric key. The popular 'PGP' public key encryption, and the 'RSA' encryption that it's based on, uses an 'asymmetrical' key, allowing you to share the 'public' encryption key with everyone, while keeping the 'private' decryption key safe. The encryption key is significantly different from the decryption key, such that attempting to derive the private key from the public key involves too many hours of computing time to be practical. It would NOT be impossible, just highly unlikely, which is 'pretty good'.

There are few operations in mathematics that are truly 'irreversible'. In nearly all cases, the commutative property or an 'inverse' operation applies. if an operation is performed on 'a', resulting in 'b', you can perform an equivalent operation on 'b' to get 'a'. In some cases you may get the absolute value (such as a square root), or the operation may be undefined (such as dividing by zero). However, it may be possible to base an encryption key on an algorithm such that you cannot perform a direct calculation to get the decryption key. An operation that would cause a division by zero would PREVENT a public key from being directly translated into a private key. As such, only 'trial and error' (otherwise known as a 'brute force' attack) would remain as a valid 'key cracking' method, and it would therefore require a significant amount of processing time to create the private key from the public key.

In the case of the RSA encryption algorithm, it uses very large prime numbers to generate the public key and the private key. Although it would be possible to factor out the public key to get the private key (a trivial matter once the 2 prime factors are known), the numbers are so large as to make it very impractical to do so. The encryption algorithm itself is ALSO very slow, which makes it impractical to use RSA to encrypt large data sets. So PGP (and other RSA-based encryption schemes) encrypt a symmetrical key using the public key, then encrypt the remainder of the data with a faster algorithm using the symmetrical key. The symmetrical itself key is randomly generated, so that the only (theoretical) way to get it would be by using the private key to decrypt the RSA-encrypted symmetrical key.

Example:  Suppose you want to encrypt data (let's say this web page) with a key of 12345.  Using your public key, you RSA-encrypt the 12345, and put that at the front of the data stream (possibly followed by a marker or preceded by a data length to distinguish it from the rest of the data).  THEN, you follow the 'encrypted key' data with the encrypted web page text, encrypted using your favorite method and the key '12345'.  Upon receipt, the decrypt program looks for (and finds) the encrypted key, uses the 'private key' to decrypt it, and gets back the '12345'.  It then locates the beginning of the encrypted data stream, and applies the key '12345' to decrypt the data.  The result:  a very well protected data stream that is reliably and efficiently encrypted, transmitted, and decrypted.

Source files for a simple RSA-based encryption algorithm can be found HERE:
ftp://ftp.funet.fi/pub/crypt/cryptography/asymmetric/rsa

It is somewhat difficult to write a front-end to get this code to work (I have done so myself), but for the sake of illustration, the method actually DOES work and by studying the code you can understand the processes involved in RSA encryption.  RSA, incidentally, is reportedly patented through the year 2000, and may be extended beyond that, so commercial use of RSA requires royalty payments to the patent holder (www.rsa.com).  But studying the methods and experimenting with it is free, and with source code being published in print (PGP) and outside the U.S., it's a good way to learn how it works, and maybe to help you write a better algorithm yourself.

A 'multi-phase' S Table method, invented by me

Some time ago, in the late 1990's, developed and tested an encryption method that is (in my opinion) nearly uncrackable, so long as it is implemented properly. The reasons why will be pretty obvious when you take a look at the method itself. I originally explained it in prose due to the way US export laws were written at the time. Fortunately, for open source, it appears that it is only necessary to inform the right people of the URL (or to send them a copy of the source), which is a LOT more reasonable than it used to be. So thumb-on-nose aside, the encryption method itself is relatively straightforward, and reasonably strong (I recently analyzed it using some standard statistical methods and the entropy is pretty good, especially when I made a minor alteration to correct certain trivial cases that resulted in poor entropy, such as a block of zeros). According to THIS web site, my algorithm (which I have dubbed 'pelcg', the ROT13 for 'crypt') could be described as an asynchronous stream cipher with a symmetrical key)
So, here goes (description of method first made public on June 1, 1998):

The pseudo-random sequence can be designed by YOU to be ANYTHING that YOU want. Without details on the sequence, the source code, or the compiled binary image, the cipher key itself is worthless. PLUS, a block of identical ascii characters will translate into random garbage with (potentially) high entropy, each byte depending upon the encrypted value of the preceding byte (which is why I use the ENCRYPTED value, not the actual value, as the table index). You'll get a random set of permutations for any single character, permuations that are of random length, that effectively hide the true size of the cipher.

However, if you're at a loss for a random sequence consider a FIBBONACCI sequence, using 2 DWORD's (like from your encryption key) as "seed" numbers, and possibly a 3rd DWORD as an 'XOR' mask. An algorithm for generating a random sequence of numbers, not necessarily connected with encrypting data, might look as follows:

  unsigned long dw1, dw2, dw3, dwMask;
  int i1;
  unsigned long aRandom[256];

  dw1 = {seed #1};
  dw2 = {seed #2};
  dwMask = {seed #3};
  // this gives you 3 32-bit "seeds", or 96 bits total

  for(i1=0; i1 < 256; i1++)
  {
    dw3 = (dw1 + dw2) ^ dwMask;
    aRandom[i1] = dw3;
    dw1 = dw2;
    dw2 = dw3;
  }

If you wanted to generate a list of random sequence numbers, let's say between zero and the total number of random numbers in the list, you could try something like THIS:

int __cdecl MySortProc(void *p1, void *p2)
{
  unsigned long **pp1 = (unsigned long **)p1;
  unsigned long **pp2 = (unsigned long **)p2;
  
  if(**pp1 < **pp2)
    return(-1);
  else if(**pp1 > *pp2)
    return(1);

  return(0);
}

...

  int i1;
  unsigned long *apRandom[256];
  unsigned long aRandom[256];  // same array as before, in this case
  int aResult[256];  // results go here

  for(i1=0; i1 < 256; i1++)
  {
    apRandom[i1] = aRandom + i1;
  }
  
  // now sort it
  qsort(apRandom, 256, sizeof(*apRandom), MySortProc);
  
  // final step - offsets for pointers are placed into output array
  for(i1=0; i1 < 256; i1++)
  {
    aResult[i1] = (int)(apRandom[i1] - aRandom);
  }
  
...

The result in 'aResult' should be a randomly sorted (but unique) array of integers with values between 0 and 255, inclusive. Such an array could be useful, for example, as a byte for byte translation table, one that could easily and reliably be reproduced based solely upon a short length key (in this case, the random number generator seed); however, in the spirit of the 'GUTLESS DISCLAIMER' (below), such a table could also have other uses, perhaps as a random character or object positioner for a game program, or as a letter scrambler for an anagram generator.

GUTLESS DISCLAIMER: The sample code above does not in and of itself constitute an encryption algorithm, or necessarily represent a component of one. It is provided solely for the purpose of explaining some of the more obscure concepts discussed in prose within this document. Any other use is neither proscribed nor encouraged by the author of this document, S.F.T. Inc., or any individual or organization that is even remotely connected with this web site.

Weaknesses in Encryption

An encryption method might seem 'safe' on the outside, and even accept a ridiculously large key, but if the data it generates is NOT 'random' in appearance, there is a possibility to develop a method that exploits the 'non-random' patterns to greatly reduce the amount of time it would take to 'crack' the cipher. This kind of exploitation has already been demonstrated in several instances.

One particular method that can be used to reveal weakness is a statistical analysis of the results of the encryption. This can be done with the original or without. A method involving a statistical breakdown of byte patterns, such as the number of times any particular value appears in the encrypted output, would quickly reveal whether any potential patterns might exist. Similar 'byte A follows B' analysis could reveal the same kinds of weaknesses. This sort of analysis could even be done with a SPREADSHEET application, where high standard deviation would indicate poor entropy. Ideally, the algorithm would have an entropy similar to that of a truly random sequence. So performing your analysis FIRST on random numbers (try /dev/urandom on non-windows systems), and THEN applying the same analysis to the output of the encryption algorithm, would give you a nice indication of just how much entropy your algorithm has.

Another method involves 'predictability'. If you know that a particular sequence of data results in a particular pattern in the encryption stream, you can use these patterns to partially decrypt the content. Once a partial decrypt has been performed, knowledge of the algorithm may be enough to help you generate the key that created the cipher stream. This technique was used to help crack 'Enigma' back in World War 2, by the Bletchley Park team. Commonly used phrases like 'Heil Hitler' were used in their analysis, which in many ways is ironic. They used paper cards to create what they called 'cribs' to help visually locate these patterns within the encrypted data. It got to the point where they could read the encrypted information in 'real time', sometimes before the recipient got his copy of the unencrypted message.

Conclusion

Because of the need to ensure that only those eyes intended to view sensitive information can ever see this information, and to ensure that the information arrives un-altered, security systems have often been employed in computer systems for governments, corporations, and even individuals. Encryption schemes can be broken, but making them as hard as possible to break is the job of a good cipher designer. All you can really do is make it very very difficult for the code breaker to decipher your cipher. Still, as long as both source and encrypted data are available, it will always be possible to break your code. It just won't necessarily be easy.

RELATED SITES (many with sample code, many outside U.S. in places like Finland to avoid the old U.S. export laws. Some links are broken and have been left in the list for legacy purposes)



Back to MY web page
©1998-2013 by R. E. Frazier - all rights reserved
last updated: 04/5/2013