Wednesday, January 1, 2014

The Scan Method Of Computer Encoding


I am burdened with an extreme sense of efficiency. When I was young, I never seemed to have enough time to do all of the reading and exercise that I wanted to do and so I was always looking for ways to make things more time-efficient. This gave me a sense of efficiency that applied to everything.

One thing that I have written about before as being inefficient in the extreme is the ASCII system used to encode the alphabet, punctuation and, control characters for computing. In this outmoded system from 1968, eight bits is defined as a byte. Since each bit can be recorded as either a 1 or a 0, that means there are 256 possible combinations for a byte because two multiplied by itself eight times equals 256. These 256 possible combinations in a byte are used to encode the letters of the alphabet as well as punctuation and unprinted control codes, such as carriage return.

I have written quite a bit about ways to upgrade this system to make it more efficient, and it seems that every time I see it I notice yet another way that it could be improved. We could gain a lot of efficiency by agreeing on an order of all of the byte codes used in ASCII. We have an order for the letters of the alphabet, and the same concept can be applied to all of the codes.

Once we agreed on a sequential order for all of the byte codes used in ASCII, we could scan any document that was to be encoded to see which codes were present in the document. A typical document may not include letters like Q and Z or characters such as +, =, !, #, etc. The first 256 bits of an encoded document would be to indicate, in sequential order, which of the byte codes were present in the document.

Then, for each present byte code, there would be a line of bits which could each be set to a 1 or a 0. These bits would be scanned to reproduce the document and would indicate whether that present character was included in this scan. A scan of the first bit after each present character would be the first scan, a scan of the second bit after each present character would be the second scan, and so on.

The system would be programmed to first, separate out the first 256 bits which indicate which characters are present. Then, to divide the remaining bits by the number of present characters. This division would yield an even number which would be the number of scans that would be needed to be done to replicate the document. If a scan bit for a particular present character is set to 1, that would mean that the character is included in that particular scan and a 0 if it is not. There would, of course, be as many scan bits as necessary after each character to complete the document.

This method would not be efficient with a single sentence. "I went to the store" would require eleven scans of ten present characters, including the space between words. Each scan would scan the present characters in the agreed-upon order of the byte codes so that this sentence, with it's eleven scans and an underscore to show the spaces between words, would look like this:
I_
w
ent
_t
o
_t
h
e
_st
or
e
Since the present byte codes would be scanned in the agreed-upon sequence, we cannot go backwards in the alphabet or double letters, another scan would be necessary. Since spaces occur so frequently in written documents, we can replace some of the non-present characters with spaces to make the process still more efficient.

This is not efficient with a single sentence but, unlike ASCII, the efficiency compounds as the document gets longer because more letters would be included in each scan. With an extremely long document, we would approach a condition of efficiency in which each letter and character would be expressed with a single bit, rather than the eight bits of the ASCII system. In contrast, ASCII gets no more efficient as the document gets longer.

We are making so much progress with processor speeds and drive capacity, but are still using the utterly inefficient coding system that has been in use since the ancient history of computing.

No comments:

Post a Comment