The incredible progress in other areas of computing is overshadowed by our continued use of the primitive ASCII coding system for the characters of the alphabet, numbers and, punctuation. We now have the chance to make just as much progress in computer capacity and efficiency with simple arithmetic as we can with the usual chip and storage technology.
The way we have been making continuous upgrades in applications and operating systems while still using the old ASCII coding system is like developing entirely new ways of printing documents with the latest version of Microsoft Office but then delivering the documents to their destination by Pony Express. You can see an ASCII table at http://www.asciitable.com/ or read all about it on http://www.wikipedia.com/
If we could make the storage and transmission of data more efficient, it would have the same effect as improving processor and transmission speeds and hard drive capacity. I prefer the idea of Numbering Sentences that I described in the posting below but that will require some work and time to break all sentences down into numbers. There are other ways that I have noticed that could be implemented in the next generation of software.
TRIMMING BYTES
Suppose we could reduce the size of the byte used in ASCII from eight bits to six? That would immediately make storage and transmission of data more efficient because it would require only 75% of today's capacity for the same amount of data. It is true that six bits gives us only 64 possible combinations, as opposed to the 256 of eight-bit bytes. But those eight-bit bytes were implemented in the days before spellcheck technology in word processing. Why not eliminate capital letters from the coding for the purposes of storage and transmission and then have spellcheckers capitalize the proper letters later?
I believe that we could also eliminate quite a few of the other characters presently used in ASCII coding. We could even eliminate the number characters; 1234567890 by having numbers automatically spelled out for storage and transmission and then being reduced to numbers later on, if required. It is true that the word "five" takes up more space than "5" but it would reduce the number of different characters that must be encoded.
THE BASE SYSTEM
.
Now, on to another idea for increasing computer efficiency by reforming ASCII. I have a way to not only store and transmit computer data with fewer bits but to speed up processing by freeing the processor from deciphering the eight-bit bytes of ASCII. As you know, a number system can be of any base. Ours happens to be base-ten because ancient people counted things on their ten fingers. Computers are coded in binary, or base-two and the computer world also uses hexidecimal, or base-sixteen.
.
.
Suppose we could consider any text that had to be stored or transmitted not as text, but as a number. If we could eliminate capital letters and numbers, our alphabet could be considered a base-twenty seven number system because there are twenty-six letters and a space between words.
.
In fact ASCII is, in effect, a base-two hundred fifty six number system. If we could see any text as actually a number, all we would have to do is store and transmit that number in binary. Not only would it require far less space but would also be much easier on the computer processor because it would be just one long number and not composed of bytes.
Let's consider something simple like my name, Mark Meek. If encoded into ASCII it would require nine bytes, including the space. In other words, 72 bits. But if we devised a base-fifty number system, let the space = zero, a = 1, b = 2, z = 26 and so on, so that my name was considered by the computer to be a number, it would require only 49 bits, according to my calculations. Using a base-fifty number system should allow us to incorporate all the required punctuation and control keys since capital letters and possibly numerical characters can be eliminated until later.
Let's consider something simple like my name, Mark Meek. If encoded into ASCII it would require nine bytes, including the space. In other words, 72 bits. But if we devised a base-fifty number system, let the space = zero, a = 1, b = 2, z = 26 and so on, so that my name was considered by the computer to be a number, it would require only 49 bits, according to my calculations. Using a base-fifty number system should allow us to incorporate all the required punctuation and control keys since capital letters and possibly numerical characters can be eliminated until later.
.
THE FLOATING BASE SYSTEM
Now, let's move on. Suppose we could use a flexible, or floating, number base for each block of data that we store or transmit? It would make it even more efficient. This is done by automatically scanning the data, detecting how many different characters along with invisible control keys it contains and using that number as the number base to encode the data as a single large number. This would be incredibly efficient.
How many documents or blocks of text contain characters such as ! # ^ & * +=( )? The answer is only a relative few. So, why is it necessary to have space to encode characters that are not there on each block of text? Also, a document may not contain letters such as q, x or, z, making it unnecessary to include space in the coding for them. The goal should be to make the number base for encoding the text as low as possible.
Back in my music days there was a song I liked, a line of which is "It don't come easy, you know it don't come easy." To encode this, the program would count the number of different characters, including spaces, punctuation and, control characters and use that number as the base by which it would be encoded. This is the type of calculation at which computers excel. If a block of text were too large for the computer processor to calculate the number representing the text, it would simply break it up into two or more blocks.
No comments:
Post a Comment