Misunderstanding Computers

Why do we insist on seeing the computer as a magic box for controlling other people?
Why do we want so much to control others when we won't control ourselves?

Computer memory is just fancy paper, CPUs are just fancy pens with fancy erasers, and the network is just a fancy backyard fence.
コンピュータの記憶というものはただ改良した紙ですし、CPU 何て特長ある筆に特殊の消しゴムがついたものにすぎないし、ネットワークそのものは裏庭の塀が少し拡大されたものぐらいです。

(original post/元の投稿 -- defining computers site/コンピュータを定義しようのサイト)

Saturday, October 7, 2017

Languages in a Common Character Code for Information Interchange

Having said a bit about why I want to re-invent Unicode (so to speak), I want to rant a little about the overall structure, relative to languages, that I propose for this Common Code for Information Interchange, as I am calling it.

I've talked a little about the goals, and the structure, in the past. Much of what I said there I still consider valid, but I want to take a different approach here, look from the outside in a bit.

First, I plan the encoding to be organized in an open-ended way, the primary reason being that language is always changing.

Second, there will be a small subset devoted primarily to the technical needs of encoding and parsing, which I will describe in more detail in a separate rant.

Third, there will be an international or interlocality context or subset, which will be relatively small, and will attempt to include enough of each current language for international business and trade. This will appear to be a subset of Unicode, but will not be a proper subset. I have not defined much of this, but I will describe what I can separately.

Parsing rules for this international subset will be as simple as possible, which means that they will depart, to some extent at least, from the rules of any particular local context.

Third, part two, there will be spans allocated for each locality within which context-local parsing and construction rules will operate.

Fourth, there will be room in each span for expansion, and rules to enable the expansion. Composition will be one such set of rules, and there will be room for dynamically allocating single code points for composed characters used in a document.

The methods of permanently allocating common composed characters should reflect the methods of temporary allocation.

Fifth, as much as possible, existing encodings will be included by offset. For instance, the JIS encoding will exist as a span starting at some multiple of 65536, which I have not yet determined, and the other "traditional" encodings will also have spans at offsets of some multiple of two. The rules for parsing will change for each local span.

I've thought about giving Unicode a span, but am not currently convinced it is possible.

Of course, this means that the encoding is assumed to require more than will fit comfortably in four bytes after UTF-8 compression.

And thinking of UTF-8 brings me to the next rant.

No comments:

Post a Comment