Misunderstanding Computers

Why do we insist on seeing the computer as a magic box for controlling other people?
人はどうしてコンピュータを、人を制する魔法の箱として考えたいのですか?
Why do we want so much to control others when we won't control ourselves?
どうしてそれほど、自分を制しないのに、人をコントロールしたいのですか?

Computer memory is just fancy paper, CPUs are just fancy pens with fancy erasers, and the network is just a fancy backyard fence.
コンピュータの記憶というものはただ改良した紙ですし、CPU 何て特長ある筆に特殊の消しゴムがついたものにすぎないし、ネットワークそのものは裏庭の塀が少し拡大されたものぐらいです。

(original post/元の投稿 -- defining computers site/コンピュータを定義しようのサイト)

Wednesday, December 13, 2017

Facebook and LinkedIn are out of control again.

I suppose it's an inherent problem of letting the metaphorical postal roads all be run for profit.

Facebook Messenger will not boot on this ancient tablet. It gets stuck in a loop demanding that I should let it see my telephone contact list.(But the tablet is not a phone.)

I suppose the proximate cause of the loop is the age of the OS. It's well beyon life support.

But there's the rub of things.

Why must we let important things like telephones be put at the mercy of the insane technical marketing update cycle?

It's this insane update cycle that is driving the human side of global warming.

Who still can't see this?

Now Facebook itself is no longer satisfied with just being Facebook. It has not shut down safely in more than a week.

And LinkedIn seems to be doing similar things, insisting, simply demanding that I turn my human network over to their engines of marketing.

Anyone willing to front me a cool USD hundred million or so to build a networking service that treats people like people and tech like tech?

Saturday, October 7, 2017

Languages in a Common Character Code for Information Interchange

Having said a bit about why I want to re-invent Unicode (so to speak), I want to rant a little about the overall structure, relative to languages, that I propose for this Common Code for Information Interchange, as I am calling it.

I've talked a little about the goals, and the structure, in the past. Much of what I said there I still consider valid, but I want to take a different approach here, look from the outside in a bit.

First, I plan the encoding to be organized in an open-ended way, the primary reason being that language is always changing.

Second, there will be a small subset devoted primarily to the technical needs of encoding and parsing, which I will describe in more detail in a separate rant.

Third, there will be an international or interlocality context or subset, which will be relatively small, and will attempt to include enough of each current language for international business and trade. This will appear to be a subset of Unicode, but will not be a proper subset. I have not defined much of this, but I will describe what I can separately.

Parsing rules for this international subset will be as simple as possible, which means that they will depart, to some extent at least, from the rules of any particular local context.

Third, part two, there will be spans allocated for each locality within which context-local parsing and construction rules will operate.

Fourth, there will be room in each span for expansion, and rules to enable the expansion. Composition will be one such set of rules, and there will be room for dynamically allocating single code points for composed characters used in a document.

The methods of permanently allocating common composed characters should reflect the methods of temporary allocation.

Fifth, as much as possible, existing encodings will be included by offset. For instance, the JIS encoding will exist as a span starting at some multiple of 65536, which I have not yet determined, and the other "traditional" encodings will also have spans at offsets of some multiple of two. The rules for parsing will change for each local span.

I've thought about giving Unicode a span, but am not currently convinced it is possible.

Of course, this means that the encoding is assumed to require more than will fit comfortably in four bytes after UTF-8 compression.

And thinking of UTF-8 brings me to the next rant.

A Common Code for Information Interchange

I've been thinking about this topic since I first heard of plans for what become Unicode, back in the mid 1980s.

At the time, there were many in the industry who still thought that 64K of RAM should be enough for general personal computing, and there were many people who thought 65,536 should be enough characters to cover at least all modern languages. I tried to tell as many people as I could that Japanese alone had more than 20,000 and Chinese had an estimated count in the range of a hundred thousand, and no one believed me. I also tried to tell people that we shouldn't conflate Japanese and Chinese characters, but I had a hard time convincing even myself of that.

I also tried to tell people that the Kanji radicals should be encoded first, but the Japanese standards organization wasn't doing that, so why should anyone believe it?

As I noted recently, some of the problems of the current approach to unifying the world's character sets are becoming obvious.

Each natural human language is it's own context. And each language has it's own set of sub-contexts which we call dialects. But neither the contexts nor sub-contexts nor the sets of sub-contexts are well-defined, mathematically speaking, which means that mathematical methods can not be used to perfectly parse any natural language.

Therefore, programs which work with human languages are necessarily buggy. That is, we know that, no matter how carefully they are constructed, they will always contain errors.

When we combine language contexts, we combine error rates with error rates, and the result is at best multiplicative. It is not simply additive. So we really should not want to do that. But that's what Unicode tries to do -- combine character codes for all relevant languages into one over-arching set.

Actually, for all my pessimism, Unicode works about as well as we should expect it to. I just want something better, but it's hard to describe exactly what that something is. This rant is an attempt to do so.

With just US English, it's fairly easy to construct a text editor. Parsing the entered text requires around ten simple functions, and visual formatting less than ten more. Word processors are fairly straightforward, as well.

With Unicode, a simple text editor requires more like a hundred functions, interacting in ways that are anything but obvious.

And if you need to rely on what you read in the text, as I noted in the rant linked above, you find that displaying the text reliably adds significantly more complexity.

Actually, US English is almost unnaturally simple to parse (relatively speaking). That's why it has been adopted over French, Spanish, Russian, and German, and why you don't hear much of Japanese plans to make Japanese the international language, and why the Chinese Communist Party's dreams of making Chinese the international language just will never fly, no matter how significant a fraction of the world's population ostensibly speaks Chinese as a native or second language.

Memorizing 9000+ characters for basic literacy requires starting at the age of two, I hear.

The Chinese may claim a full third, but the other two thirds are not going to happily and willingly accept being forced to propogandize their children (or themselves) with that many characters just to be literate. That alone is oppressive enough to prevent a productive peace.

Even the Japanese subset of two thousand for school literacy basically requires all twelve years of the primary grades to complete.

If we could reduce that burden by teaching the radicals first (We westerners call the sub-parts of Kanji "radicals".), we might have hope to address the difficulty, but the radicals themselves are an added layer of parsing. That's multiplicative complexity, which is one of the reasons that approach has not been successful as a general approach. (It is taught, I understand, in some schools of Japanese calligraphy, but that is not a large fraction of the population.)

And the rules for assembling and parsing the radicals are anything but simple.

Now, you may be wondering why I think the radicals should be prioritized in the encoding, but the dirty secret of Kanji is that they are not a closed set, any more than English vocabulary is a closed set. Every now and then someone invents a new one.

Methods to address new coinage must be part of the basic encoding.

This is getting long, and I think I'll wrap up my rant on my motivations for considering something to supercede Unicode here.

I wrote up a summary list of overall goals about three years back, here.

As I've said elsewhere, Unicode has served a purpose until now, and will continue to do so for a few more years, but we need something better.

It needs to provide better separation for the contexts of languages.

Tuesday, September 26, 2017

A 16/32-bit Extension for the Venerable 6809

This is somewhat in the vein of my last several posts, but wandering a little, I have for several years (decades, even) been thinking of ways to extend my favorite microprocessor, the 6809, to make it useful in applications that use Unicode text and decent-sized displays.

I don't have the tools to do  HTML tables right now, so I'm going to assume you have tables of 6809 instructions, registers, and addressing modes handy to refer to. Otherwise, this may not make a lot of sense.

Before I begin, I do not intend to do anything with the 6309 extensions. Don't ask me to explain much beyond the fact that success in this kind of endeavor requires keeping things as simple as possible.

I'm thinking of calling this the X61609, BTW.

The first step in this project is to double the register sizes.

The accumulators will be 16 bits wide, and the indexable registers will be 32 bits wide:

    A and B 16 bits wide.

    X, Y, U, S and PC 32 bits wide.

The direct page register will be widened on both ends, to contain a full width 32-bit direct page base address instead of just the high-order part.

   DP 32 bits wide -- full address.

The condition codes will be augmented by status bits, and will be 32 bits wide.

    CC has extra condition code and status bits.

If you are familiar with the processor and the instruction set, you are likely wondering what magic will make this work. Sign extension plays a big role.

We will need more instructions. We'll borrow a play from the original 6809 and use page escapes to encode them. That means that, in addition to page2 ($10) and page3 ($11) pre-bytes, we'll have pagex1 ($14), pagex2 ($15), and pagex3 ($18). To simplify, the codes will be mapped out the same as the 8-bit instructions.

For immediate (constant) operands, the constant portion of the instructions will be 16 bits for accumulators A and B, and 32 bits for the indexable registers X, Y, U, S, and PC, and also 32 bits for the double accumulator, D.

The double accumulator D for 16-bit instructions will be the concatenation of A:B in their full width, with A being the top 16 bits.

For 8-bit instructions, the 16-bit result will generally be the sign extension of the 8-bit result. For example,

    LDA #$-1

will load the 8 bits of the argument and sign extend it, so that the entire 16 bits of A will be -1. (Bit rotate and shift instructions will be the exceptions that prove this rule.)

For 8-bit instructions involving D, D will be the concatenation of the lower 8 bits of the result A and B, sign extended, when D is the target of the instruction.

Likewise, when loading a sixteen bit value into an indexable register, it will be sign extended.

    LDX #$8000

will result in X containing $FFFF8000. Appending "W" to instructions to specify their 16-bit forms,

    LDW X #$8000

is the form required to get $8000 into X, and it will consume 6 bytes of instruction space:

    $14  $E8 $00 $00 $80 $00

This may feel strange, but it should yield the best emulation of the 6809 in existing hardware designs, by decoding the bottom 32K of memory as the low half of 6809 address space and the top 32K as the high half.

Shifts and rotates require a bit of special attention. the 6809 carry bit will be retained, and will alway reflect the carry result defined according to the 6809, in other words, carries from bit 7 mostly, and from bit zero in right rotates, etc. There will be an additional 16 bit carry that will be set according to the 16-bit results, mostly carry from bit 15, etc., and will provide the source for carry in sixteen bit instructions, too.

For register exchange and transfer insfructions, whole registers will alway be exchanged or transferred, except for DP and CC. Bits 8-15 of DP will be involved in 8-bit transfers and exchanges, clearing the low byte of DP and extending the sign into the high word as necessary. DP will be allowed in exchanges and transfers with indexable registers, in which case all 32 bits  will be affected.

The status register will be allowed in transfers and exchanges with D, in which case all 32 bits will be affected (at least in system mode).

Stack operations may seem to be a cause for concern, but PSH and PUL will be in both the 8- and 16- bit maps, and the 8-bit push and pop will behave as it does in the 6809, where the 16-bit will do the logical thing. The one point of concern is that 8-bit PULs will sign-extend their results, to maintain compatibility.

8-bit SWIs will behave as in the 6809, and 16-bit SWIs will save entire registers.

The traditional interrupt and reset vectors will be sign extended before being used. Addtional 16-bit (thus 32-bit address) interrupt and reset inputs and vectors will be provided.

Finally, the index postbytes will remain almost as they are, with the exception that post-inc/dec modes will increment or decrement by 2 or 4 when used by 16-bit instructions. (I considered multiplying all offsets by two for 16-bit instructions, but it looks like that would complicate, rather than simplify things.)

Also, there will also be at least three new indexing modes --

    1RRI1010 will provide 32-bit offsets from X, Y, U, and S.
    1XXI1110 will provide 32-bit offsets for PC
    101I1111 will provide 32-bit absolute addresses for indirection. (Tentative.)

A DP relative indexed mode would be useful, to allow using DP variables as extra index registers (6502 envy). There should be room in the undefined postbyte values. But the slickest solution, slipping this mode into the don't care assignments for PC relative, depends on how much ROMmed software there is that used something besides 0s in the don't care bits. It could be switched on and off with a (system-mode-only) status bit, perhaps.

From this point, the next priority would be memory management and support for isolating instances of emulated systems. That will be for another time.

And, if there is room left over after that, it would be nice to add bit-level primitives for multiply and divide, such that a 32-bit multiply or divide could just be a sequence of 32 of the primitives. (The primitive would conditionally add or subtract according to the bit being examined and then shift the result one, ready to examine the next.)

Friday, June 30, 2017

Keeping the Return Address Stack Separate

So far, in this extended rant, I have
Now I want to show how a CPU could completely prevent an entire subclass of these attacks, without a whole lot of loss in processor speed.

I first came on these ideas twenty, maybe thirty years ago, when trying to figure out what made the M6809 such a magical microprocessor. (That's a subject for another day.) The M6809 was (and still is) an 8-bit microprocessor with a 16-bit address bus. That means it can address 64Kbytes of memory.

Motorola specified a memory management part called the 6829 which supported designs up to 1 megabyte of memory. It was essentially a block of fast RAM that would be used to translate the upper bits of memory, plus a latch that would select which parts of the RAM would be used to translate the upper bits of the address bus, something like this:



(This is from memory, and not really complete. Hopefully, it's enough to get the concept from.)

Memory management control would provide functions like write protect and read protect, so you could keep the CPU from overwriting program code and set parts of the address space up as guard pages.

With 32 bits and more of address, memory management doesn't quite work this simply, but this is enough to get some confidence that memory management can actually and meaningfully be done.

Now, if you are familiar with the 8086, you may wonder what the difference between the 8086's segment registers and this would be. This kind of scheme provides fairly complete control over the memory map.

The 8086 segment registers only moved 64k address windows around in the physical map, and provided no read or write control. Very simple, but no real management. The 80286 provided write protect and such, but the granularity was still abysmal, mostly based on guesses about the usage of certain registers, guesses which sort of worked with some constrained C programming language run-time models. And these guesses were frozen into silicon before they were tested. It should go without saying, that such guesses miss the mark for huge segments of the industry, but Intel's salescrew has always been trained in the art of smooth-talking.

(Intel are not the only badguys in the industry, they are just the ones who played this particular role.)

Now I knew about both of these approaches, and I knew about the split stack in Forth. And it occurred to me that, if a 6829-like MMU could talk to the CPU, and select a different task latch on accesses through the return stack pointer (S in the 6809), you could make it completely impossible to crash the return stack and overwrite return addresses.


I'm not talking about guard pages, I'm talking about the return addresses just simply can't be accesses by any means except call and return. They're outside the range of addresses that application code can generate.

Of course, the OS kernel can access the stack regions by mapping them in, but we expect the OS to behave itself.

(We would provide system calls to allow an application to have the OS adjust a return address when such is necessary.

Also, since we are redesigning the CPU, we might add instructions for exceptional return states, but I would really rather not do that. It seems redundant, since split stacks make multiple return values so much easier.)

Another thing that occurred to me is that the stack regions could be mapped to separate RAM from the main memory. This would allow calls and returns that would take no more time than regular branches or jumps.

At this point in my imaginings, I'm thinking about serious redesign of the CPU. So I thought about adding one more stack register to the 6809, a dedicated call/return stack. It would never be indexed, so it would be a very simple bit of circuitry. That would free up registers for other use, including additional stacks and such.

(Well, if we allow frame pointers to be pushed with the instruction pointers, and provide instructions for walking the stack, there would be one kind of indexing -- an instruction to fetch a frame pointer at a specific level above the current one. I'll explain how this would work, to aid understanding what's going on here:

There would be a couple of bits in the processor status area, which the OS would set before calling the application startup. The application must not be allowed to modify these bits, but, since the application must be able to confirm that the frames are present, it should be able to read them.

These bits would tell the processor which stack pointers to save with the instruction pointer on calls. The return instructions would have a bit field to determine whether to restore or discard each saved frame pointer.

"Walking the stack" would be simply a load of a specified saved frame pointer at a specified level of calling routine.

In the example shown here, the instruction GETFP sees from the status register that both LP and SP are being recorded, and multiplies the index argument by 3, then adds 2 to point to LP, checks against the return stack base register, and loads LP0 into X.

But GETFP SP,3,Y after pointing into stack that isn't there, checks the return stack base register and refuses to load the frame pointer that isn't there.

Another flag in the status register might select between generating an exception on failure and recording the failure in a status bit.

Maybe. :-/)

Could we do such a thing with the 68000 or other 32-bit CPUs? Add a dedicated call/return stack and free the existing stack pointer for use as a parameter stack in a split-stack architecture? 64-bit CPUs?

Sure.

But if we intend to completely separate the return addresses, we have to add at least one bit of physical address, or we have to treat at least one of the existing address bits (the highest bit) in a special way. I think I'd personally want to lean towards adding a physical address bit, even for the 64-bit CPUs, to keep the protection simple. But, of course, there are interesting possibilities with keeping the physical and logical addresses the same size, but filtering the high bit in user mode.

And that would provide us with a new kind of level-1 cache -- 8, 16, or maybe 32 entries of spill-fill cache attached to the call/return stack, operating in parallel with a (modified) generalized level-1 cache. The interface between memory management and cache would need a bit of redesign, of course.

I'm not sure it would mix well with register renaming. At bare minimum, the call/return stack pointer would have to be completely separate from the rename-able registers.

Would this require rewriting a lot of software?

Some, but mostly just the programming language compilers would need to be worked on.

And most of the rewrite would focus on simplification of code designed to work around the bottleneck of having the return addresses mixed in with parameters.

There you have a way to completely protect the return addresses on stack.

What about other regions of memory? Can we separate them meaningfully?

That kind of thing is already being done in software used on real mainframes, so, yes. But it does have a much larger impact on existing software and on run-time speed, and it is not as simply accomplished.

But that's actually a question I want to visit when I start ranting about the ideal processor that I want to design but will probably never get a chance to. Later.

Wednesday, June 28, 2017

Proper Use of CPU Address Space

I referred to this in my overview about re-inventing the industry, but I was not very specific. Now that I've been motivated to write a rant or two about memory maps and how they can be exploited:
I can write about the ideal, perfect CPU (that may be too perfect for this world), and how it works with memory.

First, these are the general addressable regions of memory that you want to be able to separate out. I'll put them in the order I've been using in the other two rants:


0x7FFFFFFFFFFFFFFF
  stack (dynamic variables, stack frames, return pointers)
0x7FFFxxxxxxxxxxxx ← SP
  gap
  guard page (Access to this page triggers OS responses.)
  gap
  heap (malloc()ed variables, etc.)
  statically allocated variables
0x4000000000000000
  application code
0x2000000000000000
  operating system code, variables, etc.
0x0000000000000000



The regions we see are
  • Stack (dynamic variables, stack frames, return pointers)
  • Heap (malloc()ed variables, etc.)
  • application code (including object code, constants, linkage tables, etc.)
Operating system code should include the same sort of regions. But that should not really be visible in the application code map. Only the linkage to the OS should be visible, and that would be clumped with the application code.

Memory management hardware provides the ability to move OS code out of the application map. Let's see how that would look:



0x7FFFFFFFFFFFFFFF
  stack (dynamic variables, stack frames, return pointers)
0x7FFFxxxxxxxxxxxx ← SP
  gap
  guard page (Access to this page triggers OS responses.)
  gap
  heap (malloc()ed variables, etc.)
  statically allocated variables
0x4000000000000000
  application code
0x0000000000000000


We used to talk about the problems of accidentally using small integers as pointers. Basically, when pointer variables get overwritten with random integers, the overwriting integers tend to be relatively small integers. Then when those integers are used as pointers, they access arbitrary stuff in low memory. We can notice that and refrain from allocating small integer space. And we realize that we have already dealt with small negative integers by buffering the wraparound into highest memory:


0xFFFFFFFFFFFFFFFF
  gap (wraparound and small negative integers)
0x8000000000000000
  stack (dynamic variables, stack frames, return pointers)0x7FFFxxxxxxxxxxxx ← SP
  gap
  guard page (Access to this page triggers OS responses.)
  gap
  heap (malloc()ed variables, etc.)
  statically allocated variables
0x4000000000000000
  application code
0x0000000100000000
  gap (small integers)


I've posted a rant about using a split stack, with a little of the explanation for why at the end. Basically, that would allow us to move those local buffers that can oerflow, crash, and/or smash the stack way away from the return address stack.

Thus, even if the attacker could muck in the local variables, he would still be at least one step from overwriting a return address. That means he has to use some harder method to get control of the instruction pointer.

Stack usage patterns actually point us to using a third stack, or a stack-organized heap separate from the random allocation heap. Parameters and small local variables could be on one stack, and large local variables on the other.

In other words, scalar local/dynamic variables would be on the second stack and vector/structure local/dynamic variables on the third. This would be especially convenient for Forth and C run-times, virtually eliminating all need of function preamble and cleanup, and simplifying stack management.

Another way to use the third stack would be to just put all the local variables on it. It might be easier to understand it this way, and I'll use the parameter/locals division below. As far as the discussion below goes, the two divisions can be interchanged. (The run-time details are significant, but I'll leave that for another day. Besides, there is no reason for a single computer to limit itself to one or the other. With a little care, the approaches could even be mixed in a running process.)

But the third stack could be optional, and its use determined by the language run time support. The OS run-time support really doesn't need to see it other than as a region to be separated from the others. Here is a possible general map, using 64 bit addressing:


0xFFFFFFFFFFFFFFFF
  gap (wraparound and all negative integers)
0x8000000000000000
  gap (large positive integers)
0x7FFFFF0000000000
  gap
  return stack ← RP
  gap
0x7FFFFE0000000000
  guard page (240 addresses)
0x7FFFFD0000000000
  gap
  parameter stack ← SP
  gap
0x7FFFFC0000000000
  guard page (240 addresses)
0x7FFFFB0000000000
  gap
  local stack ← LP  gap
0x7FFFFA0000000000
  guard page (really huge)
0x7000000000000000
  gap
  heap (malloc()ed variables, etc.)
  gap
0x4000020000000000
  guard page (240 addresses)
0x4000010000000000
  gap
  statically allocated variables
  gap
0x4000000000000000
  gap
  application code
  gap
0x0000010000000000
  gap (small positive integer pointer guard)
0x0000000000000000


If we choose to have stack frames, we could manage them very simply on the return stack by just pushing the local and/or pointer stack pointer when we push the IP. And we just discard them when we pop the IP. Or we can pop them, to force-balance the stack. This gets rid of pretty much all the complexity of walking the stack.

The gaps should be randomized, to make it harder for attacker code to find anything to abuse.

The regions we now have are
  • Return Stack (return address and maybe frame pointers)
  • Parameter stack (parameters only)
  • Locals Stack (dynamically allocated local variables)
  • Heap (malloc()ed variables, etc.)
  • Statically allocated process variables (globally and locally visible)
  • application code (including object code, constants, linkage tables, etc.)
And we have large guard regions between each.

What's missing?

Multiprocessing requires a region of memory dedicated to process (or thread) shared variables, semaphores, resource monitor counters and such. This is a separate topic, but basically the statically allocated variable area would have a section which could be protected from bare writes, with only reads and locked read-modify-write cycle instructions allowed. These would be a separate region, so their addresses could be somewhat randomized.

I'm not sure that it makes sense to manage allocation of shared variables in the malloc() sense, but there is room with this kind of scheme, and modern processors should support that many different regions of memory.

Also, regions of memory shared mmap-style would be in a separate region, or perhaps a guarded region for each. I'm not sure whether the would be protected in the same way as semaphores and monitor counters. It would seem, rather that the CPU instructions would be ordinary instructions, and the mmap region would be a resource protected by semaphore- or monitor- controlled access.

We can do the same sort of thing with 32-bit addressing, although, instead of guard pages 240 or so in size, we would be looking at guard pages between 220
and maybe 224 in size. This would be more appropriate for some controller applications.

We could do the same thing with 16-bit addressing, but it wouldn't leave us much room for the variables and code. On the other hand, looking twice at 16-bit addressing will give us clues for further refinement of these ideas. But I think I'll save that for another rant, probably another day. I have burned up enough of today on this prolonged rant.

Monday, June 12, 2017

Reinventing computers.

I mention my bad habits a bit, but I don't really go into much detail:
One of these days I'll get someone to pay me to design a language that combines the best of Forth and C.
Then I'll be able to leap wide instruction sets with a single #ifdef, run faster than a speeding infinite loop with a #define,
and stop all integer size bugs with my bare cast.
I recall trying to get a start ages ago, on character encoding, CPUs, and programming languages, and, more recently, more on character encoding.

These are areas in which I think we have gone seriously south with our current technology.

First and foremost, we tend to view computers too much as push-button magic boxes and not enough as tools.

Early PCs came with a bit of programmability in them, such as the early ROMmed BASIC languages, and, more extensively, toolsets like the downloadable Macintosh Programmers' Workbench. Office computers also often came with the ability to be programmed. Unix (and Unix-like) minicomputers and workstations generally came with, at minimum, a decent C compiler and several desktop calculator programs.

Modern computers really don't provide such tools any more. It's not that they are not available, it's that they are presented as task-specific tools, and you often have to pay extra for them. And they are not nearly as flexible (MSExcel macros?).

Computers were not given to us to use as crutches. They were given to us to help us communicate and to help us think.

I'm not alone in my interest in retro-computing, but I think I have a little bit unusual ultimate goal in my interests.

I want to go back and re-open certain paths of exploration that the industry has lopped off as being too unprofitable (or, really, too profitable for someone else).

One is character encoding. Unicode is too complicated. Complicated is great for big companies who want to offer a product that everyone must buy. The more complicated they can make things, the harder it is for ordinary customers to find alternatives. And that is especially true if they can use patents and copyrights on the artificial complexities that they invent, to scare the customer away from trying to solve his or her own problems -- or their own corporate problems, in the case of the corporate customer.

Computer are supposed to help us solve our own problems, not to impose our own solutions on unsuspecting other people, while making them pay for the solutions that really don't solve their problems.

Now, producing something simpler than Unicode is going to be hard work, harder even than putting the original Unicode together was.

Incidentally, for all that I seem to be disparaging Unicode, the Unicode Consortium has done an admirable job, and Unicode is quite useful. They just made a conscious decision to try not to induce changes on the languages they are encoding. It's a worthy and impossible goal.

And they should keep it up. Even though it's an impossible goal, their pursuing that goal is enabling us to communicate in ways we couldn't before.

But we must begin to take the next step.

Rehashing,
  • The encoding needs to include the ability to encode a single common set of international characters/glyphs in addition to all the national encodings.
  • It needs to include
    • characters,
    • non-character numerics,
    • bitmap and vector image,
    • and other arbitrary (binary/blob) data.
  • It needs to be easily parsed with a simple, regular encoding grammar.
  • And it needs to be open-ended, allowing new words and characters to be coined on-the-fly.
Another path involves CPUs. Intel wants us all to believe that they own the pinnacle of CPU design, but, of course, that is just corporate vanity.

In the embedded world, lots of CPUs that the rest of the world has forgotten are still very much in use, because their designs are optimal in specific engineering contexts. Tradition is also influential, but there are real, tangible engineering reasons that certain non-mainstream CPUs are more effective in certain application areas. The complexity and temporal patterns of the input will favor certain interrupt architectures. The format of the data will favor specific register set constructions. Etc.

Many engineers will acknowledge the old Motorola M6809 as the most advanced 8-bit CPU ever, but it seems to have been a dead-end. ("Seems." It is still in use.) "Bits of it lived on in the 68HC12 and 68HC16." But the conventional wisdom is now that, if you need such an advanced CPU, it's cheaper to go with a low-end 32-bit ARM processor.

What got left behind was the use of a split stack.

The stack is where the CPU keeps a record of where it has been as it chases the branching trails of a problem's solution. When the CPU reaches a dead end, the stack provides an organized structure for backtracking and starting back down new branches in the trail.

Even "stackless" run-time environments tend to imitate stacks in the way they operate, because of a principle called problem context, in addition to the principle of backing out of a non-workable solution.

But the stack doesn't just track where the CPU has been. It also keeps the baggage the CPU carries with it, stuff called local (or context-local) variables. Without the data in that baggage, it does no good for the CPU to try to back up. The data is part and parcel of where it has been.

Most "modern" CPUs keep the code location records in the same memory as the context-local data. It seems more efficient, but it also means that a misbehaving program can easily lose track of both the context data and the code location at once. When that happens, there is no information to analyze about what went wrong. The machine ends up in a partially or completely undefined state.

Worse, in a hostile environment, such a partially defined state provides a chance for attacking the machine and the persistent data that it keeps on the hard disk. (Stack crashes are most effective when the state of the program has already become partially undefined.)

[JMR201706301711: I've recently written rather extensively on stack vulnerabilities and using a split stack to reduce the vulnerabilities:
]

Splitting the stack allows for more controlled recovery from error states that haven't been provided for. In the process, it reduces the surface area susceptible to attack.

The split stack also provides a more flexible run-time architecture, which can help engineers reduce errors in the code, which means fewer partially-defined states.

There are a couple of other areas in which so-called modern CPUs in use in desktop computers and portable data devices are not well matched to their target application areas, and the programming languages (and operating systems), reflecting the hardware, are likewise not well-matched. This is especially true of the sort of problems we find ourselves trying to solve, now that we think most of the easy ones have been solved.

In order to flesh out better CPU architectures, I want to build a virtual machine for the old M6809, then add some features like system/user separation, and then design an expanded address space and data register CPU following the same principles.

I'm pretty sure it will end up significantly different from the old 68K family. (The M6809 is often presented as a "little brother" to the 68K, but they were developed separately, with separate design goals. And Motorola management never really seemed to understand what they had in the 6809.)

Once I have an emulator for the new CPU, I want to develop a language that takes advantage of the CPU's features, allowing for a richer but cleaner procedural API that becomes less of a data and time bottleneck. It should also allow for a more controlled approach to multiprocessing.

And then I want to build a new operating system on what this language and CPU combination would allow, one which would allow the user to be in control of his tools, instead of the other way around.

This is what I mean when I say I am trying to re-invent the industry.