Misunderstanding Computers

Why do we insist on seeing the computer as a magic box for controlling other people?
人はどうしてコンピュータを、人を制する魔法の箱として考えたいのですか?
Why do we want so much to control others when we won't control ourselves?
どうしてそれほど、自分を制しないのに、人をコントロールしたいのですか?

Computer memory is just fancy paper, CPUs are just fancy pens with fancy erasers, and the network is just a fancy backyard fence.
コンピュータの記憶というものはただ改良した紙ですし、CPU 何て特長ある筆に特殊の消しゴムがついたものにすぎないし、ネットワークそのものは裏庭の塀が少し拡大されたものぐらいです。

(original post/元の投稿 -- defining computers site/コンピュータを定義しようのサイト)

Thursday, November 28, 2024

How Address Function and Caches Should Interact (6801-derived Example)

In addition to daydreaming about address function, segmentation, caches and the 6809, I daydream similar things about the 6801.

But the 6801 does not have Y, U, or DP. 

So if I want to daydream about a 6801-derived CPU that provides address function output to allow distinguishing between the source of the address something like the following,

  • 000 for code
  • 001 for interrupt vectors
  • 010 for U relative (parameter stack)
  • 011 for S relative (return address stack)
  • 100 for general data (extended/absolute, X, Y)
  • 101 for direct page
  • 110 and 111 for DMA

with customized caches for each, since access patterns for each are so different, I'll have to daydream about extending the 6801 in a somewhat different path from the 68HC11.

And being able to do things like
  • indexing into ROMmed jump tables at run time,
  • copying strings out of ROMs,
  • using X or Y to access arrays on the parameter stack,
  • and such.

just means we're going to use the more general solution. Instead of just adding address function output bits and relying on bank switching, I have to daydream about widening the address bus and extending the index registers, either with segment registers (preferably paired with limit registers) or simple extra bits in the registers.

And if the direct page (page zero) is going to be separate from absolute addresses, I'm going to need some register to distinguish them. Maybe absolute would only be the low 64K, or maybe I'd add an absolute base page register to extend absolute addresses out to whatever maximum. And this CPU would get a DP register like the 6809's, I guess. Maybe it would only have 8 bits, and direct page would only be in the low 64K. Or even maybe less. Or even split the direct page so that the low half (I/O) and the high half (pseudo-registers) could be moved independently. Decide all that later.

8 bits of extension would be more than enough for 6801-sized applications, but would 5 bits? 5 for ordinary address plus 3 for function would be 8 bits of extension, for 24 bits, total.

21 bits of address is 2 megabytes per address space. I can see that being a bit tight for some things that I'd still want to do on a fundamentally 8-bit CPU, but it's okay to leave that question for later.

(The 6809 is actually fundamentally 16-bit, 8-at-a-time, just for reference. It's the LEA instructions.)

In addition to Y, we'll need an additional U stack for the parameters. It would be nice to have it indexable like X and Y, but by providing TXU, TUX, TYU, and TUY, we could get the local variable addressing we need without thrashing the index registers so much. So if making U indexable induced significant complexity, we could leave that out and rely on the widened X and Y to give us access to the parameters.

We can keep the constant 8-bit unsigned offsets of the 6801/HC11, but we need to add an SBX instruction to subtract B from X to do real pointer math, and we need the corollary ABY/SBY. 

ABU and SBU would be nice, if we are going to include direct indexing of the parameter stack pointer.

And, incidentally, add immediate to index register is something I really wish Motorola had added for the 6801, either 16-bit signed AIX (and AIY and maybe AIU in this CPU), or by unsigned 8-bit AIX/SIX (and AIY/SIY and maybe AIU/SIU.

With all the transferring and adding, maybe we just need to trade the implicit operand instructions for register-register instructions. 

This kind of extension would give us enough to allow PC caching by fill-ahead with one branch of fill, checking whether the branch code is in cache first. Wouldn't need as much as the 6809, since total instruction length is shorter, maybe? But then there will be the long address forms, so probably 32 bytes each branch.

Return address stack cache with spill-fill, 2/4 centered hysteresis, eight entries would probably be good enough, maybe 32 if we simply want to make sure (most) embedded applications would not need external return address stack.

64 bytes of spill-fill parameter stack cache with hysteresis would probably be plenty.

A different approach to the direct page, the DP could be banked in small blocks, 32 bytes per block seems best to me, but maybe 64 if it's enough simpler. Only addresses $80 to $FF would be cached to RAM, since addresses $00 to $7F would be for I/O. 

Rather than automatic caching, the DP RAM would then just be internal RAM, 1 kilobyte addressed from $400, with the bank switching circuit dual-mapping small blocks down to blocks in the $80-$FF range. The bank switching circuit might be tied to an active process number. We can figure out the details later.

Again, maybe not even bother caching general data accesses.

How Address Function and Caches Should Interact (6809 Example)

I keep daydreaming about a 6809 derived CPU that provides address function output (like the 68000's address functions, bit not the same) to allow distinguishing between the source of the address, say,
  • 000 for PC relative (code)
  • 001 for interrupt/reset vectors
  • 010 for U relative (parameter stack)
  • 011 for S relative (return address stack)
  • 100 for general data (extended/absolute, X, Y)
  • 101 for direct page relative
  • 110 and 111 for DMA, maybe.

And customized caches for each, since access patterns for each are so different.

And I keep remembering that with 16-bit addressing, even this would be barely big enough for limited Fuzix.

And I keep remembering that X and Y, being general index registers, will need extra bits if you want to do things like

  • indexing into ROMmed jump tables at run time,
  • copying strings out of ROMs,
  • using X or Y to access arrays on the parameter stack,
  • and such.

And then I remember that widening the address bus and the index registers is the more general solution -- either via segment registers (which I think I would pair with limit registers) or by simply widening the registers. And then I wander off into ways of extending the address range of the extended/absolute mode and DP-relative, maybe adding a second DP-like register for I/O, and such. (Widened DP relative and IOPage relative addressing could be added via unused index mode bit patterns in the indexed mode post-byte, of course.)

(8 bits of index extension should be plenty for the 6809, but we want to be able to still encode address function in 3 bits, so maybe 16 bits of extension total.

And forgetting the cache gadgetry that got me running back over this ground again.

In this scheme, PC cache just needs to fill ahead, and fill a second direction over one branch, checking whether the branch code is in cache first. 32 bytes per branch, 64 total would be plenty on a 6809.

Return address stack cache just needs spill-fill with 2/4 hysteresis in the middle. Eight entries would speed calls and returns a lot, Thirty-two entries would be enough to eliminate the need for return stack RAM on many embedded applications.

Parameter stack would have to be a bit more flexible than the return address stack, but would be similar in operation.

The DP cache would be a bit weird, but 256 bytes of cache, two, four, or eight banks, and address tag for each bank, with some sort of mechanism to automatically save a direct page that has been switched out, perhaps after another has been switched, and so forth. 

Of course, there would be system cache and user program cache for fast process context switching.

And, with all that, maybe not even bother caching general data accesses.

(Analyzing this for the 6801 here: https://defining-computers.blogspot.com/2024/11/how-address-function-and-caches-should-interact-6801-derived-example.html.)