Misunderstanding Computers

Why do we insist on seeing the computer as a magic box for controlling other people?
人はどうしてコンピュータを、人を制する魔法の箱として考えたいのですか?
Why do we want so much to control others when we won't control ourselves?
どうしてそれほど、自分を制しないのに、人をコントロールしたいのですか?

Computer memory is just fancy paper, CPUs are just fancy pens with fancy erasers, and the network is just a fancy backyard fence.
コンピュータの記憶というものはただ改良した紙ですし、CPU 何て特長ある筆に特殊の消しゴムがついたものにすぎないし、ネットワークそのものは裏庭の塀が少し拡大されたものぐらいです。

(original post/元の投稿 -- defining computers site/コンピュータを定義しようのサイト)

Thursday, November 28, 2024

How Address Function and Caches Should Interact (6801-derived Example)

In addition to daydreaming about address function, segmentation, caches and the 6809, I daydream similar things about the 6801.

But the 6801 does not have Y, U, or DP. 

So if I want to daydream about a 6801-derived CPU that provides address function output to allow distinguishing between the source of the address something like the following,

  • 000 for code
  • 001 for interrupt vectors
  • 010 for U relative (parameter stack)
  • 011 for S relative (return address stack)
  • 100 for general data (extended/absolute, X, Y)
  • 101 for direct page
  • 110 and 111 for DMA

with customized caches for each, since access patterns for each are so different, I'll have to daydream about extending the 6801 in a somewhat different path from the 68HC11.

And being able to do things like
  • indexing into ROMmed jump tables at run time,
  • copying strings out of ROMs,
  • using X or Y to access arrays on the parameter stack,
  • and such.

just means we're going to use the more general solution. Instead of just adding address function output bits and relying on bank switching, I have to daydream about widening the address bus and extending the index registers, either with segment registers (preferably paired with limit registers) or simple extra bits in the registers.

And if the direct page (page zero) is going to be separate from absolute addresses, I'm going to need some register to distinguish them. Maybe absolute would only be the low 64K, or maybe I'd add an absolute base page register to extend absolute addresses out to whatever maximum. And this CPU would get a DP register like the 6809's, I guess. Maybe it would only have 8 bits, and direct page would only be in the low 64K. Or even maybe less. Or even split the direct page so that the low half (I/O) and the high half (pseudo-registers) could be moved independently. Decide all that later.

8 bits of extension would be more than enough for 6801-sized applications, but would 5 bits? 5 for ordinary address plus 3 for function would be 8 bits of extension, for 24 bits, total.

21 bits of address is 2 megabytes per address space. I can see that being a bit tight for some things that I'd still want to do on a fundamentally 8-bit CPU, but it's okay to leave that question for later.

(The 6809 is actually fundamentally 16-bit, 8-at-a-time, just for reference. It's the LEA instructions.)

In addition to Y, we'll need an additional U stack for the parameters. It would be nice to have it indexable like X and Y, but by providing TXU, TUX, TYU, and TUY, we could get the local variable addressing we need without thrashing the index registers so much. So if making U indexable induced significant complexity, we could leave that out and rely on the widened X and Y to give us access to the parameters.

We can keep the constant 8-bit unsigned offsets of the 6801/HC11, but we need to add an SBX instruction to subtract B from X to do real pointer math, and we need the corollary ABY/SBY. 

ABU and SBU would be nice, if we are going to include direct indexing of the parameter stack pointer.

And, incidentally, add immediate to index register is something I really wish Motorola had added for the 6801, either 16-bit signed AIX (and AIY and maybe AIU in this CPU), or by unsigned 8-bit AIX/SIX (and AIY/SIY and maybe AIU/SIU.

With all the transferring and adding, maybe we just need to trade the implicit operand instructions for register-register instructions. 

This kind of extension would give us enough to allow PC caching by fill-ahead with one branch of fill, checking whether the branch code is in cache first. Wouldn't need as much as the 6809, since total instruction length is shorter, maybe? But then there will be the long address forms, so probably 32 bytes each branch.

Return address stack cache with spill-fill, 2/4 centered hysteresis, eight entries would probably be good enough, maybe 32 if we simply want to make sure (most) embedded applications would not need external return address stack.

64 bytes of spill-fill parameter stack cache with hysteresis would probably be plenty.

A different approach to the direct page, the DP could be banked in small blocks, 32 bytes per block seems best to me, but maybe 64 if it's enough simpler. Only addresses $80 to $FF would be cached to RAM, since addresses $00 to $7F would be for I/O. 

Rather than automatic caching, the DP RAM would then just be internal RAM, 1 kilobyte addressed from $400, with the bank switching circuit dual-mapping small blocks down to blocks in the $80-$FF range. The bank switching circuit might be tied to an active process number. We can figure out the details later.

Again, maybe not even bother caching general data accesses.

How Address Function and Caches Should Interact (6809 Example)

I keep daydreaming about a 6809 derived CPU that provides address function output (like the 68000's address functions, bit not the same) to allow distinguishing between the source of the address, say,
  • 000 for PC relative (code)
  • 001 for interrupt/reset vectors
  • 010 for U relative (parameter stack)
  • 011 for S relative (return address stack)
  • 100 for general data (extended/absolute, X, Y)
  • 101 for direct page relative
  • 110 and 111 for DMA, maybe.

And customized caches for each, since access patterns for each are so different.

And I keep remembering that with 16-bit addressing, even this would be barely big enough for limited Fuzix.

And I keep remembering that X and Y, being general index registers, will need extra bits if you want to do things like

  • indexing into ROMmed jump tables at run time,
  • copying strings out of ROMs,
  • using X or Y to access arrays on the parameter stack,
  • and such.

And then I remember that widening the address bus and the index registers is the more general solution -- either via segment registers (which I think I would pair with limit registers) or by simply widening the registers. And then I wander off into ways of extending the address range of the extended/absolute mode and DP-relative, maybe adding a second DP-like register for I/O, and such. (Widened DP relative and IOPage relative addressing could be added via unused index mode bit patterns in the indexed mode post-byte, of course.)

(8 bits of index extension should be plenty for the 6809, but we want to be able to still encode address function in 3 bits, so maybe 16 bits of extension total.

And forgetting the cache gadgetry that got me running back over this ground again.

In this scheme, PC cache just needs to fill ahead, and fill a second direction over one branch, checking whether the branch code is in cache first. 32 bytes per branch, 64 total would be plenty on a 6809.

Return address stack cache just needs spill-fill with 2/4 hysteresis in the middle. Eight entries would speed calls and returns a lot, Thirty-two entries would be enough to eliminate the need for return stack RAM on many embedded applications.

Parameter stack would have to be a bit more flexible than the return address stack, but would be similar in operation.

The DP cache would be a bit weird, but 256 bytes of cache, two, four, or eight banks, and address tag for each bank, with some sort of mechanism to automatically save a direct page that has been switched out, perhaps after another has been switched, and so forth. 

Of course, there would be system cache and user program cache for fast process context switching.

And, with all that, maybe not even bother caching general data accesses.

(Analyzing this for the 6801 here: https://defining-computers.blogspot.com/2024/11/how-address-function-and-caches-should-interact-6801-derived-example.html.)

Sunday, September 22, 2024

A Bird's-eye View of an Alternate History Roadmap of the 680X and 680X0 CPUs

About the time I wrote a critique of the 680X and 680XX CPUs, I also started writing an alternate history roadmap for how they developed, but I got mired down in details and the alternate history novel I was trying to write then.

Now, after a few years, I'm working on a tutorial, and I wanted something to refer to. (Partly for the construction of the tutorial, but also as my personal notes from which I hope to eventually synthesize some workable extensions to the 680X and 680XX, should I ever be able to properly retire and play with programmable logic devices. Probably only decodable by me.)

So I'm going to write that alternate history roadmap here, but deliberately leave out details. Each stage will list extensions in order of my perceived priority and probably order of development.

Of course, this should all be considered to be some random wacko daydreaming out loud on the Internet. 

 

68/2805

Adds a parameter stack to 6805. Implements dual-port return address stack RAM to optimize calls and returns.

 

68/3808

Adds indexable parameter stack and Y register to 68HC08,  along with 68/2805 optimized call/return via dual-port return stack, etc.


68/0801

Extends the 6801, breaking object code compatibility, but maintaining one-to-one mapping from both 6800 and 6801 object code to 2801.

(1) Add SBX as complement of ABX, to aid in stack allocation, tracing relative links, and such.

(2) Move the inherent addressing mode op-codes around to facilitate adding direct-page op-codes for the unary (read-modify-write) instructions (INC/DEC, ROL/R, A/LSL/R, etc.). This would make the direct page more effective as pseudo-registers and ease the register pinch, and provide better support for such things as virtual machine registers and (hard-coded) per-task global variables.

ROM code would not be usable as-is, but a simple re-assemble would suffice in many cases. In those cases which relied on the actual object-code values of operators (as in for brittle optimizations), the assembler could flag jump and branch targets and fall-through that do not end up at valid instruction boundaries, and could analyze the value of the bytes at the target, issuing an error and an attempt to interpret the assembled value at the target as an instruction. This would allow the engineer to determine how to either provide a similar optimization or (more likely) replace the brittle optimization with something from the improved instruction set.

(3) Add 4 bits of address function outputs to allow decoding program ROM, general data, direct page, and stack separately, making it possible for the sum of system/user code, data, stack, and DP (I/O, global pseudo-registers) to be greater than 64K, make bank-switching more effective, allow isolation of system and user spaces, allow isolation of the return address stack, and potentially allow adding segment registers for engineers who like segment registers.

The index register (X) would be extended by the function code bits, to allow it to index the separate address spaces. Transfer of stack to X would set the appropriate function code, and operators for setting the function code for other spaces would be provided. (How to do this with system/user space separation?)

(4) Replace the 6801's ASLD and LSRD inherent instructions with a full set of 16-bit unary/RMW shifts ASL16/LSR16 (D/DP/[X]/EXT), and add 16-bit unary/RMW INC16/DEC16 (D/DP/[X]/EXT). This would be especially useful for things like synthesizing generalized stacks and queues and such in software, and in multiply/divide by constant powers of 2, etc.

(5) Built-in dual-port RAM blocks to optimize DP RAM and return stack RAM (call/return optimization). 

(6a) Add (two sets each for user and system) two 5-bit prefix registers for the DP lower and upper halves, lower half in system space would be nominally for I/O and upper/non-system would be nominally for global pseudo-register RAM. This would allow allocating DP spaces within the lower 8K of the physical address space, or within an 8K separate DP space, where internal dual-port RAM and I/O registers could be provided, and external I/O space decoded.

(6b) Optional standard bank-switching schemes for code and data spaces, for 128K to 512K, or for 2M to 16M. Bank switching includes address function decoding, to support isolation of spaces, also includes support for switching between user and system spaces safely.

(6c) For engineers who like segment registers, optional internal segment registers for user/system data, code, and return stack spaces, properly organized to support the dual-port RAM blocks for the process return stacks. (How to integrate this with separate stack and DP?) DP would not have additional segmentation.

 

68/0800

This would be a 68/0801 without built-in peripherals, ROM, RAM, etc., and with pin-out and timing that matches the 6800, to be used as a near-drop-in replacement for the 6800 -- requiring source code re-assembly, but with a high probability of being able to match the original timings. 

 

68/2801

These extensions to the 68/0801 would require op-code map extensions through one or more pre-bytes, and would pretty much implement all our real world's 68HC11 instruction and interrupt functionality, plus a separate parameter stack. Indexing from the second stack would still be by way of index registers.

(1) Extend the 68/0801 with a second (U) stack, providing UPSHA/B/X/Y and UPULA/B/X/Y, TUX/Y and TX/YU.

(2) Provide an extra Y index (more-or-less as in the 68HC11). Both index registers would be extended by 4 bits to support indexing the address spaces. Transfer of stack registers to X or Y would set the appropriate function bits, and operators to set the function bits for other spaces provided, probably as prefixed LDX/Y instructions.

(3) Add an address function code for the U stack.

(4) Add bit instructions per 68HC11, with a toggle bits instruction, as well. These would be added by way of pre-byte.

(5) MUL/DIV --  Add IDIV and FDIV per the 68HC11, using X and D. Add a 16-bit MUL X by D to X:D

 

68/31601

Extends the 68/2801. 

Make all instructions 16-bit internal. Extend the index registers by 4 more bits between the function bits and the logical address bits. Provide 4 bits of extension between the function code constants and the 16-bit logical address bits for PC, stacks, and Extended mode addresses.  Trade the bank switching and segmentation for page-mode memory management.

 

68/2809

6809 compatible, but 6801 equivalent cycle counts. Also, reference the 68/0801 address space extensions.

(1) DP-relative 8- and 16-bit offset modes added to index post-byte.

(2) 8-bit page zero mode added to index post-byte.

(3) IO Page register, with 8- and 16-bit offset modes in index post-byte.

(4) Address functions for data, code, DP,  IOP, return stack, parameter stack, interrupts and DMA.

(5) MUL/DIV -- X by D 16-bit MUL; FDIV and IDIV per 68HC11.

(6) 16-bit unary-operator shifts and INC/DEC.

(7) Bit instructions.

(8) Available as SOC core, like 6801 and 6805.

(9) Standard optional bank switching and/or segmentation.


68/31609

16/32-bit 68/2809, with 32-bit index, stack, and PC registers, 16-bit Extended mode offset register, and paged memory management instead of bank switching. Moves op-codes around for efficiency, adds second 16-bit accumulator. Has 32-bit ADD, SUB, CMP, MUL, and FDIV and IDIV. Dedicated 64-byte spill/fill hysteric caches  for both stacks, dedicated 256-byte PC read-ahead cache, dedicated DP and page zero caches.


68/28000

Basically the 68010 with improved timing; 32-bit offsets for indexing and branching; 32-bit multiply and divide; a system A6 register for system parameters; dedicated stack caches, 256 bytes for A6 and 64 bytes for A7, both user and system; dedicated 256 byte code cache for PC; 4K associative cache for other data; PEA that pushes on A6 or SEA that saves an effective address where the specified address register points.