In addition to daydreaming about address function, segmentation, caches and the 6809, I daydream similar things about the 6801.
But the 6801 does not have Y, U, or DP.
So if I want to daydream about a 6801-derived CPU that provides address function output to allow distinguishing between the source of the address something like the following,
- 000 for code
- 001 for interrupt vectors
- 010 for U relative (parameter stack)
- 011 for S relative (return address stack)
- 100 for general data (extended/absolute, X, Y)
- 101 for direct page
- 110 and 111 for DMA
with customized caches for each, since access patterns for each are so different, I'll have to daydream about extending the 6801 in a somewhat different path from the 68HC11.
And being able to do things like- indexing into ROMmed jump tables at run time,
- copying strings out of ROMs,
- using X or Y to access arrays on the parameter stack,
- and such.
just means we're going to use the more general solution. Instead of just adding address function output bits and relying on bank switching, I have to daydream about widening the address bus and extending the index
registers, either with segment registers
(preferably paired with limit registers) or simple extra bits in the registers.
And if the direct page (page zero) is going to be separate from absolute addresses, I'm going to need some register to distinguish them. Maybe absolute would only be the low 64K, or maybe I'd add an absolute base page register to extend absolute addresses out to whatever maximum. And this CPU would get a DP register like the 6809's, I guess. Maybe it would only have 8 bits, and direct page would only be in the low 64K. Or even maybe less. Or even split the direct page so that the low half (I/O) and the high half (pseudo-registers) could be moved independently. Decide all that later.
8 bits of extension would be more than enough for 6801-sized applications, but would 5 bits? 5 for ordinary address plus 3 for function would be 8 bits of extension, for 24 bits, total.
21 bits of address is 2 megabytes per address space. I can see that being a bit tight for some things that I'd still want to do on a fundamentally 8-bit CPU, but it's okay to leave that question for later.
(The 6809 is actually fundamentally 16-bit, 8-at-a-time, just for reference. It's the LEA instructions.)
In addition to Y, we'll need an additional U stack for the parameters. It would be nice to have it indexable like X and Y, but by providing TXU, TUX, TYU, and TUY, we could get the local variable addressing we need without thrashing the index registers so much. So if making U indexable induced significant complexity, we could leave that out and rely on the widened X and Y to give us access to the parameters.
We can keep the constant 8-bit unsigned offsets of the 6801/HC11, but we need to add an SBX instruction to subtract B from X to do real pointer math, and we need the corollary ABY/SBY.
ABU and SBU would be nice, if we are going to include direct indexing of the parameter stack pointer.
And, incidentally, add immediate to index register is something I really wish Motorola had added for the 6801, either 16-bit signed AIX (and AIY and maybe AIU in this CPU), or by unsigned 8-bit AIX/SIX (and AIY/SIY and maybe AIU/SIU.
With all the transferring and adding, maybe we just need to trade the implicit operand instructions for register-register instructions.
This kind of extension would give us enough to allow PC caching by fill-ahead with one branch of fill, checking whether the branch code is in cache first. Wouldn't need as much as the 6809, since total instruction length is shorter, maybe? But then there will be the long address forms, so probably 32 bytes each branch.
Return address stack cache with spill-fill, 2/4 centered hysteresis, eight entries would probably be good enough, maybe 32 if we simply want to make sure (most) embedded applications would not need external return address stack.
64 bytes of spill-fill parameter stack cache with hysteresis would probably be plenty.
A different approach to the direct page, the DP could be banked in small blocks, 32 bytes per block seems best to me, but maybe 64 if it's enough simpler. Only addresses $80 to $FF would be cached to RAM, since addresses $00 to $7F would be for I/O.
Rather than automatic caching, the DP RAM would then just be internal RAM, 1 kilobyte addressed from $400, with the bank switching circuit dual-mapping small blocks down to blocks in the $80-$FF range. The bank switching circuit might be tied to an active process number. We can figure out the details later.
Again, maybe not even bother caching general data accesses.
No comments:
Post a Comment