defining computers: A 16/32-bit Extension for the Venerable 6809

This is somewhat in the vein of my last several posts, but wandering a little, I have for several years (decades, even) been thinking of ways to extend my favorite microprocessor, the 6809, to make it useful in applications that use Unicode text and decent-sized displays.

I don't have the tools to do HTML tables right now, so I'm going to assume you have tables of 6809 instructions, registers, and addressing modes handy to refer to. Otherwise, this may not make a lot of sense.

Before I begin, I do not intend to do anything with the 6309 extensions. Don't ask me to explain much beyond the fact that success in this kind of endeavor requires keeping things as simple as possible.

[JMR20190309:
Greg Miller, who implemented a true cycle-accurate 6809 HDL core, has some opinions on extending the 6309 which might be relevant here, in his github repository of the HDL core.

(This was brought up yesterday/today in the Facebook group for the TRS-80 Color Computer, in the course of a conversation on extending the 6809 along the same lines as the 65816 extending the 6502, along with a chart of one possible way to fill out the instruction set holes in the 6309, and some hand-waving at reasons not to go down that rabbit hole. And, speaking of the 65816, too bad it didn't have one more stack register. I'd have been really interested.)
]

I'm thinking of calling this the X61609, BTW.

The first step in this project is to double the register sizes.

The accumulators will be 16 bits wide, and the indexable registers will be 32 bits wide:

    A and B 16 bits wide.

    X, Y, U, S and PC 32 bits wide.

The direct page register will be widened on both ends, to contain a full width 32-bit direct page base address instead of just the high-order part.

   DP 32 bits wide -- full address.

[JMR20190309:
In a 16 bit address space, requiring the base address of the process-local statically allocatable variable segment/block to be set on a 256-byte boundary was a rather severe limitation to the usefulness of the DP register. Looking back on it now, it seems a bit surprising that Motorola themselves never corrected this structural error in the design.

I suppose, if we expand the addressing range to 32 bits, the limitations of having the bottom 8 bits of the DP register be zero is not as severe as with only 16 bits of addressing. There could even be an argument made for leaving it at 8 bits, keeping the resultant direct page within the bottom 64K of memory space. But I doubt it saves enough time on address cycles, and the number of gates it saves in internal registers is no longer relevant.
]

The condition codes will be augmented by status bits, and will be 32 bits wide.

    CC has extra condition code and status bits.

If you are familiar with the processor and the instruction set, you are likely wondering what magic will make this work. Sign extension plays a big role.

We will need more instructions. We'll borrow a play from the original 6809 and use page escapes to encode them. That means that, in addition to page2 ($10) and page3 ($11) pre-bytes, we'll have pagex1 ($14), pagex2 ($15), and pagex3 ($18). To simplify, the codes will be mapped out the same as the 8-bit instructions.

For immediate (constant) operands, the constant portion of the instructions will be 16 bits for accumulators A and B, and 32 bits for the indexable registers X, Y, U, S, and PC, and also 32 bits for the double accumulator, D.

The double accumulator D for 16-bit instructions will be the concatenation of A:B in their full width, with A being the top 16 bits.

For 8-bit instructions, the 16-bit result will generally be the sign extension of the 8-bit result. For example,

    LDA #$-1

will load the 8 bits of the argument and sign extend it, so that the entire 16 bits of A will be -1. (Bit rotate and shift instructions will be the exceptions that prove this rule.)

For 8-bit instructions involving D, D will be the concatenation of the lower 8 bits of the result A and B, sign extended, when D is the target of the instruction.

Likewise, when loading a sixteen bit value into an indexable register, it will be sign extended.

    LDX #$8000

will result in X containing $FFFF8000. Appending "W" to instructions to specify their 16-bit forms,

    LDW X #$8000

is the form required to get $8000 into X, and it will consume 6 bytes of instruction space:

    $14 $E8 $00 $00 $80 $00

This may feel strange, but it should yield the best emulation of the 6809 in existing hardware designs, by decoding the bottom 32K of memory as the low half of 6809 address space and the top 32K as the high half.

Shifts and rotates require a bit of special attention. the 6809 carry bit will be retained, and will alway reflect the carry result defined according to the 6809, in other words, carries from bit 7 mostly, and from bit zero in right rotates, etc. There will be an additional 16 bit carry that will be set according to the 16-bit results, mostly carry from bit 15, etc., and will provide the source for carry in sixteen bit instructions, too.

For register exchange and transfer insfructions, whole registers will alway be exchanged or transferred, except for DP and CC. Bits 8-15 of DP will be involved in 8-bit transfers and exchanges, clearing the low byte of DP and extending the sign into the high word as necessary. DP will be allowed in exchanges and transfers with indexable registers, in which case all 32 bits will be affected.

The status register will be allowed in transfers and exchanges with D, in which case all 32 bits will be affected (at least in system mode).

Stack operations may seem to be a cause for concern, but PSH and PUL will be in both the 8- and 16- bit maps, and the 8-bit push and pop will behave as it does in the 6809, where the 16-bit will do the logical thing. The one point of concern is that 8-bit PULs will sign-extend their results, to maintain compatibility.

8-bit SWIs will behave as in the 6809, and 16-bit SWIs will save entire registers.

The traditional interrupt and reset vectors will be sign extended before being used. Addtional 16-bit (thus 32-bit address) interrupt and reset inputs and vectors will be provided.

Finally, the index postbytes will remain almost as they are, with the exception that post-inc/dec modes will increment or decrement by 2 or 4 when used by 16-bit instructions. (I considered multiplying all offsets by two for 16-bit instructions, but it looks like that would complicate, rather than simplify things.)

Also, there will also be at least three new indexing modes --

    1RRI1010 will provide 32-bit offsets from X, Y, U, and S.
    1XXI1110 will provide 32-bit offsets for PC
    101I1111 will provide 32-bit absolute addresses for indirection. (Tentative.)

And I guess a DP relative indexed mode would be useful, to allow using DP variables as extra index registers (6502 envy). There should be room in the undefined postbyte values. But the slickest solution, slipping this mode into the don't care assignments for PC relative, depends on how much ROMmed software there is that used something besides 0s in the don't care bits. It could be switched on and off with a (system-mode-only) status bit, perhaps.

[JMR20190309:
Specifically, I want to be able to use these two currently illegal addressing modes:

LDB [<CURRENT_BYTE] ; Think of input buffer operations.

and

LEAY <CACHED_RECORD

The former currently requires something like

PSHS X ; X probably has something important in it.
LDX <CURRENT_BYTE
LDB ,X ; Just want to check it.
PULS X

Sure, unless I can also use auto-inc or offsets on DP variables, that's a bit limited in usefulness. But the latter currently requires something like

PSHS D,Y
CLRB
TFR DP,A
TFR D,Y
PULS D
* Do stuff, and don't forget to
PULS Y

And popping Y back off the stack is sure easy to forget.
]

From this point, the next priority would be memory management and support for isolating instances of emulated systems. That will be for another time.

And, if there is room left over after that, it would be nice to add bit-level primitives for multiply and divide, such that a 32-bit multiply or divide could just be a sequence of 32 of the primitives. (The primitive would conditionally add or subtract according to the bit being examined and then shift the result one, ready to examine the next.)

defining computers

Misunderstanding Computers

Tuesday, September 26, 2017

A 16/32-bit Extension for the Venerable 6809

No comments:

Post a Comment