defining computers: 6801

Misunderstanding Computers

Why do we insist on seeing the computer as a magic box for controlling other people?
人はどうしてコンピュータを、人を制する魔法の箱として考えたいのですか?
Why do we want so much to control others when we won't control ourselves?
どうしてそれほど、自分を制しないのに、人をコントロールしたいのですか？

Computer memory is just fancy paper, CPUs are just fancy pens with fancy erasers, and the network is just a fancy backyard fence.
コンピュータの記憶というものはただ改良した紙ですし、CPU 何て特長ある筆に特殊の消しゴムがついたものにすぎないし、ネットワークそのものは裏庭の塀が少し拡大されたものぐらいです。

(original post/元の投稿 -- defining computers site/コンピュータを定義しようのサイト)

Showing posts with label 6801. Show all posts

Thursday, November 28, 2024

How Address Function and Caches Should Interact (6801-derived Example)

In addition to daydreaming about address function, segmentation, caches and the 6809, I daydream similar things about the 6801.

But the 6801 does not have Y, U, or DP.

So if I want to daydream about a 6801-derived CPU that provides address function output to allow distinguishing between the source of the address something like the following,

000 for code
001 for interrupt vectors
010 for U relative (parameter stack)
011 for S relative (return address stack)
100 for general data (extended/absolute, X, Y)
101 for direct page
110 and 111 for DMA

with customized caches for each, since access patterns for each are so different, I'll have to daydream about extending the 6801 in a somewhat different path from the 68HC11.

And being able to do things like

indexing into ROMmed jump tables at run time,
copying strings out of ROMs,
using X or Y to access arrays on the parameter stack,
and such.

just means we're going to use the more general solution. Instead of just adding address function output bits and relying on bank switching, I have to daydream about widening the address bus and extending the index registers, either with segment registers (preferably paired with limit registers) or simple extra bits in the registers.

And if the direct page (page zero) is going to be separate from absolute addresses, I'm going to need some register to distinguish them. Maybe absolute would only be the low 64K, or maybe I'd add an absolute base page register to extend absolute addresses out to whatever maximum. And this CPU would get a DP register like the 6809's, I guess. Maybe it would only have 8 bits, and direct page would only be in the low 64K. Or even maybe less. Or even split the direct page so that the low half (I/O) and the high half (pseudo-registers) could be moved independently. Decide all that later.

8 bits of extension would be more than enough for 6801-sized applications, but would 5 bits? 5 for ordinary address plus 3 for function would be 8 bits of extension, for 24 bits, total.

21 bits of address is 2 megabytes per address space. I can see that being a bit tight for some things that I'd still want to do on a fundamentally 8-bit CPU, but it's okay to leave that question for later.

(The 6809 is actually fundamentally 16-bit, 8-at-a-time, just for reference. It's the LEA instructions.)

In addition to Y, we'll need an additional U stack for the parameters. It would be nice to have it indexable like X and Y, but by providing TXU, TUX, TYU, and TUY, we could get the local variable addressing we need without thrashing the index registers so much. So if making U indexable induced significant complexity, we could leave that out and rely on the widened X and Y to give us access to the parameters.

We can keep the constant 8-bit unsigned offsets of the 6801/HC11, but we need to add an SBX instruction to subtract B from X to do real pointer math, and we need the corollary ABY/SBY.

ABU and SBU would be nice, if we are going to include direct indexing of the parameter stack pointer.

And, incidentally, add immediate to index register is something I really wish Motorola had added for the 6801, either 16-bit signed AIX (and AIY and maybe AIU in this CPU), or by unsigned 8-bit AIX/SIX (and AIY/SIY and maybe AIU/SIU.

With all the transferring and adding, maybe we just need to trade the implicit operand instructions for register-register instructions.

This kind of extension would give us enough to allow PC caching by fill-ahead with one branch of fill, checking whether the branch code is in cache first. Wouldn't need as much as the 6809, since total instruction length is shorter, maybe? But then there will be the long address forms, so probably 32 bytes each branch.

Return address stack cache with spill-fill, 2/4 centered hysteresis, eight entries would probably be good enough, maybe 32 if we simply want to make sure (most) embedded applications would not need external return address stack.

64 bytes of spill-fill parameter stack cache with hysteresis would probably be plenty.

A different approach to the direct page, the DP could be banked in small blocks, 32 bytes per block seems best to me, but maybe 64 if it's enough simpler. Only addresses $80 to $FF would be cached to RAM, since addresses $00 to $7F would be for I/O.

Rather than automatic caching, the DP RAM would then just be internal RAM, 1 kilobyte addressed from $400, with the bank switching circuit dual-mapping small blocks down to blocks in the $80-$FF range. The bank switching circuit might be tied to an active process number. We can figure out the details later.

Again, maybe not even bother caching general data accesses.

Sunday, July 3, 2022

A Critique of Motorola's 68XX and 680XX CPUs

I want to note at the top here, that this is not about which company's CPU was better. This is not about comparing CPUs at all.

And this is not disparaging Motorola. Motorola did a pretty decent job of designing each of their CPUs, especially when considering that they were not just pioneering microprocessor design. Engineers with experience designing CPUs were basically all already employed, mostly by other companies. (And many of those CPU engineers didn't really understand CPUs all that well, after all.) Motorola was also pioneering the design of CPUs in general.

The engineers at Motorola did a good job. But nobody's perfect.

Taking these in the order that Motorola produced them:

6800 Niggles:

(1) It's not hard to guess that the improper optimization of the CPX (compare X index register) instruction was an attempt to be too clever, a bad case of penny-pinching and setting arbitrary deadlines, an oversight, or any and all of the foregoing. But, as a result, the branches implementing signed and unsigned comparisons just don't do what they would be expected to do after CPX.

C (Carry) is simply not affected by CPX on the 6800 (and 6802), so the branches implementing unsigned compare, BCC, BCS, BHI, and BLS just won't work after CPX.
V (oVerflow) is the result from comparing the most-significant byte only, so the branches implementing signed comparison, BGE, BGT, BLE, and BLT fail in hard-to-predict ways after CPX.
N (Negative) is also the result from comparing the most-significant byte only. It may not seem that this is a problem for BPL (branch if plus) and BMI (branch if minus), but the programmers' manual says neither N nor V are intended for conditional branching. It seems to me that the N flag will actually be set correctly after the CPX, giving the sign of the result of the thrown-away subtraction of the argument address from the address in X. But using BPL and BMI in ordered comparison is just going to be a bit fiddly, no matter what. You probably just won't get what you thought you wanted if you use BPL or BMI after CPX.

Z (Zero) is the result of all 15 bits of the result of the compare, so BEQ (branch if equal) and BNE (branch if not equal) after a CPX work as expected.

In the abstract sense, pointers were thought at the time to be necessarily unordered, so it sort of didn't seem to matter. Ideally, you wouldn't be comparing addresses for order. But real algorithms often do want to give pointers order, and that meant that, on the 6800, you would have to use a sequence of instructions to cover all the cases in ordered comparison, because you couldn't rely on CPX alone.

This mis-feature was preemptively prevented in the designs of the 68000 and the 6809, and was fixed, pretty much without issue, in the 6801. In the 6805, it's prevented by making the X register an 8-bit register anyway, more on that below.

(2) Addressing temporary variables and parameters on a stack required using X, and if you had something you needed in X, you had to save X somewhere safe -- which meant on a stack if you wanted your code to be re-entrant. But the 6800 had no instructions to directly push or pop X. That left you with a conundrum. You had to save X to use X to save X.

So you had to use a statically allocated temporary variable. Statically allocated temporaries tend to introduce race conditions even in single-processor designs, because you really don't want to take the time to block interrupts just to use the temporaries, especially for something like adjusting a stack pointer.

You can potentially work around the race conditions in some cases by having your interrupt-time stack pointers separate from your non-interrupt-time stack pointers, but that can also get pretty tricky pretty quickly.

The 6801 provides push (PSHX) and pop (PULX) instructions for X.

Stack-addressable temporary variables and parameters were supported by definition in the 68000 and 6809 designs, but not on the 6801. They were considered out of scope on the 6805, but were addressed on descendants of the 6805.

(3) This niggle is somewhat controversial, but using a single stack that combines return addresses and parameters and temporary variables is a fiddly solution that has become widely accepted as the standard. Even though it is accepted, and learning how to set up a a stack frame is something of a rite-of-passage, setting up stack frames to keep the regions on stack straight consumes cycles, even when it can be done without inducing race conditions (see the above niggle about using X to address the stack.)

Separating parameters and temporaries from return addresses is supported by design on the 68000 and 6809, but not on the 6801 or 6805.

(4) The lack of direct-page mode op-codes for the unary operators was, in my opinion, a serious strategic miss. Sure, you could address variables in the direct page with extended mode addressing, but it cost extra cycles, and it just felt funny.

To explain, the binary instructions (loads, stores, two-operand arithmetic and logic) all have a direct-page mode. This allows saving a byte and a cycle when working on variables in the direct page (called zero page on other processors -- addresses from 0 to 255).

The unary instructions (increment/decrement, shifts, complements, etc.) do not. The irony is that the unary instructions are the ones you use on memory when you don't want to waste time and accumulators loading and storing a result.

This may have been another attempt to save transistors by not implementing every possible op-code. But a careful re-examination of the op-code table layout map indicates that it should have been possible without using significantly more transistors. In fact, I'm guessing it actually required more transistors to do it the way they ended up doing it.

Or it may have been an attempt to avoid running into the situation where they would need an op-code for something important but had already used all of the available codes in a particular area of the map. But, again, re-examining the op-code map would have revealed room to fit the op-codes in.

Maybe there just wasn't enough time to re-examine and reconsider the omissions before the scheduled deadlines, and they thought absolute/extended addressing should be good enough.

I'll come back to the reasons it really wasn't further down.

This one was also fixed in the designs of the 68000, and 6809, and sort-of in the 6805, but not addressed or fixed in the 6801.

Fixing it in the 6801 would have been awkward after-the-fact tack-on, but I'll look at that below.

(5) The 6800 had a few instructions for inter-accumulator math -- ABA (add B to A), SBA (subtract B from A), and CBA (compare B with A, which is SBA but without storing the result).

But it's missing the logical instructions AND, OR, and EOR (Exclusive-OR) of B into A, and doesn't have any instructions at all going the other direction, A into B.

Surprisingly, this is not hard to work around in most cases, but the workarounds are case-by-case tricks with the condition codes. Otherwise, you're back to using statically allocated temporaries, and care must be taken to avoid potential race conditions by such things as using the same temporaries during interrupt processing.

This is fixed in the design of the 68000, and eliminated from the scope of the 6805, effectively fixed in the 6809 (by the addition of stack-relative addressing for temporaries), and partially addressed in the 6801 (by adding 16-bit math, the most common place where it becomes a problem, more below).

(6) The 6800 has no native 16-bit math other than incrementing, decrementing, and comparing X, and incrementing and decrementing S. Synthesizing 16-bit math is straightforward, but -- especially without the inter-accumulator logical operators -- it does require temporary variables, requiring extra processor cycles and potentially inducing race conditions.

Also, you usually need one or more extra test cases to cover partial results in one or the other byte, or the use of a logical instruction to collect the results, and it's easy to forget or just fail to complete the math, per the problem with CPX.

And you need 16-bit arithmetic to deal with 16-bit addresses.

This is solved on the 6809 and 6801 by adding 16-bit addition and subtraction. On the 68000, the problem becomes 32-bit math, and it's solved for addition and subtraction, but, oddly, not quite completely for multiplication and division, more below.

(7) To explain this last niggle, of the above niggles, (1), (2), (3), (5), and (6) can be solved in the software application/operating system design by appropriate declaration of global pseudo-register variables, and globally accessible routines to handle the missing functionality, exercising care to separate variables and code for interrupt-time functions from those for non-interrupt-time functions. (These global routines and variables are a core feature of most 8-bit operating systems.)

For example, if your system design declares and systematically uses something like the following:

ORG $80 ; non-interrupt time global pseudo-registers
PSTK RMB 2 ; two bytes for parameter stack pointer
QTMP RMB 2 ; temporary for high bytes of 32-bit quadruple accumulator
DTMP RMB 2 ; temporary for 16-bit double accumulator
XTMP RMB 2 ; temporary for index math and copy source pointer
YTMP RMB 2 ; temporary for index math and copy destination pointer

ORG $90 ; interrupt time global pseudo-registers
IPSTK RMB 2 ; two bytes for parameter stack pointer
IQTMP RMB 2 ; temporary for high bytes of 32-bit quadruple accumulator
IDTMP RMB 2 ; temporary for 16-bit double accumulator
IXTMP RMB 2 ; temporary for index math and copy source pointer
IYTMP RMB 2 ; temporary for index math and copy destination pointer

... and if all the processes running on your system respect those global variable declarations, then you may at least have a way to avoid the race conditions.

But that chews a piece out of the memory map for user applications.

Now, if the unary operators all had direct-page mode versions, see niggle (4) above, the processor could also define a direct-page address space function code, along several other such function codes, allowing the system designer to optionally include hardware to separate the direct-page system resources from other resources in the address map, such as general data, stack, code, interrupt vectors, etc.

Two or three extra address lines could be provided as optional address function codes, to allow hardware to separate the spaces out.

This looks kind of like the I/O instructions on the 8080 and 8086 families, but it isn't separate instructions, it's separate address maps.

An example two-bit function code might be

00: general (extended/absolute) data and I/O
01: direct-page data and I/O
10: code/interrupt vectors
11: return address stack

Such extra address function signals can improve the utilization of the cramped 64 kilobyte address space, even though they would require increasing the number of pins on the processor package or multiplexing the functions onto some other signals, raising the effective count of external parts.

But they provide a place for such things as bank-switch hardware, in addition to general I/O and system globals and temporaries, without having to eat holes in general address space. And completely separating the return pointer stack from general data greatly increases the security of the system.

I'm not sure if Motorola ever did so in any of their evolved microcontrollers, but this could also potentially allow optimizing access to direct-page pseudo-registers when direct-page RAM is provided on-chip in integrated system-on-a-chip devices like the 6801 and 6805 SOC packages.

The 68000 provides similar address function codes, but the address space on the 68000 is so much bigger than 64 kilobytes that the address function codes have been largely ignored.

Before Motorola began designing new microprocessors, such niggles in the 6800 were noticed and discussed in engineering and management within Motorola. The company decided to analyze code they had available, including internally developed code and code customers shared with them for the purpose of the analysis, looking for bottlenecks and inefficient sequences that an improved processor design could help avoid. The results of this code analysis motivated the design of the 68000 and the 6809.

The 68000 and the 6809 were designed concurrently, by different groups within Motorola.

68000 Niggles:

The 68000 significantly increases the number of both accumulators (data registers) and index registers, and directly supports common address math in the instruction set. It also widens address and data registers to 32 bits. They solved a lot of problems, but they left a few niggles.

(1) The processor was excessively complex. Having a lot of registers reduced the need for complex instructions and for instructions that operated directly on memory without going through registers, but the 68000 did complex instructions and instructions that operated directly on memory, as well.

IBM was just beginning work on the 801 (followup to the ROMP) at the time, and reduced instruction sets were still not a common topic, so the assumption of complexity can be understood.

Still, the complexity required a lot of work to test and properly qualify products for production.

(2) They got the stack frame for memory management exceptions wrong. That is, memory management hardware turned out to work significantly better using the approach they did not initially choose to support, so the frames they had defined did not contain enough information to recover using the preferred memory management techniques. This was fixed in the 68010.

(3) The exception vector space being global made it difficult to fully separate the user program space from the system program space. This was also fixed in the 68010.

(4) Constant offsets for the indexed modes were limited to 16 bits. This seems to be another false optimization -- not fatal because they included variable (register) offsets in the addressing modes, so you could load a 32-bit offset into a data register to get what you wanted. But it still had a cost in cycle counts and register usage. This was not fixed until the 68020, and then they went overboard, making the addressing even more complex, which made the 68020 even harder to test.

(5) They added hardware multiplication and division to the 68000, but they didn't fully support 32 bit multiply and divide. This also was not fixed until the 68020. This can make such things as accessing really large data structures in memory suddenly become slow, when the index to the data structure exceeds 32,767.

Of the above, (4), and (5) could conceivably have been dealt with in the initial design, if management had not been pushing engineering to find corners to cut. The first three were problems that simply required experience to get right.

6809 Niggles:

The 6809 does not increase the number of accumulators, but it does add instructions that combine the two 8-bit accumulators, A and B, into a single 16-bit accumulator D for basic math -- addition, subtraction, load, and store.

On the other hand, it does increase the number of indexable registers to six, and it adds a whole class of address math that can be incorporated into the addressing portion of the instructions themselves, or can be calculated independently of other instructions.

It supports using two of the index registers as stack pointers, and thus supports stack addressing, so that race conditions can generally be completely avoided by using temporary variables on stack. (In comparison, the 68000 can use any of the 8 address registers visible to the programmer as stack pointers.)

One of the stack-pointer capable registers can be used as a frame pointer, making stack frames less of a bottleneck. Or it can be used as a separate parameter stack pointer, pretty much eliminating the bottleneck and improving security. (In comparison, the 68000 includes an instruction to generate a stack frame, which, of course, you don't need when you use properly split stacks. It also includes an entirely superfluous instruction to destroy a stack frame.)

One of the index-capable registers is the PC, which simplifies such things as mixing tables of constants in the code. (This is also supported on the 68000, making a ninth index-capable register for the 68000.)

One of the index registers (DP, for direct page) is a funky 8-bit high-byte partial index for the direct page modes it inherits from the 6800. (This is not done on the 68000, but any of the 68000's address registers can be used in a similar way, with short constant offsets for compact code and reduced cycle counts.)

All unary instructions have a direct page mode op-code, which saves byte count if not cycle count.

(1) As a minor niggle, I can't tell that not providing a full 16-bit base address for the direct addressing mode actually saved them anything in terms of transistor count and instruction cycle count, but we are probably safe in guessing that was their reasoning for doing it that way. It is still useful, although it might have been more useful to have provided finer-grain control of the base address of the direct page. (See above about using any address register in the 68000 in a similar way.)

The DP can be used, with caveat, as a base for process-local static allocations, which greatly reduces potential for inadvertent conflicts in use of global variables and race conditions.

(2) Another niggle about the direct page, the caveat, is that the direct page base is not directly supported for address math. Just finding where the direct page is pointing requires moving DP to the A accumulator and clearing the B accumulator, after which you can move it to one of the index registers. Cycle and register consuming, but not fatal.

(3) A third niggle about both the direct page and the indexed mode, it seems like cycle counts for both could have been better. The 6801 improved cycle counts for both, making the 6809 seem less attractive to engineers seeking for speed. It would have been nice for Motorola to have followed the 6801 with an improved 6809 that fixed the DP niggles and cycle count niggles.

(4) The 6809 also does not have address function code signals. The overall design provides enough power to implement mini-computer class operating systems, but the 64 kilobyte address space then limits the size of user applications. Address function signals that allow separating code, stack, direct page, and extended data would have eased the limits significantly.

On the other hand, widening the index registers would have done even more to ease the addressing restrictions. (I've talked about that elsewhere, and I hope to examine in more carefully sometime in a rant on how the 6809 could have evolved.)

(5) Other than those niggles, the 6809 is about as powerful a design as you can get and still call a CPU an 8-bit processor. In spite of the fact that it would have meant letting the 6809 compete with the 68000 in the market, they could have used the 6809 as the base design of a family of very competitive 16-bit CPUs.

In other words, my fifth niggle is that Motorola never pursued the potential of the 6809.

(6) but not really -- 8-bit CPUs are generally focused on keeping transistor count down for 8-bit applications, so hardware multiplication and division of 16-bit numbers doesn't really make sense in an 8-bit CPU design. This is probably the reason the 6809 only had 8- by 8-bit multiplication, and also probably the reason for the irregular structure of the operation.

A similar 8-bit division of accumulator A by accumulator B yielding 8 bits of quotient and 8 bits of remainder might make sense, but I'm not sure we should want to waste the transistors.

16-bit multiply and divide would have been good for a true 16-bit version of the 6809, but that would include a full 16-bit instruction set.

6801 Niggles:

When the 6809 was introduced in the market, it was still a bit too much complexity in the CPU to comfortably integrate peripheral parts -- timers, serial and parallel ports, and such -- into the same semiconductor die that contained the CPU. So Motorola decided to fix just a few of the niggles of the 6800 for use as a core CPU in semi-custom designs that included on-chip peripheral devices.

(It's something that is commonly misunderstood, that the 6801 actually came after the 6809 historically, but is best understood as a slightly improved 6800, not as a stripped-down 6809. Three steps forward, three steps back, half a step forward.)

As noted above, they fixed the CPX instruction in the 6801, but they did not fix the lack of direct-page unary instructions. They also added instructions to directly push and pop the X index register, which greatly helped when you had something in X that you needed to save before you used X for something else.

And they added the 16-bit loads, stores, and math that combined A and B into a single 16-bit double accumulator D -- similar to the 6809, which overcame a lot of the other niggles about the 6800. In particular, you don't feel the lack of an OR B with A instruction to make sure both bytes of the result were zero, because the flags are correctly set after the D accumulator instructions.

And they included the 8-bit multiply A by B from the 6809. They also included a couple of 16-bit double accumulator shifts, but only for D, not for memory, which is a very minor niggle, an engineering trade-off.

They also added an instruction to add B to X, ABX, to help calculate the addresses of fields within records.

This brings up niggle (1) -- ABX is unsigned, and they did not include a subtract B from X instruction. Being able to subtract B from X, or add a negative value in B to X, would have significantly helped with allocating local variable space on the stack. As it is, ABX is primarily useful for addressing elements with records and structures.

Although I/O devices tended to be assigned addresses in high memory on early 6800 designs, the 6801 put the built-in I/O devices in the direct page. They also put a bit of built-in RAM in the direct page, starting at $80.

But, as I noted above, niggle (2) is that they did not add direct-page mode unary instructions.

If they had done so, either they'd have broken object code compatibility with the 6800, or they'd have had to spread the direct-page op-codes in awkward places in the 6800, which definitely would have cost transistors that they wanted for the I/O devices and such. Either way, I think it would have been worth the cost.

I put together a table showing one possible way to spread them out among unimplemented op-code locations in the inherent/branch section of the op-code table for a chapter of one of my stalled novels, and I'll just copy below a list of where I allocated the direct page op-codes:

NEG direct: $02
ROR direct: $12
ASR direct: $03
COM direct: $13
LSR direct: $14
ROL direct: $15
ASL direct: $18
DEC direct: $1A
INC direct: $1C
TST direct: $1D
JMP direct: $1E
CLR direct: $1F

That doesn't prove anything other than that there were ultimately enough op-codes available. But I'm guessing this layout could be done with a hundred or less extra transistors -- transistors that admittedly would then be unavailable for counters or port bits. But it could be done, and it wouldn't have cost that much.

Also, with these in the op-code map, they could have provided this version of the CPU for compatibility, and then provided another version with the direct-page op-codes correctly laid out for customers who were willing to simply re-assemble their source code. (That's all it would have taken, but many customers wouldn't be willing to take a chance that something would sneak up and bite them.)

One possible more efficient layout would have been to repeat the addressing of the binary op-code groups. Working from the right in the opcode map, there are four columns for accumulator B binary operators and four columns for accumulator A binary operators:

$FX is extended mode B, and $BX is extended mode A;
$EX is indexed mode B, and $AX is indexed mode A;
$DX is direct page B, and $9X is direct page A;
$CX is immediate mode B, and $8X is immediate mode A.

In the existing 6800, this continues down two more for the unaries, but then you have the unary A and B instructions:

$7X is extended mode unary;
$6X is indexed mode unary;
$5X is B unary;
$4X is A unary.

Then you have inherent mode instructions in columns $3X, $1X, and $0X, with the branches in column $2X.

In a restructured op-code map, it could be done like this:

$7X is extended mode unary;
$6X is indexed mode unary;
$5X would be direct page unary;
$4X would be B unary;
$0X would be A unary.

And the inherent mode operators would be more densely packed in the $1X and $3X columns.

~~This would require either moving the negate instructions or the halt-and-catch-fire instruction, I suppose.~~ [I'm not finding my reference that had me thinking the 6801's test instruction was at $00. Cancel that thought.] Interestingly, when Motorola laid out the op-code map for the 6809, they kept A and B in columns $4X and $5X, and put the direct page in column $0X -- and left the negate at row $X0~~, so that they had to move the test instruction~~. [Again, I'm not finding my reference on the location of the 6809's test instruction. But they did leave negate where it was.]

Also interestingly, the 6801 has a direct-page jump to subroutine, which could be put to good use for a small set of quick global routines (like stack?). (The op-code is $9D, which some sources say was one of the accidental test instructions in the 6800).

Niggle (3) about the 6801 is that I think they should have split the stack. Add a parameter stack U, and then pushes and pops (PULs) would operate on the U stack, but JSR/BSR/RET would operate on the S stack. This would make stack frames much less of a bottleneck, make it possible to reduce call and return cycle counts, and increase general code security somewhat.

(Note again that the 6809 and the 68000 both directly support this kind of split stack. It was the education system that failed to teach engineers to use it.)

And I'll note here that the 68HC11 derivative of the 6801 added, among other things, a Y index, but no parameter stack.

6805 Niggles

Really the only niggle I have with the 6805 is the lack of a separate parameter stack, and the lack of any push/pop at all in the original 6805. Motorola did add pushes and pops to some derivatives of the 6805, but they were on the same S stack as the return address was going to.

The idea of an 8-bit index that could have a 16-bit base (as opposed to an offset) was novel to me when I first looked at the 6805, but it is rather useful. Instead of thinking in terms of putting a base address in X and then adding an offset, you think in terms of having a constant base address -- like an array with a known, fixed address, and the X register provides a variable offset. Indexed mode for binary operators includes no base, 8-bit base, and 16-bit base, allowing use anywhere in the address space.

A small caveat is that unary operators do not have 16-bit base address indexed versions. This is a valid engineering tradeoff, and they cut the right corners here, fully supporting unary instructions for variables in the direct page.

The 8-bit index does not support generalized copying and other generalized functions needed to support self-hosted development environments (without self-modifying code), but that's not necessarily a problem. Hosted development environments are much more powerful tools than self-hosted. (I think a very small Tiny-BASIC interpreter could be constructed without self-modifying code, but that's more of an application than a self-hosted dev environment.)

It does make the CPX operator much simpler -- as an 8-bit operator.

Motorola ultimately extended the index with an XHI in some derivatives of the 6805, which would have allowed self-hosting for those derivatives, but we won't go there today. Also, we won't look at the 68HC11 in detail today. Nor will we do more than glance at the 68HC12 and 68HC16, even though both are quite interesting designs -- in spite of not having split stacks.

I think this is enough to show that Motorola really did do a fairly decent job with their CPU designs.

Actually comparing CPUs, by the way, requires producing a lot of parallel code implementing several real-world applications for each CPU compared. I'd like to do that someday, but I doubt I'll ever have the spare time and money to do so.

Thursday, August 26, 2021

Differences between the 6800 and the 6801, with Notes on 68HC11 and 6809

(It occurs to me that adding notes about the 68HC11 and 6809 in here would be useful, but I want to leave the simpler comparison of the 6800 and 6801 as it is. So I'm duplicating that here, and adding notes for the 6809 and 68HC11. However, I am not comparing any of these with the 6805 or its descendants here.)

I've described the differences between the 6800 and the 6801 instruction architectures at length in several other posts. (Many of those posts also take up other CPUs.) This is a high-level overview.

**68HC11: I'll note here that, much as the 6801 is object-code upwards compatible with the 6800, the 68HC11 is object-code upwards compatible with the 6801, but with some timing differences.

**6809: The 6809 is not object-code upwards compatible with either, although it is, to a great degree, assembler source upwards compatible with both, using macros. It is not upward source code compatible with the 68HC11, needing hardware divide instructions and direct bit manipulation instructions. (The bit instructions generally need more than two 6809 instructions to synthesize, and the divide instructions just take too long to wink at.)

(But I still ignore the built-in ROM, RAM, and peripheral devices in the 6801 here. Those are important, but require separate treatment.)

**68HC11: (The 68CH11 has built-in RAM, ROM,. and peripherals, like the 6801.

**6809: The 6809 does not, and never came in a publicly available version that did, that I know of.)

First, where the 6800 had two independent 8-bit accumulators, A and B, the 6801 has, in addition, the ability to combine them as a single 16-bit accumulator (A:B, or D) for several key instructions, load, store, add, and subtract:

LDD
STD
ADDD
SUBD

HIgh byte is in A, low byte in B.

These are available in the full complement of binary addressing modes that the 6800 provides:

immediate (16-bit immediate value)
direct/zero page (addresses 0 to 255)
indexed (with 8-bit constant offset)
extended/absolute (addresses 0 to 65536)

Note that there is no separate CMPD. If you really need to compare two 16-bit values, you'll need to go ahead do a destructive compare -- save D in a temporary, if necessary, and use the subtract instruction. (The way the flags work, it's only rarely necessary to do so.)

Be aware that the D register is not an actual additional register. It is simply the concatenation and A and B. If you LDD #$ABCD, A will have $AB in it and B will have $CD in it.

Registers in the 6800:
	accumulator A:8
	accumulator B:8
Index X:16
Stack Pointer SP:16
Program Counter PC:16
	Condition Codes CC:8

Registers in the 6801:
accumulator D (A:B) accumulator A accumulator B
Index X
Stack Pointer SP
Program Counter PC
	Condition Codes CC

**68HC11/6809: Both the 68HC11 and the 6809 include the CMPD instruction, with the full set of addressing modes.

**68HC11: The 68HC11 has an actual additional Y (IY) index register. Other than the additional byte and cycle implied by the prebyte for the IY instructions, it can do essentially anything the IX register can do.

Registers in the 68HC11:
accumulator D:16 (A:B) accumulator A:8 accumulator B:8
Index IX:16
Index IY:16
Stack Pointer SP:16
Program Counter PC:16
	Condition Codes CC:8

**68HC11: Like the 6800/6801, the 68HC11 has only one indexing mode, a constant 8-bit unsigned offset of 0 to 255 from the index register, but it can be used with either IX or IY:

n,X or n,Y.

**6809: The 6809 adds to the X index and S stack pointer a Y index register and a U stack pointer, and allows indexed address modes with both stack pointers. And the indexed addressing modes are significantly improved, allowing auto-inc/dec, various sizes of signed constant offsets, variable offset using the accumulators, pushing and popping multiple registers on the U and S stacks, and memory indirection. (Deep breath.) Oh. I forgot. PC is indexable, too.

**6809: And the 6809 provides a DP register that serves as the top 8 bits of address when using the direct page address mode. (Unfortunately, while memory indirection on an extended address is provided, memory indirection on a direct-page address is not. Darn.)

Registers in the 6809:
accumulator D:16 (A\|B) accumulator A:8 accumulator B:8
Index X:16
Index Y:16
Indexable Return Stack Pointer S:16
Indexable User Stack Pointer U:16
Indexable Program Counter PC:16
	Direct Page Base DP:8
	Condition Codes CC:8

**6809: Indexing modes for the 6809 for all four indexable registers X, Y, U, and S include

zero offset,
constant signed offset of
- 5 bits (-16 to 15),
- 8 bits (-128 to 127),
- 16 bits (-32768 to 32767),
(signed variable) accumulator offset of
- A (-128 to 127),
- B (-128 to 127),
- D (-32768 to 32767),
auto increment/decrement by 1 or 2,
constant signed offset from PC of
- 8 bits (-128 to 127),
- 16 bits (-32768 to 32767),
absolute/extended memory indirect

**6809: In addition to the indexed modes referencing the four index registers, two additional indexed modes are provided via the index post-byte encoding:

program counter relative, with constant signed offset from PC of

8 bits (-128 to 127) or
16 bits (-32768 to 32767),

extended (absolute) memory indirect.

**6809: The index post-byte encoding also provides one level of memory indirection on the result address for all indexed addressing modes except for constant 5-bit offset mode. (However, this does not imply double indirection for the extended memory indirect mode.) Memory indirection allows, for example, popping a pointer off a stack and loading an accumulator from the pointer without using an intermediate index register -- thus

LDX ,U++ ; pop pointer into X
LDA ,X ; use pointer to load A

can be done in one instruction, without using X:

LDA [,U++]

Second, the 6801 has an 8-bit by 8-bit integer multiply,

MUL

which multiplies the A and B accumulators yielding a 16-bit result in the double accumulator D. 16-bit multiplies can be done the traditional bit-by-bit way to save a few bytes, or with four 8-bit MULs and appropriate adding of columns for a much faster result.

**68HC11/6809: Both the 68HC11 and the 6809 have the 8 by 8 multiply, as well.

There is no hardware divide in the 6801. You'll have to move up to 68HC11, 68HC12, 68HC16, 68000, or Coldfire, for that.

**6809: The 6809 does not have a hardware divide, either.

**68HC11: The 68HC11 has both integer and fractional 16 bit by 16 bit hardware divide.

Third, the 6801 adds two 16-bit shifts, logical shift left and right, accumulator-only:

LSLD (ASLD)
LSRD

**68HC11: These two instructions are present in the 68HC11, as well.

**6809: These two instructions are not present in the 6809.

These were considered key instructions. If you need 16-bit versions of the rest of the accumulator shifts and rotates, they are easy, and not expensive, to synthesize with shift-rotate or shift-shift pairs.

(Note that the 6800/6801 does not provide an arithmetic shift left distinct from the logical shift left. As an exercise for the reader, see if you can make an argument for doing so, and describe the separate behavior the arithmetic shift left should have. Heh. Not sure if I'm kidding or not. Saturation?)

Fourth, the 6801 adds a bit of index math, and push and pop for the index register:

ABX (add B to X, unsigned only)
PSHX
PULX

**68HC11: The 68HC11 adds ABY, PSHY, and PULY. It also, of course, adds increment and decrement Y -- INY/DEY. (But no subtract B from X or Y, darn.)

**6809: The 6809 has two sets of pushes and pops, PSHS/U and PULS/U. Each takes a register list, so you can essentially save or restore the entire processor state on either stack in a single instruction.

**6809/68HC11: There is one important difference between the 6800/6801/68HC11 and the 6809:

**68HC11 The stack in the former is post-decrement push, just as the 6800 stack is, always pointing one byte below the top of stack (next free byte). (TSX, TSY, TXS, and TYS adjust the pointer before moving it to or from the index register, so that indexing has no surprises. But if you save S to memory, then load it to the index register, surprise!)

**6809: The stacks in the 6809 are pre-decrement push, always pointing to the last element pushed. Never any surprise on the 6809, but you do need to be careful about this when moving code from the 6800/6801/68HC11 to the 6809.

**6809: The 6809 adds (hang on to your seat again) a load effective address instruction that can load the result address of any indexed addressing mode into any of the four indexable registers other than the PC. Use the LEA instruction to add any signed constant offset to X, Y, U, or S, or to add either 8-bit accumulator or the double accumulator to X, Y, U, or S. PC cannot be used as a destination of LEA, but it can be used as a source, allowing such things as constant tables embedded in fully position-independent code without having to game the return stack to access them. Yes, this seriously makes up for the otherwise limited register set of the 6809. Serious magic.

There is one 6801 instruction (four op-codes for each of the binary addressing modes), and one only, with different semantics from the 6800:

CPX (full results in flags)

On the 6800, you could only depend on the Zero flag after a CPX. (Actually, Negative was also set, and oVerflow, too, but not by rules that were useful.) On the 6801, the Negative, oVerflow, and Carry flags are also set appropriately, so that you can use any conditional branch, not just BEQ/BNE, after a CPX and get meaningful results.

** The 68HC11 adds the CPY instruction, in full addressing modes, with full 16-bit comparison semantics like the 6801 CPX.

** The 6809 adds CMPY, CMPS, and CMPU, in full addressing modes, with full 16-bit comparison semantics. (The CPX mnemonic is CMPX on the 6809.)

These improve index handling, help support stack frames, etc.

(I am not a fan of stack frames on the return address stack, but the frame pointer pushed on S could as easily be a pointer into a synthesized parameter stack. I think maintaining a synthetic parameter stack is no more expensive than maintaining stack frames in a combined stack on the 6800 or 6801.)

**6809 The 6809 U register can be used as a frame pointer in a combined stack run-time architecture. Alternatively, in a split-stack run-time, it can be pushed to the S stack on routine entry as a frame link.

The 6801 provides an additional op-code for the call instruction, adding the direct/zero page addressing mode:

JSR (direct page)

This allows the programmer to put short, critical subroutines in the direct page for more efficient calls.

(JMP does not get the additional op-code, which means that inner-interpreter loops for virtual machines do not benefit from allocation in the direct page, same as the 6800.)

*68HC11: The 68HC11 follows the 6801 with regards to JSR and JMP. Only extended and indexed mode for JMP, but direct page mode added for JSR.

**6809: The 6809, on the other hand, adds direct page mode JMP as well as JSR.

**68HC11: The 68HC11 follows the 6800/6801 in not providing direct page mode addressing for unary instructions (increments/decrements, shifts, etc.).

**6809: The 6809, on the other hand, does provide the direct page mode opcodes for all unary instructions. (Unfortunately, it does not provide indirection through the direct page.)

Finally, the 6801 adds a branch never instruction:

BRN

This can be useful as a marker no-op in object code, helpful in debugging, linking, and compiler code generation.

**68HC11: The 68HC11 also includes BRN.

**6809: The 6809 also include BRN.

**6809: The 6809 provides long versions of all branches. This, in addition to allowing PC relative indexing, provides significant support for position independent coding so that modules can be loaded anywhere in memory.

If you want more information than this and my writing seems understandable --

[EDIT 202410061839:] Beginning in February 2024, I have been putting together an assembly language tutorial for the 6800, 6801, 6809, and 68000 in parallel. (I do not include the 68HC11 in it because I don't know of an open source/libre licensed simulator for it.) You can find it in my programming fun blog:

https://joels-programming-fun.blogspot.com/2024/03/alpp-assembly-language-programming.html

You can see something of how these additions improve code in a post I put up describing 64-bit math on several of the Motorola CPUs:

https://joels-programming-fun.blogspot.com/2020/12/64-bit-addition-on-four-retro-cpus-6800.html

Also, I discussed microcontroller (plus 6809) differences in this post:

https://defining-computers.blogspot.com/2021/08/guessing-which-motorola-microcontroller-6801-6805-6811-6809.html

I have a long rant discussing the differences between the 68HC11 and the 6809, which picks up the 6801 and 68000 along the way:

https://defining-computers.blogspot.com/2018/12/68hc11-is-not-modified-6809-and-what-if.html

And I have this chapter of a novel in process (or suspended animation, not sure which), which gives a bit more of a detailed discussion of the 6800 instruction set architecture, touching on the 6801:

https://joelrees-novels.blogspot.com/2020/02/33209-little-about-6800-and-others.html

Also, with Joe H. Allen's permission, I forked his Exorsim project and added instruction set architecture support for the 6801 to it. You can find source code for a fig-Forth implementation for the 6800 and an implementation (somewhat) optimized to the 6801 in the test source code I include there:

https://osdn.net/users/reiisi/pf/exorsim6801/wiki/FrontPage

My assembler for 6800/6801 may be useful for assembling the fig-Forth source:

https://sourceforge.net/projects/asm68c/

Monday, August 23, 2021

Differences between the 6800 and the 6801 (Revisited)

I've described the differences between the 6800 and the 6801 instruction architectures at length in several other posts. This is a high-level overview.

(This same rant with notes on 68HC11 and 6809 can be found here: https://defining-computers.blogspot.com/2021/08/differences-between-6800-and-6801-with-notes-68hc11-6809.html .)

(But I still ignore the built-in ROM, RAM, and peripheral devices in the 6801. Those are important, but require separate treatment.)

LDD
STD
ADDD
SUBD

HIgh byte is in A, low byte in B.

These are available in the full complement of binary addressing modes that the 6800 provides:

immediate (16-bit immediate value)
direct/zero page (addresses 0 to 255)
indexed (with 8-bit constant offset)
extended/absolute (addresses 0 to 65536)

Be aware that the D register is not an actual additional register. It is simply the concatenation and A and B. If you LDD #$ABCD, A will have $AB in it and B will have $CD in it.

Second, the 6801 has an 8-bit by 8-bit integer multiply,

MUL

There is no hardware divide in the 6801. You'll have to move up to 68HC11, 68HC12, 68HC16, 68000, or Coldfire, for that.

Third, the 6801 adds two 16-bit shifts, logical shift left and right, accumulator-only:

LSLD (ASLD)
LSRD

Fourth, the 6801 adds a bit of index math, and push and pop for the index register:

ABX (add B to X, unsigned only)
PSHX
PULX

There is one 6801 instruction (four op-codes for each of the binary addressing modes), and one only, with different semantics from the 6800:

CPX (full results in flags)

These improve index handling, help support stack frames, etc.

The 6801 provides an additional op-code for the call instruction, adding the direct/zero page addressing mode:

JSR (direct page)

This allows the programmer to put short, critical subroutines in the direct page for more efficient calls.

(JMP does not get the additional op-code, which means that inner-interpreter loops for virtual machines do not benefit from allocation in the direct page, same as the 6800.)

Finally, the 6801 adds a branch never instruction:

BRN

This can be useful as a marker no-op in object code, helpful in debugging, linking, and compiler code generation.

At this point, you should suspect that the 6801 is fully object-code level upwards compatible with the 6800. It is.

If you want more information than this and my writing seems understandable --

You can see something of how these additions improve code in a post I put up describing 64-bit math on several of the Motorola CPUs:

https://joels-programming-fun.blogspot.com/2020/12/64-bit-addition-on-four-retro-cpus-6800.html

Also, I discussed microcontroller (plus 6809) differences in this post:

https://defining-computers.blogspot.com/2021/08/guessing-which-motorola-microcontroller-6801-6805-6811-6809.html

I have a long rant discussing the differences between the 68HC11 and the 6809, which picks up the 6801 and 68000 along the way:

https://defining-computers.blogspot.com/2018/12/68hc11-is-not-modified-6809-and-what-if.html

And I have this chapter of a novel in process (or suspended animation, not sure which), which gives a bit more of a detailed discussion of the 6800 instruction set architecture, touching on the 6801:

https://joelrees-novels.blogspot.com/2020/02/33209-little-about-6800-and-others.html

https://osdn.net/users/reiisi/pf/exorsim6801/wiki/FrontPage

My assembler for 6800/6801 may be useful for assembling the fig-Forth source:

https://sourceforge.net/projects/asm68c/

Monday, August 9, 2021

Guessing Which Motorola Microcontroller Part It Is (6801/6805/68HC11/6809)

Wasted too much time on this.

This is extracted from my response to a post to the Facebook vintage {Computers | Microprocessors | Microcontrollers} group, asking for help identifying a Motorolo-logo microcontroller found in a washing machine with an apparent custom SOC part number ZC85148L, with an apparent date stamp from early 1984.

There were many guesses as to what the ZC85148 was, and I thought I'd put my guesses and reasoning out here, to make them more available for searching:

I'm guessing either 6805/68HC05 or 6801/68HC01.

6805 was essentially a stripped-down 6800, with only one eight-bit accumulator (A), one eight-bit index (X, yes, eight-bit), bit instructions, expanded indexed modes, better power-saving stuff, lots of timers, some analog-to-digital, and other integrated I/O to choose from. At least some parts included hardware 8-bit multiply.

6801/68HC01 was exactly the 6800 with a few new 16-bit instructions for the double accumulator A:B pair, better X handling, hardware 8-bit multiply, and better power-saving mode stuff.

My reasoning is as follows:

If the chip were a bare CPU, it would need separate ROM and RAM parts on the circuit board, and such were nowhere in evidence. From the late 1970s until Motorola spun off the microprocessors business and renamed it Freescale, they provided Systems-On-a-Chip (SOC) semi-custom microcontrollers which included RAM, ROM, and I/O on chip. It's a pretty safe bet that the part was an SOC microcontroller.

I am not aware of any 6809 SOC products from Motorola, so, since the circuit board showed no evidence of ROM or RAM, I'm pretty sure that it is not a 6809 variant of any sort. (If there were any special-order 6809 SOC microcontrollers, that would be interesting to hear about.)

The 68HC11 did start shipping in 1984, so it could possibly be a 68HC11.

(68HC11 is a 6801 in HCMOS, with an additional Y index register and a pre-byte that converts X-indexed op-codes to Y-indexed opcodes, plus hardware integer and fraction divide, bit instructions, and a little bit more. Some people confuse the 68HC11 with the 6809. I discussed the differences between the 68CH11 and the 6809, comparing their architecture and lineage, in another rant here, several years back: https://defining-computers.blogspot.com/2018/12/68hc11-is-not-modified-6809-and-what-if.html. There are a few errors there I need to go back and correct sometime, but they are not errors of substance, I think.)

I don't have solid information on when the first HC08 microcontrollers started shipping, but my impression is no sooner than the late 1980s. So I'm guessing it was not an HC08 or HCS08 or any of the later extensions thereof.

(HC08s were 68HC05s with a high-byte extension for the X register and some other useful stuff, including hardware 8-bit multiply and divide.)

Other possibilities -- I understand that Motorola did second-source at least one other company's CPU in the 1970s. Whether the Intel 8501 might have been one of those, well, it seems ludicrous, but I have some conflicting memories. I think they had mostly gotten out of that business by the mid-1980s

It is my memory that they dabbled in manufacturing IBM compatible desktop PCs in the mid-to-late 1980s, but I don't remember whether that included manufacturing their own 8086 compatible CPUs, or second-sourcing Intel's. Anyway, that ended up only for desktop, and management quickly recognized that business model was not going to be profitable for them (and had been a marketing misstep).

I did also see announcements and engineering materials for 6502 core SOCs from Motorola, somewhere around 1986 or '87, IIRC, but those also seemed to have been dropped pretty quickly. It would not have been there in 1984.

And it is also my understanding that Motorola provided manufacturing for some mil-std microcontrollers, but I think that was only to the military and maybe NASA. Those were 16-bit, and not based on any of the 68XX or 68XXX series. I suppose it would not be impossible to see something like that in a washing machine controller, but it would be overkill.

There were also 4-bit and 1-bit SOC microcontrollers that Motorola produced in the late 1970s, but I don't remember any of them in 40-pin dip. I think they would be a bit underpowered.

I don't think the 88000 RISC series was even announced yet in 1984, and the Power architecture discussions with IBM and Apple had not even been imagined yet. There was the one-off custom implementation of the 360 architecture borrowing from the 68000. We can be sure that none of these would have been in a 40-pin chip in a washer in 1984.

Which is why my guesses come down to the 6805/68HC05 or the 6801/68HC01.

Thursday, December 20, 2018

68HC11 Is Not a Modified 6809 (and What if?)

(I have a more high-level treatment of this topic, comparing the 6801, 68HC11, and 6809 with the 6800, here: https://defining-computers.blogspot.com/2021/08/differences-between-6800-and-6801-with-notes-68hc11-6809.html.)

In the Color Computer Facebook group, somebody started a "What if?" thread.

What if Radio Shack had recognized that they had in the TRS-80 Color Computer the makings of an IBM PC killer way early in the game? etc.

(Yeah, we know hypothesis contrary to fact is a no-win game. We don't care. Call going there our happy place.)

So, I was looking up the history of the 6809 to refresh my memory so I could talk intelligently about what who shoulda done where when, and I noticed on the wikipedia page for the 6809 that somebody was talking about how the 68HC11 is a modified 6809.

Say what?!?!?!

And the wikipedia page for the 68HC11 had the same assertion.

Double-huh!?

Huh-uh. No. Where did they get that bit of history so tangled up?

(But that might explain why someone in the group would be talking about the 68HC11 as if it were a modified 6809.

For what it's worth, NXP, the successor in interest to Motorola relative to the 6800 and it's descendants, implies in their on-line materials that the 68000 is a descendant of the 6800 through the 6809, which is also just plain wrong. The 68000 and 6809 were essentially developed in parallel, by separate departments. They apparently communicated with each other, but pursued similar but different paths.)

I know, you don't care. I should just log in and correct it, if I care so much.

Well, maybe I will sometime. But stick with me. Maybe it'll be more amusing than youtube videos of geeks spraying package thieves with glitter. Let's take a look.

First, lets look at the register model of the 6800, the 8-bit grandaddy of Motorola's homespun lines of CPUs:

Registers in the 6800:
	accumulator A:8
	accumulator B:8
Index X:16
Stack Pointer SP:16
Program Counter PC:16
	Condition Codes CC:8

Accumulators are kind of like displays on calculators. This calculator has two small displays, so to speak. Not two calculators, two displays. They can accumulate separate sums, but the CPU has to say which it's working on:

LDAA #1 ; Put the number 1 in accumulator A.
LDAB #110 ; Put the number 110 in accumulator B.
* Now you have the year 366 in the combined A:B,
* but you have to work on the number a byte at a time.
PSHB ; Saving the number on the stack
PSHA ; has to be done in two pieces.

*** By the way, all code on this page is untested. ***

*** If you find errors, let me know. ***

An index register is useful for pointing to working areas within the big blackboard of memory. The 6800 provides an indexed mode using the index register with an eight bit offset, and that's it.

LDX #YEARSTABLE

If you want to do math on the pointer, you have to store it somewhere and work on it 8 bits at a time.

STX <TEMP ; Make sure TEMP is in the direct page.
LDAA <TEMP ; Grab the high byte,
LDAB <TEMP+1 ; then the low byte.
ADDB #160 ; (2000*2) modulo 256
ADCA #15 ; (2000*2)/256
* This is a big table in a small computer.)
STAB <TEMPX+1 ; Save the location for later
STAA <TEMPX

A stack is useful for tracking where you've been so you can know where to go next. It's also good for temporary storage of things. With the 6800, to do any math on the item on the top of stack, or to point at things that are buried under the top of stack requires copying the stack pointer to the index register.

TSX ; Now we can address the stack.
LDAA 0,X ; Get the number 366
LDAB 1,X
LDX <TEMPX ; Point to the entry for the year 2000.
STAA 0,X ; Put 366 in the table.
STAB 1,X

It works, but it feels a little awkward and takes lots of instructions.

The Program Counter (Instruction Pointer in other companies' parlance) is necessary to know where you are in the problem solution now. Pointing at things relative to the PC on the 6800 requires executing a moot call to the next instruction, transferring the stack pointer to the index register to get the pointer and do math on it, and loading the index register to load the index register indexed by itself. This is useful if you want to have code containing tables of constants buried in the code itself. Maybe it will help to walk through some code:

* Yes, there are times you might actually do something like this.
DAYSINMONTHS:
FCB 31,28,31,30,31,30,31,31,30,31,30,31
* Leap year check will be done elsewhere.
* Month number on stack is 16 bit integer, but less than 256.
GETDAYSINMONTH:
BSR DUMMY
DUMMY:
PULA ; Get the address of DUMMY to work on.
PULB
SUBB DUMMY-DAYSINMONTHS ; Less than 256, okay?
SBCA #0 ; Address of DAYSINMONTHS, relocatable.
STAA <TEMPX2
STAB <TEMPX2+1
TSX
LDB 3,X ; Skip return address, ignore high byte of month number.
* It would be nice if we could use B as an offset to X, wouldn't it?
* LDX <TEMPX2
* LDB B,X
* But we can't on the 6800.
ADDB <TEMPX2+1
BCC SKIPIT
INC <TEMPX2 ; Had a carry.
SKIPIT:
STAB <TEMPX2+1
LDX TEMPX2
LDB 0,X ; Finally got the days in the month!
TSX
STB 3,X ; Let's use the stack to return the count.
CLR 2,X
RTS ; We remembered to keep the return address safe, right?

Heh. You really wanted to know how to do that, didn't you?

Condition codes, by the way, are where you keep track of things like whether the last math carried or overflowed or not, and of certain modes like whether the processor should let anyone interrupt it or not. The carry bit (C) in the condition code register is what allows BCC (Branch Carry Clear) to fall through and add 1 to the high byte in the above code.

One more useful example is moving a block of code:

* Parameters on stack, hard limit of 2048 bytes.
* pushed in order of source, destination, count:
* 0,S ; return PC
* 2,S ; count
* 4,S ; destination
* 6,S ; source
* Slightly paranoid code.
BLOCKMOVE:
TSX
LDAB 3,X
LDAA 2,X
STAB <COUNT+1 ; Page zero globals, erk.
STAA <COUNT
BMI BLOCKMOVEEND ; Bit 15 set is way too big.
SUBB #1 ; Yes, this is necessary.
SBCA #8 ; 8*256+1 == 2049
BCC BLOCKMOVEEND ; Don't even try if too much.
LDX 4,X
STX <DESTINATION
TSX
LDX 6,X
STX <SOURCE
* We know it's a good, non-zero count.
LDAB <COUNT+1 ; Low byte, easier to check the count.
BRA BLOCKMOVETEST ; Test on entry does what you want.
BLOCKMOVELOOP:
LDX <SOURCE
LDAA 0,X
INX
STX <SOURCE
LDX <DESTINATION
STAA 0,X
INX
STX <DESTINATION
BLOCKMOVETEST:
SUBB #1 ; Reflect it in carry
BCC BLOCKMOVELOOP ; Gets much harder if not test on entry.
DEC <COUNT
BMI BLOCKMOVEDONE ; For count < 256 at start.
BNE BLOCKMOVELOOP
BLOCKMOVEDONE:
TSX
LDAA <DESTINATION
LDAB <DESTINATION+1
STAA 4,X
STAB 5,X
LDAA <SOURCE
LDAB <SOURCE+1
STAA 6,X
STAB 7,X
BLOCKMOVEEND:
RTS
* Leave the parameters for the calling routine.

That looks inefficient, but in the mid 1970s it wasn't that bad. But it definitely shows one of the warts of the 6800, and one of the motivations for adding index registers.

There are ways to do it quicker, especially if you know how much you are moving when you are writing the code. These tricks involve buffer alignment and size alignment and unrolling the loop, etc. That's not what we need to look at here.

(The 6800 was used in a few PCs and game machines of the late 1970s and early 1980s, for example, the APF Imagination Machine and the Tektronix 4050, and some more obscure office machines.)

Now I'm not going to get into the 6800 vs. 8080 vs. 6502 question here. (Maybe I should someday, somewhere else. Not today.) Suffice it to say that the 6800 had some charm points and some warts. All the CPUs back then did.

Now this M6800 microprocessor is useful, but Motorola's customers wanted to have an entire controller in a single part, not a bunch of parts on a PC board. And they wanted it to be a little easier to write programs for. So Motorola put together a project (around 1977) to improve the 6800, keeping it simple so that adding transistors for ROM, RAM, I/O, and timers and such would not blow budget out of the water. They called the result the 6801.

The 6801 was actually in limited production in 1977, but they were pretty quiet about it.

(Why? Maybe they worried that some customers would be upset they had bought the less capable 6800. It's a valid worry. Customers can be unreasonable about wanting the perfect thing yesterday at tomorrow's cheap thing's price.)

Motorola talked to some select customers besides GM in 1978, and started telling general customers about it in 1979. In other words, development on the 6801 started before development on the 6809 and 68000.

Registers in the 6801:
accumulator D (A:B) accumulator A accumulator B
Index X
Stack Pointer SP
Program Counter PC
	Condition Codes CC

What's the difference from the 6800? Well, to see, we'll reproduce the code above using some of the additional instructions. First, we'll load the days in a leap year in the A and B accumulators:

LDD #366 ; The 6801 does this in one instruction!
* Much easier. Now let's point to the right place for it in YEARSTABLE:
LDX #YEARSTABLE ; We could actually put this directly in D.
* And we could have worked more difficult pointer math above,
* to shorten the code. We could also have made it not relocatable.
* So, keeping the two sets of code parallel, we will use X here, too.
PSHB ; Save the day count on the stack.
PSHA
PSHX ; This is a new instruction in the 6801!
TSX ; We can avoid putting TEMP variables in the direct page!
LDD 0,X
ADDD #(2000*2) ; All 16 bits at once!
STD 0,X
* Now we can save the days in the leap year in the table:
LDD 2,X ; There's the days!
PULX ; Of course we can pop what we push!
STD 0,X ; Done! All stored away in YEARSTABLE.
INS ; But we have to drop the count to balance the stack.
INS

That's a lot simpler, less likely to forget something, even if it isn't really fewer total instructions. Runs a bit faster, too. Really simple additions made significant difference for the 6801. Let's look at the DAYSINMONTH thing:

DAYSINMONTHS:
FCB 31,28,31,30,31,30,31,31,30,31,30,31
* Month number on stack as 16 bit integer.
* Leap year check done elsewhere.
GETDAYSINMONTH:
BSR DUMMY
DUMMY:
TSX
LDD 0,X ; Get the address of DUMMY to work on.
SUBD DUMMY-DAYSINMONTHS
STD 0,X ; Address of DAYSINMONTHS, relocatable.
LDB 5,X ; Ignore high byte of month number.
PULX
* It would be nice if we could use B as an offset to X, wouldn't it?
ABX ; But now, on the 6801, we can add B to X!
LDB 0,X ; Get the days in the month!
TSX
CLRA
STD 2,X ; Let's use the stack to return the count.
RTS

13 instructions versus 19. Nice, huh?

How much help for the block move are the additions?

* Parameters on stack, hard limit of 2048 bytes.
* pushed in order of source, destination, count:
* 0,S ; return PC
* 2,S ; count
* 4,S ; destination
* 6,S ; source
* Slightly paranoid code.
BLOCKMOVE:
TSX
LDD 2,X
STD <COUNT
BMI BLOCKMOVEEND ; Bit 15 set is way too big.
SUBD #2049
BCC BLOCKMOVEEND ; Don't even try if too much.
LDD 4,X
STD <DESTINATION
LDD 6,X
STD <SOURCE
* We know it's a good, non-zero count, so we can test at end.
BLOCKMOVELOOP:
LDX <SOURCE
LDAA 0,X
INX
STX <SOURCE
LDX <DESTINATION
STAA 0,X
INX
STX <DESTINATION
LDD <COUNT ; Faster than doing it by halves.
SUBD #1 ; Reflect it in carry
STD <COUNT ; Cleaner than by halves, too.
BHI BLOCKMOVELOOP ; Different end condition from 6800 code.
BLOCKMOVEDONE:
TSX
LDD <DESTINATION
STD 4,X
LDD <SOURCE
STD 6,X
BLOCKMOVEEND:
RTS
* Leave the parameters for the calling routine.

The code is much clearer, and a little faster and shorter.

Just as an exercise, to help convince you these are real improvements, you might want to unroll the loop once to see what happens. Use an extra variable in the zero page to remember if the count is odd, and use the ASR instruction to divide the count by two:

CLR <TAILCOUNT
LDD 2,X
ASRA
RORB
BCC BLOCKMOVEHALVE
INC <TAILCOUNT
BLOCKMOVEHALVE:
STD <COUNT

should get you started. It takes a bit of extra code, but it gets close to double the speed for counts greater than 8 bytes if you do it right.

And, just for good measure, Motorola gave the 6801 an 8 bit by 8 bit hardware multiply that greatly sped up a lot of integer math. But we won't look at that here.

(The 6801 was used in a few PCs and game machines of the late 1970s and early 1980s. For example, Radio Shack's TRS-80 MC-10 had the 6803, which was a ROM-less 6801.)

Motorola's management realized at the time that competing in semiconductors meant really competing, so they decided to leapfrog the competition around 1977 or '78, and invested a lot of engineering time to bring out the 68000 in 1979 or '80:

Registers in the 68000:
Data register/Accumulator D0:32
Data register/Accumulator D1:32
Data register/Accumulator D2:32
Data register/Accumulator D3:32
Data register/Accumulator D4:32
Data register/Accumulator D5:32
Data register/Accumulator D6:32
Data register/Accumulator D7:32
Index register/Stack pointer A0:32
Index register/Stack pointer A1:32
Index register/Stack pointer A2:32
Index register/Stack pointer A3:32
Index register/Stack pointer A4:32
Index register/Stack pointer A5:32
Index register/Stack pointer A6:32
Index register/User stack pointer A7:32
Index register/System stack pointer A7:32
Program Counter (Indexable) PC:32
	Status/Condition Codes CC:16

8 bits and 16 bit registers were too short, 32 bits is the answer! (True, but not without some downside.)

Two accumulators were good, eight were better! (True, but not without some downside.)

One index was not enough, and not being able to index off the stack pointer is a pain, eight index/stack pointer registers is the answer! (Again true, but not without some downside. We won't look at the downsides here, however. Suffice it say those were valid engineering trade-offs.)

As you can see, stack pointers are directly indexable. So is the PC.

This is oversimplifying but you can see a bit of pattern here. Let's see what effect that has on the code:

MOVE.W #366,D0 ; No problem!
* Now let's point to the right place for it in YEARSTABLE:
MOVE.L #YEARSTABLE,A0 ; Again, could have put it in D1.
* Actually that should be LEA YEARSTABLE,A0
* but I don't want to distract myself or you.
* Still keeping the two sets of code parallel:
MOVE.W D0,2000(A0) ; Done. I'm not kidding you.

That's what all the extra transistors in the 68000 do for you. More leeway to work on more difficult problems. So let's look at the DAYSINMONTH thing on the 68000:

DAYSINMONTHS:
DC.B 31,28,31,30,31,30,31,31,30,31,30,31
* Month number on stack as 16 bit integer.
* Leap year check done elsewhere.
GETDAYSINMONTH:
* The assembler can get the address difference at assemble time: LEA DAYSINMONTHS(PC),A0 ; Relocatable. LEA will use the difference.
* Single stack to keep the code parallel.
MOVE.L 4(A7),D0 ; Month number in 32 bit integer on stack.
* CLR.L 4(A7)
* MOVE.B (D0,A0),7(A7) ; Done. I kid you not.
* Except it's better not to abuse the external memory bus so much:
CLR.L D1
MOVE.B (D0,A0),D1 ; Got the count.
MOVE.L D1,4(A7) ; Done. For real.
RTS

Heh. This is why we like the 68K.

Interested in how nicely it works for block moves?

* Parameters on stack, hard limit of half a megabyte.
* pushed in order of source, destination, count:
* 0,A7 ; return PC
* 4,A7 ; count
* 8,A7 ; destination
* 12,A7 ; source
* We could call A7 SP, but you get the point.
* Slightly paranoid code.
BLOCKMOVE:
MOVE.L 4(A7),D0
BMI BLOCKMOVEEND ; Bit 31 set is way, way too big.
SUB.L #524289,D0 ; Subtract 524289 *from* D0.
BCC BLOCKMOVEEND ; Don't even try if too much.
* We know it's a good, non-zero count.
MOVE.W 4(A7),D1 ; One of the 68000's warts.
MOVE.W 6(A7),D0 ; Just the lower 2 bytes.
MOVEA.L 8(A7),A1
MOVEA.L 12(A7),A0
BRA BLOCKMOVETEST ; Designed for test on entry.
BLOCKMOVELOOP:
MOVE.B (A0)+,(A1)+
BLOCKMOVETEST:
DBF D0,BLOCKMOVELOOP ; That wart, only the lower half counts.
DBF D1,BLOCKMOVELOOP ; Did I get this right this time?
MOVEA.L A1,8(A7)
MOVEA.L A0,12(A7)
BLOCKMOVEEND:
RTS
* Leave the parameters for the calling routine.

Loop unrolling to four bytes at a time is pretty straightforward, but if you're going that far you'll also want to check memory alignment. It can get a little messy before it becomes obvious. And then the 68000 has the MOVEM instruction that would let you push blocks of 64 bytes or more in one time through, using registers. You really have to think that one through if you try it.

Many PCs and game machines had 68000s in them -- the Atari ST, the Amiga, the original Macintosh and the Lisa which preceded the Macintosh, the Sega Genesis/Mega Drive and so on. Lots of arcade games. There were also many workstations with versions of the 68000, including early Suns, the Apollo/Domain, the NeXT, early HP 9000s, the Tandy Model 16, and a lot of workstations from less well-known companies. IBM even had a scientific computer with the 68000 (System 9000) at the time it introduced the infamous IBM PC.

Arguments that the 68000 was somehow inferior to the 8086 in any way but being initially more expensive and being produced by a company that had no intention of trying to form a monopoly with it are pure revisionist history.

Now, here is a point that is often gotten wrong -- the 6809 was not a predecessor to the 68000 (contrary to what even NXP might have you believe).

Some of the engineers thought the 6800 could use some polishing up beyond the 6801, and management allowed them to put together the 6809 at pretty much the same time the company was putting together the 68000. The 6809 was actually brought into production before the 68000, but neither is an ancestor of the other.

The 6809 looks like this:

Registers in the 6809:
accumulator D:16 (A\|B) accumulator A:8 accumulator B:8
Index X:16
Index Y:16
Indexable Return Stack Pointer S:16
Indexable User Stack Pointer U:16
Indexable Program Counter PC:16
	Direct Page Base DP:8
	Condition Codes CC:8

The design team here was a bit less ambitious than the design team for the 68000. The 6809 only got half as many index/stack pointers. And the two 8-bit accumulators can be put together for 16 bit addition and subtraction like the 6801 does, but there's only 16 bits of accumulator. Some people think it's register poor.

Anyway, you can definitely see the influence of the 6800 in the 6809. The 6809 is truly descended from the 6800. (We are pretty sure the 68000 and the 6809 influenced each other, but, in parallel, going somewhat different directions.)

Influence from the 6801? Maybe, but the op-codes in the 6809 have been seriously restructured from the 6800, especially the indexing. Indexing is much more richly supported on the 6809.

[JMR202009211038:
My memory is that the 6801 project actually began after the 6809, and the influence was in the reverse, with the D double accumulator borrowed back to the 6801 and the ABX instruction implemented to help overcome lack of the LEA load effective address. But I can't find good references for that now.
]

Incidentally, many small engineering projects that can be designed with the 6809 actually take less code and run faster on the 6809 at a 1 MHz memory cycle than on the 68000 at 1 MHz memory cycle. The downside of that is the limit of expansion of functionality. There are many problems that just can't be solved with only 16 bits of address. (Thousands of characters in Unicode? Millions of colors on a screen way too large to directly address in 16 bits?) That's why you need the larger CPU for many things.

And the 68000 was produced at much higher frequencies, as well. Should give some examples of these, but not today.

Let's look at 6809 code examples per the above, instead:

LDD #366 ; Days in leap years.
* Now let's point to the right place for it in YEARSTABLE:
LDX #YEARSTABLE ; Here's why we didn't put it in D.
* Or LEAX YEARSTABLE,PCR -- but we won't mention that here.
* Keeping the several sets of code parallel, still,
STD (2000*2),X ; Done. No, I'm still not kidding you.

Less than half the transistors of the 68000, and it still nails it. Let's look at the DAYSINMONTH thing on the 6809, also:

DAYSINMONTHS:
FCB 31,28,31,30,31,30,31,31,30,31,30,31
* Month number on stack as 16 bit integer.
* Leap year check done elsewhere.
GETDAYSINMONTH:
LEAX DAYSINMONTHS,PCR ; Relocatable. The assembler knows how.
* Single stack to keep the code parallel.
* LDB 3,SP ; Month number in 16 bit integer, ignore high byte.
* CLR 2,SP ; Either way, really.
* LDB B,X ; Ignore high byte of offset.
* STB 3,SP ; Or, go ahead and use the (zero) high byte:
LDD 2,SP ; Month number in 16 bit integer.
LDB D,X ; Got the count in B.
CLRA
STD 2,X ; For real.
RTS

What about block moves on the 6809?

* Parameters on stack, hard limit of 2048 bytes.
* pushed in order of source, destination, count:
* 0,S ; return PC
* 2,S ; count
* 4,S ; destination
* 6,S ; source
* Slightly paranoid code.
BLOCKMOVE:
LDD 2,S
BMI BLOCKMOVEEND ; Bit 15 set is way too big.
SUBD #2049
BCC BLOCKMOVEEND ; Don't even try if too much.
* We know it's a good, non-zero count, so we can test at end.
LDX 6,S
LDY 4,S
BLOCKMOVELOOP:
LDA ,X+
STA ,Y+
LDD 2,S ; Faster than doing it by halves.
SUBD #1 ; Reflect it in carry
STD 2,S ; Cleaner than by halves, too.
BCC BLOCKMOVELOOP
BLOCKMOVEDONE:
STY 4,S
STX 6,S
BLOCKMOVEEND:
RTS
* Leave the parameters for the calling routine.

Are you sold? Do you want to try unrolling the loop to see how hard it is? Percentagewise, it won't be as much of an increase in speed as the 6801 sees in unrolling, but it would probably still be worth the effort.

One of the downsides of both the 6809 and 68000's improved functionality is that it takes a lot of those small transistors to make them work. That makes less room for adding IO, ROM, RAM, etc. for one-chip stuff.

But the 6809 takes much fewer than the 68000, and only needs 8 bit wide memory where the 68000 needs sixteen bit wide memory.

Perhaps the most well-known PC/game machine to use the 6809 was the TRS-80 Color Computer, an odd-ball machine that existed, perhaps, because the 6809 was just too powerful to suppress. There were others, however, some of which you can find on Wikipedia's category page for 6809 home computers (Dragon in the UK, models from Thomson in France, Fujitsu's FM-8 in Japan, ) and others still in the Wikipedia main 6809 entry. There were also multi-user business machines produced by obscure companies such SWTP, many of which had started with the 6800. And lots of arcade machines and music synthesizers, also still mentioned on Wikipedia. And it was heavily used in Aerospace. I need to gather more links for these, sometimes, but the Wikipedia page still has a lot. And I should mention operating systems, such as the real-time OS, OS-9 from Microware. But this page was not intended as such a list. The point is that the 6809 was and is a hidden workhorse.

The 6809 really could have been used as a core by 1983, just as the 6801 was in 1979. Just as a somewhat advanced version of the 68000 was in the late 1980s. Just as somewhat advanced versions of the 6801 (68HC12 and 68HC16) also were in the 1990s.

I understand that there may have been, among the salescrew and management, those who were scared of having Motorola competing with itself -- scared of cannibalizing 68000 sales with the very competitive 6809, if they told everyone how good the 6809 was. If it was so, it was very shortsighted. The 6809 could easily have gone head-to-head with the 8088 on a lot of smaller designs, and the 68000 would have been an almost natural upgrade path from the 6809, just as the 6809 was a natural upgrade path from the 6801.

Just as a bit of text processing can convert 6800 code into executable 6809 code, a bit of text processing can convert a 6800 code or 6809 code into 68000 code. It may require a little engineering time for cleanup, but you'll want to take the time to optimize the code anyway.

So the 6801 improves the 6800 model by adding instructions that treat the accumulator pair as a single double (16-bit) accumulator, with A holding the high byte and B holding the low byte. And it adds an 8 x 8 multiply with 16 bit result. This much is about all there is in common with the 6809, other than the 6800 origins.

The 6801 also includes a few new instructions that allow adding an offset to the index register, comparing the index register, pushing and popping (PULling) the index register. These are significant improvements, improving efficiency in core areas that were bottlenecks in the 6800 for high-level functionality. Better than the 6800, not as good as the 6809, different. And done differently.

The instructions of the 6809 are not a proper superset of those of the 6801. Conversion is required.

The single indexing mode of the 6801 is the exact same as the 6800's. And the single stack pointer is not directly indexable. Nor is the PC. Comparing that with the 6809, you see all sorts of indexing available in the 6809 to shorten and clarify code.

There were some other important improvements, such as additional interrupt vectors for built-in hardware and such. But the 6801 is very much a separate path of evolution from the 6809. The additional interrupts in the 6809 are more general. I guess I'm repeating myself.

More general. That's a good way to compare the 6809 and the 6801.

Game machines with the 6809 in them are too numerous

Now we come to the 6811, which, functionally speaking, harks back to and extends the 6801 rather than the 6809. The year of introduction was 1985.

Registers in the 68HC11:
accumulator D:16 (A:B) accumulator A:8 accumulator B:8
Index X:16
Index Y:16
Stack Pointer SP:16
Program Counter PC:16
	Condition Codes CC:8

The 68HC11 improves on the 6801 by adding another index register, Y. This significantly eases another bottleneck when handling two pointers at once, but the indexing mode is the same as the 6800's. It is not as general as the 6809's indexing modes (plural). The additional index register is supported by adding instructions, not by modifying the addressing modes.

Even after all my preaching, you might still think, oh, that's like the 6809, only missing the U. Uhm, and the DP, whatever that is.

Yes. It's missing U. Still only one stack pointer. SP is still not indexable. Neither is PC.

And no direct page register, which I haven't really demonstrated uses for here. (It's a bit of a complex topic, which I have partially addressed in other rants, erm, blog posts, in this blog.)

It is not a stripped down 6809. If it were, it would have much more complex addressing modes.

We can be sure the manufacturing process was influenced by things they learned developing the 6809, but that can also be said of the 68000, and we are not going to say it was somehow derived from the 68000. Not if we want to speak meaningfully.

It's a beefed-up 6801, not at all by way of the 6809. You still don't believe me?

Let's look again at that code we've been playing with, converted for the 68HC11:

LDD #366 ; The leap year.
LDY #YEARSTABLE ; How much will Y actually help?
PSHY ; Still have to have it in some temporary.
PSHB ; Save the day count on the stack.
PSHA
TSX
LDD 2,X ; Extra index register let us reorder the stack.
ADDD #(2000*2)
STD 2,X ; 2,X costs the same as 0,X on this CPU.
PULA ; Leap year days.
PULB
PULY ; Bring the entry address back.
STD 0,Y ; Done! All stored away in YEARSTABLE.
* And we don't have to balance the stack.

Not much of a savings here, and you could say this is contrived, but stack order issues do occur in real code. That's part of why two accumulators were useful in the original 6800.

Will it help in DAYSINMONTHS?

DAYSINMONTHS:
FCB 31,28,31,30,31,30,31,31,30,31,30,31
* Month number on stack as 16 bit integer.
* Leap year check done elsewhere.
GETDAYSINMONTH:
BSR DUMMY
DUMMY:
TSY
LDD 0,Y ; Get the address of DUMMY to work on.
SUBD DUMMY-DAYSINMONTHS
STD 0,Y ; Address of DAYSINMONTHS, relocatable.
LDB 5,Y ; Ignore high byte of month number.
PULX
ABX
LDB 0,X ; Get the days in the month.
CLRA
STD 4,Y ; Y is not equal S, but we can use it.
RTS

Not much, but it does help.

The real utility of having X and Y in the 6811 is for moving large blocks of data. You can have both the source and destination pointers in registers, where in the 6801 and before you must juggle the source and destination through X.

* Parameters on stack, hard limit of 2048 bytes.
* pushed in order of source, destination, count:
* 0,S ; return PC
* 2,S ; count
* 4,S ; destination
* 6,S ; source
* Slightly paranoid code.
BLOCKMOVE:
TSX
LDD 2,X
BMI BLOCKMOVEEND ; Bit 15 set is way too big.
SUBD #2049
BCC BLOCKMOVEEND
LDD 2,X ; Bring it back.
PSHA ; Save the high byte.
LDY 4,X
LDX 6,X
* We know it's a good, non-zero count.
BRA BLOCKMOVETEST ; But test-on-entry is what we want.
BLOCKMOVELOOP:
LDAA 0,X
INX
STAA 0,Y
INY
BLOCKMOVETEST:
SUBB #1 ; Reflect it in carry
BCC BLOCKMOVELOOP ; Speed up the inner loop.
PULA
SBCA #0 ; Because I prefer BCC here.
PSHA
BCC BLOCKMOVELOOP
INS ; Drop high half of count.
PSHX
PULA
PULB
TSX
STY 4,X
STD 6,X
BLOCKMOVEEND:
RTS
* Leave the parameters for the calling routine.

Well, there's still a little bit of juggling, but the loop is much cleaner than on the 6801, even. But not at all as clean as the 6809 code. Here is where it really becomes clear that the 6811 is not a modified 6809, it's a beefed up 6801.

Incidentally, the 68HC11 defines a number of new bit manipulation instructions, and some other useful things, including hardware byte-wide integer divide, which are not available on the 6809. These instructions, and the fact that it was available as a core for integrated controllers, were the reasons for using the 68HC11.

This is further along the path of evolution that the 6801 began, but it is still on a separate path from the 6809. (Do you get the feeling I'm peeved about this? I do have reasons. Motorola should have provided 6809 as a core like the 6811. More transistors, sure, but look at all the transistors in the 68000-family controllers. Yes, I know that much of Motorola's development work was driven by customer demands, but ....)

Now, there is/was a 68HC12, which further extends the 68HC11. I don't remember some details of the register model, so I'll refrain from putting bad information here. Maybe later.

The stack pointer and, if I recall correctly, the PC, are both indexable on the 68HC12.

The 68HC12 could almost be called a modified 6809. Still missing the second stack pointer. It even has fancy indexing modes similar to some of the 6809's. But the encodings are different, and there aren't as many of them. And if we look at it carefully, it's clear that it is a beefed-up 6811, borrowing ideas from the 6809.

The 68HC12 is still less a modified 6809 and more of a modified 6800 that has some of the features of the 6809, and is missing the most important feature, the extra stack pointer. The instruction set is said to be a superset of the 6811's.

The 68HC16 further extends the 68HC12, but it's still missing the extra stack pointer. Has a third index register which, we supposed, can either be a substitute for the DP register or the U stack pointer, but not both at once. And the indexes are wider, giving an addressable range of a full megabyte.

Okay, so, the what-if game.

What if Motorola's management had recognized that they should have been pitching the 6809 as the direct competitor to the 8088 instead of the 68000? What if they had recognized the opportunity to use the 6809 in researching correct processor design?

Step one, fill in some holes in the instruction set and addressing modes.

1.1 Widen the direct page register to 16 bits, so it can actually be used to point to a local/statically allocated data segment. (This has a ripple effect in the interrupt stack frame and in the encoding and meaning of the TFR and EXG instructions. Or it requires additional PSH/PUL instructions. Maybe both. But the gain is well worth it.)

1.2 Add indirect addressing through direct page variables, so you can stash a pointer in the direct page and use it without disturbing X, Y, U, or S. (Lesson from the 6502.) Essentially adds 128 possible statically allocated local index registers.

1.3 Add integer and fractional divide (as in 68H11). Possibly add bit multiply and divide primitives, but those tended to be misdesigned back then, so maybe not. Make the internal address and double accumulator math functions 16 bits to speed many instructions by a cycle.

1.4 Make it available as a system-on-a-chip core, like they did with the 6801 and then 68HC11, etc.

1.5 Make the mapping MMU and the DMA functions avaliable to be integrated with the core, and add support for both in the condition codes and interrupt architectures.

At step 1.5, the rumoured CoCo 4 could be built without the separate GIME -- or, rather, with the GIME on the same die as the CPU.

In fact, they might have implemented the full 6829 MMU functionality on-chip, succeeding where the 6829 on a separate chip was too slow. And they might have avoided some of the less stable aspects of the 6829 design, accessing CPU control signals directly.

With care in the design of the MMU, the S stack could be made completely unreachable, blocking whole classes of stack gaming vulnerabilities. Likewise, full process separation and such could have been achieved.

And they could have gone on to experimenting with true segment registers (full 32 bit, unshifted, not like the half-baked segments of the 8086, not the big bank switches of the 68HC16) and perhaps bounds registers, to experiment with memory management without re-mapping the memory itself.

And somewhere along the line, they might have proven to themselves that their fear of competing with themselves would not have done damage, that a decent lineup with the 6809 would have actually helped 68000 sales rather than cannibalize them.

If Motorola had been researching correct CPU design on the 6809 in 1981 and '82, they would have been much less likely to have taken the detour through the full reportoire of useless addressing modes that they built in the 68020. They could have applied the lessons of the smaller cousin to the bigger, and avoided the trap of trying to compete with the feature king Intel on useless features.

We can dream, can't we?

Of a world where competition is dog-meet-dog instead of dog-eat-dog, utility of the best fit instead of survival of the artificial fittest.

It's at present only slightly more realistic dream, I suppose, but it would be fun if I had the chance sometime to implement a FIG Forth interpreter optimized to each of the CPUs I've talked about here.

The 6800 version of the fig model was done by members of the Forth Interest Group back in the late 1970s, and I transcribed it in the mid-2000s and made it available as part of my M6800/1 assembler project, and I did a modified version for the 6809 back in the mid-1980s for college. (If you are interested in the 6809 code, contact me. I have it running on emulation, just need some time and motivation to put it up in its own tree. [JMR20201022: You can now find it here: https://osdn.net/projects/bif-6809/.]) (I started on an SH-3 conversion at 32 bits once several years back when I was trying to get myself back in the computer industry.) Really want to do a 68000 conversion at 32 bits, and the 68HC12 and 68HC16 have interesting aspects relative to implementing the thing.

And I'd like to go back and do them all as so-called subroutine-threaded. [JMR20201022: Project now in slow progress, here: https://osdn.net/projects/splitstack-runtimelib/.] Having all of that would be rather useful for comparing processor architectures and instruction sets.