(I have a more high-level treatment of this topic, comparing the 6801, 68HC11, and 6809 with the 6800, here: https://defining-computers.blogspot.com/2021/08/differences-between-6800-and-6801-with-notes-68hc11-6809.html.)
In the Color Computer Facebook group, somebody started a "What if?" thread.
What if Radio Shack had recognized that they had in the TRS-80 Color Computer the makings of an IBM PC killer way early in the game? etc.
(Yeah, we know hypothesis contrary to fact is a no-win game. We don't care. Call going there our happy place.)
So, I was looking up the history of the 6809 to refresh my memory so I could talk intelligently about what who shoulda done where when, and I noticed on the wikipedia page for the 6809 that somebody was talking about how the 68HC11 is a modified 6809.
Say what?!?!?!
And the wikipedia page for the 68HC11 had the same assertion.
Double-huh!?
Huh-uh. No. Where did they get that bit of history so tangled up?
(But that might explain why someone in the group would be talking about the 68HC11 as if it were a modified 6809.
For what it's worth, NXP, the successor in interest to Motorola relative to the 6800 and it's descendants, implies in their on-line materials that the 68000 is a descendant of the 6800 through the 6809, which is also just plain wrong. The 68000 and 6809 were essentially developed in parallel, by separate departments. They apparently communicated with each other, but pursued similar but different paths.)
I know, you don't care. I should just log in and correct it, if I care so much.
Well, maybe I will sometime. But stick with me. Maybe it'll be more amusing than youtube videos of geeks spraying package thieves with glitter. Let's take a look.
First, lets look at the register model of the 6800, the 8-bit grandaddy of Motorola's homespun lines of CPUs:
accumulator A:8 | |
accumulator B:8 | |
Index X:16 | |
Stack Pointer SP:16 | |
Program Counter PC:16 | |
Condition Codes CC:8 |
Accumulators are kind of like displays on calculators. This calculator has two small displays, so to speak. Not two calculators, two displays. They can accumulate separate sums, but the CPU has to say which it's working on:
LDAA #1 ; Put the number 1 in accumulator A.
LDAB #110 ; Put the number 110 in accumulator B.
* Now you have the year 366 in the combined A:B,
* but you have to work on the number a byte at a time.
PSHB ; Saving the number on the stack
PSHA ; has to be done in two pieces.
LDX #YEARSTABLEIf you want to do math on the pointer, you have to store it somewhere and work on it 8 bits at a time.
STX <TEMP ; Make sure TEMP is in the direct page.A stack is useful for tracking where you've been so you can know where to go next. It's also good for temporary storage of things. With the 6800, to do any math on the item on the top of stack, or to point at things that are buried under the top of stack requires copying the stack pointer to the index register.
LDAA <TEMP ; Grab the high byte,
LDAB <TEMP+1 ; then the low byte.
ADDB #160 ; (2000*2) modulo 256
ADCA #15 ; (2000*2)/256
* This is a big table in a small computer.)
STAB <TEMPX+1 ; Save the location for later
STAA <TEMPX
TSX ; Now we can address the stack.It works, but it feels a little awkward and takes lots of instructions.
LDAA 0,X ; Get the number 366
LDAB 1,X
LDX <TEMPX ; Point to the entry for the year 2000.
STAA 0,X ; Put 366 in the table.
STAB 1,X
The Program Counter (Instruction Pointer in other companies' parlance) is necessary to know where you are in the problem solution now. Pointing at things relative to the PC on the 6800 requires executing a moot call to the next instruction, transferring the stack pointer to the index register to get the pointer and do math on it, and loading the index register to load the index register indexed by itself. This is useful if you want to have code containing tables of constants buried in the code itself. Maybe it will help to walk through some code:
* Yes, there are times you might actually do something like this.Heh. You really wanted to know how to do that, didn't you?
DAYSINMONTHS:
FCB 31,28,31,30,31,30,31,31,30,31,30,31
* Leap year check will be done elsewhere.
* Month number on stack is 16 bit integer, but less than 256.
GETDAYSINMONTH:
BSR DUMMY
DUMMY:
PULA ; Get the address of DUMMY to work on.
PULB
SUBB DUMMY-DAYSINMONTHS ; Less than 256, okay?
SBCA #0 ; Address of DAYSINMONTHS, relocatable.
STAA <TEMPX2
STAB <TEMPX2+1
TSX
LDB 3,X ; Skip return address, ignore high byte of month number.
* It would be nice if we could use B as an offset to X, wouldn't it?
* LDX <TEMPX2
* LDB B,X
* But we can't on the 6800.
ADDB <TEMPX2+1
BCC SKIPIT
INC <TEMPX2 ; Had a carry.
SKIPIT:
STAB <TEMPX2+1
LDX TEMPX2
LDB 0,X ; Finally got the days in the month!
TSX
STB 3,X ; Let's use the stack to return the count.
CLR 2,X
RTS ; We remembered to keep the return address safe, right?
Condition codes, by the way, are where you keep track of things like whether the last math carried or overflowed or not, and of certain modes like whether the processor should let anyone interrupt it or not. The carry bit (C) in the condition code register is what allows BCC (Branch Carry Clear) to fall through and add 1 to the high byte in the above code.
One more useful example is moving a block of code:
* Parameters on stack, hard limit of 2048 bytes.
* pushed in order of source, destination, count:
* 0,S ; return PC
* 2,S ; count
* 4,S ; destination
* 6,S ; source
* Slightly paranoid code.
BLOCKMOVE:
TSX
LDAB 3,X
LDAA 2,X
STAB <COUNT+1 ; Page zero globals, erk.
STAA <COUNT
BMI BLOCKMOVEEND ; Bit 15 set is way too big.
SUBB #1 ; Yes, this is necessary.
SBCA #8 ; 8*256+1 == 2049
BCC BLOCKMOVEEND ; Don't even try if too much.
LDX 4,X
STX <DESTINATION
TSX
LDX 6,X
STX <SOURCE
* We know it's a good, non-zero count.
LDAB <COUNT+1 ; Low byte, easier to check the count.
BRA BLOCKMOVETEST ; Test on entry does what you want.
BLOCKMOVELOOP:
LDX <SOURCE
LDAA 0,X
INX
STX <SOURCE
LDX <DESTINATION
STAA 0,X
INX
STX <DESTINATION
BLOCKMOVETEST:
SUBB #1 ; Reflect it in carry
BCC BLOCKMOVELOOP ; Gets much harder if not test on entry.
DEC <COUNT
BMI BLOCKMOVEDONE ; For count < 256 at start.
BNE BLOCKMOVELOOP
BLOCKMOVEDONE:
TSX
LDAA <DESTINATION
LDAB <DESTINATION+1
STAA 4,X
STAB 5,X
LDAA <SOURCE
LDAB <SOURCE+1
STAA 6,X
STAB 7,X
BLOCKMOVEEND:
RTS
* Leave the parameters for the calling routine.
(The 6800 was used in a few PCs and game machines of the late 1970s and early 1980s, for example, the APF Imagination Machine and the Tektronix 4050, and some more obscure office machines.)
Now this M6800 microprocessor is useful, but Motorola's customers wanted to have an entire controller in a single part, not a bunch of parts on a PC board. And they wanted it to be a little easier to write programs for. So Motorola put together a project (around 1977) to improve the 6800, keeping it simple so that adding transistors for ROM, RAM, I/O, and timers and such would not blow budget out of the water. They called the result the 6801.
The 6801 was actually in limited production in 1977, but they were pretty quiet about it.
(Why? Maybe they worried that some customers would be upset they had bought the less capable 6800. It's a valid worry. Customers can be unreasonable about wanting the perfect thing yesterday at tomorrow's cheap thing's price.)
Motorola talked to some select customers besides GM in 1978, and started telling general customers about it in 1979. In other words, development on the 6801 started before development on the 6809 and 68000.
accumulator D (A:B)
accumulator A
accumulator B
| |
Index X | |
Stack Pointer SP | |
Program Counter PC | |
Condition Codes CC |
What's the difference from the 6800? Well, to see, we'll reproduce the code above using some of the additional instructions. First, we'll load the days in a leap year in the A and B accumulators:
LDD #366 ; The 6801 does this in one instruction!That's a lot simpler, less likely to forget something, even if it isn't really fewer total instructions. Runs a bit faster, too. Really simple additions made significant difference for the 6801. Let's look at the DAYSINMONTH thing:
* Much easier. Now let's point to the right place for it in YEARSTABLE:
LDX #YEARSTABLE ; We could actually put this directly in D.
* And we could have worked more difficult pointer math above,
* to shorten the code. We could also have made it not relocatable.
* So, keeping the two sets of code parallel, we will use X here, too.
PSHB ; Save the day count on the stack.
PSHA
PSHX ; This is a new instruction in the 6801!
TSX ; We can avoid putting TEMP variables in the direct page!
LDD 0,X
ADDD #(2000*2) ; All 16 bits at once!
STD 0,X
* Now we can save the days in the leap year in the table:
LDD 2,X ; There's the days!
PULX ; Of course we can pop what we push!
STD 0,X ; Done! All stored away in YEARSTABLE.
INS ; But we have to drop the count to balance the stack.
INS
DAYSINMONTHS:13 instructions versus 19. Nice, huh?
FCB 31,28,31,30,31,30,31,31,30,31,30,31
* Month number on stack as 16 bit integer.
* Leap year check done elsewhere.
GETDAYSINMONTH:
BSR DUMMY
DUMMY:
TSX
LDD 0,X ; Get the address of DUMMY to work on.
SUBD DUMMY-DAYSINMONTHS
STD 0,X ; Address of DAYSINMONTHS, relocatable.
LDB 5,X ; Ignore high byte of month number.
PULX
* It would be nice if we could use B as an offset to X, wouldn't it?
ABX ; But now, on the 6801, we can add B to X!
LDB 0,X ; Get the days in the month!
TSX
CLRA
STD 2,X ; Let's use the stack to return the count.
RTS
How much help for the block move are the additions?
* Parameters on stack, hard limit of 2048 bytes.
* pushed in order of source, destination, count:
* 0,S ; return PC
* 2,S ; count
* 4,S ; destination
* 6,S ; source
* Slightly paranoid code.
BLOCKMOVE:
TSX
LDD 2,X
STD <COUNT
BMI BLOCKMOVEEND ; Bit 15 set is way too big.
SUBD #2049
BCC BLOCKMOVEEND ; Don't even try if too much.
LDD 4,X
STD <DESTINATION
LDD 6,X
STD <SOURCE
* We know it's a good, non-zero count, so we can test at end.
BLOCKMOVELOOP:
LDX <SOURCE
LDAA 0,X
INX
STX <SOURCE
LDX <DESTINATION
STAA 0,X
INX
STX <DESTINATION
LDD <COUNT ; Faster than doing it by halves.
SUBD #1 ; Reflect it in carry
STD <COUNT ; Cleaner than by halves, too.
BHI BLOCKMOVELOOP ; Different end condition from 6800 code.
BLOCKMOVEDONE:
TSX
LDD <DESTINATION
STD 4,X
LDD <SOURCE
STD 6,X
BLOCKMOVEEND:
RTS
* Leave the parameters for the calling routine.
CLR <TAILCOUNT
LDD 2,X
ASRA
RORB
BCC BLOCKMOVEHALVE
INC <TAILCOUNT
BLOCKMOVEHALVE:
STD <COUNT
(The 6801 was used in a few PCs and game machines of the late 1970s and early 1980s. For example, Radio Shack's TRS-80 MC-10 had the 6803, which was a ROM-less 6801.)
Motorola's management realized at the time that competing in semiconductors meant really competing, so they decided to leapfrog the competition around 1977 or '78, and invested a lot of engineering time to bring out the 68000 in 1979 or '80:
Data register/Accumulator D0:32 | |
Data register/Accumulator D1:32 | |
Data register/Accumulator D2:32 | |
Data register/Accumulator D3:32 | |
Data register/Accumulator D4:32 | |
Data register/Accumulator D5:32 | |
Data register/Accumulator D6:32 | |
Data register/Accumulator D7:32 | |
Index register/Stack pointer A0:32 | |
Index register/Stack pointer A1:32 | |
Index register/Stack pointer A2:32 | |
Index register/Stack pointer A3:32 | |
Index register/Stack pointer A4:32 | |
Index register/Stack pointer A5:32 | |
Index register/Stack pointer A6:32 | |
Index register/User stack pointer A7:32 | |
Index register/System stack pointer A7:32 | |
Program Counter (Indexable) PC:32 | |
Status/Condition Codes CC:16 |
8 bits and 16 bit registers were too short, 32 bits is the answer! (True, but not without some downside.)
Two accumulators were good, eight were better! (True, but not without some downside.)
One index was not enough, and not being able to index off the stack pointer is a pain, eight index/stack pointer registers is the answer! (Again true, but not without some downside. We won't look at the downsides here, however. Suffice it say those were valid engineering trade-offs.)
As you can see, stack pointers are directly indexable. So is the PC.
This is oversimplifying but you can see a bit of pattern here. Let's see what effect that has on the code:
MOVE.W #366,D0 ; No problem!That's what all the extra transistors in the 68000 do for you. More leeway to work on more difficult problems. So let's look at the DAYSINMONTH thing on the 68000:
* Now let's point to the right place for it in YEARSTABLE:
MOVE.L #YEARSTABLE,A0 ; Again, could have put it in D1.
* Actually that should be LEA YEARSTABLE,A0
* but I don't want to distract myself or you.
* Still keeping the two sets of code parallel:
MOVE.W D0,2000(A0) ; Done. I'm not kidding you.
DAYSINMONTHS:
DC.B 31,28,31,30,31,30,31,31,30,31,30,31
* Month number on stack as 16 bit integer.
* Leap year check done elsewhere.
GETDAYSINMONTH:
* The assembler can get the address difference at assemble time: LEA DAYSINMONTHS(PC),A0 ; Relocatable. LEA will use the difference.
* Single stack to keep the code parallel.
MOVE.L 4(A7),D0 ; Month number in 32 bit integer on stack.
* CLR.L 4(A7)
* MOVE.B (D0,A0),7(A7) ; Done. I kid you not.
* Except it's better not to abuse the external memory bus so much:
CLR.L D1
MOVE.B (D0,A0),D1 ; Got the count.
MOVE.L D1,4(A7) ; Done. For real.
RTS
Interested in how nicely it works for block moves?
* Parameters on stack, hard limit of half a megabyte.
* pushed in order of source, destination, count:
* 0,A7 ; return PC
* 4,A7 ; count
* 8,A7 ; destination
* 12,A7 ; source
* We could call A7 SP, but you get the point.
* Slightly paranoid code.
BLOCKMOVE:
MOVE.L 4(A7),D0
BMI BLOCKMOVEEND ; Bit 31 set is way, way too big.
SUB.L #524289,D0 ; Subtract 524289 *from* D0.
BCC BLOCKMOVEEND ; Don't even try if too much.
* We know it's a good, non-zero count.
MOVE.W 4(A7),D1 ; One of the 68000's warts.
MOVE.W 6(A7),D0 ; Just the lower 2 bytes.
MOVEA.L 8(A7),A1
MOVEA.L 12(A7),A0
BRA BLOCKMOVETEST ; Designed for test on entry.
BLOCKMOVELOOP:
MOVE.B (A0)+,(A1)+
BLOCKMOVETEST:
DBF D0,BLOCKMOVELOOP ; That wart, only the lower half counts.
DBF D1,BLOCKMOVELOOP ; Did I get this right this time?
MOVEA.L A1,8(A7)
MOVEA.L A0,12(A7)
BLOCKMOVEEND:
RTS
* Leave the parameters for the calling routine.
Many PCs and game machines had 68000s in them -- the Atari ST, the Amiga, the original Macintosh and the Lisa which preceded the Macintosh, the Sega Genesis/Mega Drive and so on. Lots of arcade games. There were also many workstations with versions of the 68000, including early Suns, the Apollo/Domain, the NeXT, early HP 9000s, the Tandy Model 16, and a lot of workstations from less well-known companies. IBM even had a scientific computer with the 68000 (System 9000) at the time it introduced the infamous IBM PC.
Arguments that the 68000 was somehow inferior to the 8086 in any way but being initially more expensive and being produced by a company that had no intention of trying to form a monopoly with it are pure revisionist history.
Some of the engineers thought the 6800 could use some polishing up beyond the 6801, and management allowed them to put together the 6809 at pretty much the same time the company was putting together the 68000. The 6809 was actually brought into production before the 68000, but neither is an ancestor of the other.
The 6809 looks like this:
accumulator D:16 (A|B)
accumulator A:8
accumulator B:8
| |
Index X:16 | |
Index Y:16 | |
Indexable Return Stack Pointer S:16 | |
Indexable User Stack Pointer U:16 | |
Indexable Program Counter PC:16 | |
Direct Page Base DP:8 | |
Condition Codes CC:8 |
The design team here was a bit less ambitious than the design team for the 68000. The 6809 only got half as many index/stack pointers. And the two 8-bit accumulators can be put together for 16 bit addition and subtraction like the 6801 does, but there's only 16 bits of accumulator. Some people think it's register poor.
Anyway, you can definitely see the influence of the 6800 in the 6809. The 6809 is truly descended from the 6800. (We are pretty sure the 68000 and the 6809 influenced each other, but, in parallel, going somewhat different directions.)
Influence from the 6801? Maybe, but the op-codes in the 6809 have been seriously restructured from the 6800, especially the indexing. Indexing is much more richly supported on the 6809.
[JMR202009211038:
My memory is that the 6801 project actually began after the 6809, and the influence was in the reverse, with the D double accumulator borrowed back to the 6801 and the ABX instruction implemented to help overcome lack of the LEA load effective address. But I can't find good references for that now.
]
Incidentally, many small engineering projects that can be designed with the 6809 actually take less code and run faster on the 6809 at a 1 MHz memory cycle than on the 68000 at 1 MHz memory cycle. The downside of that is the limit of expansion of functionality. There are many problems that just can't be solved with only 16 bits of address. (Thousands of characters in Unicode? Millions of colors on a screen way too large to directly address in 16 bits?) That's why you need the larger CPU for many things.
And the 68000 was produced at much higher frequencies, as well. Should give some examples of these, but not today.
Let's look at 6809 code examples per the above, instead:
LDD #366 ; Days in leap years.
* Now let's point to the right place for it in YEARSTABLE:
LDX #YEARSTABLE ; Here's why we didn't put it in D.
* Or LEAX YEARSTABLE,PCR -- but we won't mention that here.
* Keeping the several sets of code parallel, still,
STD (2000*2),X ; Done. No, I'm still not kidding you.
DAYSINMONTHS:
FCB 31,28,31,30,31,30,31,31,30,31,30,31
* Month number on stack as 16 bit integer.
* Leap year check done elsewhere.
GETDAYSINMONTH:
LEAX DAYSINMONTHS,PCR ; Relocatable. The assembler knows how.
* Single stack to keep the code parallel.
* LDB 3,SP ; Month number in 16 bit integer, ignore high byte.
* CLR 2,SP ; Either way, really.
* LDB B,X ; Ignore high byte of offset.
* STB 3,SP ; Or, go ahead and use the (zero) high byte:
LDD 2,SP ; Month number in 16 bit integer.
LDB D,X ; Got the count in B.
CLRA
STD 2,X ; For real.
RTS
* Parameters on stack, hard limit of 2048 bytes.Are you sold? Do you want to try unrolling the loop to see how hard it is? Percentagewise, it won't be as much of an increase in speed as the 6801 sees in unrolling, but it would probably still be worth the effort.
* pushed in order of source, destination, count:
* 0,S ; return PC
* 2,S ; count
* 4,S ; destination
* 6,S ; source
* Slightly paranoid code.
BLOCKMOVE:
LDD 2,S
BMI BLOCKMOVEEND ; Bit 15 set is way too big.
SUBD #2049
BCC BLOCKMOVEEND ; Don't even try if too much.
* We know it's a good, non-zero count, so we can test at end.
LDX 6,S
LDY 4,S
BLOCKMOVELOOP:
LDA ,X+
STA ,Y+
LDD 2,S ; Faster than doing it by halves.
SUBD #1 ; Reflect it in carry
STD 2,S ; Cleaner than by halves, too.
BCC BLOCKMOVELOOP
BLOCKMOVEDONE:
STY 4,S
STX 6,S
BLOCKMOVEEND:
RTS
* Leave the parameters for the calling routine.
One of the downsides of both the 6809 and 68000's improved functionality is that it takes a lot of those small transistors to make them work. That makes less room for adding IO, ROM, RAM, etc. for one-chip stuff.
But the 6809 takes much fewer than the 68000, and only needs 8 bit wide memory where the 68000 needs sixteen bit wide memory.
Perhaps the most well-known PC/game machine to use the 6809 was the TRS-80 Color Computer, an odd-ball machine that existed, perhaps, because the 6809 was just too powerful to suppress. There were others, however, some of which you can find on Wikipedia's category page for 6809 home computers (Dragon in the UK, models from Thomson in France, Fujitsu's FM-8 in Japan, ) and others still in the Wikipedia main 6809 entry. There were also multi-user business machines produced by obscure companies such SWTP, many of which had started with the 6800. And lots of arcade machines and music synthesizers, also still mentioned on Wikipedia. And it was heavily used in Aerospace. I need to gather more links for these, sometimes, but the Wikipedia page still has a lot. And I should mention operating systems, such as the real-time OS, OS-9 from Microware. But this page was not intended as such a list. The point is that the 6809 was and is a hidden workhorse.
The 6809 really could have been used as a core by 1983, just as the 6801 was in 1979. Just as a somewhat advanced version of the 68000 was in the late 1980s. Just as somewhat advanced versions of the 6801 (68HC12 and 68HC16) also were in the 1990s.
I understand that there may have been, among the salescrew and management, those who were scared of having Motorola competing with itself -- scared of cannibalizing 68000 sales with the very competitive 6809, if they told everyone how good the 6809 was. If it was so, it was very shortsighted. The 6809 could easily have gone head-to-head with the 8088 on a lot of smaller designs, and the 68000 would have been an almost natural upgrade path from the 6809, just as the 6809 was a natural upgrade path from the 6801.
Just as a bit of text processing can convert 6800 code into executable 6809 code, a bit of text processing can convert a 6800 code or 6809 code into 68000 code. It may require a little engineering time for cleanup, but you'll want to take the time to optimize the code anyway.
So the 6801 improves the 6800 model by adding instructions that treat the accumulator pair as a single double (16-bit) accumulator, with A holding the high byte and B holding the low byte. And it adds an 8 x 8 multiply with 16 bit result. This much is about all there is in common with the 6809, other than the 6800 origins.
The 6801 also includes a few new instructions that allow adding an offset to the index register, comparing the index register, pushing and popping (PULling) the index register. These are significant improvements, improving efficiency in core areas that were bottlenecks in the 6800 for high-level functionality. Better than the 6800, not as good as the 6809, different. And done differently.
The instructions of the 6809 are not a proper superset of those of the 6801. Conversion is required.
The single indexing mode of the 6801 is the exact same as the 6800's. And the single stack pointer is not directly indexable. Nor is the PC. Comparing that with the 6809, you see all sorts of indexing available in the 6809 to shorten and clarify code.
There were some other important improvements, such as additional interrupt vectors for built-in hardware and such. But the 6801 is very much a separate path of evolution from the 6809. The additional interrupts in the 6809 are more general. I guess I'm repeating myself.
More general. That's a good way to compare the 6809 and the 6801.
Game machines with the 6809 in them are too numerous
Now we come to the 6811, which, functionally speaking, harks back to and extends the 6801 rather than the 6809. The year of introduction was 1985.
accumulator D:16 (A:B)
accumulator A:8
accumulator B:8
| |
Index X:16 | |
Index Y:16 | |
Stack Pointer SP:16 | |
Program Counter PC:16 | |
Condition Codes CC:8 |
The 68HC11 improves on the 6801 by adding another index register, Y. This significantly eases another bottleneck when handling two pointers at once, but the indexing mode is the same as the 6800's. It is not as general as the 6809's indexing modes (plural). The additional index register is supported by adding instructions, not by modifying the addressing modes.
Even after all my preaching, you might still think, oh, that's like the 6809, only missing the U. Uhm, and the DP, whatever that is.
Yes. It's missing U. Still only one stack pointer. SP is still not indexable. Neither is PC.
And no direct page register, which I haven't really demonstrated uses for here. (It's a bit of a complex topic, which I have partially addressed in other rants, erm, blog posts, in this blog.)
It is not a stripped down 6809. If it were, it would have much more complex addressing modes.
We can be sure the manufacturing process was influenced by things they learned developing the 6809, but that can also be said of the 68000, and we are not going to say it was somehow derived from the 68000. Not if we want to speak meaningfully.
It's a beefed-up 6801, not at all by way of the 6809. You still don't believe me?
Let's look again at that code we've been playing with, converted for the 68HC11:
LDD #366 ; The leap year.
LDY #YEARSTABLE ; How much will Y actually help?
PSHY ; Still have to have it in some temporary.
PSHB ; Save the day count on the stack.
PSHA
TSX
LDD 2,X ; Extra index register let us reorder the stack.
ADDD #(2000*2)
STD 2,X ; 2,X costs the same as 0,X on this CPU.
PULA ; Leap year days.
PULB
PULY ; Bring the entry address back.
STD 0,Y ; Done! All stored away in YEARSTABLE.
* And we don't have to balance the stack.
Will it help in DAYSINMONTHS?
DAYSINMONTHS:
FCB 31,28,31,30,31,30,31,31,30,31,30,31
* Month number on stack as 16 bit integer.
* Leap year check done elsewhere.
GETDAYSINMONTH:
BSR DUMMY
DUMMY:
TSY
LDD 0,Y ; Get the address of DUMMY to work on.
SUBD DUMMY-DAYSINMONTHS
STD 0,Y ; Address of DAYSINMONTHS, relocatable.
LDB 5,Y ; Ignore high byte of month number.
PULX
ABX
LDB 0,X ; Get the days in the month.
CLRA
STD 4,Y ; Y is not equal S, but we can use it.
RTS
* Parameters on stack, hard limit of 2048 bytes.
* pushed in order of source, destination, count:
* 0,S ; return PC
* 2,S ; count
* 4,S ; destination
* 6,S ; source
* Slightly paranoid code.
BLOCKMOVE:
TSX
LDD 2,X
BMI BLOCKMOVEEND ; Bit 15 set is way too big.
SUBD #2049
BCC BLOCKMOVEEND
LDD 2,X ; Bring it back.
PSHA ; Save the high byte.
LDY 4,X
LDX 6,X
* We know it's a good, non-zero count.
BRA BLOCKMOVETEST ; But test-on-entry is what we want.
BLOCKMOVELOOP:
LDAA 0,X
INX
STAA 0,Y
INY
BLOCKMOVETEST:
SUBB #1 ; Reflect it in carry
BCC BLOCKMOVELOOP ; Speed up the inner loop.
PULA
SBCA #0 ; Because I prefer BCC here.
PSHA
BCC BLOCKMOVELOOP
INS ; Drop high half of count.
PSHX
PULA
PULB
TSX
STY 4,X
STD 6,X
BLOCKMOVEEND:
RTS
* Leave the parameters for the calling routine.
Incidentally, the 68HC11 defines a number of new bit manipulation instructions, and some other useful things, including hardware byte-wide integer divide, which are not available on the 6809. These instructions, and the fact that it was available as a core for integrated controllers, were the reasons for using the 68HC11.
This is further along the path of evolution that the 6801 began, but it is still on a separate path from the 6809. (Do you get the feeling I'm peeved about this? I do have reasons. Motorola should have provided 6809 as a core like the 6811. More transistors, sure, but look at all the transistors in the 68000-family controllers. Yes, I know that much of Motorola's development work was driven by customer demands, but ....)
Now, there is/was a 68HC12, which further extends the 68HC11. I don't remember some details of the register model, so I'll refrain from putting bad information here. Maybe later.
The stack pointer and, if I recall correctly, the PC, are both indexable on the 68HC12.
The 68HC12 could almost be called a modified 6809. Still missing the second stack pointer. It even has fancy indexing modes similar to some of the 6809's. But the encodings are different, and there aren't as many of them. And if we look at it carefully, it's clear that it is a beefed-up 6811, borrowing ideas from the 6809.
The 68HC12 is still less a modified 6809 and more of a modified 6800 that has some of the features of the 6809, and is missing the most important feature, the extra stack pointer. The instruction set is said to be a superset of the 6811's.
The 68HC16 further extends the 68HC12, but it's still missing the extra stack pointer. Has a third index register which, we supposed, can either be a substitute for the DP register or the U stack pointer, but not both at once. And the indexes are wider, giving an addressable range of a full megabyte.
Okay, so, the what-if game.
What if Motorola's management had recognized that they should have been pitching the 6809 as the direct competitor to the 8088 instead of the 68000? What if they had recognized the opportunity to use the 6809 in researching correct processor design?
Step one, fill in some holes in the instruction set and addressing modes.
1.1 Widen the direct page register to 16 bits, so it can actually be used to point to a local/statically allocated data segment. (This has a ripple effect in the interrupt stack frame and in the encoding and meaning of the TFR and EXG instructions. Or it requires additional PSH/PUL instructions. Maybe both. But the gain is well worth it.)
1.2 Add indirect addressing through direct page variables, so you can stash a pointer in the direct page and use it without disturbing X, Y, U, or S. (Lesson from the 6502.) Essentially adds 128 possible statically allocated local index registers.
1.3 Add integer and fractional divide (as in 68H11). Possibly add bit multiply and divide primitives, but those tended to be misdesigned back then, so maybe not. Make the internal address and double accumulator math functions 16 bits to speed many instructions by a cycle.
1.4 Make it available as a system-on-a-chip core, like they did with the 6801 and then 68HC11, etc.
1.5 Make the mapping MMU and the DMA functions avaliable to be integrated with the core, and add support for both in the condition codes and interrupt architectures.
At step 1.5, the rumoured CoCo 4 could be built without the separate GIME -- or, rather, with the GIME on the same die as the CPU.
In fact, they might have implemented the full 6829 MMU functionality on-chip, succeeding where the 6829 on a separate chip was too slow. And they might have avoided some of the less stable aspects of the 6829 design, accessing CPU control signals directly.
With care in the design of the MMU, the S stack could be made completely unreachable, blocking whole classes of stack gaming vulnerabilities. Likewise, full process separation and such could have been achieved.
And they could have gone on to experimenting with true segment registers (full 32 bit, unshifted, not like the half-baked segments of the 8086, not the big bank switches of the 68HC16) and perhaps bounds registers, to experiment with memory management without re-mapping the memory itself.
And somewhere along the line, they might have proven to themselves that their fear of competing with themselves would not have done damage, that a decent lineup with the 6809 would have actually helped 68000 sales rather than cannibalize them.
If Motorola had been researching correct CPU design on the 6809 in 1981 and '82, they would have been much less likely to have taken the detour through the full reportoire of useless addressing modes that they built in the 68020. They could have applied the lessons of the smaller cousin to the bigger, and avoided the trap of trying to compete with the feature king Intel on useless features.
We can dream, can't we?
Of a world where competition is dog-meet-dog instead of dog-eat-dog, utility of the best fit instead of survival of the artificial fittest.
It's at present only slightly more realistic dream, I suppose, but it would be fun if I had the chance sometime to implement a FIG Forth interpreter optimized to each of the CPUs I've talked about here.
The 6800 version of the fig model was done by members of the Forth Interest Group back in the late 1970s, and I transcribed it in the mid-2000s and made it available as part of my M6800/1 assembler project, and I did a modified version for the 6809 back in the mid-1980s for college. (If you are interested in the 6809 code, contact me. I have it running on emulation, just need some time and motivation to put it up in its own tree. [JMR20201022: You can now find it here: https://osdn.net/projects/bif-6809/.]) (I started on an SH-3 conversion at 32 bits once several years back when I was trying to get myself back in the computer industry.) Really want to do a 68000 conversion at 32 bits, and the 68HC12 and 68HC16 have interesting aspects relative to implementing the thing.
And I'd like to go back and do them all as so-called subroutine-threaded. [JMR20201022: Project now in slow progress, here: https://osdn.net/projects/splitstack-runtimelib/.] Having all of that would be rather useful for comparing processor architectures and instruction sets.