defining computers: December 2018

Misunderstanding Computers

Why do we insist on seeing the computer as a magic box for controlling other people?
人はどうしてコンピュータを、人を制する魔法の箱として考えたいのですか?
Why do we want so much to control others when we won't control ourselves?
どうしてそれほど、自分を制しないのに、人をコントロールしたいのですか？

Computer memory is just fancy paper, CPUs are just fancy pens with fancy erasers, and the network is just a fancy backyard fence.
コンピュータの記憶というものはただ改良した紙ですし、CPU 何て特長ある筆に特殊の消しゴムがついたものにすぎないし、ネットワークそのものは裏庭の塀が少し拡大されたものぐらいです。

(original post/元の投稿 -- defining computers site/コンピュータを定義しようのサイト)

Thursday, December 20, 2018

68HC11 Is Not a Modified 6809 (and What if?)

(I have a more high-level treatment of this topic, comparing the 6801, 68HC11, and 6809 with the 6800, here: https://defining-computers.blogspot.com/2021/08/differences-between-6800-and-6801-with-notes-68hc11-6809.html.)

In the Color Computer Facebook group, somebody started a "What if?" thread.

What if Radio Shack had recognized that they had in the TRS-80 Color Computer the makings of an IBM PC killer way early in the game? etc.

(Yeah, we know hypothesis contrary to fact is a no-win game. We don't care. Call going there our happy place.)

So, I was looking up the history of the 6809 to refresh my memory so I could talk intelligently about what who shoulda done where when, and I noticed on the wikipedia page for the 6809 that somebody was talking about how the 68HC11 is a modified 6809.

Say what?!?!?!

And the wikipedia page for the 68HC11 had the same assertion.

Double-huh!?

Huh-uh. No. Where did they get that bit of history so tangled up?

(But that might explain why someone in the group would be talking about the 68HC11 as if it were a modified 6809.

For what it's worth, NXP, the successor in interest to Motorola relative to the 6800 and it's descendants, implies in their on-line materials that the 68000 is a descendant of the 6800 through the 6809, which is also just plain wrong. The 68000 and 6809 were essentially developed in parallel, by separate departments. They apparently communicated with each other, but pursued similar but different paths.)

I know, you don't care. I should just log in and correct it, if I care so much.

Well, maybe I will sometime. But stick with me. Maybe it'll be more amusing than youtube videos of geeks spraying package thieves with glitter. Let's take a look.

First, lets look at the register model of the 6800, the 8-bit grandaddy of Motorola's homespun lines of CPUs:

Registers in the 6800:
	accumulator A:8
	accumulator B:8
Index X:16
Stack Pointer SP:16
Program Counter PC:16
	Condition Codes CC:8

Accumulators are kind of like displays on calculators. This calculator has two small displays, so to speak. Not two calculators, two displays. They can accumulate separate sums, but the CPU has to say which it's working on:

LDAA #1 ; Put the number 1 in accumulator A.
LDAB #110 ; Put the number 110 in accumulator B.
* Now you have the year 366 in the combined A:B,
* but you have to work on the number a byte at a time.
PSHB ; Saving the number on the stack
PSHA ; has to be done in two pieces.

*** By the way, all code on this page is untested. ***

*** If you find errors, let me know. ***

An index register is useful for pointing to working areas within the big blackboard of memory. The 6800 provides an indexed mode using the index register with an eight bit offset, and that's it.

LDX #YEARSTABLE

If you want to do math on the pointer, you have to store it somewhere and work on it 8 bits at a time.

STX <TEMP ; Make sure TEMP is in the direct page.
LDAA <TEMP ; Grab the high byte,
LDAB <TEMP+1 ; then the low byte.
ADDB #160 ; (2000*2) modulo 256
ADCA #15 ; (2000*2)/256
* This is a big table in a small computer.)
STAB <TEMPX+1 ; Save the location for later
STAA <TEMPX

A stack is useful for tracking where you've been so you can know where to go next. It's also good for temporary storage of things. With the 6800, to do any math on the item on the top of stack, or to point at things that are buried under the top of stack requires copying the stack pointer to the index register.

TSX ; Now we can address the stack.
LDAA 0,X ; Get the number 366
LDAB 1,X
LDX <TEMPX ; Point to the entry for the year 2000.
STAA 0,X ; Put 366 in the table.
STAB 1,X

It works, but it feels a little awkward and takes lots of instructions.

The Program Counter (Instruction Pointer in other companies' parlance) is necessary to know where you are in the problem solution now. Pointing at things relative to the PC on the 6800 requires executing a moot call to the next instruction, transferring the stack pointer to the index register to get the pointer and do math on it, and loading the index register to load the index register indexed by itself. This is useful if you want to have code containing tables of constants buried in the code itself. Maybe it will help to walk through some code:

* Yes, there are times you might actually do something like this.
DAYSINMONTHS:
FCB 31,28,31,30,31,30,31,31,30,31,30,31
* Leap year check will be done elsewhere.
* Month number on stack is 16 bit integer, but less than 256.
GETDAYSINMONTH:
BSR DUMMY
DUMMY:
PULA ; Get the address of DUMMY to work on.
PULB
SUBB DUMMY-DAYSINMONTHS ; Less than 256, okay?
SBCA #0 ; Address of DAYSINMONTHS, relocatable.
STAA <TEMPX2
STAB <TEMPX2+1
TSX
LDB 3,X ; Skip return address, ignore high byte of month number.
* It would be nice if we could use B as an offset to X, wouldn't it?
* LDX <TEMPX2
* LDB B,X
* But we can't on the 6800.
ADDB <TEMPX2+1
BCC SKIPIT
INC <TEMPX2 ; Had a carry.
SKIPIT:
STAB <TEMPX2+1
LDX TEMPX2
LDB 0,X ; Finally got the days in the month!
TSX
STB 3,X ; Let's use the stack to return the count.
CLR 2,X
RTS ; We remembered to keep the return address safe, right?

Heh. You really wanted to know how to do that, didn't you?

Condition codes, by the way, are where you keep track of things like whether the last math carried or overflowed or not, and of certain modes like whether the processor should let anyone interrupt it or not. The carry bit (C) in the condition code register is what allows BCC (Branch Carry Clear) to fall through and add 1 to the high byte in the above code.

One more useful example is moving a block of code:

* Parameters on stack, hard limit of 2048 bytes.
* pushed in order of source, destination, count:
* 0,S ; return PC
* 2,S ; count
* 4,S ; destination
* 6,S ; source
* Slightly paranoid code.
BLOCKMOVE:
TSX
LDAB 3,X
LDAA 2,X
STAB <COUNT+1 ; Page zero globals, erk.
STAA <COUNT
BMI BLOCKMOVEEND ; Bit 15 set is way too big.
SUBB #1 ; Yes, this is necessary.
SBCA #8 ; 8*256+1 == 2049
BCC BLOCKMOVEEND ; Don't even try if too much.
LDX 4,X
STX <DESTINATION
TSX
LDX 6,X
STX <SOURCE
* We know it's a good, non-zero count.
LDAB <COUNT+1 ; Low byte, easier to check the count.
BRA BLOCKMOVETEST ; Test on entry does what you want.
BLOCKMOVELOOP:
LDX <SOURCE
LDAA 0,X
INX
STX <SOURCE
LDX <DESTINATION
STAA 0,X
INX
STX <DESTINATION
BLOCKMOVETEST:
SUBB #1 ; Reflect it in carry
BCC BLOCKMOVELOOP ; Gets much harder if not test on entry.
DEC <COUNT
BMI BLOCKMOVEDONE ; For count < 256 at start.
BNE BLOCKMOVELOOP
BLOCKMOVEDONE:
TSX
LDAA <DESTINATION
LDAB <DESTINATION+1
STAA 4,X
STAB 5,X
LDAA <SOURCE
LDAB <SOURCE+1
STAA 6,X
STAB 7,X
BLOCKMOVEEND:
RTS
* Leave the parameters for the calling routine.

That looks inefficient, but in the mid 1970s it wasn't that bad. But it definitely shows one of the warts of the 6800, and one of the motivations for adding index registers.

There are ways to do it quicker, especially if you know how much you are moving when you are writing the code. These tricks involve buffer alignment and size alignment and unrolling the loop, etc. That's not what we need to look at here.

(The 6800 was used in a few PCs and game machines of the late 1970s and early 1980s, for example, the APF Imagination Machine and the Tektronix 4050, and some more obscure office machines.)

Now I'm not going to get into the 6800 vs. 8080 vs. 6502 question here. (Maybe I should someday, somewhere else. Not today.) Suffice it to say that the 6800 had some charm points and some warts. All the CPUs back then did.

Now this M6800 microprocessor is useful, but Motorola's customers wanted to have an entire controller in a single part, not a bunch of parts on a PC board. And they wanted it to be a little easier to write programs for. So Motorola put together a project (around 1977) to improve the 6800, keeping it simple so that adding transistors for ROM, RAM, I/O, and timers and such would not blow budget out of the water. They called the result the 6801.

The 6801 was actually in limited production in 1977, but they were pretty quiet about it.

(Why? Maybe they worried that some customers would be upset they had bought the less capable 6800. It's a valid worry. Customers can be unreasonable about wanting the perfect thing yesterday at tomorrow's cheap thing's price.)

Motorola talked to some select customers besides GM in 1978, and started telling general customers about it in 1979. In other words, development on the 6801 started before development on the 6809 and 68000.

Registers in the 6801:
accumulator D (A:B) accumulator A accumulator B
Index X
Stack Pointer SP
Program Counter PC
	Condition Codes CC

What's the difference from the 6800? Well, to see, we'll reproduce the code above using some of the additional instructions. First, we'll load the days in a leap year in the A and B accumulators:

LDD #366 ; The 6801 does this in one instruction!
* Much easier. Now let's point to the right place for it in YEARSTABLE:
LDX #YEARSTABLE ; We could actually put this directly in D.
* And we could have worked more difficult pointer math above,
* to shorten the code. We could also have made it not relocatable.
* So, keeping the two sets of code parallel, we will use X here, too.
PSHB ; Save the day count on the stack.
PSHA
PSHX ; This is a new instruction in the 6801!
TSX ; We can avoid putting TEMP variables in the direct page!
LDD 0,X
ADDD #(2000*2) ; All 16 bits at once!
STD 0,X
* Now we can save the days in the leap year in the table:
LDD 2,X ; There's the days!
PULX ; Of course we can pop what we push!
STD 0,X ; Done! All stored away in YEARSTABLE.
INS ; But we have to drop the count to balance the stack.
INS

That's a lot simpler, less likely to forget something, even if it isn't really fewer total instructions. Runs a bit faster, too. Really simple additions made significant difference for the 6801. Let's look at the DAYSINMONTH thing:

DAYSINMONTHS:
FCB 31,28,31,30,31,30,31,31,30,31,30,31
* Month number on stack as 16 bit integer.
* Leap year check done elsewhere.
GETDAYSINMONTH:
BSR DUMMY
DUMMY:
TSX
LDD 0,X ; Get the address of DUMMY to work on.
SUBD DUMMY-DAYSINMONTHS
STD 0,X ; Address of DAYSINMONTHS, relocatable.
LDB 5,X ; Ignore high byte of month number.
PULX
* It would be nice if we could use B as an offset to X, wouldn't it?
ABX ; But now, on the 6801, we can add B to X!
LDB 0,X ; Get the days in the month!
TSX
CLRA
STD 2,X ; Let's use the stack to return the count.
RTS

13 instructions versus 19. Nice, huh?

How much help for the block move are the additions?

* Parameters on stack, hard limit of 2048 bytes.
* pushed in order of source, destination, count:
* 0,S ; return PC
* 2,S ; count
* 4,S ; destination
* 6,S ; source
* Slightly paranoid code.
BLOCKMOVE:
TSX
LDD 2,X
STD <COUNT
BMI BLOCKMOVEEND ; Bit 15 set is way too big.
SUBD #2049
BCC BLOCKMOVEEND ; Don't even try if too much.
LDD 4,X
STD <DESTINATION
LDD 6,X
STD <SOURCE
* We know it's a good, non-zero count, so we can test at end.
BLOCKMOVELOOP:
LDX <SOURCE
LDAA 0,X
INX
STX <SOURCE
LDX <DESTINATION
STAA 0,X
INX
STX <DESTINATION
LDD <COUNT ; Faster than doing it by halves.
SUBD #1 ; Reflect it in carry
STD <COUNT ; Cleaner than by halves, too.
BHI BLOCKMOVELOOP ; Different end condition from 6800 code.
BLOCKMOVEDONE:
TSX
LDD <DESTINATION
STD 4,X
LDD <SOURCE
STD 6,X
BLOCKMOVEEND:
RTS
* Leave the parameters for the calling routine.

The code is much clearer, and a little faster and shorter.

Just as an exercise, to help convince you these are real improvements, you might want to unroll the loop once to see what happens. Use an extra variable in the zero page to remember if the count is odd, and use the ASR instruction to divide the count by two:

CLR <TAILCOUNT
LDD 2,X
ASRA
RORB
BCC BLOCKMOVEHALVE
INC <TAILCOUNT
BLOCKMOVEHALVE:
STD <COUNT

should get you started. It takes a bit of extra code, but it gets close to double the speed for counts greater than 8 bytes if you do it right.

And, just for good measure, Motorola gave the 6801 an 8 bit by 8 bit hardware multiply that greatly sped up a lot of integer math. But we won't look at that here.

(The 6801 was used in a few PCs and game machines of the late 1970s and early 1980s. For example, Radio Shack's TRS-80 MC-10 had the 6803, which was a ROM-less 6801.)

Motorola's management realized at the time that competing in semiconductors meant really competing, so they decided to leapfrog the competition around 1977 or '78, and invested a lot of engineering time to bring out the 68000 in 1979 or '80:

Registers in the 68000:
Data register/Accumulator D0:32
Data register/Accumulator D1:32
Data register/Accumulator D2:32
Data register/Accumulator D3:32
Data register/Accumulator D4:32
Data register/Accumulator D5:32
Data register/Accumulator D6:32
Data register/Accumulator D7:32
Index register/Stack pointer A0:32
Index register/Stack pointer A1:32
Index register/Stack pointer A2:32
Index register/Stack pointer A3:32
Index register/Stack pointer A4:32
Index register/Stack pointer A5:32
Index register/Stack pointer A6:32
Index register/User stack pointer A7:32
Index register/System stack pointer A7:32
Program Counter (Indexable) PC:32
	Status/Condition Codes CC:16

8 bits and 16 bit registers were too short, 32 bits is the answer! (True, but not without some downside.)

Two accumulators were good, eight were better! (True, but not without some downside.)

One index was not enough, and not being able to index off the stack pointer is a pain, eight index/stack pointer registers is the answer! (Again true, but not without some downside. We won't look at the downsides here, however. Suffice it say those were valid engineering trade-offs.)

As you can see, stack pointers are directly indexable. So is the PC.

This is oversimplifying but you can see a bit of pattern here. Let's see what effect that has on the code:

MOVE.W #366,D0 ; No problem!
* Now let's point to the right place for it in YEARSTABLE:
MOVE.L #YEARSTABLE,A0 ; Again, could have put it in D1.
* Actually that should be LEA YEARSTABLE,A0
* but I don't want to distract myself or you.
* Still keeping the two sets of code parallel:
MOVE.W D0,2000(A0) ; Done. I'm not kidding you.

That's what all the extra transistors in the 68000 do for you. More leeway to work on more difficult problems. So let's look at the DAYSINMONTH thing on the 68000:

DAYSINMONTHS:
DC.B 31,28,31,30,31,30,31,31,30,31,30,31
* Month number on stack as 16 bit integer.
* Leap year check done elsewhere.
GETDAYSINMONTH:
* The assembler can get the address difference at assemble time: LEA DAYSINMONTHS(PC),A0 ; Relocatable. LEA will use the difference.
* Single stack to keep the code parallel.
MOVE.L 4(A7),D0 ; Month number in 32 bit integer on stack.
* CLR.L 4(A7)
* MOVE.B (D0,A0),7(A7) ; Done. I kid you not.
* Except it's better not to abuse the external memory bus so much:
CLR.L D1
MOVE.B (D0,A0),D1 ; Got the count.
MOVE.L D1,4(A7) ; Done. For real.
RTS

Heh. This is why we like the 68K.

Interested in how nicely it works for block moves?

* Parameters on stack, hard limit of half a megabyte.
* pushed in order of source, destination, count:
* 0,A7 ; return PC
* 4,A7 ; count
* 8,A7 ; destination
* 12,A7 ; source
* We could call A7 SP, but you get the point.
* Slightly paranoid code.
BLOCKMOVE:
MOVE.L 4(A7),D0
BMI BLOCKMOVEEND ; Bit 31 set is way, way too big.
SUB.L #524289,D0 ; Subtract 524289 *from* D0.
BCC BLOCKMOVEEND ; Don't even try if too much.
* We know it's a good, non-zero count.
MOVE.W 4(A7),D1 ; One of the 68000's warts.
MOVE.W 6(A7),D0 ; Just the lower 2 bytes.
MOVEA.L 8(A7),A1
MOVEA.L 12(A7),A0
BRA BLOCKMOVETEST ; Designed for test on entry.
BLOCKMOVELOOP:
MOVE.B (A0)+,(A1)+
BLOCKMOVETEST:
DBF D0,BLOCKMOVELOOP ; That wart, only the lower half counts.
DBF D1,BLOCKMOVELOOP ; Did I get this right this time?
MOVEA.L A1,8(A7)
MOVEA.L A0,12(A7)
BLOCKMOVEEND:
RTS
* Leave the parameters for the calling routine.

Loop unrolling to four bytes at a time is pretty straightforward, but if you're going that far you'll also want to check memory alignment. It can get a little messy before it becomes obvious. And then the 68000 has the MOVEM instruction that would let you push blocks of 64 bytes or more in one time through, using registers. You really have to think that one through if you try it.

Many PCs and game machines had 68000s in them -- the Atari ST, the Amiga, the original Macintosh and the Lisa which preceded the Macintosh, the Sega Genesis/Mega Drive and so on. Lots of arcade games. There were also many workstations with versions of the 68000, including early Suns, the Apollo/Domain, the NeXT, early HP 9000s, the Tandy Model 16, and a lot of workstations from less well-known companies. IBM even had a scientific computer with the 68000 (System 9000) at the time it introduced the infamous IBM PC.

Arguments that the 68000 was somehow inferior to the 8086 in any way but being initially more expensive and being produced by a company that had no intention of trying to form a monopoly with it are pure revisionist history.

Now, here is a point that is often gotten wrong -- the 6809 was not a predecessor to the 68000 (contrary to what even NXP might have you believe).

Some of the engineers thought the 6800 could use some polishing up beyond the 6801, and management allowed them to put together the 6809 at pretty much the same time the company was putting together the 68000. The 6809 was actually brought into production before the 68000, but neither is an ancestor of the other.

The 6809 looks like this:

Registers in the 6809:
accumulator D:16 (A\|B) accumulator A:8 accumulator B:8
Index X:16
Index Y:16
Indexable Return Stack Pointer S:16
Indexable User Stack Pointer U:16
Indexable Program Counter PC:16
	Direct Page Base DP:8
	Condition Codes CC:8

The design team here was a bit less ambitious than the design team for the 68000. The 6809 only got half as many index/stack pointers. And the two 8-bit accumulators can be put together for 16 bit addition and subtraction like the 6801 does, but there's only 16 bits of accumulator. Some people think it's register poor.

Anyway, you can definitely see the influence of the 6800 in the 6809. The 6809 is truly descended from the 6800. (We are pretty sure the 68000 and the 6809 influenced each other, but, in parallel, going somewhat different directions.)

Influence from the 6801? Maybe, but the op-codes in the 6809 have been seriously restructured from the 6800, especially the indexing. Indexing is much more richly supported on the 6809.

[JMR202009211038:
My memory is that the 6801 project actually began after the 6809, and the influence was in the reverse, with the D double accumulator borrowed back to the 6801 and the ABX instruction implemented to help overcome lack of the LEA load effective address. But I can't find good references for that now.
]

Incidentally, many small engineering projects that can be designed with the 6809 actually take less code and run faster on the 6809 at a 1 MHz memory cycle than on the 68000 at 1 MHz memory cycle. The downside of that is the limit of expansion of functionality. There are many problems that just can't be solved with only 16 bits of address. (Thousands of characters in Unicode? Millions of colors on a screen way too large to directly address in 16 bits?) That's why you need the larger CPU for many things.

And the 68000 was produced at much higher frequencies, as well. Should give some examples of these, but not today.

Let's look at 6809 code examples per the above, instead:

LDD #366 ; Days in leap years.
* Now let's point to the right place for it in YEARSTABLE:
LDX #YEARSTABLE ; Here's why we didn't put it in D.
* Or LEAX YEARSTABLE,PCR -- but we won't mention that here.
* Keeping the several sets of code parallel, still,
STD (2000*2),X ; Done. No, I'm still not kidding you.

Less than half the transistors of the 68000, and it still nails it. Let's look at the DAYSINMONTH thing on the 6809, also:

DAYSINMONTHS:
FCB 31,28,31,30,31,30,31,31,30,31,30,31
* Month number on stack as 16 bit integer.
* Leap year check done elsewhere.
GETDAYSINMONTH:
LEAX DAYSINMONTHS,PCR ; Relocatable. The assembler knows how.
* Single stack to keep the code parallel.
* LDB 3,SP ; Month number in 16 bit integer, ignore high byte.
* CLR 2,SP ; Either way, really.
* LDB B,X ; Ignore high byte of offset.
* STB 3,SP ; Or, go ahead and use the (zero) high byte:
LDD 2,SP ; Month number in 16 bit integer.
LDB D,X ; Got the count in B.
CLRA
STD 2,X ; For real.
RTS

What about block moves on the 6809?

* Parameters on stack, hard limit of 2048 bytes.
* pushed in order of source, destination, count:
* 0,S ; return PC
* 2,S ; count
* 4,S ; destination
* 6,S ; source
* Slightly paranoid code.
BLOCKMOVE:
LDD 2,S
BMI BLOCKMOVEEND ; Bit 15 set is way too big.
SUBD #2049
BCC BLOCKMOVEEND ; Don't even try if too much.
* We know it's a good, non-zero count, so we can test at end.
LDX 6,S
LDY 4,S
BLOCKMOVELOOP:
LDA ,X+
STA ,Y+
LDD 2,S ; Faster than doing it by halves.
SUBD #1 ; Reflect it in carry
STD 2,S ; Cleaner than by halves, too.
BCC BLOCKMOVELOOP
BLOCKMOVEDONE:
STY 4,S
STX 6,S
BLOCKMOVEEND:
RTS
* Leave the parameters for the calling routine.

Are you sold? Do you want to try unrolling the loop to see how hard it is? Percentagewise, it won't be as much of an increase in speed as the 6801 sees in unrolling, but it would probably still be worth the effort.

One of the downsides of both the 6809 and 68000's improved functionality is that it takes a lot of those small transistors to make them work. That makes less room for adding IO, ROM, RAM, etc. for one-chip stuff.

But the 6809 takes much fewer than the 68000, and only needs 8 bit wide memory where the 68000 needs sixteen bit wide memory.

Perhaps the most well-known PC/game machine to use the 6809 was the TRS-80 Color Computer, an odd-ball machine that existed, perhaps, because the 6809 was just too powerful to suppress. There were others, however, some of which you can find on Wikipedia's category page for 6809 home computers (Dragon in the UK, models from Thomson in France, Fujitsu's FM-8 in Japan, ) and others still in the Wikipedia main 6809 entry. There were also multi-user business machines produced by obscure companies such SWTP, many of which had started with the 6800. And lots of arcade machines and music synthesizers, also still mentioned on Wikipedia. And it was heavily used in Aerospace. I need to gather more links for these, sometimes, but the Wikipedia page still has a lot. And I should mention operating systems, such as the real-time OS, OS-9 from Microware. But this page was not intended as such a list. The point is that the 6809 was and is a hidden workhorse.

The 6809 really could have been used as a core by 1983, just as the 6801 was in 1979. Just as a somewhat advanced version of the 68000 was in the late 1980s. Just as somewhat advanced versions of the 6801 (68HC12 and 68HC16) also were in the 1990s.

I understand that there may have been, among the salescrew and management, those who were scared of having Motorola competing with itself -- scared of cannibalizing 68000 sales with the very competitive 6809, if they told everyone how good the 6809 was. If it was so, it was very shortsighted. The 6809 could easily have gone head-to-head with the 8088 on a lot of smaller designs, and the 68000 would have been an almost natural upgrade path from the 6809, just as the 6809 was a natural upgrade path from the 6801.

Just as a bit of text processing can convert 6800 code into executable 6809 code, a bit of text processing can convert a 6800 code or 6809 code into 68000 code. It may require a little engineering time for cleanup, but you'll want to take the time to optimize the code anyway.

So the 6801 improves the 6800 model by adding instructions that treat the accumulator pair as a single double (16-bit) accumulator, with A holding the high byte and B holding the low byte. And it adds an 8 x 8 multiply with 16 bit result. This much is about all there is in common with the 6809, other than the 6800 origins.

The 6801 also includes a few new instructions that allow adding an offset to the index register, comparing the index register, pushing and popping (PULling) the index register. These are significant improvements, improving efficiency in core areas that were bottlenecks in the 6800 for high-level functionality. Better than the 6800, not as good as the 6809, different. And done differently.

The instructions of the 6809 are not a proper superset of those of the 6801. Conversion is required.

The single indexing mode of the 6801 is the exact same as the 6800's. And the single stack pointer is not directly indexable. Nor is the PC. Comparing that with the 6809, you see all sorts of indexing available in the 6809 to shorten and clarify code.

There were some other important improvements, such as additional interrupt vectors for built-in hardware and such. But the 6801 is very much a separate path of evolution from the 6809. The additional interrupts in the 6809 are more general. I guess I'm repeating myself.

More general. That's a good way to compare the 6809 and the 6801.

Game machines with the 6809 in them are too numerous

Now we come to the 6811, which, functionally speaking, harks back to and extends the 6801 rather than the 6809. The year of introduction was 1985.

Registers in the 68HC11:
accumulator D:16 (A:B) accumulator A:8 accumulator B:8
Index X:16
Index Y:16
Stack Pointer SP:16
Program Counter PC:16
	Condition Codes CC:8

The 68HC11 improves on the 6801 by adding another index register, Y. This significantly eases another bottleneck when handling two pointers at once, but the indexing mode is the same as the 6800's. It is not as general as the 6809's indexing modes (plural). The additional index register is supported by adding instructions, not by modifying the addressing modes.

Even after all my preaching, you might still think, oh, that's like the 6809, only missing the U. Uhm, and the DP, whatever that is.

Yes. It's missing U. Still only one stack pointer. SP is still not indexable. Neither is PC.

And no direct page register, which I haven't really demonstrated uses for here. (It's a bit of a complex topic, which I have partially addressed in other rants, erm, blog posts, in this blog.)

It is not a stripped down 6809. If it were, it would have much more complex addressing modes.

We can be sure the manufacturing process was influenced by things they learned developing the 6809, but that can also be said of the 68000, and we are not going to say it was somehow derived from the 68000. Not if we want to speak meaningfully.

It's a beefed-up 6801, not at all by way of the 6809. You still don't believe me?

Let's look again at that code we've been playing with, converted for the 68HC11:

LDD #366 ; The leap year.
LDY #YEARSTABLE ; How much will Y actually help?
PSHY ; Still have to have it in some temporary.
PSHB ; Save the day count on the stack.
PSHA
TSX
LDD 2,X ; Extra index register let us reorder the stack.
ADDD #(2000*2)
STD 2,X ; 2,X costs the same as 0,X on this CPU.
PULA ; Leap year days.
PULB
PULY ; Bring the entry address back.
STD 0,Y ; Done! All stored away in YEARSTABLE.
* And we don't have to balance the stack.

Not much of a savings here, and you could say this is contrived, but stack order issues do occur in real code. That's part of why two accumulators were useful in the original 6800.

Will it help in DAYSINMONTHS?

DAYSINMONTHS:
FCB 31,28,31,30,31,30,31,31,30,31,30,31
* Month number on stack as 16 bit integer.
* Leap year check done elsewhere.
GETDAYSINMONTH:
BSR DUMMY
DUMMY:
TSY
LDD 0,Y ; Get the address of DUMMY to work on.
SUBD DUMMY-DAYSINMONTHS
STD 0,Y ; Address of DAYSINMONTHS, relocatable.
LDB 5,Y ; Ignore high byte of month number.
PULX
ABX
LDB 0,X ; Get the days in the month.
CLRA
STD 4,Y ; Y is not equal S, but we can use it.
RTS

Not much, but it does help.

The real utility of having X and Y in the 6811 is for moving large blocks of data. You can have both the source and destination pointers in registers, where in the 6801 and before you must juggle the source and destination through X.

* Parameters on stack, hard limit of 2048 bytes.
* pushed in order of source, destination, count:
* 0,S ; return PC
* 2,S ; count
* 4,S ; destination
* 6,S ; source
* Slightly paranoid code.
BLOCKMOVE:
TSX
LDD 2,X
BMI BLOCKMOVEEND ; Bit 15 set is way too big.
SUBD #2049
BCC BLOCKMOVEEND
LDD 2,X ; Bring it back.
PSHA ; Save the high byte.
LDY 4,X
LDX 6,X
* We know it's a good, non-zero count.
BRA BLOCKMOVETEST ; But test-on-entry is what we want.
BLOCKMOVELOOP:
LDAA 0,X
INX
STAA 0,Y
INY
BLOCKMOVETEST:
SUBB #1 ; Reflect it in carry
BCC BLOCKMOVELOOP ; Speed up the inner loop.
PULA
SBCA #0 ; Because I prefer BCC here.
PSHA
BCC BLOCKMOVELOOP
INS ; Drop high half of count.
PSHX
PULA
PULB
TSX
STY 4,X
STD 6,X
BLOCKMOVEEND:
RTS
* Leave the parameters for the calling routine.

Well, there's still a little bit of juggling, but the loop is much cleaner than on the 6801, even. But not at all as clean as the 6809 code. Here is where it really becomes clear that the 6811 is not a modified 6809, it's a beefed up 6801.

Incidentally, the 68HC11 defines a number of new bit manipulation instructions, and some other useful things, including hardware byte-wide integer divide, which are not available on the 6809. These instructions, and the fact that it was available as a core for integrated controllers, were the reasons for using the 68HC11.

This is further along the path of evolution that the 6801 began, but it is still on a separate path from the 6809. (Do you get the feeling I'm peeved about this? I do have reasons. Motorola should have provided 6809 as a core like the 6811. More transistors, sure, but look at all the transistors in the 68000-family controllers. Yes, I know that much of Motorola's development work was driven by customer demands, but ....)

Now, there is/was a 68HC12, which further extends the 68HC11. I don't remember some details of the register model, so I'll refrain from putting bad information here. Maybe later.

The stack pointer and, if I recall correctly, the PC, are both indexable on the 68HC12.

The 68HC12 could almost be called a modified 6809. Still missing the second stack pointer. It even has fancy indexing modes similar to some of the 6809's. But the encodings are different, and there aren't as many of them. And if we look at it carefully, it's clear that it is a beefed-up 6811, borrowing ideas from the 6809.

The 68HC12 is still less a modified 6809 and more of a modified 6800 that has some of the features of the 6809, and is missing the most important feature, the extra stack pointer. The instruction set is said to be a superset of the 6811's.

The 68HC16 further extends the 68HC12, but it's still missing the extra stack pointer. Has a third index register which, we supposed, can either be a substitute for the DP register or the U stack pointer, but not both at once. And the indexes are wider, giving an addressable range of a full megabyte.

Okay, so, the what-if game.

What if Motorola's management had recognized that they should have been pitching the 6809 as the direct competitor to the 8088 instead of the 68000? What if they had recognized the opportunity to use the 6809 in researching correct processor design?

Step one, fill in some holes in the instruction set and addressing modes.

1.1 Widen the direct page register to 16 bits, so it can actually be used to point to a local/statically allocated data segment. (This has a ripple effect in the interrupt stack frame and in the encoding and meaning of the TFR and EXG instructions. Or it requires additional PSH/PUL instructions. Maybe both. But the gain is well worth it.)

1.2 Add indirect addressing through direct page variables, so you can stash a pointer in the direct page and use it without disturbing X, Y, U, or S. (Lesson from the 6502.) Essentially adds 128 possible statically allocated local index registers.

1.3 Add integer and fractional divide (as in 68H11). Possibly add bit multiply and divide primitives, but those tended to be misdesigned back then, so maybe not. Make the internal address and double accumulator math functions 16 bits to speed many instructions by a cycle.

1.4 Make it available as a system-on-a-chip core, like they did with the 6801 and then 68HC11, etc.

1.5 Make the mapping MMU and the DMA functions avaliable to be integrated with the core, and add support for both in the condition codes and interrupt architectures.

At step 1.5, the rumoured CoCo 4 could be built without the separate GIME -- or, rather, with the GIME on the same die as the CPU.

In fact, they might have implemented the full 6829 MMU functionality on-chip, succeeding where the 6829 on a separate chip was too slow. And they might have avoided some of the less stable aspects of the 6829 design, accessing CPU control signals directly.

With care in the design of the MMU, the S stack could be made completely unreachable, blocking whole classes of stack gaming vulnerabilities. Likewise, full process separation and such could have been achieved.

And they could have gone on to experimenting with true segment registers (full 32 bit, unshifted, not like the half-baked segments of the 8086, not the big bank switches of the 68HC16) and perhaps bounds registers, to experiment with memory management without re-mapping the memory itself.

And somewhere along the line, they might have proven to themselves that their fear of competing with themselves would not have done damage, that a decent lineup with the 6809 would have actually helped 68000 sales rather than cannibalize them.

If Motorola had been researching correct CPU design on the 6809 in 1981 and '82, they would have been much less likely to have taken the detour through the full reportoire of useless addressing modes that they built in the 68020. They could have applied the lessons of the smaller cousin to the bigger, and avoided the trap of trying to compete with the feature king Intel on useless features.

We can dream, can't we?

Of a world where competition is dog-meet-dog instead of dog-eat-dog, utility of the best fit instead of survival of the artificial fittest.

It's at present only slightly more realistic dream, I suppose, but it would be fun if I had the chance sometime to implement a FIG Forth interpreter optimized to each of the CPUs I've talked about here.

The 6800 version of the fig model was done by members of the Forth Interest Group back in the late 1970s, and I transcribed it in the mid-2000s and made it available as part of my M6800/1 assembler project, and I did a modified version for the 6809 back in the mid-1980s for college. (If you are interested in the 6809 code, contact me. I have it running on emulation, just need some time and motivation to put it up in its own tree. [JMR20201022: You can now find it here: https://osdn.net/projects/bif-6809/.]) (I started on an SH-3 conversion at 32 bits once several years back when I was trying to get myself back in the computer industry.) Really want to do a 68000 conversion at 32 bits, and the 68HC12 and 68HC16 have interesting aspects relative to implementing the thing.

And I'd like to go back and do them all as so-called subroutine-threaded. [JMR20201022: Project now in slow progress, here: https://osdn.net/projects/splitstack-runtimelib/.] Having all of that would be rather useful for comparing processor architectures and instruction sets.

Tuesday, December 18, 2018

High Cost of Replacement for Once Inexpensive Solutions

Someone in a Facebook group I lurk in devoted to "vintage {Computers | Microprocessors | Microcontrollers}" shared a link to a (three year old) story about how the heating systems of a bunch of schools in Grand Rapids had been operated under the control of an inexpensive Amiga computer for thirty years, which mentioned that the school system had scheduled a budget of more than a million dollars to replace it.

The Amiga system hardware would have cost no more than USD 5000 when they bought it.

But good engineers will tell us that the Amiga wasn't the only cost.

The controls system was originally programmed by (get this) a high school student (who remained on-call to fix it for quite some time after). I suspect the hardware other than the Amiga was also laid out by students, teachers, and other school system employees, who probably did not charge the school system full engineering rates for their time.

These kinds of controls systems aren't that hard, really, if you don't try to do too much.

The article notes that bugs have been found and fixed in the intervening 30 years, and new necessary functions added, as well.

This is what we call an organic solution.

If you ask me what they should be doing, they should be de-centralizing the system. And they should avoid using packaged desktop hardware and software like the plague.

Why? Most of your packaged stuff are so big and complicated that the OS can't guarantee finite response times.

Desktop is especially offensive because, essentially, the desktop operator has way too much patience. That patience means desktop systems tend to have so many bells and whistles that the entire hardware and software can be brought to its knees trying to pick colors appropriate to advertizing content out of multiple layers of style sheets in web pages.

(You thought you were impatient? Boilers in heating systems have much less patience than you do.)

Yes, if they go to IBM and Intel and Microsoft and ask for a replacement, it will cost a lot of money. Why?

First they have to analyze the existing system. That is really expensive. The rest is not so bad, but they will charge full engineering rates for their work -- and hopefully offer some sort of warranty. (Warranties are expensive!)

Wait. If that approach is taken, replacement could easily balloon into the multiple millions of dollars, because the market leaders will almost invariably try to do it using their pre-packaged systems.

Organic solutions that are shepherded along by competent engineers are usually the most cost-effective (and most effective) solutions because only the necessary functionality is added, only when it is necessary. (Mumbling something about so-called "agile" technologies.)

The problem is that "competence" is something of the luck of the draw. Certification does not really guarantee it'll happen. (Neither do large engineering fees replete with warranty and disclaimer.)

Why doesn't certification help? Think about those tests. Any test you can start and finish in a day is very one-dimensional. It can test a narrow set of knowledge and skills, but there is not enough time in a day, or even two to reach out and begin to test the branches that will be required in real-world application.

Spot-check, at best, which depends on the idea that honest seekers of certification will be studying the whole field, not just the narrow parts that will be tested.

And those taking the certification are motivated by that piece of paper. So they don't focus on the application, only on the part that is tested.

That's the excuse that testing agencies have for hiding the contents of tests. That would not be so bad, but the kinds of test questions that can be given in a certification test are limited to very non-real-world problems. So hiding doesn't really help. Teachers who know something about the field can guess close enough.

Students want to pass. Teachers want their students to pass. So they focus on teaching how to solve the non-real-world problems that can be given in tests.

This wouldn't be all bad, if those who accept certification then go out and apprentice themselves to someone who can guide them along and help them gain the real skills.

I think you can accept this much of my argument. The next step is one you'll likely rebel against.

Good computer and software engineers to apprentice oneself to are far and few between, and they are very busy.

Why?

Even now, we really don't understand much beyond the basics of computer science and software engineering. Sixty stinking years since the early big iron-and^glass, and we are still wasting time and other precious resources re-implementing old solutions.

Why?

We haven't got the basics down right, yet.

(See my posts elsewhere in this blog for bits and pieces of pointers away from the wrong directions we've taken.)

No. Let me get that out of the paranthesis.

We need to return to the technology we've left behind.

Groups like the vintage FB group that sparked this rant aren't really just there to just play around. Young guys go join these groups for fun, they think, but these groups are where essential bits of the old tech are being passed on to a rising generation -- bits that would otherwise be lost.

Here's something else that was recently shared in that group, a video of a group restoring an IBM 1401 and some of the software that ran on it. (You may have to wait for a Grammarly or other (dispargement deleted) ad to see the video.) So they run the old compiler to compile and run an old example program to multiply and invert a large matrix of floating-point numbers. That sample program is not the real goal. The real goal is the technology they used to restore the hardware and software. Some of it uses standard solutions, some of it uses custom stuff built with modern parts, some uses new product of old designs that can still be found via the internet.

These things allow legacy systems that we don't know how to replace to continue to run. They help prevent hospitals and schools from suddenly going dark while the bean-counters aren't willing to wait for organic solutions to be put together.

It's very much useful.

But the IBM 1401 restoration is not nearly as useful as the other stuff these groups are doing, in my opinion.

(Again, look around this blog for bits and pieces of what I'm talking about. But I haven't had time to put the half of it up. One of the pieces I haven't been able to talk much about is real-time OSses. Some of the old real-time OSses still exist, just barely, with significant help from these groups.)

These groups are incubating ways to pursue the branches of tech that have been buried and left behind by the excessive competition of the marketplace and the industry of the last forty or so years.

Why aren't the industry leaders doing anything to fix this situation?

You've heard the old saying

Money is like manure. It's meant to be spread around, to help more little green things grow.

(Green being something of a pun on the color of US current tender.)

Many of the market leaders understand that.

(Pardon me for a momentary lapse of control, but I'm going to rant for a few paragraphs.)

A very few at the top, mostly those who have control of the accounting divisions, understand it differently.

Money is like pus. It tends to collect where the wounds in society are.

This is one reason we should not trust people who amass personal worth too much in excess of what it would require to retire at whatever age they are.

One million USD would be enough for a healthy forty-year old to retire reasonably comfortably right now, more than enough for a healthy sixty-year old.

I'm willing to stretch that a bit for active entrepeneurs, because they need to be able to make business decisions unhindered by the bean counters.

But I'm not going to place very much trust in anyone who has more than ten million in personal worth and is still trying to beat the competion. (... Implicitly, still trying to force his or her world-view on the market with the power such money can give.)

Well, I'm bucking the trends here.

But we need to go back to old CPUs, old systems, old toys that were really much more than toys, and find what we've lost.