Misunderstanding Computers

Why do we insist on seeing the computer as a magic box for controlling other people?
人はどうしてコンピュータを、人を制する魔法の箱として考えたいのですか?
Why do we want so much to control others when we won't control ourselves?
どうしてそれほど、自分を制しないのに、人をコントロールしたいのですか?

Computer memory is just fancy paper, CPUs are just fancy pens with fancy erasers, and the network is just a fancy backyard fence.
コンピュータの記憶というものはただ改良した紙ですし、CPU 何て特長ある筆に特殊の消しゴムがついたものにすぎないし、ネットワークそのものは裏庭の塀が少し拡大されたものぐらいです。

(original post/元の投稿 -- defining computers site/コンピュータを定義しようのサイト)

Wednesday, July 26, 2023

[NOTE] newline insertion on paste bug in gedit

Intermittent bug: 

gedit inserts newlines on paste under certain conditions involving the contents of the search buffer.

gedit --version reports

gedit - Version 3.28.1

from the command line on Ubuntu, uname --all:

Linux {machine-name} 5.4.0-150-generic #167~18.04.1-Ubuntu SMP Wed May 24 00:51:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Looking it up, it might be something inherited from the GTK toolbox.

Specific details.

Searching through a text file for lines beginning with a specified string, such as a label in an assembly language source file. Leaving the search string in the search buffer, select, copy, and paste a section of several lines of text including the searched string. 

gedit inserts blank lines (newline character sequences) before occurrences of searched string.

Example:

MESS    DC.L    DOCOL,WARN,AT,ZBRAN
    DC.L    MESS3-*-NATWID
    DC.L    DDUP,ZBRAN    ; -DUP here is a bug from the original 6800 model, at least.
    DC.L    MESS3-*-NATWID
    DC.L    LIT16
    DC.W    4
    DC.L    OFSET,AT,BSCR,SLASH,SUB,DLINE,BRAN
    DC.L    MESS4-*-NATWID
MESS3    DC.L    PDOTQ
    DC.B    6
    DC.B    'err # '    ; 'err # '
    DC.B    0    ; hand align
    DC.L    DOT
MESS4    DC.L    SEMIS

with \nMESS in the search buffer.

But then when I tried it with 

* This should add two 64-bit numbers:
ADD64STK:
    MOVEM.L    (A6)+,D7/D6/D5/D4
ADDLO    ADD.L    D5,D7
ADDHI    ADDX.L    D4,D6
    MOVEM.L    D6/D7,-(A6)
    RTS

it quit inserting the newlines, even in the original example text.





Sunday, July 9, 2023

Website about the TI Series 0100 Calculator Chips

This is another website I want to remember. It gives a lot of detail on early Texas Instruments and related calculators and the electronics that made them work:

http://www.datamath.org/

In particular, this page is dedicated to the TI's series 0100 chips that were in current production with Intel's 4004:

http://www.datamath.org/Chips/TMS0100.htm

 

 

Saturday, May 6, 2023

No, Higher Costs Were Not the Real Reason for the 8088 in the IBM 5051

[This is a reply to a comment on Hackaday: https://hackaday.io/project/190838-ibm-pc-8088-replaced-with-a-motorola-68000]

*****

For some reason, I can't reply directly to your comment with the eejournal opinion piece [https://www.eejournal.com/article/how-the-intel-8088-got-its-bus/], but I suspect my earlier comment was too brief.

Let me try that again:

The 68000 had built-in support for 8-bit peripheral devices, both in the bus signals and the instruction set. Most of the popular implementations, including the Mac, made heavy use of 8-bit parts, and Motorola had application notes on interfacing other company's 8-bit devices as well as their own. You could mix 8-bit peripherals and 16-bit memory without stretching.

Motorola even had an app-note about interfacing the 68000 directly to 8-bit memory, but any decent engineer would have looked at the note and realized that the cost of 16-bit memory was not really enough to justify hobbling the 68000 to 8-bit memory.  That's one of the reasons the 68008 didn't come out until a couple of years later, and the primary reason that very few people used it. There was no good engineering reason for it.

Well, there was one meaningful cost of 16-bit wide memory: You couldn't really build your introductory entry-level model with 4 kB of RAM using just eight 4 kilobit dRAMs. (cough. MC-10.) You were forced to the next level up, 8 kB. 

IBM knew that the cost of RAM was coming down, and that they would be delivering relatively few with the base 16 kB RAM (16 kilobit by 8 wide) configuration. Starting at 32 kB (16 kilobit by 16) would not have killed the product. Similarly, the cost of the 68000 would come down, and they knew that.

Management was scared of that.

Something you don't find easily on the Internet about the history of the IBM Instruments S9000 was when the project started. My recollection was that it started before the 5150. It was definitely not later. It had much more ambitious goals, and a much higher projected price tag, much more in line with IBM's minicomputer series. There was a reason for the time it took to develop and the price they sold it at. But even many of the sales force in the computer industry didn't understand the cost of software and other intangible development costs.

Consider how much damage the 5150 did to IBM's existing desktop and minicomputer lines. Word Processing? Word Perfect was one of the early killer apps for the 8088-based PC. Spreadsheet? Etc.

IBM management knew too well that if they sold the 5150 with a 68000 in it instead of the 8088, a lot of their minicomputer customers were going to be complaining to high heavens about the price difference. They knew the answer, but their experience showed them that the too many of the customers would not believe it.

That was the real reason. They hoped the 8088 would be limited enough to give them time to maintain control of the market disruption.

I think they were wrong. But it would have taken a level of foresight and vision that very few of management withing IBM had.(very few outside IBM, either.), to take the bull by the horns and drive the disruption.

*****

Anyway, my point was that higher cost wasn't the real reason any more than the (at the time, much-rumored) technical deficiencies of the 68000.

Saturday, March 11, 2023

Mapping the Panasonic Let's Note Japanese Keyboard to the Hatari Emulator

I never owned an ST family computer, which might have been a tactical error on my part. It would have been close to an unadorned 68000-based machine in the way that the Radio Shack/Tandy Color Computer was an unadorned 6809-based machine.

I have been using the Atari ST emulator (simulator) Hatari as a platform for converting the fig Forth model for the 6800 to the 68000.

The keyboard on this Japanese Panasonic Let's Note does not map well to Atari ST keyboard. The default mapping leaves important keys like equal (=) unavailable. (Some keys are available by using the FN key to select the ten-key pad that starts on the 7 key.)

 For comparison, the Atari ST (US) keyboard is laid out like this:

!@#$%^&*() _+~
1234567890 -=`

QWERTYUIOP{}
qwertyuiop[] del

ASDFGHJKL:"  |
asdfghjkl;'  \

ZXCVBNM<>?
zxcvbnm,./

And it's precisely around where the equals key is, that the keyboard mapping goes wonky.

I've been doing almost all the source code editing in Gedit, under Ubuntu, just using Hatari for assembling and test runs. But I'm now into some really difficult debugging sessions, and the mapping of the keyboard is getting in the way. 

So today I dug into Hatari's keyboard remapping, invoked something like:

hatari -k keymap.text

on the command line. 

In the source code for Hatari, there are some utilities in the tests/keymap directory for looking at what SDL sees the keyboard generating (if I got this right) -- listkeys.c and checkkeys.c. I downloaded the source code from the git repository and changed to the tests/keymap directory, and ran make, and got the executables there. Don't need them in the general path, you can execute them in place with

./listkeys

and 

./checkkeys

They gave me some clues and not much more. Taking a look at the example keymap file in the source was also not very enlightening. And neither was the man page from SDL:

man SDLKey

But after working through those, and after some reading in forums and playing around with Hatari's remapping a bit. I figured out what to put in the keymap file: 

Each line consists of the key you want to remap, a comma, and the SDL (?) scancode. 

Sort of. 

Figuring out the scancodes was a bit tricky. First, I tried the codes I learned from the checkkeys and listkeys utilities to see how they would work:

-,45
^,94
@,64
[,91
],93
;,59
:,58
/,47
\,92
The results weren't even close to what I wanted.

So I used a little trick involve perverse keyboard mappings. What I did was line up the alphabet keys and just arbitrarily mapped almost all of them to sequential codes:

-,45
a,1
b,2
c,3
d,4
f,5
g,6
h,7
j,8
k,9
l,10
m,11
n,12
o,13
p,14
q,15
r,16
s,17
u,18
v,19
w,20
y,21
z,22

(The mapping for hyphen was the one I was able to make sense of from the utilities, but it turned out not to be one I am using.)

This is better than random guessing because it allows testing a bunch of the scan codes at once, and it helps you remember which ones you've tried. You can add the ones that look like they work at the top, like I did with hyphen, so you can test them -- and so you don't forget them.

I think I gleaned one scancode from the sequence 1 to 22, then similarly gleaned a few more from the next set, starting at scancode 23 to about 43 temporarily and perversely mapped to key a through y. 

(I left e, i, t, and x undefined to allow typing the exit command, and, after the first set, I left z un-assigned so I could use ctrl-Z to invoke the debugger.)

But I couldn't figure out how to map individual scancodes. Each remapping seems to be done as a pair, which is kind of awkward. 

Ultimately, I used this file:

-,12
[,26
],27
^,13

and, while it makes the equals key available, it maps it to the caret/tilde key -- which leaves caret and tilde unavailable.

It's not a good fit. It matches neither the keycaps on the PC keyboard nor the layout of the Atari ST keyboard. 

But I think it will allow me to proceed with debugging.

So I'll leave this post here for my own notes and post a link here to a Hatari forum for the developers, if I can figure out the appropriate forum.

Saturday, November 12, 2022

8080 Assembly Language Crib Sheet

The 8080 is messy. I have a fairly easy time remembering the 680X assembly languages. I don't have nearly as easy a time remembering the 8080 operators, allowed operands, flags, etc. 

So I'm putting up a crib sheet, mostly for myself:

8080 Registers (8 & 16 bit)
Temporary Registers B C
Temporary Registers D E
Index High/Low
H L
Accumulator/Status
PSW A
Stack Pointer SP
Program Counter PC

(Need to add some better short-short summary stuff here when I figure out how to organize it.)

R byte operands --
registers B,C,D,E,H,L,A
memory M pointed to by HL

Condition code flags (Program Status Word==PSW), in order --
Sign, Zero, (0), Auxilliary Carry, (0), Parity, (1), Carry

RP 16-bit operands --
subset of register pairs B:C, D:E, H:L, SP, PSW:A

index operands
M (H:L pair)
X:B (B:C pair), D (D:E pair)

ORG D
set origin (assembly address) to absolute address D

L EQU V
define invariant value of label/symbol

L SET V
set value of label/symbol
SET labels may be redefined.

END
end

DB/DW V
define a label and allocate and store byte or word value V there

DS SZ
define a label and only reserve space of size SZ

STC/CMC {C}
set/complement carry

INR/DCR R {ZSPA}
byte increment/decrement R/M

CMA {}
complement A

DAA {ZSCPA}
decimal adjust A

NOP
No OP

MOV Rdest,Rsrc {}
move byte data R/M
But MOV M,M is not valid.
Other than disallowed M, to self is effective NOP.

MVI R,I {}
move 8-bit immediate data from instruction stream to R/M

LDA/STA D {}
load/store (move) 8-bit data at direct (absolute) 16-bit address D to A
or 8-bit data in A to direct (absolute) 16-bit address D

LDAX/STAX X {}
load/store (move) A indexed by B:C or D:E

LHLD/SHLD D {}
load/store (move) 16-bit data at direct (absolute) 16-bit address D to H:L
or 16-bit data in H:L to direct (absolute) 16-bit address D

LXI RP,I {}
move 16-bit immediate data from instruction stream to RP
Destination can be B (B:C), D (D:E), H (H:L) or SP.

ADD/ADC R {CSZPA}
add without or with carry R/M to A
ADD A is effectively shift left, but note flags.

ADI/ACI I {CSZPA}
add without or with carry immediate date to A

SUB/SBB/CMP R {CSZPA}
subtract/compare without or with borrow R/M from A
SUB A clears A and sets the flags accordingly.
Sense of C flag in compare inverted if operand signs differ.

SUI/SBI/CMP I {CSZPA}
subtract/compare without or with borrow immediate data from A
Sense of C flag in compare inverted if operand signs differ.

ANA R {CZSP}
bit-and R/M into A
Carry is always cleared.

ANI I {CZSP}
bit-and immediate data from instruction stream into A
Carry is always cleared.

XRA R {CZSPA}
bit exclusive-or R/M into A
Carry is always cleared.

XRI I {CZSPA}
bit exclusive-or data from instruction stream into A
Carry is always cleared.

ORA R {CZSP}
bit-or R/M into A
Carry is always cleared.

ORI I {CZSP}
bit-or data from instruction stream into A
Carry is always cleared.

RLC/RRC R {C}
8-bit left/right rotate with carry R/M

RAL/RAR R
9-bit left/right rotate through carry R/M

PUSH/POP RP {},{all}
push/pop register pairs:
B (B:C), D (D:E), H (H:L), PSW (flags:A)
Condition codes only affected by POP PSW/A.

DAD RP {C}
16-bit add of register pair into H:L,
RP can be B:C, D:E, H:L, SP
DAD H is shift left with carry.

INX/DCX {}
increment/decrement register pair
RP can be B:C, D:E, H:L, SP

XCHG {}
16-bit exchange D:E with H:L

XTHL {}
16-bit exhange of top of stack with H:L

SPHL {}
16-bit move H:L to SP

PCHL {}
move H:L to PC
This is the 8080's indexed jump.

JMP D {}
jump uncoditionally to direct (absolute) 16-bit address

JC/JNC D {}
jump if C (carry) set/clear to direct (absolute) 16-bit address
(Carry/No Carry)

JZ/JNZ D {}
jump if Z (zero) set/clear to direct (absolute) 16-bit address
(Zero/Not Zero)
Effectively equal/not equal after a subtract or compare.

JM/JP D {}
jump if S (sign) set/clear to direct (absolute) 16-bit address
(Minus/Plus)

JPE/JPO D {}
jump if P (parity) set/clear to direct (absolute) 16-bit address
(Even/Odd)

CALL D {}
call unconditionally to direct (absolute) 16-bit address
Push address of next instruction on stack and jump.

CC/CNC D, CZ/CNZ D, CM/CP D, CPE/CPO D
Conditional calls, same conditions as conditional JMPs.

RET {}
return unconditionally to address saved on stack
Pop top of stack into PC.

RC/RNC, RZ,RNZ, RM/RP, RPE/RPO
Conditionally return to address saved on stack,
same conditions as conditional JMPs.

RST N {}
save address of next instruction on stack and jump to address N times 8
N is 0 through 7, yielding address from 0 to 56 on 8-byte boundaries.
Effects a software version of a numbered interrupt.
Use ordinary RET or conditional return to return.
Interrupt routine must explicitly save state of all registers used.

DI/EI {}
disable/enable interrupts
Clears/sets the INTE interrupt enable flip-flop.

IN/OUT P {}
load A from/store A to 8-bit port number P
P is an address in port space between 0 and 256.

Okay, I think I got the HTML right on that without losing any of the entries.

Sunday, October 23, 2022

Security Misfeature Report for Google GMail

Had another little unpleasant surprise from Google today.

I wonder, how many would agree with me that this is a misfeature and reflects poorly on Google's changing attitudes towards privacy and security?

Here it is:



If you can't see the message, it says

It seems like you forgot to attach a file.

You wrote "is attached" in your message, but there are no files attached. Send anyway?

Cancel   OK

Seems convenient, doesn't it?

Let's think about this.

In case you missed it, here's what Gmail's deep inspection keyed on:


The sign is attached to the desk.

What do you think? Is Google going too far with this?


Sunday, July 3, 2022

A Critique of Motorola's 68XX and 680XX CPUs

I want to note at the top here, that this is not about which company's CPU was better. This is not about comparing CPUs at all.

And this is not disparaging Motorola. Motorola did a pretty decent job of designing each of their CPUs, especially when considering that they were not just pioneering microprocessor design. Engineers with experience designing CPUs were basically all already employed, mostly by other companies. (And many of those CPU engineers didn't really understand CPUs all that well, after all.) Motorola was also pioneering the design of CPUs in general.

The engineers at Motorola did a good job. But nobody's perfect. 

Taking these in the order that Motorola produced them:

6800 Niggles:

(1) It's not hard to guess that the improper optimization of the CPX (compare X index register) instruction was an attempt to be too clever, a bad case of penny-pinching and setting arbitrary deadlines, an oversight, or any and all of the foregoing. But, as a result, the branches implementing signed and unsigned comparisons just don't do what they would be expected to do after CPX.

  • C (Carry) is simply not affected by CPX on the 6800 (and 6802), so the branches implementing unsigned compare, BCC, BCS, BHI, and BLS just won't work after CPX. 
  • V (oVerflow) is the result from comparing the most-significant byte only, so the branches implementing signed comparison, BGE, BGT, BLE, and BLT fail in hard-to-predict ways after CPX. 
  • N (Negative) is also the result from comparing the most-significant byte only. It may not seem that this is a problem for BPL (branch if plus) and BMI (branch if minus), but the programmers' manual says neither N nor V are intended for conditional branching. It seems to me that the N flag will actually be set correctly after the CPX, giving the sign of the result of the thrown-away subtraction of the argument address from the address in X. But using BPL and BMI in ordered comparison is just going to be a bit fiddly, no matter what. You probably just won't get what you thought you wanted if you use BPL or BMI after CPX.

Z (Zero) is the result of all 15 bits of the result of the compare, so BEQ (branch if equal) and BNE (branch if not equal) after a CPX work as expected.

In the abstract sense, pointers were thought at the time to be necessarily unordered, so it sort of didn't seem to matter. Ideally, you wouldn't be comparing addresses for order. But real algorithms often do want to give pointers order, and that meant that, on the 6800, you would have to use a sequence of instructions to cover all the cases in ordered comparison, because you couldn't rely on CPX alone.

This mis-feature was preemptively prevented in the designs of the 68000 and the 6809, and was fixed, pretty much without issue, in the 6801. In the 6805, it's prevented by making the X register an 8-bit register anyway, more on that below.

(2) Addressing temporary variables and parameters on a stack required using X, and if you had something you needed in X, you had to save X somewhere safe -- which meant on a stack if you wanted your code to be re-entrant. But the 6800 had no instructions to directly push or pop X. That left you with a conundrum. You had to save X to use X to save X. 

So you had to use a statically allocated temporary variable. Statically allocated temporaries tend to introduce race conditions even in single-processor designs, because you really don't want to take the time to block interrupts just to use the temporaries, especially for something like adjusting a stack pointer.

You can potentially work around the race conditions in some cases by having your interrupt-time stack pointers separate from your non-interrupt-time stack pointers, but that can also get pretty tricky pretty quickly.

The 6801 provides push (PSHX) and pop (PULX) instructions for X.

Stack-addressable temporary variables and parameters were supported by definition in the 68000 and 6809 designs, but not on the 6801. They were considered out of scope on the 6805, but were addressed on descendants of the 6805.

(3) This niggle is somewhat controversial, but using a single stack that combines return addresses and parameters and temporary variables is a fiddly solution that has become widely accepted as the standard. Even though it is accepted, and learning how to set up a a stack frame is something of a rite-of-passage, setting up stack frames to keep the regions on stack straight consumes cycles, even when it can be done without inducing race conditions (see the above niggle about using X to address the stack.)

Separating parameters and temporaries from return addresses is supported by design on the 68000 and 6809, but not on the 6801 or 6805.

(4) The lack of direct-page mode op-codes for the unary operators was, in my opinion, a serious strategic miss. Sure, you could address variables in the direct page with extended mode addressing, but it cost extra cycles, and it just felt funny. 

To explain, the binary instructions (loads, stores, two-operand arithmetic and logic) all have a direct-page mode. This allows saving a byte and a cycle when working on variables in the direct page (called zero page on other processors -- addresses from 0 to 255). 

The unary instructions (increment/decrement, shifts, complements, etc.) do not. The irony is that the unary instructions are the ones you use on memory when you don't want to waste time and accumulators loading and storing a result.

This may have been another attempt to save transistors by not implementing every possible op-code. But a careful re-examination of the op-code table layout map indicates that it should have been possible without using significantly more transistors. In fact, I'm guessing it actually required more transistors to do it the way they ended up doing it. 

Or it may have been an attempt to avoid running into the situation where they would need an op-code for something important but had already used all of the available codes in a particular area of the map. But, again, re-examining the op-code map would have revealed room to fit the op-codes in. 

Maybe there just wasn't enough time to re-examine and reconsider the omissions before the scheduled deadlines, and they thought absolute/extended addressing should be good enough. 

I'll come back to the reasons it really wasn't further down.

This one was also fixed in the designs of the 68000, and 6809, and sort-of in the 6805, but not addressed or fixed in the 6801.

Fixing it in the 6801 would have been awkward after-the-fact tack-on, but I'll look at that below. 

(5) The 6800 had a few instructions for inter-accumulator math -- ABA (add B to A), SBA (subtract B from A), and CBA (compare B with A, which is SBA but without storing the result). 

But it's missing the logical instructions AND, OR, and EOR (Exclusive-OR) of B into A, and doesn't have any instructions at all going the other direction, A into B. 

Surprisingly, this is not hard to work around in most cases, but the workarounds are case-by-case tricks with the condition codes. Otherwise, you're back to using statically allocated temporaries, and care must be taken to avoid potential race conditions by such things as using the same temporaries during interrupt processing.

This is fixed in the design of the 68000, and eliminated from the scope of the 6805, effectively fixed in the 6809 (by the addition of stack-relative addressing for temporaries), and partially addressed in the 6801 (by adding 16-bit math, the most common place where it becomes a problem, more below).

(6) The 6800 has no native 16-bit math other than incrementing, decrementing, and comparing X, and incrementing and decrementing S. Synthesizing 16-bit math is straightforward, but -- especially without the inter-accumulator logical operators -- it does require temporary variables, requiring extra processor cycles and potentially inducing race conditions.

Also, you usually need one or more extra test cases to cover partial results in one or the other byte, or the use of a logical instruction to collect the results, and  it's easy to forget or just fail to complete the math, per the problem with CPX.

And you need 16-bit arithmetic to deal with 16-bit addresses.

This is solved on the 6809 and 6801 by adding 16-bit addition and subtraction. On the 68000, the problem becomes 32-bit math, and it's solved for addition and subtraction, but, oddly, not quite completely for multiplication and division, more below.

(7) To explain this last niggle, of the above niggles, (1), (2), (3), (5), and (6) can be solved in the software application/operating system design by appropriate declaration of global pseudo-register variables, and globally accessible routines to handle the missing functionality, exercising care to separate variables and code for interrupt-time functions from those for non-interrupt-time functions. (These global routines and variables are a core feature of most 8-bit operating systems.)

For example, if your system design declares and systematically uses something like the following:

  ORG $80 ; non-interrupt time global pseudo-registers
PSTK RMB 2 ; two bytes for parameter stack pointer
QTMP RMB 2 ; temporary for high bytes of 32-bit quadruple accumulator
DTMP RMB 2 ;  temporary for 16-bit double accumulator
XTMP RMB 2 ; temporary for index math and copy source pointer
YTMP RMB 2 ; temporary for index math and copy destination pointer

  ORG $90 ; interrupt time global pseudo-registers
IPSTK RMB 2 ; two bytes for parameter stack pointer
IQTMP RMB 2 ; temporary for high bytes of 32-bit quadruple accumulator
IDTMP RMB 2 ;  temporary for 16-bit double accumulator
IXTMP RMB 2 ; temporary for index math and copy source pointer
IYTMP RMB 2 ; temporary for index math and copy destination pointer

... and  if all the processes running on your system respect those global variable declarations, then you may at least have a way to avoid the race conditions.

But that chews a piece out of the memory map for user applications. 

Now, if the unary operators all had direct-page mode versions, see niggle (4) above, the processor could also define a direct-page address space function code, along several other such function codes, allowing the system designer to optionally include hardware to separate the direct-page system resources from other resources in the address map, such as general data, stack, code, interrupt vectors, etc.

Two or three extra address lines could be provided as optional address function codes, to allow hardware to separate the spaces out.

This looks kind of like the I/O instructions on the 8080 and 8086 families, but it isn't separate instructions, it's separate address maps.

An example two-bit function code might be

  • 00: general (extended/absolute) data and I/O
  • 01: direct-page data and I/O
  • 10: code/interrupt vectors
  • 11: return address stack
Such extra address function signals can improve the utilization of the cramped 64 kilobyte address space, even though they would require increasing the number of pins on the processor package or multiplexing the functions onto some other signals, raising the effective count of external parts. 

But they provide a place for such things as bank-switch hardware, in addition to general I/O and system globals and temporaries, without having to eat holes in general address space. And completely separating the return pointer stack from general data greatly increases the security of the system.

I'm not sure if Motorola ever did so in any of their evolved microcontrollers, but this could also potentially allow optimizing access to direct-page pseudo-registers when direct-page RAM is provided on-chip in integrated system-on-a-chip devices like the 6801 and 6805 SOC packages.

The 68000 provides similar address function codes, but the address space on the 68000 is so much bigger than 64 kilobytes that the address function codes have been largely ignored.

Before Motorola began designing new microprocessors, such niggles in the 6800 were noticed and discussed in engineering and management within Motorola. The company decided to analyze code they had available, including internally developed code and code customers shared with them for the purpose of the analysis, looking for bottlenecks and inefficient sequences that an improved processor design could help avoid. The results of this code analysis motivated the design of the 68000 and the 6809. 

The 68000 and the 6809 were designed concurrently, by different groups within Motorola.

 

68000 Niggles:

The 68000 significantly increases the number of both accumulators (data registers) and index registers, and directly supports common address math in the instruction set. It also widens address and data registers to 32 bits. They solved a lot of problems, but they left a few niggles.

(1) The processor was excessively complex. Having a lot of registers reduced the need for complex instructions and for instructions that operated directly on memory without going through registers, but the 68000 did complex instructions and instructions that operated directly on memory, as well. 

IBM was just beginning work on the 801 (followup to the ROMP) at the time, and reduced instruction sets were still not a common topic, so the assumption of complexity can be understood.

Still, the complexity required a lot of work to test and properly qualify products for production. 

(2) They got the stack frame for memory management exceptions wrong. That is, memory management hardware turned out to work significantly better using the approach they did not initially choose to support, so the frames they had defined did not contain enough information to recover using the preferred memory management techniques. This was fixed in the 68010.

(3) The exception vector space being global made it difficult to fully separate the user program space from the system program space. This was also fixed in the 68010.

(4) Constant offsets for the indexed modes were limited to 16 bits. This seems to be another false optimization -- not fatal because they included variable (register) offsets in the addressing modes, so you could load a 32-bit offset into a data register to get what you wanted. But it still had a cost in cycle counts and register usage. This was not fixed until the 68020, and then they went overboard, making the addressing even more complex, which made the 68020 even harder to test.

(5) They added hardware multiplication and division to the 68000, but they didn't fully support 32 bit multiply and divide. This also was not fixed until the 68020. This can make such things as accessing really large data structures in memory suddenly become slow, when the index to the data structure exceeds 32,767.

Of the above, (4), and (5) could conceivably have been dealt with in the initial design, if management had not been pushing engineering to find corners to cut. The first three were problems that simply required experience to get right.


6809 Niggles:

The 6809 does not increase the number of accumulators, but it does add instructions that combine the two 8-bit accumulators, A and B, into a single 16-bit accumulator D for basic math -- addition, subtraction, load, and store. 

On the other hand, it does increase the number of indexable registers to six, and it adds a whole class of address math that can be incorporated into the addressing portion of the instructions themselves, or can be calculated independently of other instructions. 

It supports using two of the index registers as stack pointers, and thus supports stack addressing, so that race conditions can generally be completely avoided by using temporary variables on stack. (In comparison, the 68000 can use any of the 8 address registers visible to the programmer as stack pointers.)

One of the stack-pointer capable registers can be used as a frame pointer, making stack frames less of a bottleneck. Or it can be used as a separate parameter stack pointer, pretty much eliminating the bottleneck and improving security. (In comparison, the 68000 includes an instruction to generate a stack frame, which, of course, you don't need when you use properly split stacks. It also includes an entirely superfluous instruction to destroy a stack frame.)

One of the index-capable registers is the PC, which simplifies such things as mixing tables of constants in the code. (This is also supported on the 68000, making a ninth index-capable register for the 68000.)

One of the index registers (DP, for direct page) is a funky 8-bit high-byte partial index for the direct page modes it inherits from the 6800. (This is not done on the 68000, but any of the 68000's address registers can be used in a similar way, with short constant offsets for compact code and reduced cycle counts.)

All unary instructions have a direct page mode op-code, which saves byte count if not cycle count.

(1) As a minor niggle, I can't tell that not providing a full 16-bit base address for the direct addressing mode actually saved them anything in terms of transistor count and instruction cycle count, but we are probably safe in guessing that was their reasoning for doing it that way. It is still useful, although it might have been more useful to have provided finer-grain control of the base address of the direct page. (See above about using any address register in the 68000 in a similar way.)

The DP can be used, with caveat, as a base for process-local static allocations, which greatly reduces potential for inadvertent conflicts in use of global variables and race conditions.

(2) Another niggle about the direct page, the caveat, is that the direct page base is not directly supported for address math. Just finding where the direct page is pointing requires moving DP to the A accumulator and clearing the B accumulator, after which you can move it to one of the index registers. Cycle and register consuming, but not fatal.

(3) A third niggle about both the direct page and the indexed mode, it seems like cycle counts for both could have been better. The 6801 improved cycle counts for both, making the 6809 seem less attractive to engineers seeking for speed. It would have been nice for Motorola to have followed the 6801 with an improved 6809 that fixed the DP niggles and cycle count niggles.

(4) The 6809 also does not have address function code signals. The overall design provides enough power to implement mini-computer class operating systems, but the 64 kilobyte address space then limits the size of user applications. Address function signals that allow separating code, stack, direct page, and extended data would have eased the limits significantly.

On the other hand, widening the index registers would have done even more to ease the addressing restrictions. (I've talked about that elsewhere, and I hope to examine in more carefully sometime in a rant on how the 6809 could have evolved.)

(5) Other than those niggles, the 6809 is about as powerful a design as you can get and still call a CPU an 8-bit processor. In spite of the fact that it would have meant letting the 6809 compete with the 68000 in the market, they could have used the 6809 as the base design of a family of very competitive 16-bit CPUs.

In other words, my fifth niggle is that Motorola never pursued the potential of the 6809. 

(6) but not really -- 8-bit CPUs are generally focused on keeping transistor count down for 8-bit applications, so hardware multiplication and division of 16-bit numbers doesn't really make sense in an 8-bit CPU design. This is probably the reason the 6809 only had 8- by 8-bit multiplication, and also probably the reason for the irregular structure of the operation. 

A similar 8-bit  division of accumulator A by accumulator B yielding 8 bits of quotient and 8 bits of remainder might make sense, but I'm not sure we should want to waste the transistors.

16-bit multiply and divide would have been good for a true 16-bit version of the 6809, but that would include a full 16-bit instruction set.

 

6801 Niggles:

When the 6809 was introduced in the market, it was still a bit too much complexity in the CPU to comfortably integrate peripheral parts -- timers, serial and parallel ports, and such -- into the same semiconductor die that contained the CPU. So Motorola decided to fix just a few of the niggles of the 6800 for use as a core CPU in semi-custom designs that included on-chip peripheral devices.

(It's something that is commonly misunderstood, that the 6801 actually came after the 6809 historically, but is best understood as a slightly improved 6800, not as a stripped-down 6809. Three steps forward, three steps back, half a step forward.)

As noted above, they fixed the CPX instruction in the 6801, but they did not fix the lack of direct-page unary instructions. They also added instructions to directly push and pop the X index register, which greatly helped when you had something in X that you needed to save before you used X for something else. 

And they added the 16-bit loads, stores, and math that combined A and B into a single 16-bit double accumulator D -- similar to the 6809, which overcame a lot of the other niggles about the 6800. In particular, you don't feel the lack of an OR B with A instruction to make sure both bytes of the result were zero, because the flags are correctly set after the D accumulator instructions. 

And they included the 8-bit multiply A by B from the 6809. They also included a couple of 16-bit double accumulator shifts, but only for D, not for memory, which is a very minor niggle, an engineering trade-off.

They also added an instruction to add B to X, ABX, to help calculate the addresses of fields within records. 

This brings up niggle (1) -- ABX is unsigned, and they did not include a subtract B from X instruction. Being able to subtract B from X, or add a negative value in B to X, would have significantly helped with allocating local variable space on the stack. As it is, ABX is primarily useful for addressing elements with records and structures.

Although I/O devices tended to be assigned addresses in high memory on early 6800 designs, the 6801 put the built-in I/O devices in the direct page. They also put a bit of built-in RAM in the direct page, starting at $80.

But, as I noted above, niggle (2) is that they did not add direct-page mode unary instructions.

If they had done so, either they'd have broken object code compatibility with the 6800, or they'd have had to spread the direct-page op-codes in awkward places in the 6800, which definitely would have cost transistors that they wanted for the I/O devices and such. Either way, I think it would have been worth the cost.

I put together a table showing one possible way to spread them out among unimplemented op-code locations in the inherent/branch section of the op-code table for a chapter of one of my stalled novels, and I'll just copy below a list of where I allocated the direct page op-codes:

  • NEG direct: $02
  • ROR direct: $12
  • ASR direct: $03
  • COM direct: $13
  • LSR direct: $14
  • ROL direct: $15
  • ASL direct: $18
  • DEC direct: $1A
  • INC direct: $1C
  • TST direct: $1D
  • JMP direct: $1E
  • CLR direct: $1F

That doesn't prove anything other than that there were ultimately enough op-codes available. But I'm guessing this layout could be done with a hundred or less extra transistors -- transistors that admittedly would then be unavailable for counters or port bits. But it could be done, and it wouldn't have cost that much.

Also, with these in the op-code map, they could have provided this version of the CPU for compatibility, and then provided another version with the direct-page op-codes correctly laid out for customers who were willing to simply re-assemble their source code. (That's all it would have taken, but many customers wouldn't be willing to take a chance that something would sneak up and bite them.)

One possible more efficient layout would have been to repeat the addressing of the binary op-code groups. Working from the right in the opcode map, there are four columns for accumulator B binary operators and four columns for accumulator A binary operators:

  • $FX is extended mode B, and $BX is extended mode A;
  • $EX is indexed mode B, and $AX is indexed mode A;
  • $DX is direct page B, and $9X is direct page A;
  • $CX is immediate mode B, and $8X is immediate mode A. 

In the existing 6800, this continues down two more for the unaries, but then you have the unary A and B instructions:

  • $7X is extended mode unary;
  • $6X is indexed mode unary;
  • $5X is B unary;
  • $4X is A unary.

Then you have inherent mode instructions in columns $3X, $1X, and $0X, with the branches in column $2X.

In a restructured op-code map, it could be done like this:

  • $7X is extended mode unary;
  • $6X is indexed mode unary;
  • $5X would be direct page unary;
  • $4X would be B unary;
  • $0X would be A unary.

And the inherent mode operators would be more densely packed in the $1X and $3X columns.

This would require either moving the negate instructions or the halt-and-catch-fire instruction, I suppose. [I'm not finding my reference that had me thinking the 6801's test instruction was at $00. Cancel that thought.] Interestingly, when Motorola laid out the op-code map for the 6809, they kept A and B in columns $4X and $5X, and put the direct page in column $0X -- and left the negate at row $X0, so that they had to move the test instruction. [Again, I'm not finding my reference on the location of the 6809's test instruction. But they did leave negate where it was.]

Also interestingly, the 6801 has a direct-page jump to subroutine, which could be put to good use for a small set of quick global routines (like stack?). (The op-code is $9D, which some sources say was one of the accidental test instructions in the 6800).

Niggle (3) about the 6801 is that I think they should have split the stack. Add a parameter stack U, and then pushes and pops (PULs) would operate on the U stack, but JSR/BSR/RET would operate on the S stack. This would make stack frames much less of a bottleneck, make it possible to reduce call and return cycle counts, and increase general code security somewhat.

(Note again that the 6809 and the 68000 both directly support this kind of split stack. It was the education system that failed to teach engineers to use it.)

And I'll note here that the 68HC11 derivative of the 6801 added, among other things, a Y index, but no parameter stack.

 

6805 Niggles

Really the only niggle I have with the 6805 is the lack of a separate parameter stack, and the lack of any push/pop at all in the original 6805. Motorola did add pushes and pops to some derivatives of the 6805, but they were on the same S stack as the return address was going to.

The idea of an 8-bit index that could have a 16-bit base (as opposed to an offset) was novel to me when I first looked at the 6805, but it is rather useful. Instead of thinking in terms of putting a base address in X and then adding an offset, you think in terms of having a constant base address -- like an array with a known, fixed address, and the X register provides a variable offset. Indexed mode for binary operators includes no base, 8-bit base, and 16-bit base, allowing use anywhere in the address space. 

A small caveat is that unary operators do not have 16-bit base address indexed versions. This is a valid engineering tradeoff, and they cut the right corners here, fully supporting unary instructions for variables in the direct page.

The 8-bit index does not support generalized copying and other generalized functions needed to support self-hosted development environments (without self-modifying code), but that's not necessarily a problem. Hosted development environments are much more powerful tools than self-hosted. (I think a very small Tiny-BASIC interpreter could be constructed without self-modifying code, but that's more of an application than a self-hosted dev environment.)

It does make the CPX operator much simpler -- as an 8-bit operator.

Motorola ultimately extended the index with an XHI in some derivatives of the 6805, which would have allowed self-hosting for those derivatives, but we won't go there today. Also, we won't look at the 68HC11 in detail today. Nor will we do more than glance at the 68HC12 and 68HC16, even though both are quite interesting designs -- in spite of not having split stacks.

I think this is enough to show that Motorola really did do a fairly decent job with their CPU designs.

Actually comparing CPUs, by the way, requires producing a lot of parallel code implementing several real-world applications for each CPU compared. I'd like to do that someday, but I doubt I'll ever have the spare time and money to do so.