Misunderstanding Computers

Why do we insist on seeing the computer as a magic box for controlling other people?
人はどうしてコンピュータを、人を制する魔法の箱として考えたいのですか?
Why do we want so much to control others when we won't control ourselves?
どうしてそれほど、自分を制しないのに、人をコントロールしたいのですか?

Computer memory is just fancy paper, CPUs are just fancy pens with fancy erasers, and the network is just a fancy backyard fence.
コンピュータの記憶というものはただ改良した紙ですし、CPU 何て特長ある筆に特殊の消しゴムがついたものにすぎないし、ネットワークそのものは裏庭の塀が少し拡大されたものぐらいです。

(original post/元の投稿 -- defining computers site/コンピュータを定義しようのサイト)

Thursday, November 28, 2024

How Address Function and Caches Should Interact (6801-derived Example)

In addition to daydreaming about address function, segmentation, caches and the 6809, I daydream similar things about the 6801.

But the 6801 does not have Y, U, or DP. 

So if I want to daydream about a 6801-derived CPU that provides address function output to allow distinguishing between the source of the address something like the following,

  • 000 for code
  • 001 for interrupt vectors
  • 010 for U relative (parameter stack)
  • 011 for S relative (return address stack)
  • 100 for general data (extended/absolute, X, Y)
  • 101 for direct page
  • 110 and 111 for DMA

with customized caches for each, since access patterns for each are so different, I'll have to daydream about extending the 6801 in a somewhat different path from the 68HC11.

And being able to do things like
  • indexing into ROMmed jump tables at run time,
  • copying strings out of ROMs,
  • using X or Y to access arrays on the parameter stack,
  • and such.

just means we're going to use the more general solution. Instead of just adding address function output bits and relying on bank switching, I have to daydream about widening the address bus and extending the index registers, either with segment registers (preferably paired with limit registers) or simple extra bits in the registers.

And if the direct page (page zero) is going to be separate from absolute addresses, I'm going to need some register to distinguish them. Maybe absolute would only be the low 64K, or maybe I'd add an absolute base page register to extend absolute addresses out to whatever maximum. And this CPU would get a DP register like the 6809's, I guess. Maybe it would only have 8 bits, and direct page would only be in the low 64K. Or even maybe less. Or even split the direct page so that the low half (I/O) and the high half (pseudo-registers) could be moved independently. Decide all that later.

8 bits of extension would be more than enough for 6801-sized applications, but would 5 bits? 5 for ordinary address plus 3 for function would be 8 bits of extension, for 24 bits, total.

21 bits of address is 2 megabytes per address space. I can see that being a bit tight for some things that I'd still want to do on a fundamentally 8-bit CPU, but it's okay to leave that question for later.

(The 6809 is actually fundamentally 16-bit, 8-at-a-time, just for reference. It's the LEA instructions.)

In addition to Y, we'll need an additional U stack for the parameters. It would be nice to have it indexable like X and Y, but by providing TXU, TUX, TYU, and TUY, we could get the local variable addressing we need without thrashing the index registers so much. So if making U indexable induced significant complexity, we could leave that out and rely on the widened X and Y to give us access to the parameters.

We can keep the constant 8-bit unsigned offsets of the 6801/HC11, but we need to add an SBX instruction to subtract B from X to do real pointer math, and we need the corollary ABY/SBY. 

ABU and SBU would be nice, if we are going to include direct indexing of the parameter stack pointer.

And, incidentally, add immediate to index register is something I really wish Motorola had added for the 6801, either 16-bit signed AIX (and AIY and maybe AIU in this CPU), or by unsigned 8-bit AIX/SIX (and AIY/SIY and maybe AIU/SIU.

With all the transferring and adding, maybe we just need to trade the implicit operand instructions for register-register instructions. 

This kind of extension would give us enough to allow PC caching by fill-ahead with one branch of fill, checking whether the branch code is in cache first. Wouldn't need as much as the 6809, since total instruction length is shorter, maybe? But then there will be the long address forms, so probably 32 bytes each branch.

Return address stack cache with spill-fill, 2/4 centered hysteresis, eight entries would probably be good enough, maybe 32 if we simply want to make sure (most) embedded applications would not need external return address stack.

64 bytes of spill-fill parameter stack cache with hysteresis would probably be plenty.

A different approach to the direct page, the DP could be banked in small blocks, 32 bytes per block seems best to me, but maybe 64 if it's enough simpler. Only addresses $80 to $FF would be cached to RAM, since addresses $00 to $7F would be for I/O. 

Rather than automatic caching, the DP RAM would then just be internal RAM, 1 kilobyte addressed from $400, with the bank switching circuit dual-mapping small blocks down to blocks in the $80-$FF range. The bank switching circuit might be tied to an active process number. We can figure out the details later.

Again, maybe not even bother caching general data accesses.

How Address Function and Caches Should Interact (6809 Example)

I keep daydreaming about a 6809 derived CPU that provides address function output (like the 68000's address functions, bit not the same) to allow distinguishing between the source of the address, say,
  • 000 for PC relative (code)
  • 001 for interrupt/reset vectors
  • 010 for U relative (parameter stack)
  • 011 for S relative (return address stack)
  • 100 for general data (extended/absolute, X, Y)
  • 101 for direct page relative
  • 110 and 111 for DMA, maybe.

And customized caches for each, since access patterns for each are so different.

And I keep remembering that with 16-bit addressing, even this would be barely big enough for limited Fuzix.

And I keep remembering that X and Y, being general index registers, will need extra bits if you want to do things like

  • indexing into ROMmed jump tables at run time,
  • copying strings out of ROMs,
  • using X or Y to access arrays on the parameter stack,
  • and such.

And then I remember that widening the address bus and the index registers is the more general solution -- either via segment registers (which I think I would pair with limit registers) or by simply widening the registers. And then I wander off into ways of extending the address range of the extended/absolute mode and DP-relative, maybe adding a second DP-like register for I/O, and such. (Widened DP relative and IOPage relative addressing could be added via unused index mode bit patterns in the indexed mode post-byte, of course.)

(8 bits of index extension should be plenty for the 6809, but we want to be able to still encode address function in 3 bits, so maybe 16 bits of extension total.

And forgetting the cache gadgetry that got me running back over this ground again.

In this scheme, PC cache just needs to fill ahead, and fill a second direction over one branch, checking whether the branch code is in cache first. 32 bytes per branch, 64 total would be plenty on a 6809.

Return address stack cache just needs spill-fill with 2/4 hysteresis in the middle. Eight entries would speed calls and returns a lot, Thirty-two entries would be enough to eliminate the need for return stack RAM on many embedded applications.

Parameter stack would have to be a bit more flexible than the return address stack, but would be similar in operation.

The DP cache would be a bit weird, but 256 bytes of cache, two, four, or eight banks, and address tag for each bank, with some sort of mechanism to automatically save a direct page that has been switched out, perhaps after another has been switched, and so forth. 

Of course, there would be system cache and user program cache for fast process context switching.

And, with all that, maybe not even bother caching general data accesses.

(Analyzing this for the 6801 here: https://defining-computers.blogspot.com/2024/11/how-address-function-and-caches-should-interact-6801-derived-example.html.)

Sunday, September 22, 2024

A Bird's-eye View of an Alternate History Roadmap of the 680X and 680X0 CPUs

About the time I wrote a critique of the 680X and 680XX CPUs, I also started writing an alternate history roadmap for how they developed, but I got mired down in details and the alternate history novel I was trying to write then.

Now, after a few years, I'm working on a tutorial, and I wanted something to refer to. (Partly for the construction of the tutorial, but also as my personal notes from which I hope to eventually synthesize some workable extensions to the 680X and 680XX, should I ever be able to properly retire and play with programmable logic devices. Probably only decodable by me.)

So I'm going to write that alternate history roadmap here, but deliberately leave out details. Each stage will list extensions in order of my perceived priority and probably order of development.

Of course, this should all be considered to be some random wacko daydreaming out loud on the Internet. 

 

68/2805

Adds a parameter stack to 6805. Implements dual-port return address stack RAM to optimize calls and returns.

 

68/3808

Adds indexable parameter stack and Y register to 68HC08,  along with 68/2805 optimized call/return via dual-port return stack, etc.


68/0801

Extends the 6801, breaking object code compatibility, but maintaining one-to-one mapping from both 6800 and 6801 object code to 2801.

(1) Add SBX as complement of ABX, to aid in stack allocation, tracing relative links, and such.

(2) Move the inherent addressing mode op-codes around to facilitate adding direct-page op-codes for the unary (read-modify-write) instructions (INC/DEC, ROL/R, A/LSL/R, etc.). This would make the direct page more effective as pseudo-registers and ease the register pinch, and provide better support for such things as virtual machine registers and (hard-coded) per-task global variables.

ROM code would not be usable as-is, but a simple re-assemble would suffice in many cases. In those cases which relied on the actual object-code values of operators (as in for brittle optimizations), the assembler could flag jump and branch targets and fall-through that do not end up at valid instruction boundaries, and could analyze the value of the bytes at the target, issuing an error and an attempt to interpret the assembled value at the target as an instruction. This would allow the engineer to determine how to either provide a similar optimization or (more likely) replace the brittle optimization with something from the improved instruction set.

(3) Add 4 bits of address function outputs to allow decoding program ROM, general data, direct page, and stack separately, making it possible for the sum of system/user code, data, stack, and DP (I/O, global pseudo-registers) to be greater than 64K, make bank-switching more effective, allow isolation of system and user spaces, allow isolation of the return address stack, and potentially allow adding segment registers for engineers who like segment registers.

The index register (X) would be extended by the function code bits, to allow it to index the separate address spaces. Transfer of stack to X would set the appropriate function code, and operators for setting the function code for other spaces would be provided. (How to do this with system/user space separation?)

(4) Replace the 6801's ASLD and LSRD inherent instructions with a full set of 16-bit unary/RMW shifts ASL16/LSR16 (D/DP/[X]/EXT), and add 16-bit unary/RMW INC16/DEC16 (D/DP/[X]/EXT). This would be especially useful for things like synthesizing generalized stacks and queues and such in software, and in multiply/divide by constant powers of 2, etc.

(5) Built-in dual-port RAM blocks to optimize DP RAM and return stack RAM (call/return optimization). 

(6a) Add (two sets each for user and system) two 5-bit prefix registers for the DP lower and upper halves, lower half in system space would be nominally for I/O and upper/non-system would be nominally for global pseudo-register RAM. This would allow allocating DP spaces within the lower 8K of the physical address space, or within an 8K separate DP space, where internal dual-port RAM and I/O registers could be provided, and external I/O space decoded.

(6b) Optional standard bank-switching schemes for code and data spaces, for 128K to 512K, or for 2M to 16M. Bank switching includes address function decoding, to support isolation of spaces, also includes support for switching between user and system spaces safely.

(6c) For engineers who like segment registers, optional internal segment registers for user/system data, code, and return stack spaces, properly organized to support the dual-port RAM blocks for the process return stacks. (How to integrate this with separate stack and DP?) DP would not have additional segmentation.

 

68/0800

This would be a 68/0801 without built-in peripherals, ROM, RAM, etc., and with pin-out and timing that matches the 6800, to be used as a near-drop-in replacement for the 6800 -- requiring source code re-assembly, but with a high probability of being able to match the original timings. 

 

68/2801

These extensions to the 68/0801 would require op-code map extensions through one or more pre-bytes, and would pretty much implement all our real world's 68HC11 instruction and interrupt functionality, plus a separate parameter stack. Indexing from the second stack would still be by way of index registers.

(1) Extend the 68/0801 with a second (U) stack, providing UPSHA/B/X/Y and UPULA/B/X/Y, TUX/Y and TX/YU.

(2) Provide an extra Y index (more-or-less as in the 68HC11). Both index registers would be extended by 4 bits to support indexing the address spaces. Transfer of stack registers to X or Y would set the appropriate function bits, and operators to set the function bits for other spaces provided, probably as prefixed LDX/Y instructions.

(3) Add an address function code for the U stack.

(4) Add bit instructions per 68HC11, with a toggle bits instruction, as well. These would be added by way of pre-byte.

(5) MUL/DIV --  Add IDIV and FDIV per the 68HC11, using X and D. Add a 16-bit MUL X by D to X:D

 

68/31601

Extends the 68/2801. 

Make all instructions 16-bit internal. Extend the index registers by 4 more bits between the function bits and the logical address bits. Provide 4 bits of extension between the function code constants and the 16-bit logical address bits for PC, stacks, and Extended mode addresses.  Trade the bank switching and segmentation for page-mode memory management.

 

68/2809

6809 compatible, but 6801 equivalent cycle counts. Also, reference the 68/0801 address space extensions.

(1) DP-relative 8- and 16-bit offset modes added to index post-byte.

(2) 8-bit page zero mode added to index post-byte.

(3) IO Page register, with 8- and 16-bit offset modes in index post-byte.

(4) Address functions for data, code, DP,  IOP, return stack, parameter stack, interrupts and DMA.

(5) MUL/DIV -- X by D 16-bit MUL; FDIV and IDIV per 68HC11.

(6) 16-bit unary-operator shifts and INC/DEC.

(7) Bit instructions.

(8) Available as SOC core, like 6801 and 6805.

(9) Standard optional bank switching and/or segmentation.


68/31609

16/32-bit 68/2809, with 32-bit index, stack, and PC registers, 16-bit Extended mode offset register, and paged memory management instead of bank switching. Moves op-codes around for efficiency, adds second 16-bit accumulator. Has 32-bit ADD, SUB, CMP, MUL, and FDIV and IDIV. Dedicated 64-byte spill/fill hysteric caches  for both stacks, dedicated 256-byte PC read-ahead cache, dedicated DP and page zero caches.


68/28000

Basically the 68010 with improved timing; 32-bit offsets for indexing and branching; 32-bit multiply and divide; a system A6 register for system parameters; dedicated stack caches, 256 bytes for A6 and 64 bytes for A7, both user and system; dedicated 256 byte code cache for PC; 4K associative cache for other data; PEA that pushes on A6 or SEA that saves an effective address where the specified address register points.



Wednesday, July 26, 2023

[NOTE] newline insertion on paste bug in gedit

Intermittent bug: 

gedit inserts newlines on paste under certain conditions involving the contents of the search buffer.

gedit --version reports

gedit - Version 3.28.1

from the command line on Ubuntu, uname --all:

Linux {machine-name} 5.4.0-150-generic #167~18.04.1-Ubuntu SMP Wed May 24 00:51:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Looking it up, it might be something inherited from the GTK toolbox.

Specific details.

Searching through a text file for lines beginning with a specified string, such as a label in an assembly language source file. Leaving the search string in the search buffer, select, copy, and paste a section of several lines of text including the searched string. 

gedit inserts blank lines (newline character sequences) before occurrences of searched string.

Example:

MESS    DC.L    DOCOL,WARN,AT,ZBRAN
    DC.L    MESS3-*-NATWID
    DC.L    DDUP,ZBRAN    ; -DUP here is a bug from the original 6800 model, at least.
    DC.L    MESS3-*-NATWID
    DC.L    LIT16
    DC.W    4
    DC.L    OFSET,AT,BSCR,SLASH,SUB,DLINE,BRAN
    DC.L    MESS4-*-NATWID
MESS3    DC.L    PDOTQ
    DC.B    6
    DC.B    'err # '    ; 'err # '
    DC.B    0    ; hand align
    DC.L    DOT
MESS4    DC.L    SEMIS

with \nMESS in the search buffer.

But then when I tried it with 

* This should add two 64-bit numbers:
ADD64STK:
    MOVEM.L    (A6)+,D7/D6/D5/D4
ADDLO    ADD.L    D5,D7
ADDHI    ADDX.L    D4,D6
    MOVEM.L    D6/D7,-(A6)
    RTS

it quit inserting the newlines, even in the original example text.





Sunday, July 9, 2023

Website about the TI Series 0100 Calculator Chips

This is another website I want to remember. It gives a lot of detail on early Texas Instruments and related calculators and the electronics that made them work:

http://www.datamath.org/

In particular, this page is dedicated to the TI's series 0100 chips that were in current production with Intel's 4004:

http://www.datamath.org/Chips/TMS0100.htm

 

 

Saturday, May 6, 2023

No, Higher Costs Were Not the Real Reason for the 8088 in the IBM 5051

[This is a reply to a comment on Hackaday: https://hackaday.io/project/190838-ibm-pc-8088-replaced-with-a-motorola-68000]

*****

For some reason, I can't reply directly to your comment with the eejournal opinion piece [https://www.eejournal.com/article/how-the-intel-8088-got-its-bus/], but I suspect my earlier comment was too brief.

Let me try that again:

The 68000 had built-in support for 8-bit peripheral devices, both in the bus signals and the instruction set. Most of the popular implementations, including the Mac, made heavy use of 8-bit parts, and Motorola had application notes on interfacing other company's 8-bit devices as well as their own. You could mix 8-bit peripherals and 16-bit memory without stretching.

Motorola even had an app-note about interfacing the 68000 directly to 8-bit memory, but any decent engineer would have looked at the note and realized that the cost of 16-bit memory was not really enough to justify hobbling the 68000 to 8-bit memory.  That's one of the reasons the 68008 didn't come out until a couple of years later, and the primary reason that very few people used it. There was no good engineering reason for it.

Well, there was one meaningful cost of 16-bit wide memory: You couldn't really build your introductory entry-level model with 4 kB of RAM using just eight 4 kilobit dRAMs. (cough. MC-10.) You were forced to the next level up, 8 kB. 

IBM knew that the cost of RAM was coming down, and that they would be delivering relatively few with the base 16 kB RAM (16 kilobit by 8 wide) configuration. Starting at 32 kB (16 kilobit by 16) would not have killed the product. Similarly, the cost of the 68000 would come down, and they knew that.

Management was scared of that.

Something you don't find easily on the Internet about the history of the IBM Instruments S9000 was when the project started. My recollection was that it started before the 5150. It was definitely not later. It had much more ambitious goals, and a much higher projected price tag, much more in line with IBM's minicomputer series. There was a reason for the time it took to develop and the price they sold it at. But even many of the sales force in the computer industry didn't understand the cost of software and other intangible development costs.

Consider how much damage the 5150 did to IBM's existing desktop and minicomputer lines. Word Processing? Word Perfect was one of the early killer apps for the 8088-based PC. Spreadsheet? Etc.

IBM management knew too well that if they sold the 5150 with a 68000 in it instead of the 8088, a lot of their minicomputer customers were going to be complaining to high heavens about the price difference. They knew the answer, but their experience showed them that the too many of the customers would not believe it.

That was the real reason. They hoped the 8088 would be limited enough to give them time to maintain control of the market disruption.

I think they were wrong. But it would have taken a level of foresight and vision that very few of management withing IBM had.(very few outside IBM, either.), to take the bull by the horns and drive the disruption.

*****

Anyway, my point was that higher cost wasn't the real reason any more than the (at the time, much-rumored) technical deficiencies of the 68000.

Saturday, March 11, 2023

Mapping the Panasonic Let's Note Japanese Keyboard to the Hatari Emulator

I never owned an ST family computer, which might have been a tactical error on my part. It would have been close to an unadorned 68000-based machine in the way that the Radio Shack/Tandy Color Computer was an unadorned 6809-based machine.

I have been using the Atari ST emulator (simulator) Hatari as a platform for converting the fig Forth model for the 6800 to the 68000.

The keyboard on this Japanese Panasonic Let's Note does not map well to Atari ST keyboard. The default mapping leaves important keys like equal (=) unavailable. (Some keys are available by using the FN key to select the ten-key pad that starts on the 7 key.)

 For comparison, the Atari ST (US) keyboard is laid out like this:

!@#$%^&*() _+~
1234567890 -=`

QWERTYUIOP{}
qwertyuiop[] del

ASDFGHJKL:"  |
asdfghjkl;'  \

ZXCVBNM<>?
zxcvbnm,./

And it's precisely around where the equals key is, that the keyboard mapping goes wonky.

I've been doing almost all the source code editing in Gedit, under Ubuntu, just using Hatari for assembling and test runs. But I'm now into some really difficult debugging sessions, and the mapping of the keyboard is getting in the way. 

So today I dug into Hatari's keyboard remapping, invoked something like:

hatari -k keymap.text

on the command line. 

In the source code for Hatari, there are some utilities in the tests/keymap directory for looking at what SDL sees the keyboard generating (if I got this right) -- listkeys.c and checkkeys.c. I downloaded the source code from the git repository and changed to the tests/keymap directory, and ran make, and got the executables there. Don't need them in the general path, you can execute them in place with

./listkeys

and 

./checkkeys

They gave me some clues and not much more. Taking a look at the example keymap file in the source was also not very enlightening. And neither was the man page from SDL:

man SDLKey

But after working through those, and after some reading in forums and playing around with Hatari's remapping a bit. I figured out what to put in the keymap file: 

Each line consists of the key you want to remap, a comma, and the SDL (?) scancode. 

Sort of. 

Figuring out the scancodes was a bit tricky. First, I tried the codes I learned from the checkkeys and listkeys utilities to see how they would work:

-,45
^,94
@,64
[,91
],93
;,59
:,58
/,47
\,92
The results weren't even close to what I wanted.

So I used a little trick involve perverse keyboard mappings. What I did was line up the alphabet keys and just arbitrarily mapped almost all of them to sequential codes:

-,45
a,1
b,2
c,3
d,4
f,5
g,6
h,7
j,8
k,9
l,10
m,11
n,12
o,13
p,14
q,15
r,16
s,17
u,18
v,19
w,20
y,21
z,22

(The mapping for hyphen was the one I was able to make sense of from the utilities, but it turned out not to be one I am using.)

This is better than random guessing because it allows testing a bunch of the scan codes at once, and it helps you remember which ones you've tried. You can add the ones that look like they work at the top, like I did with hyphen, so you can test them -- and so you don't forget them.

I think I gleaned one scancode from the sequence 1 to 22, then similarly gleaned a few more from the next set, starting at scancode 23 to about 43 temporarily and perversely mapped to key a through y. 

(I left e, i, t, and x undefined to allow typing the exit command, and, after the first set, I left z un-assigned so I could use ctrl-Z to invoke the debugger.)

But I couldn't figure out how to map individual scancodes. Each remapping seems to be done as a pair, which is kind of awkward. 

Ultimately, I used this file:

-,12
[,26
],27
^,13

and, while it makes the equals key available, it maps it to the caret/tilde key -- which leaves caret and tilde unavailable.

It's not a good fit. It matches neither the keycaps on the PC keyboard nor the layout of the Atari ST keyboard. 

But I think it will allow me to proceed with debugging.

So I'll leave this post here for my own notes and post a link here to a Hatari forum for the developers, if I can figure out the appropriate forum.