defining computers: real security

Misunderstanding Computers

Why do we insist on seeing the computer as a magic box for controlling other people?
人はどうしてコンピュータを、人を制する魔法の箱として考えたいのですか?
Why do we want so much to control others when we won't control ourselves?
どうしてそれほど、自分を制しないのに、人をコントロールしたいのですか？

Computer memory is just fancy paper, CPUs are just fancy pens with fancy erasers, and the network is just a fancy backyard fence.
コンピュータの記憶というものはただ改良した紙ですし、CPU 何て特長ある筆に特殊の消しゴムがついたものにすぎないし、ネットワークそのものは裏庭の塀が少し拡大されたものぐらいです。

(original post/元の投稿 -- defining computers site/コンピュータを定義しようのサイト)

Showing posts with label real security. Show all posts

Friday, April 21, 2017

Model Boot-up Process Description, with Some References to Logging

This is a description of a model boot-up process

for a device that contains a CPU,
with Some References to Logging.

(This is a low-level follow-up to theses posts:

which may provide more useful information.)

This is just a rough model, a rough ideal, not a specification. Real devices will tend to vary from this model. It's just presented as a framework for discussion, and possibly as a model to refer to when documenting real hardware.

(1) Simple ALU/CPU test.

The first thing the CPU should do on restart is check the Arithmetic-Logic Unit, not in the grand sense, but in a limited sense.

Something like (assuming an 8 bit binary ALU) adding 165 to 90 and checking that the result comes out 255 (A5_sixteen + 5A_sixteen == FF_sixteen), and then adding 1 to the result to see if the result is 0 with a carry, would be a good, quick check. This would be roughly equivalent to trying to remember what day it is when you wake up, then checking to see that you remember what the day before and the day after are.

It doesn't tell you much, but it at least tells you that your brains are trying to work.

* If the ALU appears to give the wrong result, there likely won't be much that can be done -- maybe set a diagnostic flag and halt safely.

* In some devices, halting itself is not safe, and an alternative to simply halting such as having the device securely self-destruct may be safer. Halting safely may have non-obvious meanings.

Now, it's very likely that this test can be made a part of the next step, but we need to be conscious of it.

(2) Initial boot ROM test.

There should be an initial boot ROM that brings the CPU up. The size should be in the range of 1,000 instructions to 32,768 instructions.

Ideally, I would strongly suggest that it contain a bare-metal Forth interpreter as a debugger/monitor, but it may contain some other kind of debug/monitor. It may just contain a collection of simple Basic Input-Output library functions, but I personally do not recommend that. It needs to have some ability to interact with a technician.

And, of course, it contains the machine instructions to carry out the first several steps of the boot-up process.

This second step would then be to perform a simple, non-cryptographic checksum of the initial boot ROM.

Which means that the ROM contains its own test routines. This is clearly an example of chicken-and-egg logical circularity. It is therefore not very meaningful.

This is not the time for cryptographic checksums.

* Success does not mean that the CPU is secure or safe. Failure, on the other hand, gives us another opportunity to set a diagnostic flag for a technician to find, and halt safely, whatever halting safely means.

On modern, highly integrated CPUs, this ROM is a part of the physical CPU package. It should not be re-programmable in the field.

(That's one reason it should be small -- making it small helps reduce the chance for serious bugs that can't be fixed. This smallest part of the boot process cannot be safely re-written and cannot safely be allowed to be overridden.)

For all that it should not be re-programmable in the field, the source should be available to the end-administrator, and there should be some means of verifying that the executable object in the initial boot ROM matches the source that the vendor says should be there.

(3) Internal RAM check.

Most modern CPUs will have some package internal RAM, distinct from CPU registers. It is a good idea to check these RAM locations at this point, to see that what is written out can be read back, using bit patterns that can catch short and open circuits in the RAM.

Just enough RAM should be tested to see that the initial boot-up ROM routines can run safely. If the debug/monitor is a Forth interpreter, it should have enough return stack for at least 8 levels of nested call, 16 native integers on the parameter stack, and 8 native integers of per-user variable space. That's 32 cells of RAM, or room for 32 full address words, in non-Forth terminology.

(I'm speaking roughly, more complex integrated packages will need more than that, much more in some cases. Very simple devices might actually need only half that. The engineers should be able to determine actual numbers from their spec. If they can't, they should raise a management diagnostic flag and put the project in a wait state.)

* Again, if there are errors, there is not much we can do but set a diagnostic flag and do its best to halt safely, whatever halting safely means.

(4) Lowest level diagnostic firmware.

At this point, we can be moderately confident that the debug/monitor can safely be entered, so it should be entered and initialize itself.

The next several steps should run under the control of the debug/monitor.

* Again, if the debug/monitor fails to came up in a stable state, the device should set a diagnostic flag and halt itself as safely as possible.

** This means that the debug/monitor needs a resident watchdog cycle that will operate at this level.

(5) First test/diagnostic device.

We want a low-level serial I/O (port) device of high reliability, through which the technician can read error messages and interact with the debug/monitor.

(Parallel port could work, but it would usually be a waste of I/O pins for no real gain.)

* This is the last point where we want to just set a diagnostic flag and halt as safely as possible on error. Any dangerous side-effects of having started the debug port should be addressed before halting safely at this stage.

(6) Full test of CPU internal devices.

This step can be performed somewhat in parallel with the next step. Details are determined by the internal devices and the interface devices. Conceptually, however, this is a separate step.

All internal registers should be tested to the extent that it is safe to test them without starting external devices. This includes being able to write and read any segment base and limit/extent registers, but not does not actually include testing their functionality.

If the CPU provides automatic testing, this is probably the stage where it should be performed (which may require suspending or shutting down, then restarting the monitor/debug processes).

Watchdog timers should be checked to the extent possible and started during this step.

If there is internal low-level ROM that remains to be tested, or if management requires cryptographic checksum checks on the initial boot ROM, this is the stage to do those.

Note that the keys used here are not, repeat, not the manufacturer's update keys. Those are separate.

However, for all that management might require cryptographic self-checks at this stage, engineers should consider such checks to be exercising the CPU and looking for broken hardware, and not related to security. There should be a manufacturer's boot key, and the checksums should be performed with the manufacturer's boot key, since the initial boot ROM is the manufacturer's code.

How to hide the manufacturer's boot key should be specified in the design, but, if the test port enabled in step (5) allows technician input at this step, such efforts to hide the manufacturer's key can't really prevent attack, only discourage attack.

Even if the device has a proper system/user separation, the device is in system state right now, and the key has to be readable to be used.

The key could be encrypted and hidden, spread out in odd corners of the ROM. There could be two routines to read it, and the one generally accessible through the test port could be protected by security switch/strap and/or extra password. But the supervisor, by definition, allows the contents of ROM to be read and dumped through the test port at this stage. A determined engineer would be able to analyze the code and find the internal routine, and jump to it. Therefore, this raises the bar, but does not prevent access.

Another approach to raising the bar is the provision of a boundary between system/supervisor mode and key-access mode. The supervisor could use hardware to protect the key except when in key-access mode, and could use software to shut down the test port when key-access mode is entered. This would make it much more difficult to get access to supervisor commands while the key is readable, but there are probably going to be errors in the construction that allow some windows of opportunity. It is not guaranteed that every design will be able to close off all windows of opportunity.

Such efforts to protect the boot key may be useful. They do raise the bar. But they do not really protect the boot key, only discourage access.

And legal proscriptions such as that epitome of legal irony called DMCA do not prevent people who ignore the law from getting over the bar.

Thus, the key used to checksum the initial boot ROM must not be assumed to be unknown to attackers. (And, really, we really don't need to assume it is unknown, if we don't believe in fairy tales about protecting intellectual property at a distance. As long as this initial boot ROM can't be re-written. As long as the update keys are separate.)

The extra ROM, if it exists, should not be loaded yet, only tested.

If extra RAM is required to do the checksums, the RAM should be checked first, enough to perform the checksums

All remaining internal RAM should be checked at this stage.

(7) Low-level I/O subsystems.

Finally, the CPU package is ready to check its own fundamental address decode, data and address buffers, and so forth. Not regular I/O devices, but the devices that give it access to low-level flash ROM, cache, working RAM, and the I/O space, in that order.

They should be powered up and given rudimentary tests.

Note that the flash ROM, cache, working RAM, and I/O devices themselves should not yet be powered up, much less tested.

Only the interfaces are powered up and tested at this step, and they must be powered up in a state that keeps the devices on the other side powered down.

* On errors here, any devices enabled to the point of error should be powered down in whatever order is safe (often in reverse order of power-up), diagnostic messages should be sent through the diagnostic port, and the device should set a diagnostic flag and enter as safe a wait state as possible.

** It may be desirable to enter a loop that repeats the diagnostic messages.

It would seem to be desirable to provide some way for a technician to interrogate the device for active diagnostic messages.

** But security will usually demand that input on the diagnostic port be shut down unless a protected hardware switch or strap for this function is set to the enabled position/state. This is one of several such security switch/straps, and the diagnostic message will reflect the straps state to some extent.

This kind of security switch or strap is not perfect protection, but it is often sufficient, and is usually better than nothing. (Better than nothing if all involved understand it is not perfect, anyway.)

** In some cases, the security switch/straps should not exist at all, and attempts to find or force them should be met with the device's self-destruction. In other cases, lock and key are sufficient. In yet other cases, such as in-home appliance controllers, a screw panel may be sufficient, and the desired level of protection.

Straps are generally preferred to switches, to discourage uninformed users from playing with them.

*** However, attempts to protect the device from access by the device's legal owner or lawfully designated system administrator should always be considered highly suspect, and require a much higher level of engineering safety assurance. If the owner/end-admin user must be prevented from maintenance access, it should be assumed that the device simply cannot be maintained -- thus, quite possibly should self-destruct on failure.

(8) Supervisor, extended ROM, internal parameter store.

The initial boot ROM may actually be the bottom of a larger boot ROM, or there may be a separate boot ROM containing more program functions, such as low-level supervisor functions, to be loaded and used during initial boot up. This additional ROM firmware, if it exists, should be constructed to extend, but not replace the functionality in the initial boot ROM.

This extra initial boot ROM was tested in step (6), it should be possible to begin loading and executing things from it now. It would contain the extensions in stepped modules, starting modules necessary to support the bootstrap process as it proceeds.

Considering the early (classic) Macintosh, a megabyte of ROM should be able to provide a significant level of GUI interface for the supervisor, giving end-admins with primarily visual orientation improved ability to handle low-level administration. But we don't have display output at this point, such functionality should be oriented toward the technician's serial port at this stage.

This supervisor would also contain the basic input/output functionality, so it could be called, really, a true "Basic Input/Output Operating System" -- BIOOS. But that would be confusing, so let's not do that. Let's just call it a supervisor.

It could also contain "advanced" hooks and virtual OS support such as a "hypervisor" has, but we won't give in to the temptation to hype it. It's just a supervisor. And most of it will not be running yet.

This remaining initial boot ROM is not an extension boot ROM such as I describe below, but considered part of the initial boot ROM.

There should be internal persistent store that is separate from the extension boot (flash) ROM, to keep track of boot parameters such as the owner's cryptographic keys and the manufacturer's update cryptographic keys for checksumming the extension flash ROM, passwords, high-level boot device specification, etc. It should all be write protected under normal operation. The part containing the true cryptographic keys for the device and such must be both read- and write-protected under normal operation, preferably requiring a security switch/strap to enable write access.

Techniques for protecting these keys have been partially discussed above. The difference is that these are the owner's keys and update keys, and those are the manufacturer's boot keys.

This parameter store should be tested and brought up at this point.

Details such as how to protect it, how to enable access, and what to do on errors are determined by the engineers' design specification.

In the extreme analysis, physical access to a device means that anything it contains can be read and used. The engineering problem is the question of what kinds of cryptological attacks are expected, and how much effort should be expended to defend the device from unauthorized access.

Sales literature and such should never attempt to hide this fact, only assert the level to which they are attempting to raise the bar.

Again, attempts to protect the device from access by the legitimate owner/end-admin should be considered detrimental to the security of the device.

* At this point, reading the owner's keys and update keys from the test port should be protected by security switch/strap and password. But, again, until the boot process has proceeded far enough to be able to switch between system and user mode, the protections have to be assumed to be imperfect.

Providing a key-access mode such as described above for the manufacturer's key should mitigate the dangers and raise the bar to something reasonable for some applications, but not for all.

Some existing applications really should never be produced and sold as products.

(As an example, consider the "portable digital purse" in many cell phones. That is an abomination. Separated from the cell phone, it might be workable, but only with specially designed integrated packages, and only if the bank always keeps a copy of the real data. Full discussion of that is well beyond the scope of this rant.)

(9) Private cache.

If there is private cache RAM local to the first boot CPU, separate from the internal RAM, it should be tested now. Or it could be schedule and set to run mostly in a lesser privileged mode after lesser privileged modes are available.

If there are segment base and limit/extent registers, their functionality may be testable against the local cache.

In particular, if the stack register(s) have segment base and limit, and can be pointed into cache, it might be possible to test them and initialize the stacks into such cache here, providing some early stack separation.

If dedicated stack caches are provided in the hardware, they should be tested here. If they can be used in locked mode (no spills, deep enough), the supervisor should switch to them now.

* Errors at this point will be treated similarly to errors in step 7.

(10) Exit low-level boot and enter intermediate level boot process.

At this point, all resources owned by the boot-up CPU should have been tested.

Also, at this point, much of the work can and should be done in less secure modes of operation. The less time spent in system/supervisor mode, the better.

(10.1) Testing other CPUs.

If there are multiple CPUs, this is the step where they should be tested. The approach to testing the CPUs depends on their design, whether they share initial boot ROMs or are under management of the initial boot CPUs, etc.

From a functional point of view, it is useful if the first boot CPU can check the initial boot ROMs of the other CPUs before powering them up, if those ROMs are not shared. It may also be useful for the first boot CPU to initiate internal test routines on the others, and monitor their states as they complete.

At any rate, as much as possible should be done in parallel here, but care should be exercised to avoid one CPU invalidating the results of another.

* Again, errors at this point will be treated similarly to errors in step 7.

(10.2) Testing shared memory management hardware access, if it exists.

While waiting for the other CPUs to come up, any true memory management hardware should be tested and partially initialized.

At this point, only writing and reading registers should be tested, and enough initialization to allow un-mapped access.

* Again, errors at this point will be treated similarly to errors in step 7. MMU is pretty much vital, if it exists.

(10.3) Finding and testing shared RAM.

Shared main RAM should be searched for before shared cache.

As other CPUs come up, they can be allocated to test shared main RAM. (Really, modern designs should go to multiple CPUs before going to larger address space or faster CPUs, any more.) If there are multiple CPUs, testing RAM should be delegated to CPUs other than the first boot CPU.

This also gets tangled up in testing MMU.

Tests should be run first without address translation, then spot-checked with address translation.

As soon as enough good RAM has been found to support the return address stack and local variable store (one stack in the common case now, but preferably two in the future, a thread heap and a process heap) the supervisor OS, to the extent it exists, should be started now if it has not already been started. (See next step.)

Otherwise, parallel checks on RAM should proceed without OS support.

Either way, the boot ROM should support checking RAM in the background as long as the device is operational. RAM which is currently allocated would be left alone, and RAM which is not currently allocated would have test patterns written to them and read, helping erase data that programs leave behind.

Such concurrent RAM testing would be provided in the supervisor in the initial boot up ROM, but should run in a privilege-reduced state (user mode instead of system/supervisor).

* Usually, errors in RAM can be treated by slowing physical banks down until they work without errors, or by mapping physical banks out. Again, a log of such errors must be kept, and any errors in RAM should initiate a RAM checking process that will continue in the background as long as the device is running.

** If there are too many errors at this point, they may be treated similarly to errors in step 7.

*** Any logs kept in local RAM should be transferred to main RAM once enough main RAM is available (and known good).

(10.4) Testing shared cache.

As other CPUs come up, they can also be allocating to testing shared cache. As with testing main RAM, testing cache should be delegated to CPUs other than the first boot CPU. Also, main RAM comes before cache until there is enough known-good RAM to properly support multiple supervisor processes.

And this also gets tangled up in testing MMU.

Tests should be run first without being assigned to RAM, then again with RAM assigned.

* If there are errors in the cache, it might be okay to disable or partially disable the cache. Engineers must make such decisions.

** Errors at this point errors may still be treated similarly to errors in step 7, depending on engineering decisions. If it is acceptable to run with limited cache, or without cache, some logging mechanism that details the availability of cache must be set up. Such logging would be temporarily kept in internal RAM.

*** The decision about when to enable cache is something of an engineering decision, but, in many cases, once cache is known to be functional, and main RAM has also been verified, the cache can be put into operation.

In some designs, caches should not be assigned to RAM that is still being tested.

(11) Fully operational supervisor.

At this point, most of the remaining functionality of the supervisor (other than GUI and other high-level I/O) should be made available. Multi-tasking and multi-processing would both be supported (started in the previous step), with process management and memory allocation.

One additional function may become available at this point -- extending the supervisor via ROM or flash ROM.

If there is an extension ROM, the initial boot ROM knows where it is. If it is supposed to exist, the checksum should be calculated and confirmed at this point.

The key to use depends on whether the extension has been provided by the manufacturer or the end-user/owner. Manufacturer's updates should be checked with the update key (not the boot key), and owner's extensions should be checked with the owner's key.

Failure would result in a state such as in step (7).

Testing the extension proceeds as follows:

There are at least two banks of flash ROM. In the two bank configuration, one is a shadow bank and the other is an operational bank.

If the checksum of the operational bank is the same as the unwritable extension ROM, the contents are compared. If they are different, the operational bank is not loaded, and the error is logged and potentially displayed on console.

If the checksum of the operational bank is different from the unwritable ROM, it is checked against the shadow bank. If the shadow bank and the operational bank have the same checksum, the contents of the two are compared. If the contents are different, the operational bank is not loaded and the error is logged and potentially displayed on console.

If the contents are identical, the cryptographic checksum is checked for validity. If it is not valid, the operational bank is not loaded, and the error is logged and potentially displayed on console.

* If the operational bank verifies, it is loaded and boot proceeds.

** If the operational bank fails to verify, a flag in the boot parameters determines whether to continue or to drop into to a maintenance mode.

If the device drops into a maintenance mode, the test port becomes active, and a request for admin password is sent out it. A flag is set, and boot proceeds in a safe mode, to bring up I/O devices safely.

(When the operational bank is updated, the checksum checked and verified, and committed, the operational bank is copied directly onto the shadow bank. But that discussion is not part of this rant.)

Other approaches can be taken to maintain a valid supervisor. For instance, two shadow copies can be kept to avoid having to restore the factory extensions and go through the update process again from scratch.

The extensions can override much of the initial boot ROM, but the monitor/debugger must never be overridden. It can be extended in some ways, but it must not be overridden.

There should be no way to write to this flash ROM except by setting another protected hardware switch or strap which physically controls the write-protect circuit for the flash. This switch or strap should not be the same as mentioned in step (7), but may be physically adjacent to it, depending on the engineers' assessment of threat.

*** The initial boot ROM should not proceed to the flash ROM extensions unless said switches or straps are unset.

(12) I/O devices.

(12.1) Locating and testing normal I/O device controllers.

As known good main RAM becomes available, the boot process can shift to locating the controllers for normal I/O devices such as network controllers, rotating disk controllers, flash RAM controllers, keyboards, printers, etc.

There may be some priority to be observed when testing normal I/O device controllers, as to which to initiate first.

It also may be possible to initiate controller self-tests or allocate another CPU to test the controllers, so that locating the controllers and testing them can be done somewhat it parallel.

Timers and other such hardware resources would be more fully enabled at this point.

* Errors for most controllers should be logged, and should not cause the processor to halt.

(12.2) Identifying and testing devices.

As controllers become available and known good, the devices attached to them should be identified, initialized, and tested.

This might also occur in parallel with finding and testing other controllers.

* Errors for most devices should be logged, and should not cause the processor to halt.

** Some intelligence about the form and number of logs taken at this point can and should be exercised. We don't want RAM filled with messages that, for example, the network is unavailable. One message showing when problems began, and a count of error events, with a record of the last error, should be sufficient for most such errors.

(12.3) Low-level boot logging.

As video output and persistent store become available, error events should be displayed on screen and recorded in an error message partition. Again, there should be a strategy to avoid filling the error message partition, and to allow as many error notifications as possible to remain on screen.

If the device is booting to maintenance mode, and an admin has not logged in via the test port at this point, the video device may present a console login prompt/window, as well. Or it may present one for other reasons, such as a console request from the keyboard.

The video display could also have scrolling windows showing current system logs.

Also, parameter RAM flags may prevent console login to a local video device/keyboard pair, requiring admin login at the test port via some serial terminal device.

(12.4) High-level boot.

The supervisor would have hooks and APIs to present walled virtual run-time sessions to high-level OSses, including walled instances of itself and walled instances of Linux or BSD OSses, or Plan Nine, etc., to the extent the CPU can support such things, and to the extent the device is designed to support such things.

And parameter RAM would have flags to indicate whether a boot menu should be provided, or which high-level OSses available should boot.

If walled instances are not supported, only a single high level OS would be allowed to boot, and the supervisor would still map system calls from the high-level OS into device resources.

This is my idea of what should happen in the boot-up process. Unfortunately, most computers I am familiar with do a lot of other stuff and not enough of this.

Friday, December 2, 2016

What Is Computer Security? (Short Version)

The detailed version of this, where I plan to talk about different kinds of complexity and intransigent (NP-complete) problems and such has gotten side-tracked and mired in details.

There's a bit of irony there.

But "cybersecurity" and similar nonsense is getting bandied about in the information stream again, so I'm thinking I should try writing the short version anyway.

That means I make a bunch of assertions and fail to back them up. And leave out a bunch of references to automata and algebrae and ideal machines, etc. And, at the end, I really can't make certain of the strong arguments here.

But things need to be said, even if only ten or twenty people out of the billions in the world will ever read this rant.

First, something I use as a siggie when I post from my tablet:

(I've done the first part of this rant before.)

A CPU is just a fancy pen.

Computer memory is just fancy paper.

The network is just a fancy backyard fence.

What's fancy about the paper?

It can be erased and written over and over, very quickly.

It has a built-in structure, so that the pen can (must) specify where it reads and writes.

It has a regular granularity, too, but I'll skip talking about that here. It isn't unimportant, but I don't have simple ways to talk about that today. And there is plenty to talk about without it.

Now, when we talk about computer memory in this way, we include the high-speed RAM, hard disks, flash and other non-volatile semiconductor storage (USB drives, SD, etc.), and such.

In the extreme, a printer that can scan what it printed can be used as a kind of memory.

And here's another hard-to-understand bit -- the network is, in a sense, a storage device. But it is different enough that we need to treat it ~~someone~~ somewhat separately.

What's fancy about the pen?

It can read as well as write.

It can't understand much about what it reads and writes, but it can read.

It can perform simple arithmetic between reading and writing.

Some people don't understand that the logic that CPUs perform is actually a simpler kind of arithmetic than even addition, subtraction, multiplication, division, and copying, but it is. Reading an address and using that address to access a specific part of memory is also not, I repeat, not hard math.

Everything a computer does is built on simple arithmetic.

Which leads to one more thing. A computer can read a list of pre-defined simple mathematical operations off of that fancy paper and perform that list. It can follow a list of simple instructions.

Haven't you always, secretly, wished you could find someone who would do exactly what you told them and nothing more, nothing less, and not complain?

Heh. When we start programming computers, there's a certain thrill in finding that computers follow the instructions we give them blindly and exactly. And then we discover that we often (usually?) don't really understand the instructions we are giving them until the computer has complained at us at length.

And until we and the computer have wasted a lot of time and other resources doing what we said instead of what we meant. And then we get our backs up. "Who gave this stupid computer permission to talk back?" And we get stubborn. And proud.

Hubris was once considered a necessary attribute in computer programmers.

Bill Gates demonstrates quite a bit of hubris.

Steve Jobs sometimes demonstrated hubris, but he moderated his megalomania with a sense of what the limits of technology were, and with a sense of the limits of general human ability to make good use of the technology at the time.

What's fancy about the backyard fence?

Somebody used to say that the Internet was just a fancy telephone directory. I think I even said that a few times.

There is part of the Internet that is, in fact, very similar to a telephone directory, and web sites can look very much like extended function advertisements in telephone directories, but there's more to the Internet than just that.

It's fun to talk to our neighbor across the backyard fence. It can also be useful. And sometimes several of your neighbors can gather at the fence.

Networks can be big. The Internet is really big. Lots of people can get together and talk.

And you can use things like "social networking" to select just a part of the big neighborhood to talk to.

(The telephone directories, yes, they are the basis of the social networking organizations. I'm giving a lot of control to my life to Google by using their products. LinkedIn is a little more up front about this and a lot less reaching, but then they let Google do a lot of the reaching for them.)

In addition to the directory, the Internet is composed of billions of host machines. The concept that each computing device attached to the Internet is actually a host has been watered down and swept under the rug by a whole host of interested parties who think they can't profit if they can't pipe everyone through their application. But without those hosts, there is nothing to put in the directories.

The Internet is often described as a highway. More accurately, it would be a large system of interconnected highways and byways and sometimes the local roads. (Oh, you take the highroad and I'll take the low road. ...) More correctly, the Internet is the set of rules by which traffic on the highways and the byways and the local roads gets routed. That includes a stripped-down directory service called the domain name system (DNS), which most users of the Internet only work with indirectly.

Access to the various hosts is at best only as reliable as the directory.

When we humans want to go somewhere, we use a higher-level directory for a variety of reasons. Mostly, we don't want to ~~remain~~ remember details like those complicated-looking URLs and URIs and IP addresses that the domain name system deals with.

Even the name of the DNS causes us to block. What is a domain and why should we be worried? (I should expand on that in this rant, (but I evidently didn't).)

So we rely on Google and Bing and Yahoo and (if you are a certain type of geek) Duck Duck Go, and so forth.

Bing. You do understand that Microsoft wants to get you to use Bing for everything without realizing you are using their search engine, exactly as Bill Gates and Co. wanted to get you to use Internet Explorer for everything without realizing you were using their universal browser?

And Google's end-play becomes more understandable, right?

The high-level web is at best only as reliable as the search engine we let guide us around it.

All of this talk about the "semantic web" is little more than a ruse to get us to let other people do the work we should be doing for ourselves (in finding meaning in things). Search engines are good, but we should not turn all our maps over to one or two or even just a few institutions, even if those institutions are privately held companies rather than government or para-governmental entities.

We have to take a certain amount of responsibility for what we do, or why even bother doing it? -- much less bother doing it right?

Security --

Computers are just fancy pens and paper and a fancy backyard fence ...

From here, lets use a different metaphor for the backyard fence.

Computers are just fancy pens and paper and a fancy bulletin board ...

Well, that metaphor is not quite complete, either. Highway system? Getting closer, but the metaphor gets more mixed than it was originally. Let's imagine a very large backyard fence with cats walking along the top, taking messages up and down the fence, and fancy bulletin board that can be copied from and to any particular place on the fence.

Getting closer, if your mind hasn't imploded.

Anybody who can get access to your pen can write anything on your paper. And read what has been written. Anyone who can get direct access to your paper doesn't need your pen.

And anyone who can get access to your piece of fence can write whatever they want on your phone book -- uhm -- directory.

That fence is a highway. Are you going to put your company safe out on the highway? Are you going to put your internal-use-only phone book out on the highway?

There are precisely two ways to protect your paper, including your directory.

One is to keep your computer away from the backyard fence, but then it can't access the neighborhood.

The other is to use encryption, but you have to use that correctly.

How does encryption help us stay somewhat secure?

If your paper is encrypted, and only you know the key to the encryption, what happens when someone else writes something on your paper?

It's not encrypted with your key.

If it's written on your paper and it's not encrypted with your key, you can guess you didn't write it. (Well, maybe you wrote it when you were too sleepy to use your key and encryption system.)

Now you know why the OS companies lately seem to be encouraging you to use encrypted file systems, whatever those are. This is what they are trying to help you deal with. Otherwise, you will rightly blame them for telling you the OS is secure when it isn't.

But someone sneaky could guess your key. Or they could find some way to watch you using it and copy it. (Think of the guy behind you at the ATM, watching you punch in your PIN out of the corner of his eye, as one example. Good thing your bank shuffles the keypad around, isn't it? Or did you not realize that was why they were doing that?)

This is why everyone has been telling you to make up "hard" passwords. Don't use easy-to-guess or easy-to-read-over-your-shoulder passwords, PINs, and so forth.

Oh, wait. You don't encrypt that stuff. Encryption is hard work. You let your computer do all the work.

But if you computer does all the encryption for you, nothing is gained. It will also do the encryption for the neighbor who sneaks up to the fence while you aren't watching.

Even if you do it right, access with knowledge of the password you use allows the unauthorized neighbor the ability to write what he or she wants and read what he or she wants. (That's why you protect your password, right?)

There is also an implication about knowing where to read and write, but that is not as hard as we would wish. The CPU can help the nosy neighbor with that pretty much as easily as it can help us, the rightful owners of the data. (I'll have to rant about that in more detail sometime, too.)

Okay, now you are beginning to see how important passwords and PINs are. (PINs are short passwords that can't be made hard to guess. I've tried to discuss their use elsewhere in this blog, but the simple but incomplete explanation is this:

A PIN is what the bank has you use when it's okay for the bank to shut down access to your bank account on the third bad try.

So PINs are really just another kind of password, going the other direction from pass-phrases and cryptographic keys.

How can you avoid the problems of people guessing your password?

As I explained in the rant about PINs and passwords that I linked above, you make the passwords long and avoid memes and other things everyone (including you) talks about all day long. Or you let the computer generate long streams of nonsense.

How do you remember those?

You let the computer keep them in one of its notebooks -- your "wallet" or "database of tokens" or whatever. And, so that the sneaky guy can't use your database of tokens, you encrypt it and protect it with a master password, and your master pass phrase is long and something that you think about a lot but don't talk about much.

(Please don't write your master password on a sticky note and leave it in your physical wallet or somewhere on or in your desk. At least, not unencrypted, but that's back to the encryption problem.)

Have you heard about one-time-pads (OTP)?

These are kind of cool. A one-time-pad is a list of passwords or other such tokens for a specific use. Only you and the bank (or whoever the other guy is) have the list. (You don't keep it where people can read it, okay?)

And each time you use a password, you throw it away. The next time will be the next password. So even if the sneaky guys take a movie of you using one of the passwords, that password doesn't do them any good. You can't use it again and neither can they.

You can see that you want a computer to handle OTPs, right? That's computer is ~~the~~ that little device that looks like a calculator, that the bank wants you to start using.

The only problem is that the bank lets a third party build those. So it's not just the bank and you. That is a problem with the way things are done now.

What the banks should do instead is basically build the devices in-house and have your local branch program yours when you go to the branch to pick it up, but that is the subject for another rant.

Actually, one of the devices I want to build is a personal OTP/password manager. That is, I want to make a design that you could look at, and then you could build it yourself. It would look something like a calculator. In fact, it might include calculator functions. But it does not hook up to the Internet.

(Or, at least, the connections is physical-wire-only, and it reminds you to take it off-line after a minute on-line. But this option really isn't good, and I only mention it to help the non-tech salescrew who will want, oh, so very much, to have that "feature" to sell theirs with.)

It could generate a list of one-time-passwords that you encrypt and take to the bank. It could copy the list to a SD card, and the bank could provide a terminal for you to plug that SD card into, to make the bank's copy. Then you and the bank can be pretty sure that only you and the bank have a copy.

And, just to be sure, the bank erases the card for you after it makes the copy, and you plug the card back into your token manager and write noise where the list used to be. (Some SD cards won't let you really erase files permanently, so the device has to use special-purpose SD cards. Forget USB flash for this.)

The idea still has problems, but it separates your passcode management from the devices that are constantly on the Internet.

Uhmm, No.

You really can't make things perfectly secure. You can just try to make getting in harder than it's worth.

Real Security

This kind of security is hard. It's doubly hard to keep in it meaningful.

Constantly checking who is looking over your shoulder is not real security, so you really should not want to be doing this kind of thing too much.

Here is the basic key to real security:

Don't be valuable.

If you have to do or be something that other people think is valuable enough to steal or kill you for, give that value away as fast as you can.

Maybe you can find time to choose worthy recipients of the value. That is commendable. But, even when you can't find worthy recipients, don't wait. Give it away.

How can this be a good thing?

YOU END UP NO RICHER THAN ANYONE ELSE!

Oh!!! Horrors!!!

If you give away the value you create, others see you doing it and some of them decide to do the same thing, too. This eventually results in a world where the things people need are available. Who is going to be motivated to kill or steal when they don't have to?

It also changes your attitude, and that is the most important protection you can get. Real security is when you know that your most valuable asset is you, yourself, and you are busy putting that asset to good use for other people.

(I rant a little about this in my freedom is not free blog (for instance, here and there, too.), but I'm not the only one who can tell you about it. Look around. It may be where you least expect it.)

[JMR201612200823: Fixed a bunch of typos and added just a little to clarify a few points.]

Monday, August 27, 2012

Some Basic Hardware and OS Security

(I really wanted to do something else today.)

Recent movements in the IP field that I think need to be responded to. (A new huge dump of a random harvest of private data by a pseudo-anonymous group of crackers and gray-hats today, after a bunch of hoopla about a US standards body publishing its lame (pardon me for saying so out loud) list of hardware security requirements last week, among other things.)

#1 No standard DRM (digital rights management) systems in the design.

Let me repeat that. If you think system security is good, standard DRM is evil.

Reason:

A: Every system will be vulnerable. (This is axiomatic, empirically proven in system theory.)

B: In order to meaningful, DRM is sold as universal.

C: If Universal DRM is adopted, every system using it will contain the same vulnerabilities (in addition to whatever vulnerabilities the system has on its own.)

Thus, DRM will tend to induce universal vulnerabilities. When you know how to get into one computer, you can get into them all. Huge, huge reward to the bad guys, and fighting against the diversity which could otherwise protect systems with different (or no) DRM.

Because even nonstandard DRM must function at a low level, it tends to allow entities that do not have system security as a high priority access to the OS at way too low a level, which is another serious social engineering flaw, but that could be worked around somewhat, sort-of, if we really must have DRM. (I'll have to rant about that sometime, I suppose.) To make that work, DRM has to be re-thought, however, particularly the very odd concept that DRM must be enforced.

Yes, I'm pointing out the inherent flaws in the DMCA (Digital Millenium Copyright Act). Which leads to

#2 Repeal the DMCA. If possible, pass an amendment against any further national, state, or community-level attempt at attaching legal culpability to any sort of automated rights protection system. (Which is another subject for a rant sometime.)

The security implications of the DMCA are that, with the DMCA provisions against reverse-engineering, the engineering required of the end-user (or the end-users system integrator or IT specialist) to secure a system has been technically criminalized.

You can provide all sorts of fancy argument about why this should not be so, but both my wife's and my cell (non-smart) phones are vulnerable. I can't get the information from the vendor or manufacturer that I would need to secure them. Under a DMCA-like law (I'm not sure whether the laws in Japan include such.) I would be risking prison time to dig around in the phone, to figure out how to fix the vulnerabilities.

Risk going to prison, just to keep the bad guys out of my phone. Not a win, okay?

Which leads to

#3 Both 3rd party and customer admin and support allowed, and the necessary documentation and source code provide by the vendor.

The options of non-vendor support and system administration are necessary to keep the vendors honest. The option of the customer doing it him/herself is necessary to keep the vendor from trying to use licenses and non-disclosure agreements to keep all the 3rd party companies in their back pocket.

A free-as-in-libre software license, such as the GPL, should be the recommended best practice, but it is not sufficient. The vendor support has to be optional, and not just theoretically so.

#4 Free or open source OS bootstrap code (BIOS, Open Firmware, etc.), supplied with the hardware.

Generally, the manufacturer should supply both the latest tested binary image and the source code over the 'net. But if the manufacturer goes under or drops support, or if you can't get on the internet, you might still be stuck. So a copy of the bootstrap code and object in the system at the time of original sale should be supplied with the hardware.

This should go without saying. If you can't fix the software that controls how your system starts up, you can't fix bugs there, either.

More important, if someone sneaks into your system from the internet and plants low-level malware in your BIOS, you may be left with no option but to junk the motherboard if you don't have some means of re-programming the BIOS to what it should be.

I know that means the customer would end up being able to re-purpose the hardware counter to the vendor's intent, but what is wrong with that? The vendor can (and should) explicitly dis-avow warranties when the user does things like modifying the bootstrap code.

But this brings us to two more points:

#5 Built-in bootstrap code re-programmer, not available during normal operation.

#6 Physical hardware disable for the bootstrap re-programmer which cannot be re-enabled by software, at least not during normal operation.

There is a bit of a conundrum here, because the typical (and often expected) solution here is to have a simple VB program that holds the user's hand while walking through the steps of re-flashing the BIOS. The problem is that, when you need to re-flash the BIOS, you don't want to run the operating system in order to do so. When the malware is in your bootstrap code/BIOS, you have to assume it's also in your OS, even in the so-called safe mode:

Why would the intruder mess with your BIOS/bootstrap code and not mess with the programs that test the integrity of the Operating System?

There are several ways to do this, but the cheapest will be to provide a backup of the bootstrap/BIOS, and a small button or strap on the motherboard, such that, when you push the button, the active bootstrap/BIOS will be automatically overwritten by the backup.

Of course, if you have patched your BIOS/bootstrap code, you lose all your patches when you do that. Which has induced manufacturers to play dodgy games with two flashable BIOSses in many current motherboards, but no real way to tell which one is the safe one.

This is the point at which the DRM advocates said, "Let's just take the whole mess out of the consumers' hands. We have to protect them from themselves."

***
Instead of doing a proper engineering job here, "protect the consumer from himself"!

And this makes the manufacturer a party with the malware installer.

It's not hard to get this right, but it's distracting from the point of this rant, so I'll explain it in the previous rant.

Why should the re-programmer be surrounded by so much fuss? Won't that make keeping the bootstrap/BIOS up to date all the more unlikely?

The bootstrap/BIOS should not be updated that often. But, yes, the manufacturer should provide an opt-in notification method, and instructions which could be printed out, when a bootstrap/BIOS level update is necessary.

This gets us safely booted. What's next?

#7 CPUs. And OS structure. Which should be the subjects of other rants, I guess, even though they're the reason I started ranting today.

Briefly, Intel, in their rush to lock the competition out, have always sacrificed security for features.

Even today, the best CPUs Intel has made fail to properly separate tasks from each other. The 8086 had those stupid segment registers, and they would not have been stupid at all, except they were just ways to extend the length of pointers in an otherwise pure 16-bit chip.

They looked like rudimentary memory management, but they were nothing of the sort. Just a misfeature that was used (fraudulently, if you ask me) to sell underperforming chips.

A: CPUs need to be able to separate flow-of-control information from data. (Yeah, I know, you're raising an eyebrow and saying that sounds like FORTH.)

How this works is that the flow-of-control (return pointer and possibly a frame pointer) are stored on one stack. That stack is cached in a cache that cannot be accessed by any normal instruction, and you have a low wall against the attacker trying to overwrite return pointers. A low wall, yes, but better than none at all.

(A stack-oriented cache with hysteresis could also speed up subroutine call/return significantly, but that's a topic for another day. Yeah, I know Sun has processors that do something like this, but not quite.)

The parameter/locals stack could also have a stack-oriented (ergo, hysteric LILO spill/fill) cache. It would be larger than the cache for the flow-of-control stack, but not too large, because of the delay it would cause for task-switching.

B: CPUs need to be able to separate task-local (thread-local) static data from global data, or, rather, from the data of the calling environment.

Task-local data is accessed without regard to other processes/threads because it is allocated per-process/thread. (This is another way of disentangling the current usage of the unified stack, actually.)

The context of the calling task or thread may be read relatively freely, but writes to it must be filtered through semaphores, critical sections, monitors and the like.

There are four segments here, which must be kept somewhat distinct. This is what segment registers should be used for, but the memory interface itself should have access mode protection, similar to the seven modes supported (but never really used) in the original M68000.

Not quite the same because the conditions for enabling read, write, and execute on those modes could not be tied to access via a specific address register (in absence of segment registers). And there were not limit registers to prevent attempts to access one segment through long offset in another.

These are the kinds of things where Intel has let us down, and actually fought against security.

Not because we knew we needed this back thirty years ago. (I only saw it dimly, until about twenty years ago. I don't know who else sees it.) But because Intel's processor is over-tuned to the structure of the C/Unix run-time defined forty years ago, and Intel has used very questionable means to destroy the market for more flexible processors which could have been used to explore alternative run-times.

Without Intel's hypercompetitivity, it is quite probable that the run-times that could prevent the most commonly exploited vulnerabilities in current OSses would ready for production. As it is, we still need to explore and experiment a lot.

(Microkernels can apparently help for a while.)

There's much more that I need to say about this, but I'm out of time today.

[JMR201704211138: I had some thoughts on the low-level boot process, which might be relevant: http://defining-computers.blogspot.com/2017/04/model-boot-up-process-description-with.html.]

How to update your bootstrap code/BIOS.

No, this is not for existing systems. This is how systems should be designed to provide for updating the code that the hardware runs before anything else.

(If you want to understand why, see the later post on keeping the bootstrap code scure.)

First, your bootsrap OS has to be a bit more complete than BIOSses have tended to be in the past. Closer to Open Firmware, but useable by the average moderately-technical user and the average support-guy at the local shop.

You have to have four images:

Working image,
Backup of latest working image,
Archive image,
Fallback, or fail-safe image.

The working image is the one you normally boot. It boots your normal operating system. If the hardware allows the working image to be re-programmed from the normal OS, the normal OS must only provide access to the re-programming features in a special administrator mode that requires being re-booted to.

(Getting or compiling an updated bootstrap image is a separate topic that I will try to rant about later.)

The backup of the latest working image is never booted. It's basically there because a good cryptographic checksum cannot guarantee perfectly that something hasn't been inserted by a very clever mathematically inclined attacker. It's for checking the bootstrap before the bootstrap is allowed to continue.

(Yes, that means a pre-boot boot. Naturally.)

The backup is also checked against the current working image before re-programming is allowed.

Then, after you update the bootstrap code, and run some integrity/security tests, reboot, and run some more integrity tests, before the normal OS is called, a new bootstrap will copy itself to the backup.

The archive is also never booted. It must be physically impossible to write to it from either the normal bootstrap or the normal OS.

The administrator will set a period, a week or two, or a month, after which a grandfather backup will be scheduled, and the pre-boot bootstrap will copy the backup to the archive. (The waiting period is to leave enough time that the bootstrap can be assumed stable.)

The fallback image, which includes the pre-boot bootstrap, must be physically impossible to write to, period. It's there for when all else has failed. It will include some command-line and (simple) menu-driven tools for testing, debugging, hunting for malware, etc.

There must be a physical button, switch, or electrical strap that will force booting to stop and wait at a command-line or menu instead of proceeding to the normal OS. In addition, an administrator tool should be provided for the normal OS, which directs the next boot to stop at the bootstrap level.

Another button, switch, or strap will direct bootup to the fallback.

Among the commands available will be one to get a new bootstrap (working) image from the manufacture, over the network, or from some removable media. Another will provide for updating the kernel and lowest-level utilities of the normal OS without having to start any image of the normal OS.

In a brand-new, fresh-from-the-factory motherboard or system, all four images will be identical.

So, what about the normal OS?

A similar approach might be useful in updating normal OS and application code, as well.

Some code, such as the kernel, would do well to have full multiple copies for backup. Others, mostly end-user applications, might be okay with only good checksums, but I would be inclined to use full copy backup for any mission-critical application.

If four copies of every app is overkill, two copies and a good checksum would be a next best alternative. (And preferably, don't let the application updater directly overwrite the checksums.)

[JMR201704211138: I had some further thoughts on the low-level boot process, which might be interesting: http://defining-computers.blogspot.com/2017/04/model-boot-up-process-description-with.html.]