defining computers: November 2018

Many people misunderstand what Ken Thompson was saying when he talked about trusting other people to build your complex systems for you.

(But it's actually not a completely useless misunderstanding.)

In his Reflections on Trust, Ken talked about a certain way to hide deliberate back doors.

He was essentially saying, Look! Here's one more devious thing that your systems vendor would have a hard time defending your system from.

But he did not say his hack was in every Unix system.

I have (randomly) read yet another blogpost in which the blogger essentially said, if I interpret him correctly, that was what Ken Thompson claimed.

Now, maybe I'm misunderstanding this blogger. Or maybe he has since understood this and just not gone back to correct the post. Hey, I have a number of posts that need correction, too.

But it's a common misunderstanding.

What Ken Thompson really said was that any Unix OS compiled with a compiler directly descended in a specific way from one of the compilers he and some of his cohorts in Bell Labs wrote might still have the back door he hid as a proof of concept many years before, and that at least one that he tested somewhat at random did.

Then he used his back-door hack to point out that transparency really isn't ironclad, with the implicit implication that non-transparent systems are going to be even worse.

So that we can stay on the same wavelength, I'm going to give some detail of how the specific workflow vulnerability he gave as his example functioned here. And then I'll tell you a little about how it can be blocked. If you haven't ever written library code, it may be difficult to follow, but I'll try to make it clear.

First, let me define a library.

In computer systems, a library is a set of common routines and/or functions that a compiler uses to glue the pieces of the program it is compiling for you into the system that you are compiling your program to be run on -- the target system. It's essentially pre-fab code, that you can use merely for the investment of time to understand the function and the parameters.

Perhaps you see one of the problems already. Perhaps you envision a library as a physical part. And you imagine some cloak and dagger sneaking into the storeroom and swapping a batch of good parts for vulnerable parts.

Perhaps you go one step further, where some of the machines manufacturing those parts are tweaked while everyone is out to lunch, so that they manufacture vulnerable parts. You're getting a little closer.

Technicians on the assembly line are not as perceptive as engineers. Perhaps you envision sneaking into the design archives and substituting a vulnerable design for the real one. That's getting closer.

Libraries are not parts, but they do contain parts. The parts are not physical. These days, libraries are hardly ever even written out to external media until they are copied over the network into the developers' computers (maybe yours). The machines manufacturing the libraries are not robots, they are (rather ordinary) computers running compilers, "manufacturing" them by compiling them from source code.

What are these compilers that compile the compilers and their libraries?

Generally, previous versions of the compilers -- using previous versions of the libraries.

You wouldn't insert the source code of a vulnerability into a compiler's source code. That would stick out too much. It wouldn't be as obvious in libraries, but it would still be visible, and it would still stick out a little. Even if no one is looking for deliberate vulnerabilities, the process of fixing bugs and updating the oompiler functionality would tend to wash the vulnerability out, so to speak.

And actually, you probably wouldn't write a vulnerability directly into a compiler. That wouldn't buy you much. More likely, you would write a little bit of vulnerable code in a commonly used bit of library, so that the vulnerable code would be installed in operating systems or office software.

But, again, if it's there in the source, it is likely to be found and/or just washed out in the process of producing new versions.

Software exists in several forms. The two primary classes of forms are called source and object. Source is the human-readable stuff. Object is the machine-readable stuff produced by compiling the source, and not very readable by most humans. If there were some way to leave the vulnerability in the object, but not in the source, it would be rather hard to find.

It's common practice to have a compiler compile itself. This is a good, if incomplete, test of a compiler. But it allows a sneaky trick for insertion.

When having the compiler compile itself (in other words, having the previous version compile the next), the technician assembling the compiler libraries is effectively the compiler itself, and the design doc is the source code. Computers are notoriously dumb when it comes to noticing these things, so the saboteur merely has to avoid the methods management has put in place for detecting them.

Ken was developing some of those methods when he produced his hack, essentially as a proof of concept.

Ken's gadget was to build a machine for perpetuating the vulnerability into the vulnerability.

I don't know the specific technique he used, but if his gadget tracks certain patterns in the source, it can get a fairly accurate match on when the compiler is compiling itself. Then it can insert a copy of itself into the libraries being compiled.

Now as long as the source code is there, it will probably eventually get discovered, or even blindly washed out in a version update where the structure and symbols used chance significantly. But these are probabilistic events. To depend on them is to relegate your security to chance.

But, in case you missed this above, the hack does not need to be present in source form. It recognizes the patterns of compiling a compiler's libraries and inserts itself whenever that happens -- as part of the compiler function, without the source code.

Once the modified library using such a technique is in place, the source code can be erased and the vulnerability will copy itself anyway, every time the compiler is used to compile itself -- unless changes in the compiler mean that the modified library is no longer used, or that the attempt to copy itself is by some chance blocked.

And if you are compiling your own compiler and you use an infected compiler, and your compiler's source code is similar enough, your compiler could carry the vulnerability, too.

Breaking that self-compile chain is one way to wash the vulnerability out.

How?

Say I bootstrap a compiler of my own, not using the vulnerable compiler. (This is possible, and many companies have done so.)

You could borrow my compiler to compile your compiler, and, since mine uses my clean libraries, the result is that the vulnerability in your compiler never gets the chance to reproduce itself in the newly compiled compiler libraries.

So, if you use a clean compiler, say one from another vendor, or a backup from before the vulnerability was inserted, the resulting compiler will be clean from that point.

There are some other ways to break the chain:

Encrypting the source would make it harder for the vulnerability to recognize itself. But this is not a perfect solution. Easy, but not perfect.

Cross compiling with random (but not known to be clean) compilers would increase the likelihood that the vulnerability gets washed.

Changing the structure of the compiler will also tend to interfere with the self-copy functionality, but that's another matter of luck.

The only sure way is to keep a bootstrap compiler in a safe somewhere. Preferably, the bootstrap compiler would be regularly compiled by hand and the result compared to other copies.

(A language called Forth, or a similar language, could help at the very bottom of this chain, but that is a subject for another post sometime.)

Wouldn't laws help?

Laws are just words in books in musty law libraries somewhere.

How does the law get into the victim's computer to protect it?

Ditto any contractual attempt.

If the legal system has no laws, or if the courts lack understanding and can't see how to apply the laws to this semi-intangible thing called software, new laws and explicit clauses in contracts may help in the short term, but the fundamental principles remain.

Laws and contracts help. But they do not solve the ultimate problem.

People who will use such means to breach your trust will use them anyway, hoping or assuming they won't be caught.

So, what should we understand from Ken Thompson's proof of concept?

Is every Unix-like OS backdoored this way?

Nooo. Not, at least, by the same backdoor he inserted.

Is every copy of a Unix descended from the one he worked it out on backdoored like this?

Most likely not. At least some of the licensed vendors went in with gdb and found and excised his backdoor directly. Others used more exotic tools with object code metrics, and some probably worked out some cross-compiling protocols. And the source code control systems that have been put in place since then by every serious vendor, to help cooperative development, just happen to also help expose clandestine code meddling -- when coupled with compile triggers and automated record keeping and regression testing and so forth.

Unfortunately, no one goes back to the hand-compiled root of the chain, that I know of. I'm not sure that I would blame them. That's a lot of work that no one wants to pay you for.

Are Microsoft's OSses or Oracle/Sun's OSses or Apple's OSses immune to this approach?

No more immune than Unix. They also require the same kinds of vigilance.

Unfortunately, their source code is not as exposed to the light of day as the open source/free software community source code is, providing more opportunities for clandestine activities.

Is this the only kind of vulnerability from a vendor we have to worry about?

Uhm. No. Goodness no.

Every bug is a potential vulnerability. Some programming errors may not actually be accidental.

This is what Ken Thompson was trying to get us to see. There are lots of ways for these systems to develop all sorts of vulnerabilities.

Should we all therefore wipe our OSses and hand-compile our own from scratch?

Dang. I highly recommend it, if you have the time, and have someone paying your bills while you do so. But are you ready to give up your internet access until you re-implement the entire communication stack from scratch?

Blessed be. Maybe it's not such a great idea, after all.

Even though I think it would be grand fun.

defining computers

Misunderstanding Computers

Saturday, November 10, 2018

Misunderstanding Reflections on Trust