defining computers: Proper Use of CPU Address Space

I referred to this in my overview about re-inventing the industry, but I was not very specific. Now that I've been motivated to write a rant or two about memory maps and how they can be exploited:

I can write about the ideal, perfect CPU (that may be too perfect for this world), and how it works with memory.

First, these are the general addressable regions of memory that you want to be able to separate out. I'll put them in the order I've been using in the other two rants:

0x7FFFFFFFFFFFFFFF
stack (dynamic variables, stack frames, return pointers)
0x7FFFxxxxxxxxxxxx ← SP
gap
guard page (Access to this page triggers OS responses.)
gap
heap (malloc()ed variables, etc.)
statically allocated variables
0x4000000000000000
application code
0x2000000000000000
operating system code, variables, etc.
0x0000000000000000

The regions we see are

Stack (dynamic variables, stack frames, return pointers)
Heap (malloc()ed variables, etc.)
application code (including object code, constants, linkage tables, etc.)

Operating system code should include the same sort of regions. But that should not really be visible in the application code map. Only the linkage to the OS should be visible, and that would be clumped with the application code.

Memory management hardware provides the ability to move OS code out of the application map. Let's see how that would look:

We used to talk about the problems of accidentally using small integers as pointers. Basically, when pointer variables get overwritten with random integers, the overwriting integers tend to be relatively small integers. Then when those integers are used as pointers, they access arbitrary stuff in low memory. We can notice that and refrain from allocating small integer space. And we realize that we have already dealt with small negative integers by buffering the wraparound into highest memory:

0xFFFFFFFFFFFFFFFF
gap (wraparound and small negative integers)
0x8000000000000000
stack (dynamic variables, stack frames, return pointers)0x7FFFxxxxxxxxxxxx ← SP
gap
guard page (Access to this page triggers OS responses.)
gap
heap (malloc()ed variables, etc.)
statically allocated variables
0x4000000000000000
application code
0x0000000100000000
gap (small integers)

I've posted a rant about using a split stack, with a little of the explanation for why at the end. Basically, that would allow us to move those local buffers that can oerflow, crash, and/or smash the stack way away from the return address stack.

Thus, even if the attacker could muck in the local variables, he would still be at least one step from overwriting a return address. That means he has to use some harder method to get control of the instruction pointer.

Stack usage patterns actually point us to using a third stack, or a stack-organized heap separate from the random allocation heap. Parameters and small local variables could be on one stack, and large local variables on the other.

In other words, scalar local/dynamic variables would be on the second stack and vector/structure local/dynamic variables on the third. This would be especially convenient for Forth and C run-times, virtually eliminating all need of function preamble and cleanup, and simplifying stack management.

Another way to use the third stack would be to just put all the local variables on it. It might be easier to understand it this way, and I'll use the parameter/locals division below. As far as the discussion below goes, the two divisions can be interchanged. (The run-time details are significant, but I'll leave that for another day. Besides, there is no reason for a single computer to limit itself to one or the other. With a little care, the approaches could even be mixed in a running process.)

But the third stack could be optional, and its use determined by the language run time support. The OS run-time support really doesn't need to see it other than as a region to be separated from the others. Here is a possible general map, using 64 bit addressing:

0xFFFFFFFFFFFFFFFF
gap (wraparound and all negative integers)
0x8000000000000000
gap (large positive integers)
0x7FFFFF0000000000
gap
return stack ← RP
gap
0x7FFFFE0000000000
guard page (2⁴⁰ addresses)
0x7FFFFD0000000000
gap
parameter stack ← SP
gap
0x7FFFFC0000000000
guard page (2⁴⁰ addresses)
0x7FFFFB0000000000
gap
local stack ← LP gap
0x7FFFFA0000000000
guard page (really huge)
0x7000000000000000
gap
heap (malloc()ed variables, etc.)
gap
0x4000020000000000
guard page (2⁴⁰ addresses)
0x4000010000000000
gap
statically allocated variables
gap
0x4000000000000000
gap
application code
gap
0x0000010000000000
gap (small positive integer pointer guard)
0x0000000000000000

If we choose to have stack frames, we could manage them very simply on the return stack by just pushing the local and/or pointer stack pointer when we push the IP. And we just discard them when we pop the IP. Or we can pop them, to force-balance the stack. This gets rid of pretty much all the complexity of walking the stack.

The gaps should be randomized, to make it harder for attacker code to find anything to abuse.

The regions we now have are

Return Stack (return address and maybe frame pointers)
Parameter stack (parameters only)
Locals Stack (dynamically allocated local variables)
Heap (malloc()ed variables, etc.)
Statically allocated process variables (globally and locally visible)
application code (including object code, constants, linkage tables, etc.)

And we have large guard regions between each.

What's missing?

Multiprocessing requires a region of memory dedicated to process (or thread) shared variables, semaphores, resource monitor counters and such. This is a separate topic, but basically the statically allocated variable area would have a section which could be protected from bare writes, with only reads and locked read-modify-write cycle instructions allowed. These would be a separate region, so their addresses could be somewhat randomized.

I'm not sure that it makes sense to manage allocation of shared variables in the malloc() sense, but there is room with this kind of scheme, and modern processors should support that many different regions of memory.

Also, regions of memory shared mmap-style would be in a separate region, or perhaps a guarded region for each. I'm not sure whether the would be protected in the same way as semaphores and monitor counters. It would seem, rather that the CPU instructions would be ordinary instructions, and the mmap region would be a resource protected by semaphore- or monitor- controlled access.

We can do the same sort of thing with 32-bit addressing, although, instead of guard pages 2⁴⁰ or so in size, we would be looking at guard pages between 2²⁰
and maybe 2²⁴ in size. This would be more appropriate for some controller applications.

We could do the same thing with 16-bit addressing, but it wouldn't leave us much room for the variables and code. On the other hand, looking twice at 16-bit addressing will give us clues for further refinement of these ideas. But I think I'll save that for another rant, probably another day. I have burned up enough of today on this prolonged rant.

defining computers

Misunderstanding Computers

Wednesday, June 28, 2017

Proper Use of CPU Address Space

No comments:

Post a Comment