Win3mu - Part 6 - Memory Management

Win3mu - Part 6 - Memory Management

This is Part 6 in a series of articles about building Win3mu — a 16-bit Windows 3 emulator. If you’re not sure what this is or why I’m doing it you might like to read from the start.

This post covers memory management — including a brief summary of Windows 3’s memory model and some implementation details on Win3mu’s global and local heaps.

Basic Concepts

One of the amazing things about Windows 3 is just how much it managed to do in so little memory. At the time a typical PC might have about 1Mb RAM and Windows could run on as little as 384Kb.

In order pull this off a few key concepts were introduced over the simple, almost non-existent way memory was managed in DOS. This new programming model made it possible for Windows to fairly freely move allocated memory around and to discard memory that wasn’t actively in use:

  1. When a program allocates memory it receives a handle (as opposed to a pointer).
  2. When the memory needs to be accessed, the program calls a lock function passing the handle and gets back a pointer, the memory is read/written as required and then unlocked again.
  3. While not locked, Windows can shuffle memory around in order to satisfy other memory allocations.

The Global Heap

The global heap refers to all available memory. If we ignore real-mode and just focus on protected mode we can say that each allocation from the global heap is generally represented by one protected mode selector

(Although allocations greater than 64k have multiple selectors, it’s convenient to think of each global allocation as one selector).

In protected mode, even locked memory can be moved because of the extra layer of indirection provided by the selector descriptor table. The programming model is retained however so that programs work in real mode too.

One of the important things to understand about the global heap is that it represents not just memory allocated by a program but also encompasses all the code and data segments for all loaded programs and DLLs (see the previous post about the Windows executable format). Each of those segments is mapped to a selector which comes from the global heap.

The Local Heap

As you know, a selector is a 16-bit value of which 13 bits are an index into a descriptor table. This means there is a hard limit of 8192 (2¹³) selectors — across the entire operating system.

To overcome this limit Windows also offers local heaps. A local heap is essentially a heap that’s managed inside a chunk of memory allocated from the global heap. The same programming model is used — allocate a handle and lock/unlock when used.

Just like the global heap, Windows will move unlocked allocations around in order to make new allocations fit. Unlike the global heap however locked local allocations can’t be moved because there isn’t the extra level of indirection provided by protected mode selector table.

Win3mu’s Range Allocator

In Win3mu both the global and local heaps are based on the RangeAllocator class. RangeAllocator manages an address space providing functions for allocating, locking and freeing and can move unlocked allocations around to make new allocations fit.

// Manages allocations within an "address space".
public class RangeAllocator<T>
{
  // Constructor
  public RangeAllocator(int addressSpaceSize);

  // Info accessors
  public int AddressSpaceSize { get; set; }
  public int FreeSpace { get; }
  public int EntryCount { get; }

  // Defrag the heap, returns true if anything moved
  public bool Defrag();

  // Allocate
  public Range Alloc(int size, bool moveable, bool allowDefrag);
  public bool ReAlloc(Range allocation, int newSize, bool moveable, bool allocDefrag);
  public bool ReAllocInPlace(Range alloc, int newSize);
  public void Free(Range allocation);

  // Must be provided to allow move operations
  public IMoveListener MoveListener;

  // Publicly visible view on an allocation
  public abstract class Range
  {
      public abstract int Position { get; }
      public abstract int Size { get; }
      public abstract bool Moveable { get; }
      public T User;
      public int LockCount;
  }

  // Notify the client that a range is moving
  // The local heap listens to this to move the actual data
  // behind any allocations.
  public interface IMoveListener
  {
      void Move(Range user, int newPosition);
  }
}

The structure of the range allocator is pretty simple — it’s essentially a C# List<Range> where each Range has a position, length and a flag indicating if it’s allocated or free.

  • When the heap is empty the list contains one entry spanning the entire address space that is marked unallocated.
  • To make an allocation an unallocated entry is found and split — one part is marked allocated and the other becomes a smaller free slot.
  • If an allocation doesn’t fit the heap is defragmented by moving and coalescing unlocked allocations to try and make room.
  • When an allocation is freed its entry is joined back to any free adjacent entries.

It’s not the most efficient algorithm but it’s more than good enough for Win3mu’s needs.

As mentioned, RangeAllocator is used by both heaps:

  • For the global heap a RangeAllocator is used to manage the set of allocated/available selectors (0–8192).
  • For local heaps, RangeAllocator manages address ranges within a globally allocated block of memory — ie: which sections are allocated and which are free. (0–0xFFFF)

Selectors and Allocations

The global heap manages two closely related concepts — selectors and allocations. It’s possible to allocate a selector without actually mapping it to memory and it’s also possible to allocate two selectors for the same block of memory (say one with read-only code execute rights and one with read-write access).

// Allocation represents the actual memory behind one or more selectors
public class Allocation
{
    public Allocation(uint bytes, ushort flags)
    {
        this.flags = flags;
        buffer = new byte[bytes];
    }
    public ushort flags;            // GlobalAlloc flags
    public byte[] buffer;           // The actual "memory"!
    public LocalHeap localHeap;     // Only if this allocation has a local heap
}

// This selector class is very similar to a selector descriptor table
// in a real processor.  It maps a 16-bit selector to an actual allocation.
public class Selector
{
    public ushort selectorIndex;    // Base selector index allocated from RangeAllocator
    public Allocation allocation;   // The memory behind this selector
    public bool isCode;             // Can code be executed via this selector?
    public bool readOnly;           // Read-only or read/write?

    // Generate the 16-bit selector used from 16-bit code.
    public ushort selector
    {
        get
        {
            // Upper 13-bits ar the selector index, lower three bits are ring level
            // and descriptor table flag.  These lower three bit values seem to match
            // what Windows does but aren't really important
            return (ushort)((selectorIndex << 3) | (isCode ? 0x02 : 0x03));
        }
    }
}

To make an allocation from the global heap, the following occurs:

  1. Enough consecutive selectors to cover the entire allocation (1 selector for each 64Kb span) are allocated from the RangeAllocator.
  2. An allocation object is created that stores the size of the allocation, it’s flags and a byte array buffer for the actual memory.
  3. The selector is assigned a reference to the allocation object.

Testing RangeAllocator

Although conceptually simple, testing the heap is essential because a memory corruption due to a heap management bug would be extremely difficult to track down.

Normally I like to write simple unit test cases but for something like this a brute force burn test provides a lot more confidence.

The test program:

  1. Makes random alloc, free, lock, unlock, realloc and defragment operations.
  2. Writes well known data to each allocation and checks for corruption.
  3. Goes through cycles of favoring allocations over frees, pushing the heap right up to the point where it runs out of memory and then cycles back to empty.
  4. Added debug code to the RangeAllocator class to check the integrity of its control structures before and after every operation.
100 million operations without error!

Connecting the CPU to the Global Heap

Hooking the global heap up to the CPU was relatively straight forward — it just implements the IBus' ReadByte and WriteByte methods:

// Read a byte from the global heap
public byte ReadByte(ushort seg, ushort offset)
{
    // Crack the selector
    var sel = GetSelector(seg);

    try
    {
        // Read it
        return sel.allocation.buffer[((seg >> 3) - sel.selectorIndex) << 16 | offset];
    }
    catch (NullReferenceException)
    {
        throw new Sharp86.SegmentNotPresentException();
    }
    catch (IndexOutOfRangeException)
    {
        throw new Sharp86.GeneralProtectionFaultException();
    }
}

// Write a byte to the global heap
public void WriteByte(ushort seg, ushort offset, byte value)
{
    // Crack the selector
    var sel = GetSelector(seg);
    
    // Don't allow writes to read-only/code selectors
    if (sel.readOnly || sel.isCode)
        throw new Sharp86.GeneralProtectionFaultException();

    try
    {
        // Write it
        sel.allocation.buffer[((seg >> 3) - sel.selectorIndex) << 16 | offset] = value;
    }
    catch (NullReferenceException)
    {
        throw new Sharp86.SegmentNotPresentException();
    }
    catch (IndexOutOfRangeException)
    {
        throw new Sharp86.GeneralProtectionFaultException();
    }
}

Cheating

It’s worth pointing out that Win3mu doesn’t have nearly the same demands as Windows so a few shortcuts have been taken:

  1. Win3mu only needs to handle the memory demands of one program. (Windows needed to cope with every loaded program).
  2. Win3mu can leverage C# collections to manage the contents of the heap externally to the actual heap. Windows needed to use heap memory to manage the heap. Basically this means Win3mu has a little more memory because it’s not losing that overhead.
  3. Win3mu doesn’t need to handle discardable memory. Today’s PCs easily have enough memory and because we only need to cope with one program there’s no point dealing with all the complexity of patching call stacks and function references to handle discarded code segments.
  4. Win3mu doesn’t need to manage physical memory — each allocation from the global heap is essentially allocating a new C# byte array. Windows had to manage the physical memory, updating CPU descriptor tables and more.

Recap

Time for a bit of a recap. So far I’ve got:

  • A working 80286–ish CPU emulator with support for pseudo-protected mode.
  • Ability to read Windows 3 .exe files and break out all the important information.
  • Implementations of the global and local heaps.
  • The CPU wired up to the global heap.

The next thing is to get some code loaded but before that I need to explain how the C# code will call into the emulated x86 code… and vice versa. This will be the topic of the next post.