Peripheral Control Unit

In the previous article I hooked up a speaker and loading my game Robot Fire from a custom ROM produced the appropriate sounds. Next was to test FPGABee with some other software, but making custom a ROM for each was going to become tedious so I began thinking about implementing the cassette tape interface - but ended up adding a second CPU.

Designing FPGABee's Cassette Interface

I considered two approaches for the cassette I/O:

A physical interface - hooking some I/O pins to a little circuit and playing the tape recordings from an iPod/iPhone.
A virtual interface - something that could read digital cassette recording files from memory storage and render the appropriate signals for the Microbee to listen to.

In the end I decided to go with the virtual interface - mainly because I'm better at coding than designing and building analog electronic circuits, but also because I figured the virtual interface would be easier to use. I might do the physical interface at a later date.

The next step was to figure out how to actually generate these signals at which point I had two main options:

Design a circuit in VHDL to read a data file from somewhere and generate the tape signal, or
Embed a micro-controller of some sort and write code to generate the signal.

The custom circuit option didn't really appeal because it would have been quite involved and not very flexible. Embedding a micro-controller however seemed like a really good idea as I could imagine it being useful for other support functions down the track (eg: emulating disk drives, SD card reader support etc...). Given it's potential use, I decided to call it the "Peripheral Control Unit", aka PCU.

Dual Core FPGABee

Once I'd decided on the PCU approach I spent some time looking into the various micro-controller options. I had a few prerequisites:

small enough to fit on the Nexys 3 (along with everything else that makes up FPGABee),
fast enough to do everything that would be required of it
easy to develop for, preferably something I was already familiar with.

I looked into quite a few embedded micro-controller cores on opencores.org before I had the idea of just using a second instance of the T80 core. A quick look at the usage reports for the current build of FPGABee confirmed it should fit. I'm certainly familiar with it and a quick web search turned up SDCC - a C compiler for Z80 that would make coding for it a lot easier than dealing with assembly language.

My only remaining concern was the speed and accurately generating the audio signals - at which point I had another idea: implement the actual cassette signal generation in the FPGA fabric, and have the Z80 feed it data about which signals to generate - but more about that later.

So it's not really "dual core" in the true sense of the word, but certainly two Z-80's were the plan.

Implementing a Flash Memory Controller

Getting one Z80 up and running had been almost trivial but adding a second one had one major hurdle: how to have both CPUs connected to the same flash memory. The Microbee CPU uses the flash memory for it's Basic ROM and the PCU was going to need somewhere to read it's firmware from.

I considered using a core-generator ROM with the firmware embedded however the time required to rebuild core generated modules would quickly become irksome. I needed to find a way to have both cores share access to the on-board flash memory - and do it in a way that wouldn't hold up the main CPU since I didn't want to introduce weird timing delays.

So I set out to put together a Flash Memory Controller - something that could sit between the flash memory and the two CPUs and gives each a separate read port - similar to the dual port RAM used in the video controller.

This is the VHDL declaration of the component, which besides the obligatory clock and reset signals includes connections to the on-board flash chip and the two read ports which connect pretty much directly to each CPU.

entity FlashMemoryController is
    Port 
    (
        reset_n : in STD_LOGIC;
        clock : in STD_LOGIC;

        -- Memory Device
        MemOE_n : out STD_LOGIC;
        MemWR_n : out STD_LOGIC;
        FlashCS_n : out STD_LOGIC;
        FlashRP_n : out STD_LOGIC;
        MemAdr : out STD_LOGIC_VECTOR(26 downto 1);
        MemDB : inout STD_LOGIC_VECTOR(15 downto 0);        

        -- Port A
        read_A_n : in STD_LOGIC;
        addr_A : in  STD_LOGIC_VECTOR (26 downto 0);
        dout_A : out  STD_LOGIC_VECTOR (7 downto 0);
        wait_A_n : out  STD_LOGIC;

        -- Port B
        read_B_n : in STD_LOGIC;
        addr_B : in  STD_LOGIC_VECTOR (26 downto 0);
        dout_B : out  STD_LOGIC_VECTOR (7 downto 0);
        wait_B_n : out  STD_LOGIC
    );
end FlashMemoryController;

The key to this working are the wait signals that instruct each processor to stall if the controller can't provide the data in time (because it's busy servicing the other port). That said they should never really stall because:

The access time of the flash memory is around 110ns
At 3.375Mhz, the clock period is about 290ns

So even if both CPU's want access at the same time, there should be enough time to service both requests.

What this means though is that the memory controller needs to run at a higher clock speed than the CPU's so that it can provide finer grain control than the slower signals coming from the CPUs. The memory controller clock runs at 100Mhz.

Here's how it looks in the simulator. If you look closely you can see the 100Mhz clock at the top, the slower read requests, the appropriate wait signals and the address and control lines to the flash memory chip all being driven correctly. In particular if you look at port B you'll see it wait for the port A request to finish, holding the wait line active all the while.

A Hard Lesson in Cross Domain Clocks

Once the Flash Memory controller was working in the simulator, I hooked up the second CPU, wrote a simple assembly language program for it (back to flashing LEDs again) and nothing worked. Well one CPU might work while the other didn't, or vice-versa. Certainly not the dual processor action I was expecting.

This took a long time to work out and was a hard lesson in a concept known as cross-domain clocking, something I'd not really come across before (remember I'm teaching myself all this as I go). There is much that can be read about this online, and I certainly don't feel qualified to give a definitive explanation of the issues, but here's my simplified, dumbed down version of it:

When using multiple clock signals in the one design, each of the circuits connected to a particular clock are said to belong to that clock domain.
When a signal is passed from one clock domain to another it is known as "cross domain clocking".
When a signal travels through the circuitry of the FPGA, there is a delay known as propagation delay that is introduced by the circuitry itself. Propagation delay is determined by the number of components it travels through, the speed of the components, how the signal is routed on the chip and other factors.
Signals are also affected by meta-stability which can affect the reliability of a signal arriving in a different clock domain.
In cross-domain clocking, depending on how the edges of each clock signal align, there is the very real chance that some of the signals are in the correct new state and some still in their previous state (due to propagation delay and metastability).

So what does all this mean for the memory controller? Well it needs to take some precautions to make sure it's reading and returning stable signals.

First, it needs to synchronize each of the incoming asynchronous read signals (asynchronous signals in this case means one coming from the other clock domain - ie: it changes asynchronously to the memory controller's clock). This is done by simply delaying the use of that signal by one clock cycle and then manually doing edge detection on the synchronized signal. This ensures that the address lines have time to arrive and stabilize before using them:

elsif rising_edge(clock) then

    -- Synchronize read flags to this clock domain
    sync_read_A_n <= read_A_n;
    sync_read_B_n <= read_B_n;
    prev_read_A_n <= sync_read_A_n;
    prev_read_B_n <= sync_read_B_n;

    -- Detect rising edge on read_a and set wait_a
    if sync_read_A_n = '0' and prev_read_A_n = '1' then
        read_A_pending <= '1';
    end if;

    -- Detect rising edge on read_b and set wait_b
    if sync_read_B_n = '0' and prev_read_B_n = '1' then
        read_B_pending <= '1';
    end if;

Secondly the memory controller needs to make sure that the outgoing data lines have stabilized before releasing the wait signals. It simply holds the wait signal for one additional clock cycle after the data out signals have been setup:

when state_output_enable =>

    -- Once chip enabled long enough, read data
    if delay_next = DELAY_OUTPUT_ENABLE then

        -- data available now, latch it into output register but hold the wait signal
        if active_port = port_A then
            dout_16_A <= MemDB;
        else
            dout_16_B <= MemDB;
        end if;

        -- Move to the sync port state
        state <= state_sync_port;

    end if;

when state_sync_port =>

    -- now that the data out has stabilize we can release the wait signals
    if active_port = port_A then
        read_A_pending <= '0';
    else
        read_B_pending <= '0';
    end if;

    -- now return to the idle state.
    state <= state_idle;

And with these few changes, suddenly both CPUs fired up and happily shared the flash memory. I now had a working Microbee running on one CPU and some flashing LED animations on the other.

Although the changes were minor, this was a major side track but at least I learned something in the process.