OpenVizsla OV3 - FPGA design

I’ve previously talked about the hardware design of the the OpenVizsla OV3 USB hardware analyzer.

This time I want to give a broad overview of how the FPGA part of the design works.

The OpenVizsla FPGA design (available in the GitHub OV repository was written using Migen, “a Python toolbox for building complex digital hardware”. Migen allows to write logic as the result of a Python script, and compile them to Verilog (and ultimately an FPGA bitstream). This has interesting implications, as it allows for the full power of the Python language when building language features.

Let me bring up an FPGA-centric block diagram of the OpenVizsla design. Don’t panic - I will explain the individual blocks and provide some examples. You can even click parts in the block diagram to see some detailed description. Now if that isn’t magic… (And please excuse my limited website usability skills.)

(Did I tell you that you can click in the block diagram to see more details of the blocks?)

The main SDRAM chip

A regular MT48LC16M16A2 or compatible. It has a 16-bit data bus, and runs (by default) at 100 MHz, which gives a theoretical bandwidth limit of 200 MByte/s. The effective data rate will be less, because there are dead cycles when switching between pages and when doing refresh.

Wikipedia has an article that describes the basics.

SDRAM Host Controller

This is the internal SDRAM logic. Internally, a HostIf-Interface is used to interact with the SDRAM controller.

On the HostIf, you first issue a request using the i_*-signals. This can be either a read or write (i_wr=1 is a write issue, i_wr=0 is a read issue), to a specific start address (i_addr). The controller acknowledges the issue, after seeing your i_stb=1, by asserting i_ack=1.

After that, there is the data phase. Whenever d_stb (which is controlled by the SDRAM controller) is asserted, data is transferred, unless d_term is asserted.

For writes this means that the data on d_write is written to the next address, and for reads, data is available on d_read. There is no other flow control, so the client has to take the data (or deliver the data) as fast as the controller wants. If you can’t keep the rate, you have to end the issue (d_term=1), and then start a new issue.

Here is an example transfer on this bus:

  • Write AD50, AD51, …, ADB3 to address 000050, 000051, …, 0000B3
  • Read back from 000055, …, 000058

The d_stb de-assertions are arbitrary, meaning that they depend on how busy the system is. A reason for d_stb could be that we hit a column limit (and the next page has to be selected), or that another master took over. HostIf-clients always need to support d_stb de-assertions at any time.

  • ov_types.py defines the HostIF interface.
  • SDRAMCTL interfaces to the physical SDRAM chip, and provides a single HostIf-interface on the other side.
  • SDRAMMux time-slices the HostIf-interface to multiple masters.
  • SDRAMBIST is one of such masters, and first writes a test-pattern, and then reads it back.
  • SDRAMBISTCfg provides a CSR interface to the SDRAMBIST module.

Migen BankArray

BankArray scans modules that inherit from AutoCSR and builds a list of Banks. Each Bank then contains a number of a CSRs. A Bank decodes addresses on the CSR bus into write enables and selects the right register to return data from.

FTDI data mux (CmdProc)

The FTDI_sync245 modules connects to the FTDI chip on the one side, and to the CmdProc module on the other side, and provides bidirectional data transfer. It handles the FTDI SyncFIFO protocol in a state machine and provides the clock domain crossing from the internal clock to the FTDI 60MHz clock domain.

CmdProc connects to the FIFOs provided by FTDI_sync245, and connects the HOST-to-FPGA FIFO to a BusDecode instance, and the FPGA-to-HOST FIFO to a BusInterleaver instance.

The BusDecode processing data bytes sent by the host, and generates CSR master transactions. Here is an example CSR master transaction. The data from the host was 55 80 03 00 D8 (with the last byte being a checksum), and this decodes to a CSR write at address 0003 with data being 00.

It also generates the CSR response packets; for each read and write transfer on the bus, you get a copy of the transaction back via the FTDI.

In the opposite direction the BusEncode takes multiple sources - such as the CSR response, or the USB packet data, or the LFSR test data, and switches them into a single stream. It relies on each source to assert the last signal whenever a packet is complete. Switching then happens at those positions.

Here we see two pending packets (as indicated by stb being asserted in both clients), and the round-robin muxer first selects client1 (which is a CSR response packet) and then, once client1 asserts last, it switches to client2.

USB capture logic

This large block is in charge of:

  • Configuration of the external ULPI chip to sniff mode.
  • Packetizing the ULPI data, and insertion of control (RXCMD) data

The ULPI and data processing is decidedly out of scope of this blog entry. But to understand the big picture: The output of the cstream (whacker) is a packetized byte stream again, similar to the CSR slave results. It gets muxed into the stream sent to the analysis host in the BusEncoder.

The LFSR Generator

The LFSR stream generator has a Source that outputs a packetized LFSR stream. It is configured using two CSRs:

  • RANDTEST_SIZE: The number of LFSR bytes in each packet.
  • RANDTEST_CFG: A single ‘go’ bit. If it is set, packets will be generated, otherwise the generation will stop.

External IO FPGA Logic

BTN_status connects to the external physical button, and provides the ability to read the button status via a CSR register BUTTONS_STAT.

LED_outputs allows to select from a number of LED sources via a per-LED LED_MUX_<n> register.

Source 0 is the LEDS_OUT register, so by default, you can display an arbitrary LED pattern by writing it to LEDS_OUT.

In the top-level module OV3, a number of LED sources are selected:

    # from ovhw/top.py:
    # GPIOs (leds/buttons)
    self.submodules.leds = LED_outputs(plat.request('leds'),
            [
                [self.bist.busy, self.ftdi_bus.tx_ind],
                [0, self.ftdi_bus.rx_ind],
                [0]
            ], active=0)

This means that LED0 can be switched between OUT[0], the BIST busy signal, and the FTDI bus tx_ind signal. LED 1 can be switched between OUT[1], a static 0, and the FTDI bus rx_ind signal.

active=0 causes the LED_outputs module to invert the leds (leds_raw.eq(leds if active else ~leds)) since they are active-low.

External LEDS

LEDs are connected with a current-limit resistor to pins P57, P58, P59 on the FPGA against VCC. To drive a LED, the pin must be driven to GND.

External Button

The external push-button connects P67 of the FPGA with GND. A 10K pull-up resistor provides a positive input when the button is not pressed.

ULPI chip

The ULPI and data processing is decidedly out of scope of this blog entry.

FTDI chip

An FT2232H is used for communication with the host.

The FTDI website has a document that describes the SyncFIFO mode that we’re using.

SyncFIFO mode allows transferring data at a very high speed using an 8-bit bidirectional BUS clocked at 60MHz, and some control lines. The actual speed is depending mostly on how fast the host can process the data, but is of course inherently limited to the USB 2.0 High-Speed efficiency of about 90%.

FTDI outputs, FPGA inputs:

  • RXF#: Data is transferred from FTDI to FPGA when both RXF# and RD# are low. RXF# is driven high when there is no data to be read.
  • TXE#: Data is transferred from FPGA to FTDI when both TXE# and WR# are low. TXE# is driven high when there is no space for data to be stored.
  • CLKOUT: 60 MHz clock generated by FTDI chip. All signals are synchronous to this clock.

FPGA outputs, FTDI inputs:

  • RD#: Acknowledges the current byte, and causes the FTDI to load the next byte onto the bus if RXF# is low during that cycle.
  • WR#: Strobe for the current byte if TXE# is low that cycle.
  • OE#: Controls bus direction. Must be asserted at least one cycle before driving RD#.
  • SIWU: “Send Immediate” pin. Can be used to flush the FTDI buffer to lower latency, but we’re not doing this right now.

Walkthrough

On the left we see the FT2232H interface chip to the analysis host. It provides a bi-direction fast data link to a host PC running ovctl.py. The incoming command stream from the PC is parsed in CmdProc, turned into CSR (Configuration registers) master transactions. BankArray is a collection of CSR slaves. One such slave is the BTN_status component which allows reading the state of the external hardware button, another one is controlling the LEDs, other allow access to the ULPI controller, others can invoke BIST sequences in SDRAM or configure SDRAM-to-host streaming.

Migen automatically wires up modules to a CSR infrastructure, so it’s easy to add registers that can be accessed from the host.

Again, please click on the various elements in the block diagram to see a description and waveforms!