Timers block - Part three

In the first part of this tutorial, we commented about the implementation of a single timer.
The second part presented the implementation of a register based timers block,

In this (third) part of the tutorial we will see a different way to implement the timers block. The timers block is a rather thirsty animal, let's see how many resources it needs for several configurations:


Quantity of LUT Quantity of FF
Single 32 bit timer
43
33
16 x 32 bit timers block
704
528
32 x 32 bit timers block
1,408
1,056
32 x 64 bit timers block
2,848
2,080


These numbers can be obtained by changing the DATA_W and TIMERS parameters on the VHDL package file and running synthesis for each configuration. After synthesis, in Vivado, we can get the number of used resources by taking a look at "Report utilization".

A single 32 bit timer takes 33 flip-flops which is quite reasonable. Thirty-two are needed for the timer alone. As the quantity of timers increases (or their width, or both), the quantity of FF used (and of LUTs), increases linearly, which is also quite expected.

An alternative to this solution is to store the timers in a memory block. The logic for this memory based block is as follows:

  1. If a timer is enabled, retrieve its last value from memory
  2. Decrement the retrieved timer value
  3. Store the updated value back in memory
  4. Repeat steps 1 to 3 for each timer. Once all timers are taken care, start from the first one again.

Additionally, we must take care of timer updates from the host. This can be done in two ways:

  1. Use a dual port memory block, so if the host wants to change the preset of the timer, it can do this anytime. The second port is used by the internal logic of the FPGA implementing the logic that does the steps 1 to 4 
  2. Add an additional step to the four steps list mentioned above. Let's say, step 5. During step 5, if there was any host write, it is taken care of. That is, the setup value for timer 'n' is updated in the memory block with the value written by the host. The implementation that will be presented on the next chapters of this tutorial was done around this last option.

To (hopefully) make the solution more clear, a waveform is attached below presenting the steps done by the timer block controller. As it would be expected, the controller is a state machine:



The waveform shows the different states of the machine, and the read-modify-write actions of the controller, as well as the special state used for host access. Notice that while the controller is updating timer 'n', the host can write to timer 'm'. The second write pulse will occur only if there was a host access to update a timer.

When using the block based solution, the resources used drop sharply: Only 127 LUT and 75 flipflops... and of course, a BRAM block. So, is this the best solution posible?

As engineers, we know that you almost cannot win on one table without losing on another. If we are winning less usage of LUTs and FFs, chances are that we are losing something else. What we are losing is paralelism. 

Our original timers block works in parallel (each timer register is decremented in parallel and with no connection nor dependency to the operations being performed on other fellow timer registers). With this new memory-based implementation, we have serialized the operations. We have to read from memory, decrement, and write-back. But not only that, since the memory is accessed sequentially, we must repeat this read-modify-write operation for all the timers. This fact imposes a limit to the time base of the counters.

Let's say that our system clock is 50MHz. If we wanted, using the parallel timers, each one could time up to a smallest resolution of 20ns (the inverse of 50MHz). That is not the case for the memory-based timer block. The lowest possible timing, or resolution, of our timers, is now limited by two factors:

  • The quantity of cycles it take to read-modify-write one timer (4 clock cycles)
  • The quantity of timers we want to implement (Notice that the width of the timers has no impact, only their quantity).

  • On our case we wanted to implement 32 timers. The maximum achievable resolution for the memory-based timer block is then: 20ns x 4 x 32 = 2,560 ns ~ 2.6us For most applications this won't be a problem. Most application involving a CPU won't be able to react to changes on that scale, anyway. Many applications will get along happily with timers of 1ms resolution or 0.1ms = 100us, way above the limit we have calculated. 

    On the next entry of this series we will comment the code for this new, memory-based solution, as well as its verification. See you soon!

    Comments

    Popular posts from this blog

    Xilinx AXI Stream tutorial - Part 1

    Analysis, elaboration and synthesis