Inferring Multiple BRAMs

The width-changing buffer is a very common type of design.  It can be generated from coregen, but coregen only generates NGC files for the simple RAMs.  This becomes an issue if you want something to be configurable.  It also means lots of simple cores.  This article shows three possible coding styles and how they differ in implementation.

The first way is the most verbose, but is the best.  In this example, I’m using variables to make things easier to present on the web.  This style isn’t bad, but keep in mind that the statement ordering becomes important.  It does prevent you from accessing the RAM extra times outside of the process — something that isn’t allowed by the physical elements.

-- Correct
p_ram_proc : process (Clk) is
  type ram_t is array (0 to 511) 
    of std_logic_vector(63 downto 0);
  type ram_array is array (0 to 1) of ram_t;
  variable ram   : ram_array;
  variable wptr  : integer range 0 to 511;
  variable wslot : integer range 0 to 1;
  variable rptr  : integer range 0 to 511;
begin
  if rising_edge(Clk) then
    -- ram location
    wptr  := conv_integer(Waddr) / 2;
    wslot := conv_integer(Waddr) mod 2;
    rptr  := conv_integer(Raddr);
    -- read
    Dout(127 downto 64) <= ram(1)(rptr);
    Dout( 63 downto  0) <= ram(0)(rptr);
    -- write
    ram(wslot)(wptr) := Din;
  end if;
end process;

This coding style results in two inferred block rams.  This is just what is expected — 64kbit would require two 36kbit BRAMs.  The implementation actually allocates the bits unevenly — 36b to BRAM1, and 28b to BRAM2.  This is debatable — it concentrates routing into smaller areas, but that might be useful in some cases.

Now for the other methods that didn’t work out as well.  The next method treats the RAM as a single array of 63b elements, with multiple reads per cycle:

-- Bad
p_ram_proc : process (Clk) is
  type     ram_t is array (0 to 1023) 
    of std_logic_vector(63 downto 0);
  variable ram      : ram_t;
  variable wptr     : integer range 0 to 1023;
  variable rptr     : integer range 0 to 1023;
  variable rptr_pp  : integer range 0 to 1023;
begin
  if rising_edge(Clk) then
    -- ram location
    wptr     := conv_integer(Waddr);
    rptr     := 2*conv_integer(Raddr)+0;
    rptr_pp  := 2*conv_integer(Raddr)+1;
    -- read
    Dout(127 downto 64) <= ram(rptr_pp);
    Dout( 63 downto  0) <= ram(rptr);
    -- write
    ram(wptr) := Din;
  end if;
end process;

XST didn’t notice that rptr and rptr_pp were related — one always being even and the other always being odd.  As a result, it ended using four BRAMs.  Two BRAMs for the upper 64b, and two for the lower 64b.  The same data is written twice.  This allows for the general dual-read case.  It did at least detect that BRAMs were applicable though.

The last style was the worst.  In this case, the ram was treated as an array of 128b words.  This means partial writes and full reads.  This case shouldn’t be used.

-- Worst
p_ram_proc : process (Clk) is
  type     ram_t is array (0 to 511) 
    of std_logic_vector(127 downto 0);
  variable ram   : ram_t;
  variable wptr  : integer range 0 to 511;
  variable wslot : integer range 0 to 1;
  variable rptr  : integer range 0 to 511;
begin
  if rising_edge(Clk) then
    -- ram location
    wptr     := conv_integer(Waddr) / 2;
    wslot    := conv_integer(Waddr) mod 2;
    rptr     := conv_integer(Raddr);
    -- read
    Dout <= ram(rptr);
    -- write
    if wslot = 1 then
      ram(wptr)(127 downto 64) := Din;
    else
      ram(wptr)( 63 downto  0) := Din;
    end if; 
 end if;
end process;

The results here were terrible.  It no longer infers block rams and instead infers 64kbit of distributed RAMs.  For a Virtex-6, this isn’t the end of the world — it’s at least not 64k registers.

So in the end, it is possible to infer arrays of BRAMs in VHDL.  The synthesizer handles the array of array of vectors best, with the multi-read and partial-write cases being bad ideas.  I will note that adding a reset to the RAM is not supported at all — you are limited to a register-only implementation!

This entry was posted in FPGA. Bookmark the permalink.

Comments are closed.