I typically design my VHDL components for ease of reuse. I like things that are easy to read/write and infer a clear structure. As such, I try to avoid variables as much as possible. This is because VHDL doesn’t allow a non-blocking assignment for variables and blocking assigns aren’t appropriate for signals/ports. While this is very useful for synthesis, its detrimental for simulation.
I modified my Reed-Solomon encoder into a version that used a single process with only signals. I also modified it for a single process with as much variable use as possible. I then did a quick sim to ensure the outputs of all three matched. Satisfied with the results, I wrote up a quick TCL script for Xilinx’s ISIM.
I then ran ISIM for each of the three cases, three times per case. For the original code:
== Original Case == real 0m34.624s real 0m34.048s real 0m36.881s
I then ran the single process, signal only version:
== Single Process, Signals Only == real 0m42.295s real 0m50.505s real 0m45.082s
Finally, I tried the variable-heavy version:
== Single Process, Variables == real 0m7.971s real 0m7.936s real 0m8.419s
I found these results interesting. Keep in mind that the above are the times required to build and run the sims. It takes about 3.4 seconds to do this. Thus the variable-heavy case was actually much faster than what is implied.
The heavy use of variables certainly made a huge difference in the sim times. At the same time, the code was more difficult to modify. Intermediate variables needed to be used, and the blocks needed to be in a specific order. Further, the lack (and danger) of shared variables means processes cannot be broken up within a component, and as such it favors a flatter hierarchy.
After this, I decided to try another case. For this style, I decided to make all variables have a “next_*” version, as well as a “curr_*”version. The idea being to see what a non-blocking assign equivalent would look like. The results were predictably between the signal-heavy and variable-heavy version:
real 0m9.159s real 0m10.098s real 0m9.806s
Here is an example of the non-blocking equivalent code:
-- Example of this style, reset logic not shown.
-- Output signals also not shown.
p_example_proc : process(Clk) is
variable cx,cy : std_logic;
variable nx,ny : std_logic;
begin
if Clk'event and Clk = '1' then
-- The logic is clearly defined based on the
-- registered values of the variables, and the order
-- of the assignments to nx, ny is no longer
-- important.
nx := Din xor (cy or cx);
ny := Din xor (cy and cx);
-- all registers assigned at the very end
-- this style allows the assigns in any order
cx := nx;
cy := ny;
end if;
end process;
I haven’t done much else with the above style. I’m sure enough testing will show some flaws with it. One issue is that it still encourages using a minimum of processes, as the variables can’t be shared between processes. It also adds an additional level of micro-management, as the curr_* variables need to be assigned at the end of the process.
From my simple testing, it does work well. In fact, the signal-heavy, variable-heavy, and this new style all synthesized to the same number of registers/LUTs.
In conclusion, the use of variables can help sim times. Possibly not to the degree as this article suggests. The Reed-Solomon encoder is an interesting test case — for such a small module, it has a large number of signals, and its not too difficult to place them all within a single process within a single component. Most designs will use more signals to connect to components, ports, or between processes. Given these results, I am mixed. I don’t currently have access to anything but ISIM. I’m not sure if other simulators will be able to optimize the design better.