====== Circuit Synthesis ======

The process of converting a CHP description of a process into a PRS description automatically.

====== Maelstrom Guide ======

work in progress...

Maelstrom is a new logic synthesis technique (and software tool) for asynchronous circuits, developed at the AVLSI Lab at Yale. This is a guide to writing CHP so that the tool produces high-quality circuits.
 
===== Synthesizable CHP =====

Maelstrom only accepts CHP descriptions of the following form:
<code>
x0:=v0;
x1:=v1;
...
xn:=vn;
*[ true -> P ]
</code>

i.e. a (possibly empty) semi-colon separated list of initializations to a subset of variables used in the CHP, followed by an infinite repetition of an arbitrary CHP program P. The values ''vi'' must evaluate to constants. There are no restrictions on ''P'', but some forms of ''P'' will lead to circuits that are much more expensive compared to others, as we will see later. Further, the channel implementation leads to all receives being passive and all sends being active. This means that (currently) only the receiving side may probe a channel. At the process level, the ports of a process may only be channels or Booleans (see section below on external constants for usage).

===== Synthesis Commands =====
The standard synthesis steps to produce a bundled-data circuit looks like this:
 <code>
synth2 -F decomp -p proc -o mid.act in.act
synth2 -F ring -ref=1 -C bd -p decomp_proc -o out.act mid.act 
</code>
You can then simulate at PRS-level with:
<code>
actsim -ref=2 out.act ring_decomp_proc
</code>

Note that user-defined channel-types are not allowed for synthesis. Only the built-in ''chan'' type is supported. The implementation of the channel is chosen during synthesis with the ''-C'' flag that chooses the datapath type. 
If your CHP has no nested loops or multiple-channel accesses (more about this below), then you can (optionally) skip the first step and directly do:
 <code>
synth2 -F ring -C bd -p proc -o out.act in.act 
</code>
and simulate at PRS-level with:
<code>
actsim -ref=1 out.act ring_proc
</code>

This two-step process will soon be integrated into one command (TODO). To know more about what the ''-ref=n'' flags mean, please see this page: [[language:langs:refine|Refinement]].

===== Program Structure =====

The first and most important concept to keep in mind when writing a CHP program is that the structure of the program you write very directly defines the control circuit that gets generated. The syntax //is// the controller. For example, suppose you would like a process to repeat two things, one after the other, forever. It is easy to accidentally write:
<code>
s:=0;
*[
    [s=0 -> thing1
    []s=1 -> thing2
    ];
    s:=1-s
 ]
</code>

when the correct thing to do is:

<code>
*[
    thing1;
    thing2
 ]
</code>

The former is very similar to Verilog, in terms of how the state machine is defined, but CHP is more abstract than RTL. Such writing has a cost as well. The first program will have a selection block which computes guards every iteration and activates the correct ''thing'' and a loop-carry (the variable ''s'') apart from the circuit to implement the functionality that is actually needed. The second program will simply contain the circuits to implement the two ''thing''s. Note that such writing occurs more frequently when the program size grows. The figure below shows, at a block-level, the controllers that will be generated for the two programs. It should be pretty obvious as to which is the better implementation. It is helpful to internalize this bijection between the structure of the CHP and the generated control circuits.

{{ :tools:prog_compare.png?400 | Image test }}

The only time to use the state machine-style syntax is when the number of iterations spent in each state is unknown/variable. For example, consider the simple CHP that computes the 2n-th Fibonacci number.

<code>
c:=0; x:=0; y:=0;
*[
    [ c=0 -> r!x; l?c , x:=0 , y:=1
    []c>0 -> x:=x+y ; y:=x+y ; c:=c-1
    ]
 ]
</code>

Here, the number of iterations cannot be statically determined when the program is written and necessitates the state-machine style CHP. One might think to write this as a nested loop with a ''c>0'' guard. This is addressed in a section below on loops.

===== Conditionals and Ternaries =====

The selection in CHP allows for conditional statements and control flow control. However, there is a controller overhead for writing selections. In general, it is helpful to be sparing with the use of selections. Consider the following CHP:
<code>
[ x=0 -> y:=42 []else -> y:=0 ]
</code>
This is not a situation that warrants the overhead of a selection control block. The more appropriate CHP to write would be:
<code>
y:=(x=0)?(42):(0)
</code>
Although these look very similar, the hardware overhead is not. The selection controller is much more expensive than simply the combinational logic for a ternary, which is just a mux (or perhaps even cheaper, depending on the exact logical expression and how good a job the logic optimizer can do with it). However, there are cases where the usage of selections over ternaries is justified. For example, consider:

<code>
[ c    -> y:=huge_expensive_computation
[]else -> y:=0 
]
</code>

Here, suppose the slow, first branch is not taken very often in the execution of your program. Then the control will entirely skip the expensive portion and run faster on average. Rewriting this into a ternary would then actually hurt the performance, since the slow computation is put unconditionally into the execution path and slows down the circuit on every cycle. 

Further, there is an important distinction between selections written directly within a CHP body in a process and one written in a function. Selections written in functions are converted into the equivalent ternary expression in the function in-lining step and substituted. Selections in CHP bodies in a process are as they are. For example, consider:

<file act fn_vs_sel.act>
function f (int<1> c; int x) : int {
    chp {
        [ c=0 -> self:=x+1 [] c=1 -> self:=x-1 ]
    }
}

defproc p1 () {
    int x,y;
    int<1> c;
    chp {
        y:=f(c,x)
    }
}

defproc p2 () {
    int x,y;
    int<1> c;
    chp {
        [ c=0 -> y:=x+1 [] c=1 -> y:=x-1 ]    
    }
}
</file>

The CHP body in ''p1'' will contain a single ternary, but ''p2'' will contain a full selection.

Another situation where selections cannot be avoided is when there is a channel access in one of the branches but not the others. For example, 

<code>
[ c -> Y!e
[]else -> skip
]
</code>

This cannot be rewritten into a ternary operator without re-architecting the usage of the channel ''Y'' at a higher level, and often that is unwise as well, since you must try to access channels only as often as needed and not just send values just to be thrown away at the other end. 

===== Channel Accesses =====

Continuing from the previous example, it is possible that the selection is more than 2-way and the channel is only accessed in some branches but not the others. For example, consider:

<code>
[ c=0 -> Y!e0
[]c=1 -> Y!e1
[]c=2 -> x:=x+1
[]c=3 -> skip
]
</code>

Here, although the entire selection cannot be squashed down into ternaries, the channel ''Y'' is syntactically accessed at more than one point in the program. This leads to an expensive circuit. Ideally, you want every channel in your process to appear in the CHP at most once. In this case, the correct thing to do would be to assign the value to send over ''Y'' to a variable then conditionally send it later:

<code>
[ c=0 -> y:=e0
[]c=1 -> y:=e1
[]c=2 -> x:=x+1
[]c=3 -> skip
];
[c=0 | c=1 -> Y!y []else -> skip ]
</code>

In this way, the channel ''Y'' is only accessed once. In essence, channels should not be accessed multiple times when you know you're going to send something over it, but just don't know what. Instead, you must do so only know when you don't know how many times you are going to access it. The same logic extends to receives as well. 

At a higher level, it is also important to be mindful when having multiple channels that go between the same two processes in the same direction. For example, 

<code>
*[ ... ; C0!e0 , C1!e1 ; ... ]
||
*[ ... ; C0?x0 , C1?x1 ; ... ]
</code>

is incredibly wasteful. There is absolutely no need to have two channels between the same two points in the two programs. Instead, send one wider variable and assign the individual variables on the other end. Or better yet, use a structure for readability:

<code>
*[ ... ; (s.x0:=e0,s.x1:=e1) ; C!s ; ... ]
||
*[ ... ; C?t ; x0:=t.x0,x1:=t.x1 ; ... ]
</code>

===== External Constants =====

Maelstrom allows Booleans as input ports to any process to facilitate chip-level configuration bits. These booleans are assumed to be constant for the entire runtime of the circuit, i.e. these are set to a certain value before or during chip reset and never changed after that. Hence, these Booleans are also not latched, the values on the wires are used directly as if they are logical constants, such as ''Vdd'' or ''GND''.

These are useful for global configuration. For example, consider a neuromorphic core where the state-update functions of all the neurons need to be set to something. Instead of sending a config bit to each of the potentially thousands of neurons, one can simply use a ''bool'' and save on the communication and latching overhead of broadcasting a single bit from one process to thousands, since it is known that this bit will not change during the runtime of the core:

<code>
defproc neuron ( ... ; bool model[2] ) 
{
...
chp 
 {
 *[
  ...
    [ ~model[0] & ~model[1] -> state := lif(state)
    []~model[0] &  model[1] -> state := aif(state)
    [] model[0] & ~model[1] -> state := hodg_hux(state)
    [] model[0] &  model[1] -> state := izhikevich(state)
    ];
  ...
  ]
 }
}
</code>


===== Nested Loop Constructs =====

The loop construct in CHP allows for a conditional repetition of a CHP block. For example:

<code>
*[ L?x; *[x>0 -> X!x; x:=x-1] ]
</code>

generates all numbers from ''x'' to zero and sends them over ''X''. However, the controller for a nested conditional loop is expensive. On top of that, the datapath must also include muxes to correctly set values of variables from before the loop or from the previous iteration of the loop, based on whether the current iteration of the loop is the first one or not. After some experimentation, it becomes apparent that the cheapest implementation for this is to factor out nested loops into multiple parallel (unconditional) loops. For example, the process above would automatically get rewritten internally by Maelstrom into:

<code>
*[ L?x; C!y ]
||
s:=0;
*[ [s=0 -> C?y
   []s=1 & y>0 -> X!y; y:=y-1
   []s=1 & y=0 -> s=0
   ] 
 ]
</code>

Although this looks like the state-machine style CHP that was advised against, the general method to excise nested loops out recursively cannot avoid this. In general, it is better to write CHP that does not contain a loop within the top-level loop at all, since the automatic excision process may not always result in the optimal implementation.  

===== Transient Variables =====

Conventionally, one might think of every assignment to a variable ''x:=e'' to generate a latch (for the variable ''x'') to store the value that the expression ''e'' evaluates to. However, the structure of the circuit generated by Maelstrom allows for a very light-weight latching strategy. The only places in the program where latches are actually instantiated are at receives (because once the acknowledge is generated for that channel, the other side may jumble the state of those data wires and hence they must be saved) and at initial conditions (since the information from the previous iteration of the top-level loop might be lost otherwise). All assignment that occur somewhere within the program can simply be thought of as aliasing or renaming, and is thus synthesized simply as combinational logic. As a simple example, this program:

<code>
*[ L1?x, L2?y; t:=x; x:=y; y:=t; R!f(x,y) ]
</code>

and this program:

<code>
*[ L1?x, L2?y; R!f(y,x) ]
</code>

result in precisely the same circuit. This also makes the usage of structures have no overhead:

<code>
*[ ... ; L?s; x:=s.x; y:=s.y; ... ]
</code>

generates no additional latches for ''x'' and ''y'', only one for ''s''. In effect, the guidance is to be liberal with usage of temporary variables for the sake of readability etc. as simply having more assignments will not affect the circuit quality.

===== Probe Trick =====

The previous sections lead to an interesting trick. Although Maelstrom prohibits shared variables (ports to a process must be channels), this can be used to correctly implement shared variables. For example, consider the following CHP:

<code>
*[ .. ; R!x; L?s; ... ]
||
*[ R?x; big_expensive_program; L!result ]
</code>

If ''x'' is an extremely wide variable, then the second program would have a large number of latches. If the result width (i.e. width of channel ''L'') is small, then this is quite wasteful, since the value ''x'' is only used to compute something and then thrown away. A rewrite of the above program into:

<code>
*[ .. ; R!x, L?s; ... ]
||
*[ [ #R -> x:=R ]; big_expensive_program; L!result; R? ]
</code>

is much cheaper! The probed assignment waits for valid data on ''R'' then simply aliases the variable ''x'' to the data contained in ''R'' - no latches are generated. After the computation, the result is sent back and the handshake on ''R'' is completed in a data-less fashion, i.e. without storing anything into a variable. In this way, the second loop contains no latches at all (other than those necessitated by ''big_expensive_program''), resulting in a pure shared-variable type process, with the synchronization of reads and writes performed correctly using the two channels. 

===== Dynamic Array Accesses =====

Dynamic array accesses happen when the index of an arrayed variable is not a constant. In such cases, the arrayed variable is factored out into a memory and accesses are replaced with accesses to this memory block over channels. Needless to say, this is quite expensive and must be avoided unless absolutely necessary. For example, consider the following program with the variable ''c'' having bitwidth 2, and ''x'' being a size-4 array:

<code>
*[ C?c; x[c]:=x[c]+1 ]
</code>

Here, the access to the arrayed variable ''x'' is dynamic and hence this will get rewritten into a memory. To avoid a memory (which is desirable when the array size of ''x'' is small), write this as:

<code>
*[ C?c; [ ([] j : 4 : c=j -> x[j]:=x[j]+1 ] ]
</code>

which uses a syntactic replication for selections. Here, there will be 4 copies of a ''f(x)=x+1'' function being generated, which might be wasteful. A further improved way of writing this would be:

<code>
*[ C?c; 
    [ ([] j : 4 : c=j -> y:=x[j] ]; 
    y:=y+1; 
    [ ([] j : 4 : c=j -> x[j]:=y ] 
 ]
</code>

which uses a single function block. 

It is also important to realize the difference between a CHP loop and a syntactic replication loop. A CHP loop actually results in a looping control circuit. A syntactic replication loop is substituted with its expanded form as a pre-processing step. 
Suppose there is an array of ''N'' variables, all of which need to be initialized to zero. Often, one may write:

<code>
j:=0; *[ j<N -> x[j]:=0; j:=j+1 ]
</code>

Apart from the dynamic array access, this loop is entirely unnecessary and is a huge control overhead. The correct CHP to write would be:

<code>
( , j : N : x[j]:=0 )
</code>

which would get substituted to;

<code>
x[0]:=0, x[1]:=0, ... , x[N-1]:=0
</code>

which has no dynamic array accesses and is simply a list of assignments.