Differences

This shows you the differences between two versions of the page.

--- tools:chp2prs [2025/10/14 02:22] – karthi
+++ tools:chp2prs [2025/11/04 03:06] (current) – [External Constants] karthi
@@ Line 21: / Line 21: @@
 i.e. a (possibly empty) semi-colon separated list of initializations to a subset of variables used in the CHP, followed by an infinite repetition of an arbitrary CHP program P. The values ''vi'' must evaluate to constants. There are no restrictions on ''P'', but some forms of ''P'' will lead to circuits that are much more expensive compared to others, as we will see later. Further, the channel implementation leads to all receives being passive and all sends being active. This means that (currently) only the receiving side may probe a channel. At the process level, the ports of a process may only be channels or Booleans (see section below on external constants for usage).
+===== Synthesis Commands =====
+The standard synthesis steps to produce a bundled-data circuit looks like this:
+ <code>
+synth2 -F decomp -p proc -o mid.act in.act
+synth2 -F ring -ref=1 -C bd -p decomp_proc -o out.act mid.act
+</code>
+You can then simulate at PRS-level with:
+<code>
+actsim -ref=2 out.act ring_decomp_proc
+</code>
+Note that user-defined channel-types are not allowed for synthesis. Only the built-in ''chan'' type is supported. The implementation of the channel is chosen during synthesis with the ''-C'' flag that chooses the datapath type.
+If your CHP has no nested loops or multiple-channel accesses (more about this below), then you can (optionally) skip the first step and directly do:
+ <code>
+synth2 -F ring -C bd -p proc -o out.act in.act
+</code>
+and simulate at PRS-level with:
+<code>
+actsim -ref=1 out.act ring_proc
+</code>
+This two-step process will soon be integrated into one command (TODO). To know more about what the ''-ref=n'' flags mean, please see this page: [[language:langs:refine|Refinement]].
 ===== Program Structure =====
@@ Line 74: / Line 97: @@
 <code>
-[ c      -> y:=huge_expensive_computation
+[ c    -> y:=huge_expensive_computation
 []else -> y:=0
 ]
@@ Line 162: / Line 185: @@
 ===== External Constants =====
-To Do...
+Maelstrom allows Booleans as input ports to any process to facilitate chip-level configuration bits. These booleans are assumed to be constant for the entire runtime of the circuit, i.e. these are set to a certain value before or during chip reset and never changed after that. Hence, these Booleans are also not latched, the values on the wires are used directly as if they are logical constants, such as ''Vdd'' or ''GND''.
+These are useful for global configuration. For example, consider a neuromorphic core where the state-update functions of all the neurons need to be set to something. Instead of sending a config bit to each of the potentially thousands of neurons, one can simply use a ''bool'' and save on the communication and latching overhead of broadcasting a single bit from one process to thousands, since it is known that this bit will not change during the runtime of the core:
+<code>
+defproc neuron ( ... ; bool model[2] )
+{
+...
+chp
+ {
+ *[
+  ...
+    [ ~model[0] & ~model[1] -> state := lif(state)
+    []~model[0] &  model[1] -> state := aif(state)
+    [] model[0] & ~model[1] -> state := hodg_hux(state)
+    [] model[0] &  model[1] -> state := izhikevich(state)
+    ];
+  ...
+  ]
+ }
+}
+</code>
 ===== Nested Loop Constructs =====
@@ Line 185: / Line 231: @@
 </code>
-Although this looks like the state-machine style CHP that was advised against, the general method to `excise' nested loops out recursively cannot avoid this. In general, it is better to write CHP that does not contain a loop within the top-level loop at all, since the automatic excision process may not always result in the optimal implementation.
+Although this looks like the state-machine style CHP that was advised against, the general method to excise nested loops out recursively cannot avoid this. In general, it is better to write CHP that does not contain a loop within the top-level loop at all, since the automatic excision process may not always result in the optimal implementation.
+===== Transient Variables =====
+Conventionally, one might think of every assignment to a variable ''x:=e'' to generate a latch (for the variable ''x'') to store the value that the expression ''e'' evaluates to. However, the structure of the circuit generated by Maelstrom allows for a very light-weight latching strategy. The only places in the program where latches are actually instantiated are at receives (because once the acknowledge is generated for that channel, the other side may jumble the state of those data wires and hence they must be saved) and at initial conditions (since the information from the previous iteration of the top-level loop might be lost otherwise). All assignment that occur somewhere within the program can simply be thought of as aliasing or renaming, and is thus synthesized simply as combinational logic. As a simple example, this program:
 <code>
+*[ L1?x, L2?y; t:=x; x:=y; y:=t; R!f(x,y) ]
 </code>
+and this program:
+<code>
+*[ L1?x, L2?y; R!f(y,x) ]
+</code>
+result in precisely the same circuit. This also makes the usage of structures have no overhead:
+<code>
+*[ ... ; L?s; x:=s.x; y:=s.y; ... ]
+</code>
+generates no additional latches for ''x'' and ''y'', only one for ''s''. In effect, the guidance is to be liberal with usage of temporary variables for the sake of readability etc. as simply having more assignments will not affect the circuit quality.
+===== Probe Trick =====
+The previous sections lead to an interesting trick. Although Maelstrom prohibits shared variables (ports to a process must be channels), this can be used to correctly implement shared variables. For example, consider the following CHP:
+<code>
+*[ .. ; R!x; L?s; ... ]
+||
+*[ R?x; big_expensive_program; L!result ]
+</code>
+If ''x'' is an extremely wide variable, then the second program would have a large number of latches. If the result width (i.e. width of channel ''L'') is small, then this is quite wasteful, since the value ''x'' is only used to compute something and then thrown away. A rewrite of the above program into:
+<code>
+*[ .. ; R!x, L?s; ... ]
+||
+*[ [ #R -> x:=R ]; big_expensive_program; L!result; R? ]
+</code>
+is much cheaper! The probed assignment waits for valid data on ''R'' then simply aliases the variable ''x'' to the data contained in ''R'' - no latches are generated. After the computation, the result is sent back and the handshake on ''R'' is completed in a data-less fashion, i.e. without storing anything into a variable. In this way, the second loop contains no latches at all (other than those necessitated by ''big_expensive_program''), resulting in a pure shared-variable type process, with the synchronization of reads and writes performed correctly using the two channels.
+===== Dynamic Array Accesses =====
+Dynamic array accesses happen when the index of an arrayed variable is not a constant. In such cases, the arrayed variable is factored out into a memory and accesses are replaced with accesses to this memory block over channels. Needless to say, this is quite expensive and must be avoided unless absolutely necessary. For example, consider the following program with the variable ''c'' having bitwidth 2, and ''x'' being a size-4 array:
+<code>
+*[ C?c; x[c]:=x[c]+1 ]
+</code>
+Here, the access to the arrayed variable ''x'' is dynamic and hence this will get rewritten into a memory. To avoid a memory (which is desirable when the array size of ''x'' is small), write this as:
+<code>
+*[ C?c; [ ([] j : 4 : c=j -> x[j]:=x[j]+1 ] ]
+</code>
+which uses a syntactic replication for selections. Here, there will be 4 copies of a ''f(x)=x+1'' function being generated, which might be wasteful. A further improved way of writing this would be:
+<code>
+*[ C?c;
+    [ ([] j : 4 : c=j -> y:=x[j] ];
+    y:=y+1;
+    [ ([] j : 4 : c=j -> x[j]:=y ]
+ ]
+</code>
+which uses a single function block.
+It is also important to realize the difference between a CHP loop and a syntactic replication loop. A CHP loop actually results in a looping control circuit. A syntactic replication loop is substituted with its expanded form as a pre-processing step.
+Suppose there is an array of ''N'' variables, all of which need to be initialized to zero. Often, one may write:
+<code>
+j:=0; *[ j<N -> x[j]:=0; j:=j+1 ]
+</code>
+Apart from the dynamic array access, this loop is entirely unnecessary and is a huge control overhead. The correct CHP to write would be:
+<code>
+( , j : N : x[j]:=0 )
+</code>
+which would get substituted to;
+<code>
+x[0]:=0, x[1]:=0, ... , x[N-1]:=0
+</code>
+which has no dynamic array accesses and is simply a list of assignments.