Linker
The linker's task is to decide where all references in the source code point to. There are different references in Structured Text:
- variable references
x := 4
where x is a reference to the variable x. - type references
i : MyFunctionBlock
where MyFunctionBlock is a reference to the declared FunctionBlock. - Program references
PLC_PRG.x := 4
where PLC_PRG is a reference to a Program-POU called PLC_PRG. - Function references
max(a, b)
where max is a reference to a Function-POU called max.
So the linker decides where a reference points to. A reference has a corresponding declaration that matches the reference's name:
PROGRAM PLC_PRG
VAR
┌──────► x : INT;
│
│ END_VAR
│
└────┐
│
x := 3;
END_PROGRAM
The linker's results will be used by the semantic validation step and by the code-generation.
The validator decides whether the name you put at a certain location is valid or not. In order to decide whether a certain reference is valid or not, we need to know where it is pointing to, so whether we expect a variable, a datatype or something different.
The code-generation needs to know what certain names mean, in order to successfully generate the IR-code that reflects the behavior of your program.
Annotated Syntax Tree
The AST generated by the parser is a pretty static data-structure. So where should we store the linking information for a reference? Even if we would add fields for potential linking-information to the AST, the ownership concepts of Rust would give us a hard time to fill this information piece by piece during linking. So what we end up doing, is to use the arena-pattern to handle the different lifetimes of the parts of an AST (the AST itself is constructed very early in the compilation process, where the linking information is allocated later). We don't store the linking information directly in the AST, but we store it inside the mentioned arena-data-structure and link it with certain AST-elements.
The RuSTy linker stores the linking information in an arena called AnnotationMap. The AnnotationMap can store two type of annotations for any AST-element. So the first step is that we need a way to uniquely identify every single AST-node so we can use this ID as a key for the annotations stored in the AnnotationMap to automatically associate it with the given AST-Node. The parser assigns a unique ID to every Statement-Tree-Node (Note that we only assign IDs to Statements, not every AST-Node).
So the expression a + 3
now looks like this:
┌─────────────────┐
│ BinaryOperation │
├─────────────────┤
│ operator: Plus │
│ ID: 1 │
└──────┬──┬───────┘
│ │
left │ │ right
┌───────────┘ └──────────┐
│ │
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Reference │ │ LiteralInteger │
├──────────────────┤ ├──────────────────┤
│ name: 'a' │ │ value: '3' │
│ ID: 2 │ │ ID: 3 │
└──────────────────┘ └──────────────────┘
The AnnotationMap stores 5 different types of annotation:
Value
The Value-annotation indicates that this AST-Element resolves to a value with the given resulting datatype. So for Example the LiteralInteger(3) node gets a Value-Annotation with a resulting type ofDINT
.
┌─────────────────────────┐
│ Value │
├─────────────────────────┤
│ │
│ resulting_type: String │
│ │
└─────────────────────────┘
Variable
The Variable-annotation indicates that this AST-Element resolves to a variable with the given qualified name (and some comfort-information like whether it is a constant and whether it is an auto-deref pointer). Similar to the value-Annotation it also saves the resulting datatype.
┌─────────────────────────┐
│ Variable │
├─────────────────────────┤
│ │
│ resulting_type: String │
│ qualified_name: String │
│ constant: bool │
│ is_auto_deref: bool │
│ │
└─────────────────────────┘
Function
The Function-annotation indicates that this AST-Element resolves to a Function-POU (a call-statement) with the given qualified name. Similar to the value-Annotation it also saves the resulting datatype but this time as the function's return type (return_type).
┌─────────────────────────┐
│ Function │
├─────────────────────────┤
│ │
│ return_type: String │
│ qualified_name: String │
│ │
└─────────────────────────┘
Type
The Type-annotation indicates that this AST-Element resolves to a DataType (e.g. a Declaration:x: INT
) with the given name.
┌─────────────────────────┐
│ Type │
├─────────────────────────┤
│ │
│ type_name: String │
│ │
└─────────────────────────┘
Program
The Program-annotation is very similar to the Function-annotation. Since a Program has no return-value it also offers no return-type information.
┌─────────────────────────┐
│ Program │
├─────────────────────────┤
│ │
│ qualified_name: String │
│ │
└─────────────────────────┘
So the example expression from above a + 3
will be annotated like this:
(Note that the resulting type of the Binary-Operation must be calculated by the linker by determining the bigger of both types.)
┌─────────────────┐
│ BinaryOperation │
├─────────────────┤
│ operator: Plus │
│ ID: 1 │
└──────┬──┬───────┘
│ │
left │ │ right
┌───────────┘ └──────────┐
│ │
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Reference │ │ LiteralInteger │
├──────────────────┤ ├──────────────────┤
│ name: 'a' │ │ value: '3' │
│ ID: 2 │ │ ID: 3 │
└──────────────────┘ └──────────────────┘
┌────────────────────────────┐
│ Value │
┌───────────────────┐ ├────────────────────────────┤
│ AnnotationMap │ ┌───►│ resulting_type: DINT │
│ │ │ │ │
├───────┬───────────┤ │ └────────────────────────────┘
│ ID: 1 │ Value ├───┘
├───────┼───────────┤ ┌────────────────────────────┐
│ ID: 2 │ Variable ├────┐ │ Variable │
├───────┼───────────┤ │ ├────────────────────────────┤
│ ID: 3 │ Value ├──┐ │ │ resulting_type: SINT │
└───────┴───────────┘ │ └──►│ qualified_name: PLC_PRG.a │
│ │ constant: false │
│ │ is_auto_deref: false │
│ └────────────────────────────┘
│
│ ┌────────────────────────────┐
│ │ Value │
│ ├────────────────────────────┤
└────►│ resulting_type: DINT │
│ │
└────────────────────────────┘
Another example where the annotated AST carries a lot of useful information is with complex expressions like array-expressions or qualified references. Lets consider the following statement:
PLC_PRG.a.b[2]
It is annotated in the following way:
┌────────────────────┐
│ QualifiedReference │
├────────────────────┤
│ ID: 1 │
└─────────┬──────────┘
│ elements: Vec<AstStatement>
┌─────────┴──────────┬─────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Reference │ │ Reference │ │ ArrayAccess │
├──────────────────┤ ├──────────────────┤ ├──────────────────┤
│ name: 'PLC_PRG' │ │ name: 'a' │ │ │
│ ID: 2 │ │ ID: 3 │ │ ID: 4 │
└──────────────────┘ └──────────────────┘ └─────┬──────┬─────┘
│ │
reference │ │ access
┌────────┘ └─────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Reference │ │ LiteralInteger │
├──────────────────┤ ├──────────────────┤
│ name: 'b' │ │ value: '2' │
│ ID: 5 │ │ ID: 6 │
└──────────────────┘ └──────────────────┘
┌────────────────────────────┐
│ Value │
┌───►├────────────────────────────┤
│ │ resulting_type: INT │
│ │ │
│ └────────────────────────────┘
│
│ ┌────────────────────────────┐
┌───────────────────┐ │ │ Program │
│ AnnotationMap │ │ ┌─►├────────────────────────────┤
│ │ │ │ │ qualified_name: PLC_PRG │
├───────┬───────────┤ │ │ │ │
│ ID: 1 │ Value ├───┘ │ └────────────────────────────┘
├───────┼───────────┤ │
│ ID: 2 │ Program ├─────┘ ┌────────────────────────────┐
├───────┼───────────┤ │ Variable │
│ ID: 3 │ Variable ├───────►├────────────────────────────┤
├───────┼───────────┤ │ resulting_type: MyStruct │
│ ID: 4 │ Value ├─────┐ │ qualified_name: PLC_PRG.a │
├───────┼───────────┤ │ └────────────────────────────┘
│ ID: 5 │ Variable ├───┐ │
├───────┼───────────┤ │ │ ┌────────────────────────────┐
│ ID: 6 │ Value ├─┐ │ │ │ Value │
└───────┴───────────┘ │ │ └─►├────────────────────────────┤
│ │ │ resulting_type: INT │
│ │ │ │
│ │ └────────────────────────────┘
│ │
│ │ ┌─────────────────────────────────┐
│ │ │ Variable │
│ └───►├─────────────────────────────────┤
│ │ resulting_type : ARRAY[] OF INT │
│ │ qualified_name : MyStruct.b │
│ └─────────────────────────────────┘
│
│ ┌────────────────────────────┐
│ │ Value │
│ ├────────────────────────────┤
└─────►│ resulting_type: DINT │
│ │
└────────────────────────────┘
Type vs. Type-Hint
The AnnotationMap not only offers annotations regarding the AST-node's type, but it also offers a second type of annotation.
Consider the following snippet:
PROGRAM PLC_PRG
VAR
x : SINT;
y : INT;
z : BYTE;
END_VAR
z := x + y;
END_PROGRAM
The assignment z := x + y
is loaded with different types:
x
is annotated as Variable of type SINT and will be auto-upgraded to DINT.y
is annotated as Variable of type INT and will be auto-upgraded to DINT.z
is annotated as Variable of type BYTE.x + y
is annotated as Value of type DINT (the bigger of both).
In order to make life easier for validation and code-generation we add an additional annotation to x + y
to indicate, that while it technically results in a DINT, it should rather be treated as a BYTE since it is going to be assigned to z
.
This second annotation is called the type-hint. It indicates that while it technically is not the real type of this expression, the program's semantic wants the compiler to treat it as this type.
The expression z := x + y
is annotated like this:
expression | type annotation | type-hint annotation | explanation |
---|---|---|---|
x | SINT | DINT | auto-upgraded to DINT |
y | INT | DINT | auto-upgraded to DINT |
z | BYTE | - | |
x + y | DINT | BYTE | type-hint indicates that the resulting DINT needs to be cast to BYTE |
With the help of the type-hint annotations the validation can decide whether certain type-cast operations are valid very easily. The code-generation steps can easily decide when to generate casts, by simply comparing a node's type annotation and it's type-hint annotation.
Dependencies
When generating multiple units, the Linker will keep track of a dependency-tree for the unit. This means that every datatype or global variable referenced directly or indirectly by the module will be marked as a dependency. This information can then be used during the codegen period to only generated types and variables that are relevant to the unit.