RuSTy
RuSTy is a structured text (ST) compiler written in Rust and based on the LLVM compiler backend. We use the logos crate library to perform lexical analysis before the custom parser runs. RuSTy puts out static or shared objects as well as LLVM IR or bitcode by the flip of a command line flag. We are aiming towards an open-source industry-grade ST compiler supporting at least the features in 2nd edition IEC 61131 standard.
You might also want to refer to the API documentation.
Supported Language Concepts
POUs
- ✔ Program
- ✔ Function
- ✔ FunctionBlock
- ✔ Action
Datatypes
- ✔ IEC 61131-3 numeric types
- ✔ Strings
- ✔ Wide Strings
- ✔ Struct types
- ✔ Enum types
- ✔ Array data types
- ✔ Alias types
- ✔ Sub-ranges types
- ✔ Date and Time types
- ✔ Sized String types
- ✔ Sized Wide String types
- ✔ Initial values
Declarations
- ✔ VAR
- ✔ VAR_INPUT
- ✔ VAR_INPUT {ref}
- ✔ VAR_OUTPUT
- ✔ VAR_IN_OUT
Statements
- ✔ Assignments
- ✔ Call statements
- ✔ Implicit call arguments
- ✔ Explicit call arguments
- ✔ EXIT, CONTINUE statements
Control Structures
- ✔ IF Statement
- ✔ CASE Statement
- ✔ FOR Loops
- ✔ WHILE Loops
- ✔ REPEAT Loops
- ✔ RETURN statement
Expressions
- ✔ Arithmetic Operators
- ✔ Relational Operators
- ✔ Logical Operators
- ✔ Bitwise Operators
Build & Install
RuSTys code can be found on GitHub.
By default a Dockerfile
and a devcontainer.json
file are provided. If you wish to develop natively
however, you will need some additional dependencies namely:
- Rust
- LLVM 14
- LLVM Polly
- Build Tools (e.g.
build-essential
on Ubuntu) - zlib
The next sections cover how to install these dependencies on different platforms, if you already have them
however, RuSTy can be build using the cargo
command. For debug builds this can be accomplished by executing
cargo build
and for release builds (smaller & faster) you would execute cargo build --release
. The
resulting binaries can be found at target/debug/plc
and target/release/plc
respectively.
Ubuntu
The specified dependencies can be installed with the following command on Ubuntu:
sudo apt install \
build-essential \
llvm-14-dev liblld-14-dev \
libz-dev \
lld \
libclang-common-14-dev
Additionally you might need libffi7
, which can be installed with sudo apt install libffi7
.
Debian
Same as Ubuntu with the exception of adding additional repository sources since Debian 11 only includes LLVM packages up to version 11. To do so follow the official documentation.
MacOS
On MacOS you need to install the Xcode Command Line Tools
.
Furthermore LLVM 14 is needed, which can be easily installed with homebrew :
brew install llvm@14
After the installation you have to add /opt/homebrew/opt/llvm@14/bin
to your $PATH
environment variable, e.g. with the following command:
echo 'export PATH="/opt/homebrew/opt/llvm@14/bin:$PATH"' >> ~/.zshrc
Windows
Compiling RuSTy on Windows requires three dependencies:
- Windows 10 SDK
- MSVC (at the point of writing this we tested it on v142 - VS 2019 C++ x64/x86 build tools)
- LLVM 14.0.6
The first two dependencies are typically installed during the Rust installation itself. More specifically during the
installation you should have been prompted to install them. If not, you'll be able to install them via Visual Studio at any point.
The third dependency is based on a custom build which is hosted on GitHub.
Download it, extract it and add the bin/
directory to your environment variables.
In theory this should cover everything to be able to compile RuSTy (with some reboots here and there).
Installing
TODO
Troubleshooting
- Because of weak compatibility guarantees of the LLVM API, the LLVM installation must exactly match the
major version of the
llvm-sys
crate.Currently you will need to install LLVM 14 to satisfy this constraint. Read more - To avoid installation conflicts on Linux/Ubuntu, make sure you don't have a default installation available
(like you get by just installing
llvm-dev
), which may break things. If you do, make sure you have set the appropriate environment variable (LLVM_SYS_140_PREFIX=/usr/lib/llvm-14
for LLVM 14), so the build of thellvm-sys
crate knows what files to grab.
Using RuSTy
The RuSTy compiler binary is called
plc
plc
offers a comprehensive help via the -h
(--help
) option.
plc
takes one output-format parameter and any number of input-files.
The input files can also be written as glob patterns.
plc [OPTIONS] <input-files>... <--ir|--shared|--pic|--static|--bc>
Note that you can only specify at most one output format.
In the case that no output format switch has been specified, the compiler will select --static
by default.
Similarly, if you do not specify an output filename via the -o
or --output
options,
the output filename will consist of the first input filename, but with an appropriate
file extension depending on the output file format.
A minimal invocation looks like this:
plc input.st
this will take in the file input.st
and compile it into a static object that will be written to a file named input.o
.
More examples:
plc --ir file1.st file2.st
will compile file1.st and file2.st.plc --ir file1.cfc file2.st
will compile file1.cfc and file2.st.plc --ir src/*.st
will compile all ST files in the src-folder.plc --ir "**/*.st"
will compile all ST-files in the current folder and its subfolders recursively.
Example: Building a hello world program
Writing the code
We want to print something to the terminal, so we're going to declare external functions for that.
This example is available under examples/hello_world.st
in the main RuSTy repository.
main
is our entry point to the program.- To link the program, we are going to use the system's linker using the
--linker=cc
argument. - On Windows and MacOS, replace this with
--linker=clang
as cc is usually not available.
{external}
FUNCTION puts : DINT
VAR_INPUT {ref}
text : STRING;
END_VAR
END_FUNCTION
FUNCTION main : DINT
puts('hello, world!$N');
END_FUNCTION
Compiling with RuSTy
The RuSTy command line interface is similar to that of other compilers.
If you just want to build an object file, then do this:
plc -c hello_world.st -o hello_world.o
Optimization
plc
offers 4 levels of optimization which correspond to the levels established by llvm respectively clang (none
to aggressive
, respectively -O0
to -O3
).
To use an optimization, the flag -O
or --optimization
is required:
plc -c "**/*.st" -O none
plc -c "**/*.st" -O less
plc -c "**/*.st" -O default
plc -c "**/*.st" -O aggressive
By default plc
will use default
which corresponds to clang's -O2
.
Linking an executable
Instead, you can also compile this into an executable and run it:
plc hello_world.st -o hello_world --linker=cc
./hello_world
Please note that RuSTy will attempt to link the generated object file by default to generate an executable if you didn't specify something else (option -c
).
- The
--linker=cc
flag tells RuSTy that it should link with the system's compiler driver instead of the built in linker. This provides support to create executables. - Additional libraries can be linked using the
-l
flag, additional library paths can be added with-L
- You add library search paths by providing additional
-L /path/...
options. By default, this will be the current directory. - The linker will prefer a dynamically linked library if available, and revert to a static one otherwise.
Building for separate targets
RuSTy supports building for multiple targets by specifing the --target
and optionally the --sysroot
command.
- Multiple targets and sysroot can be specified for the compilation simply by adding additional
--target
and--sysroot
entries.
--target
To build and compile structured text for the rigth platform we need to specify the target
.
As RuSTy is using LLVM a target-tripple supported by LLVM needs to be selected.
The default target
is the host machine's target.
So if a dev container on an x86_64-docker
is used the target is x86_64-linux-gnu
.
--sysroot
plc
use the sysroot
option for linking purposes.
It is considered to be the root directory for the purpose of locating headers and libraries.
- If a target and sysroot are provided, the output will always be stored in a folder with the target name (e.g. an
x86_64-linux-gnu
target will have the output strored in a folder calledx86_64-linux-gnu
) --sysroot
parameters have to always match target parameters, there can be nosysroot
without a target.
Parallel Compilation
By default, plc
uses parallel compilation.
This option can be controlled with the -j
or --threads
flag. A value above 0
will indicate the number of threads to use for the compilation
Leaving the value unset, setting it to 0
or simply specifying -j
sets the value to the maximum threads that can run for the current machine.
This is determined by the underlying parallelisation library Rayon
Single module Compilation
With the introducton of parallel compilation, every unit is compiled into an object file independently and then linked together in a single module.
This behaviour might not always be desired and can be disabled using the --single-module
flag.
Note that the single module flag is currently much slower to produce as it requires first generating all modules and then merging them together.
Configuration Options
plc
supports different configuration options, these can be printed using the config
subcommand
config schema
Outputs the json schema used for the validation of the plc.json
file
config diagnostics
Ouputs a json file with the default error severity configuration for the project. See Error Configuration for more information.
Build Configuration
In addition to the comprehensive help, plc
offers a build subcommand that simplifies the build process.
Instead of having numerous inline arguments, using the build subcommand along with a build description file makes passing the arguments easier.
The build description file needs to be saved in the json format.
Usage:
plc build
Note that if plc
cannot find the plc.json
file, it will throw an error and request the path.
The default location for the build file is the current directory.
The command for building with an additional path looks like this:
plc build src/plc.json
Build description file (plc.json)
For the build description file to work, it must be written in the json format. All the keys used in the build description file are described in the following sections.
files
The keyword files
is the equivalent to the input
parameter, which adds all the ST
files that need to be compiled.
The value of files
is an array of strings, definied as follows:
"files" : [
"examples/hello_world.st",
"examples/hw.st"
"examples/*.gvl"
]
libraries
To link several objects into one executable plc
has the option to add libraries and automatically build and link them together.
The libraries
keyword is optional.
"libraries" : [
{
"name" : "iec61131std",
"path" : "path/to/lib/",
"package" : "Copy",
"include_path" : [
"examples/hw.st",
"examples/hello_world.st"
]
}
]
output
Similarly to specifying an output file via the -o
or --output
option using the command line, in the build file we use "output" : "output.so"
to define the output file. The default location is the current build directory. (see Build Location).
compile_type
The following options can be used for the compile_type
:
Static
specifies that linking/binding must be done at compile time.Shared
(dynamic) specifies that linking/bingind must be done dynamically (at runtime).PIC
Position Independent Code (Choosing this option implies that the linking will be done dynamically).Relocatable
generates relocatable object code (for combining with other object code).Bitcode
adds bitcode alongside machine code in executable file.IR
intermediatellvm
representation.
The compile format is specified in the build description file as follows: "compile_type" : "Shared"
.
The compile_type
keyword is optional.
package_commands
The package_commands
keyword is optional.
TODO
Example
{
"files" : [
"examples/hw.st",
"examples/hello_world.st",
"examples/ExternalFunctions.st",
"examples/*.dt"
],
"compile_type" : "Shared",
"output" : "proj.so",
"libraries" : [
{
"name" : "iec61131std",
"path" : "path/to/lib",
"package" : "Copy",
"include_path" : [
"examples/lib.st"
]
},
{
"name" : "other_lib",
"path" : "path/to/lib",
"package" : "System",
"include_path" : [
"examples/hello_world.st"
]
}
]
}
Build Parameters
The build
subcommand exposes the following optional parameters:
--build-location
The build location is the location all build files will be copied to.
By default the build location is the build
folder in the root of the project (the location of the plc.json
).
This can be overriden with the --build-location
command line parameter.
--lib-location
The lib location is where all libraries marked with Copy
will be copied.
By default it is the same as the build-location
.
This can be overriden with the --lib-location
command line parameter.
Environment Variables
Environment variables can be used inside the build description file, the variables are evaluated before an entry is evaluated.
In addition to externally defined variables, the build exports variables that can be referenced in the description file:
PROJECT_ROOT
The folder containing the plc.json
file, i.e. the root of the project.
ARCH
The target architecture currently being built, for a multi architecture build.
The value for ARCH
will be updated for every target.
Example targets are:
x86_64-pc-linux-gnu
, x86_64-pc-windows-msvc
, aarch64-pc-linux-musl
BUILD_LOCATION
BUILD_LOCATION
is the folder where the build will be saved.
This is the value of either the --build-location
parameter or the default build location.
LIB_LOCATION
LIB_LOCATION
is the folder where the lib will be saved.
This is the value of either the --lib-location
parameter or the build location.
Usage
To reference an environment variable in the description file, reference the variables with a preceding $
.
Example:
{
"name" : "mylib",
"path" : "$ARCH/lib",
"package" : "System",
"include_path" : [
"examples/hello_world.st"
]
}
Validation
The build description file uses a Json Schema file located at compiler/plc_project/schema/plc-json.schema
to validate the build description before build.
In order for the schema to be used, it has to be either in that location for source builds or copied next to the build binaries.
If the schema is not found, the schema based validation will be skipped.
Error Configuration
Errors in a plc
project can be configured by providing a json configuration file.
A diagnostics severity can be changed for example from warning
to error
or info
and vice-versa or ignore
d completely.
To see a default error configuration use plc config diagnostics
.
To provide a custom error configuration use plc --error-config <custom.json>
.
Note that the --error-config
command can be used with all subcommands such as build
and check
.
Running plc config diagnostics --error-config <custom.json>
will print out the full diagnostics configuration taking the provided overrides into account.
Error Description
Errors produced by plc
can be explained using the plc explain <ErrorCode>
command.
Error codes are usually provided in the diagnostic report.
General Error
This error is a catch all error. It is usually thrown when no other error better matches the case.
General IO Error
This error describes a problem during an IO operation such as reading or writing a file. It is usually accompanied by an internal error with further details.
Parameter Error
This error describes a problem with the command parameters, such as a file required for the compilation not being found.:
Duplicate Symbol
The marked symbol has been defined multiple times.
Generic LLVM Error
An unexpected error occurred during the LLVM generation phase. This is usually a follow up problem from a different diagnostics. If it occurrs without a previous diagnostics please file a bug report.
Missing Token
During the parsing phase, an additional Token (Element) was required to correctly interpret the code. The error message usually indicates what Token was missing.
Example
In the following example the name (Identifier) of the program is missing.
PROGRAM (*name*)
END_PROGRAM
error: Unexpected token: expected Identifier but found END_PROGRAM
┌─ example.st:2:1
│
2 │ END_PROGRAM
│ ^^^^^^^^^^^ Unexpected token: expected Identifier but found END_PROGRAM
Unexpected Token
During parsing, a Token (Element) was encountered in the wrong location. This could be an indication of a missused or misspelled keyword
Invalid Range
Mismatched Parantheses
Invalid time literal
Invalid Number
Missing Case Contition
Keywords should contain Underscores
Wrong paranthese for String delimiter
POINTER_TO is no standard keyword
Return types cannot have a default value
Classes cannot contain implementation
Duplicate Label
Classes cannot contain IN_OUT variables
Classes cannot contain a return type
POUs cannot be extended
Missing container name for action
Statement has no effect
Invalid Pragma Location
Missing return type
Unexpected return type
Unsupported return type
Empty variable block
Recursive data structure
Missing IN_OUT parameters
Invalid parameter type
Invalid number of parameters
Unresolved Constant
Invalid constant block
Invalid Constant
Cannot assign to constant
Invalid assignment
Missing type
Variable Overflow
Invalid Enum Variant
This error indicates the right-hand side in an enum assignment is invalid.
For example an enum such as TYPE Color : (red := 0, green := 1, blue := 2); END_TYPE
can only take values
which (internally) yield a literal integer 0, 1 or 2.
Invalid variable initializer
Assignment to Reference
Invalid array assignment
Invalid POU for VLA
Invalid VLA array access
VLA Dimension out of bounds
VLAs are always By Reference
Unresolved Reference
Illegal reference access
Expression is not assignable
Typecast error
Unknown type
Literal out of range
Literal not compatible with type
Incompatible direct access
Incompatible variable for direct access
Invalid range for direct access
Invalid range for array access
Invalid variable for array access
Direct access to variable with %
Expected literal
Invalid Nature
Unknown Nature
Unresolved Generic
Incompatible size
Invalid operation
Implicit typecast
Pointer derefernce to non pointer
Array access to non array value
Address-of requires a value
General codegen error
Missing function
Missing compare function
Cannot generate string literal
Initial values were not generated
General debug error
Generic linker error
Duplicate case condition
Case condition outside of a case statement
Invalid case condition
Empty control statement
Undefined node
Unexpected node
Unconnected source
Cyclic connection
No associated connector
Unnamed control
Invalid PLC Json file
Invalid Call parameters
Incompatible reference assingment
Unsafe Enum Assignment
At runtime there is no way to guarantee that a non-const reference will not change its value to something out-of-bounds for enums. For example consider the following
PROGRAM main
VAR
zero : DINT := 0;
color : (red := 0, green := 1, blue := 2);
END_VAR
zero := 10;
color := zero; // Invalid because `color` accepts values from 0 to 2, but we assigned 10 to it
END_PROGRAM
Equivalent enum value used
This message indicates that the assigned enum value is not part of the enum, but is equivalent to one of the internal values of the enum.
Example:
TYPE Colors : (Red, Green, Blue, Yellow) END_TYPE
TYPE Directions : (N, S, W, E) END_TYPE
VAR_GLOBAL
col : Colors := N; //N is equivalent to Red but is not part of the enum
dir : Directions := Red; //Red is equivalent to N but is not part of the enum
END_VAR
To solve the issue, use the equivalent value indicated by the enum
Return Value Of Void Functions
Functions of type VOID can not have an explicit return value, e.g. foo := 1
in the following example is invalid.
FUNCTION foo
foo := 1;
END_FUNCTION
Choose a type for your function, if a value must be returned.
Invalid Conditional Value
Control statements such as IF
, FOR
and WHILE
require specific types for their condition.
If, While
IF
and WHILE
statements require an expression which yields a boolean, any other type is invalid and will trigger an
error.
For
FOR
statements require four conditional values: a counter
, a start
value, an end
value and a step
value. All
of these need to be integers and share the same type.
FOR start := counter TO end BY step DO
// ...
END_FOR
Action call without parentheses
Integer Condition
This error is generated because an integer was used in a IF
or WHILE
statement, when a boolean was expected.
See also plc explain E094
Invalid Array Range
Ranges such as ARRAY [0..-1]
are invalid in ST because end values of ranges must be greater than their start values.
A valid range for the given statement would have been ARRAY[-1..0]
.
Libraries
RuSTy does not currently have support for importing source based libraries.
Source based libraries can, however, be compiled together with the application as normal files.
Precompiled libraries or system functions can be added using compilation flags or an entry in the plc.json
file.
System functions can also be added using External Function for each POU in that library.
Library Structure
A library is defined by:
- A set of
st
interfaces, each interface represents a function that has been precompiled.
In a POU, the interface is the definition and variable section e.g:
(*Interface for program example *) PROGRAM example VAR_INPUT a,b,c : DINT END_VAR (* End of interface *) (* Implementation *) END_PROGRAM
- A binary file for each architecture the library has been built for (
x86_64-linux-gnu
,aarch64-linux-gnu
, ..)
Linking libraries using the plc
command line
To include a library when using the plc
command line interface, the include files can be added using the -i
flag.
Each POU, Global Variable, or Datatype defined in the included files will be added to the project.
POUs and Global variables included with the -i
are marked as external, the implementation part of a POU is ignored.
To link the library, two options are then available: Shared and Static libraries.
Shared Libraries
A shared library (i.e. extension .so
) can be linked using the -l
flag.
For a library called mylib
, when the flag -lmylib
is passed, the linker will search for a file called libmylib.so
.
Note that the
lib<LibName>.so
format is required by the linker for unix like systems.
The library locations used by the linker are the default search locations of the linker (i.e. /usr/lib
, /lib
), additional paths can be provided using the -L
flag (e.g -L/opt/lib
will make the linker also search for files in /opt/lib).
Additional library locations can be provided by supplying additional -L
entries.
Additionally, the environment variable LD_LIBRARY_PATH
can be defined to append entries to the linker's search location. More information can be found here.
Static Libraries
Static libraries compiled as object files can be linked by simply passing the object file (i.e. extension .o
) as an input (simlar to other .st
files).
Archive files (i.e. extension .a
) can be linked similarly to Shared Libraries using the -l
flag.
If the application is being compiled with the --static
flag (or no shared library (.so
) is found), the linker will use the archive file.
If neither a shared object (
.so
) or an archive file (.a
) is found, compilation will fail.
Command line example
To compile a file called input.st
including a header and linking a library called libiec.so
from /lib
:
plc input.st -i iec/header.st -L/lib/ -liec
Linking libraries using the Build Description File plc.json
Libraries can be added to a project managed with a Build Description File.
To add a library to the project, the "libraries"
section can be used.
A library entry requires a name
, a path
, the package
behaviour, and a set of files to include (include_path
).
name
The name of the library to be linked. This will be used by the linker to find the library.
A library with the name mylib
must have an equivalant compiled file called libmylib.so
.
Note, archive files (ending with
.a
) are currently not supported.
path
The location of the library to be linked. The path can be either absolute or relative to the project.
package
The packaging option for the library, i.e wether the library should be copied or is already available on the system.
The value "Copy"
indicates that the given library should be copied to the Library Location.
The value "System"
indicates that the given library exists on the system and does not need to be copied.
include_path
A list of files (can include globs) that should be included with the project. Each POU, Global Variable, or Datatype defined in the included files will be added to the project. POUs and Global variables included in the list are marked as external, the implementation part of a POU is ignored.
Library Location
Libraries marked as Copy
will be copied during the compilation to the defined Library Location.
By default this is the same as the Build Location unless overridden by the --lib-location
parameter.
Using environment variables
Since libraries can be compiled for multiple targets, the lib path can contain environment variables to disambiguate the compile location.
$ARCH
can be used as placeholder in the path to indicate the the currently compiled target.
During linking, if no
.so
file with namelib<name>.so
is found, the compilation will fail.
Configuration Example (plc.json
)
A configuration example for a Copy
library called mylib and a System
library called std:
"libraries" : [
{
"name" : "mylib",
"path" : "libs/$ARCH/",
"package" : "Copy",
"include_path" : [
"simple_program.st"
]
},
{
"name" : "std",
"path" : "libs/$ARCH/",
"package" : "System",
"include_path" : [
"include/*.st"
]
}
]
External Functions
A POU
(PROGRAM
, FUNCTION
, FUNCTION_BLOCK
) can be marked as external,
which will cause the compiler to ignore its implementation.
{external}
FUNCTION log : DINT
VAR_IN_OUT
message : STRING[1024];
END_VAR
VAR_INPUT
type : (Err,Warn,Info) := Info;
END_VAR
END_FUNCTION
At compilation time, the function log
will be defined as an externally available function, and can be called from ST
code.
Note: At linking time, a
log
function with a compatible signature must be available on the system.
Calling C functions
ST
code can call into foreign functions natively.
To achieve this, the called function must be defined in a C
compatible API, e.g. extern "C"
blocks.
The interface of the function has to:
- either be included with the
-i
flag - or be declared in
ST
using the{external}
keyword
When including multiple header files/function interfaces, the -i
flag must precede each individual file, e.g. -i file1.st -i file2.st -i file3.st
. Alternatively, when including an entire folder with -i '/liblocation/*.st'
, the path must be put in quotes, otherwise the command-line might parse the arguments in a way that is incompatible (i.e. does not precede each file with -i
).
Example
Given a min
function defined in C
as follows:
int min(int a, int b) {
//...
}
an interface of that function in ST
can be defined as:
{external}
FUNCTION min : DINT
VAR_INPUT
a : DINT;
b : DINT;
END_VAR
END_FUNCTION
Variadic arguments
Some foreign functions, especially ones defined in C
, could be variadic functions.
These functions are usually defined with the last parameter ...
, and signify that a function can be called with unlimited parameters.
An example of a variadic function is printf
.
Calling a variadic function is supported in ST
. To mark an external function as variadic, you can add a parameter of type ...
to the VAR_INPUT
block.
Variadic function example
Given the printf
function defined as:
int printf( const char *restrict format, ... );
the ST
interface can be defined as:
{external}
FUNCTION printf : DINT
VAR_INPUT {ref}
format : STRING;
END_VAR
VAR_INPUT
args : ...;
END_VAR
END_FUNCTION
Runnable example
With the printf
function available on the system, there is no need to declare
the C function.
An ST
program called ExternalFunctions.st
with the following code can be declared:
(*ExternalFunctions.st*)
(**
* The printf function's interface, marked as external since
* it is defined directly along other ST functions
*)
{external}
FUNCTION printf : DINT
VAR_INPUT {ref}
format : STRING;
END_VAR
VAR_INPUT
args: ...;
END_VAR
END_FUNCTION
(**
* The main function of the program prints a demo to the standard out
* The function main is implemented at this location and thus not marked
* as {external}
*)
FUNCTION main : DINT
VAR
tmp : DINT;
END_VAR
tmp := 1;
printf('Value %d, %d, %d$N', tmp, tmp * 10, tmp * 100);
main := tmp;
END_FUNCTION
Compiling the previous code with the following command:
plc ExternalFunctions.st -o ExternalFunctions --linker=clang
will yield an executable called ExternalFunctions
.
We use clang to link the generated object file and generate an executable since the embedded linker cannot generate executable files.
The executable can then be started with ./ExternalFunctions
.
Program Organization Unit (POU)
Definition
A POU is a executable unit available in an IEC61131-3 application. It can be defined as either a Program, a Function, a Function Block, or an Action.
Methods on classes are also considered POUs but are not covered by this document
A POU is defined as:
<POU Type> name
(* parameters *)
(* code *)
END_<POU Type>
Parameters
POUs can use input, output, or in/out parameters to pass data to the outside.
Such parameters are defined in a variable block delimeted by VAR_<TYPE>
and END_VAR
Supported parameter types are VAR_INPUT
, VAR_INPUT {ref}
, VAR_OUTPUT
and VAR_IN_OUT
Input
Input parameters are typically copied into the target POU to be stored and read for later references.
A definition for input parameters is as follows:
VAR_INPUT
a : INT;
END_VAR
In some cases, especially when passing large strings or arrays, or when interacting with foreign code (see External Functions) it is more efficient to avoid copying the variable values and just use a pointer to the required input.
This can be done either using the in/out variables or by specifying a special property ref
on the input block.
Example:
VAR_INPUT {ref}
a : STRING;
END_VAR
Note that passing the ref property will convert all variables in that block to pointers, and should only be used in Functions.
In Out
In/Out parameters are required parameters that are always passed by reference. They can be modified by the POU the call, and the changes are applied directly to the passed variable. An In/Out parameter must always be passed in a POU call and cannot be stored.
Output
Output parameters are used to return the result(s) of the POU call. They are passed by reference, but are optional. If an output parameter is not passed in a call, its value is not persisted.
Variables
In addition to parameters, a POU contains local variables, these can either be stored in the POU for later reference (VAR
) or only created for a single call (VAR_TEMP
)
In a function, all local variables are temporary.
Specialization
In addition to the default behavior, each type of POU has some special cases.
Function
Functions are stateless sequences of callable code. They are not backed by any structs, and cannot hold any state accross multiple calls. A function's input parameter can be passed by value, or by reference.
Functions also support a return type, the resulting definition is:
FUNCTION fnName : <TYPE>
(* parameters *)
VAR_INPUT (* by value *)
x : INT;
END_VAR
VAR_INPUT {ref} (* by reference *)
x : INT;
END_VAR
(* temporary variables *)
VAR
y : INT;
END_VAR
VAR_TEMP
z : INT;
END_VAR
(* code *)
END_FUNCTION
Program
Programs are a static (i.e. GLOBAL
) STRUCT
that holds its state accross multiple calls.
A Program exists once, and only once in an application, and subsequent calls to a program will change and store the passed parameters as well as internal variables.
A program does not support passing input parameters by reference.
Example:
PROGRAM prg
(* parameters *)
VAR_INPUT
x : INT;
END_VAR
(* persisted variables *)
VAR
y : INT;
END_VAR
(* temporary variables *)
VAR_TEMP
z : INT;
END_VAR
(* code *)
END_PROGRAM
Function Block
A function block is a STRUCT
that can be initialized multiple times using different variables (i.e instance
s).
A function block instance can hold its state (including input parameters) across multiple calls, but does not share any state with different instances.
A function block does not support passing input parameters by reference.
FUNCTION_BLOCK fb
(* parameters *)
VAR_INPUT
x : INT;
END_VAR
(* persisted variables *)
VAR
y : INT;
END_VAR
(* temporary variables *)
VAR_TEMP
z : INT;
END_VAR
(* code *)
END_FUNCTION_BLOCK
Action
An action is represented by a parent struct, and does not define its own interface (VAR blocks). An action can only be defined for Programs and Function Blocks.
An action is defined in 3 different ways, either in a container (ACTIONS
) directly below the POU, in a named ACTIONS
container, or using a qualified name on the action.
Example:
FUNCTION_BLOCK fb
(* parameters *)
VAR_INPUT
x : INT;
END_VAR
(* persisted variables *)
VAR
y : INT;
END_VAR
(* temporary variables *)
VAR_TEMP
z : INT;
END_VAR
(* code *)
END_FUNCTION_BLOCK
ACTIONS (* implicitly belongs to FB *)
ACTION act
(* code *)
END_ACTION
END_ACTIONS
ACTIONS fb (* explicitly belongs to FB *)
ACTION act2
(* code *)
END_ACTION
END_ACTIONS
ACTION fb.act3 (* linked to FB with name definition *)
(* code *)
END_ACTION
Variables
Constants
Variable declaration blocks can be delcared as CONSTANT. All variables of a constant declaration block become constants. Constant variables can not be changed and need to be initialized.
Example
TYPE OneInt : INT := 1; END_TYPE
VAR_GLOBAL CONSTANT
MAX_SIZE : INT := 99;
MIN_LEN : INT := 1;
counter : OneInt; (* 1 *)
END_VAR
PROGRAM PLC_PRG
VAR CONSTANT
DEFAULT_INPUT : BOOL := FALSE;
END_VAR
END_PROGRAM
Variable Initialization
Initializers of variables are evaluated at compile time. Therefore they can only consist of literals, other constants or expressions consisting of a combination of them. Note that initializers must not contain recursive definitions.
If a variable has no initializer, the variable may be initialized with it's datatype's default value or else with 0
.
Array Initialization
Arrays can be initialized using array literals. If the array-initial value does not contain all required elements, the array's inner type's default value will be used to fill the missing values.
Example
TYPE SignalValue : INT := -1; END_TYPE
VAR_GLOBAL CONSTANT
MIN_LEN : INT := 1;
MAX_LEN : INT := 100;
SIZE : INT := MAX_LEN - MIN_LEN;
END_VAR
PROGRAM PLC_PRG
VAR_INPUT
signals: ARRAY[0..SIZE] OF SignalValue := [99, 99]; (* rest is -1 *)
END_VAR
...
END_PROGRAM
Datatypes
Numeric types
A variety of numeric types exist with different sizes and properties complying with IEC61131.
Overview
Type name | Size | Properties |
---|---|---|
SINT | 8 bit | signed |
USINT | 8 bit | unsigned |
INT | 16 bit | signed |
UINT | 16 bit | unsigned |
DINT | 32 bit | signed |
UDINT | 32 bit | unsigned |
LINT | 64 bit | signed |
ULINT | 64 bit | unsigned |
REAL | 32 bit | float |
LREAL | 64 bit | float |
When such a variable is declared without being initialized, it will
be default-initialized with a value of 0
or 0.0
respectively.
A word on integer literals
Integer literals can be prefixed with either 2#
(binary), 8#
(octal) or 16#
(hexadecimal).
They will then be treated with regard to the respective number system.
Examples:
i1 : DINT := 42;
- declares and initializes a 32bit signed integer with value 42.i1 : DINT := 2#101010;
- declares and initializes a 32bit signed integer with value 42.i1 : DINT := 8#52;
- declares and initializes a 32bit signed integer with value 42.i1 : DINT := 16#2A;
- declares and initializes a 32bit signed integer with value 42.
Strings
Overview
Type name | Size | Encoding |
---|---|---|
STRING | n+1 | UTF-8 |
WSTRING | 2n+2 | UTF-16 |
When such a variable is declared without being initialized, it will be default-initialized with a value of '' or "" respectively (empty strings).
STRING
RuSTy treats STRING
s as byte-arrays storing UTF-8 character bytes with a Null-terminator (0-byte) at the end.
So a String of size n requres n+1 bytes to account for the Null-terminator.
A STRING
literal is surrounded by single-ticks '
.
A String has a well defined length which can be defined similar to the array-syntax.
A String-variable myVariable: STRING[20]
declares a byte array of length 21, to store 20 utf8 character bytes.
When declaring a STRING
, the length-attribute is optional. The default length is 80.
Examples:
s1 : STRING;
- declares a String of length 80.s2 : STRING[20];
- declares a String of length 20.s3 : STRING := 'Hello World';
- declares and initializes a String of length 80, and initializes it with the utf8 characters and a null-terminator at the end.s4 : STRING[55] := 'Foo Baz';
- declares and initializes a String of length 55 and initializes it with the utf8 characters and a null-terminator at the end.
WSTRING (Wide Strings)
RuSTy treats WSTRING
s as byte-arrays storing UTF-16 character bytes with two Null-terminator bytes at the end.
The bytes are stored in Little Endian encoding.
A Wide-String of size n requres 2 * (n+1) bytes to account for the 2 byes per utf16 character and the Null-terminators.
A WSTRING
literal is surrounded by doubly-ticks "
.
A WSTRING
has a well defined length which can be defined similar to the array-syntax.
A WSTRING
-variable myVariable: WSTRING[20]
declares a byte array of length 42, to store 20 utf16 character bytes.
When declaring a WSTRING
, the length-attribute is optional. The default length is 80.
Examples:
ws1 : WSTRING;
- declares a Wide-String of length 80.ws2 : WSTRING[20];
- declares a Wide-String of length 20.ws3 : WSTRING := "Hello World";
- declares and initializes a Wide-String of length 80, and initializes it with the utf16 characters and a utf16-null-terminator at the end.ws4 : WSTRING[55] := "Foo Baz";
- declares and initializes a Wide-String of length 55 and initializes it with the utf8 characters and a utf16-null-terminator at the end.
Date and Time
Overview
Type name | Size | Internally stored as |
---|---|---|
TIME | 64 bit | Timespan in nanoseconds |
TIME_OF_DAY | 64 bit | Nanoseconds since Jan 1, 1970 UTC |
DATE | 64 bit | Nanoseconds since Jan 1, 1970 UTC |
DATE_AND_TIME | 64 bit | Nanoseconds since Jan 1, 1970 UTC |
Note that RuSTy already treats TIME
, TIME_OF_DAY
, DATE
and DATE_AND_TIME
as 64 bit numbers.
Therefore the long pendants LTIME
, LTOD
, LDATE
and LDT
are mere aliases to the original types.
DATE
The DATE
datatype is used to represent a Date in the Gregorian Calendar.
Such a value is stored as an i64 with a precision in nanoseconds and denotes the number of nanoseconds
that have elapsed since January 1, 1970 UTC not counting leap seconds.
DATE literals start with DATE#
or D#
followed by a date in the format of yyyy-mm-dd
.
Examples:
d1 : DATE := DATE#2021-05-02;
d2 : DATE := DATE#1-12-24;
d3 : DATE := D#2000-1-1;
DATE_AND_TIME
The DATE_AND_TIME
datatype is used to represent a certain point in time in the Gregorian Calendar.
Such a value is stored as an i64
with a precision in nanoseconds and denotes the
number of nanoseconds that have elapsed since January 1, 1970 UTC not counting leap seconds.
DATE_AND_TIME literals start with DATE_AND_TIME#
or DT#
followed by a date and time in the
format of yyyy-mm-dd-hh:mm:ss
.
Note that only the seconds-segment can have a fraction denoting the milliseconds.
Examples:
d1 : DATE_AND_TIME := DATE_AND_TIME#2021-05-02-14:20:10.25;
d2 : DATE_AND_TIME := DATE_AND_TIME#1-12-24-00:00:1;
d3 : DATE_AND_TIME := DT#1999-12-31-23:59:59.999;
TIME_OF_DAY
The TIME_OF_DAY
datatype is used to represent a specific moment in time in a day.
Such a value is stored as an i64
value with a precision in nanoseconds and denotes the
number of nanoseconds that have elapsed since January 1, 1970 UTC not counting leap seconds.
Hence this value is stored as a DATE_AND_TIME
with the day fixed to 1970-01-01.
TIME_OF_DAY
literals start with TIME_OF_DAY#
or TOD#
followed by a time in the
format of hh:mm:ss
.
Note that only the seconeds-segment can have a fraction denoting the milliseconds.
Examples:
t1 : TIME_OF_DAY := TIME_OF_DAY#14:20:10.25;
t2 : TIME_OF_DAY := TIME_OF_DY#0:00:1;
t3 : TIME_OF_DAY := TOD#23:59:59.999;
TIME
The TIME
datatype is used to represent a time-span.
A TIME
value is stored as an i64
value with a precision in nanoseconds.
TIME literals start with TIME#
or T#
followed by the TIME
segements.
Supported segements are:
d
...f64
daysh
...f64
hoursm
...f64
minutess
...f64
secondsms
...f64
millisecondsus
...f64
microsecondsns
...u32
nanaoseconds
Note that only the last segment of a TIME
literal can have a fraction.
Examples:
t1 : TIME := TIME#2d4h6m8s10ms;
t2 : TIME := T#2d4.2h;
t3 : TIME := T#-10s4ms16ns;
Other types
The BOOL
type can either be assigned TRUE
or FALSE
.
The type __VOID
is the empty type and has an undefined size.
Type name | Size | Properties |
---|---|---|
BOOL | 8 bit | signed |
__VOID | undefined |
Bit datatypes are defined as follows:
Type name | Size | Properties |
---|---|---|
BYTE | 8 bit | unsigned |
WORD | 16 bit | unsigned |
DWORD | 32 bit | unsigned |
LWORD | 64 bit | unsigned |
Direct (Bit) Access on Variables
The IEC61131-3 Standard allows reading specific Bits
, Bytes
, Words
or DWords
from an ANY_BIT
type.
RuSTy supports this functionalty and extends it to support all INT
types.
Constant based Direct Access
To access a bit sequence in a variable, a direct access instruction %<Type><Value>
is used.
Type
is the bit sequence size required and is described as follows:
Type | Size | Example |
---|---|---|
X | 1 | `%X1 |
B | 8 | `%B1 |
W | 16 | `%W1 |
D | 32 | `%D1 |
For
Bit
access, the%X
is optional.
Example
FUNCTION main : DINT
VAR
variable : LWORD;
bitTarget : BOOL;
bitTarget2 : BOOL;
byteTarget : BYTE;
wordTarget : WORD;
dwordTarget : DWORD;
END_VAR
variable := 16#AB_CD_EF_12_34_56_78_90;
bitTarget := variable.%X63; (*Access last bit*)
byteTarget := variable.%B7; (*Access last byte*)
wordTarget := variable.%W3; (*Access last word*)
dwordTarget := variable.%D1; (*Access last dword*)
(*Chaining an access is also allowed *)
bitTarget2 := variable.%D1.%W1.%B1.%X1;
END_FUNCTION
Varirable based Direct Access
While the IEC61131-3 Standard only defines variable access using constant int literals,
RuSTy additionally supports access using Variables.
The Syntax for a variable based access is %<Type><Variable>
.
The provided varibale has to be a direct Reference variable (non Qualified).
Short hand access for Bit (Without the
%X
modifier) is not allowed.
Example
FUNCTION main : DINT
VAR
variable : LWORD;
access_var : INT;
bitTarget : BOOL;
bitTarget2 : BOOL;
byteTarget : BYTE;
wordTarget : WORD;
dwordTarget : DWORD;
END_VAR
variable := 16#AB_CD_EF_12_34_56_78_90;
access_var := 63;
bitTarget := variable.%Xaccess_var; (*Access last bit*)
access_var := 7;
byteTarget := variable.%Baccess_var; (*Access last byte*)
access_var := 3;
wordTarget := variable.%Waccess_var; (*Access last word*)
access_var := 1;
dwordTarget := variable.%Daccess_var; (*Access last dword*)
(*Chaining an access is also allowed *)
bitTarget2 := variable.%Daccess_var.%Waccess_var.%Baccess_var.%Xaccess_var;
END_FUNCTION
Architecture
Overview
RuSTy is a compiler for IEC61131-3 languages. At the moment, ST and CFC ("FBD") are supported. It utilizes the LLVM compiler infrastructurue and contributes a Structured Text frontend that translates Structured Text into LLVM's language independent intermediate representation (IR). CFC uses a M2M-transformation and reuses most of the ST frontend for compilation. The further optimization and native code generation is performed by the existing LLVM infrastructure, namely LLVM's common optimizer and the platform specific backend (see here).
┌──────────────────┐ ┌───────────────┐ ┌────────────────┐
│ │ │ │ │ │
│ RuSTy │ │ LLVM Common │ │ LLVM Backend │
│ ├───►│ ├───►│ │
│ LLVM Frontend │ │ Optimizer │ │ (e.g Clang) │
│ │ │ │ │ │
└──────────────────┘ └───────────────┘ └────────────────┘
So RuSTy consists of the frontend part of the llvm compiler-infrastructure. This means that this compiler can benefit from llvm's existing compiler-optimizations, as well as all backend target platforms available.
Rusty Frontend Architecture
Ultimately the goal of a compiler frontend is to translate the original source code into the infrastructure's intermediate representation (in this case we're talking about LLVM IR). RuSTy treats this task as a compilation step of its own. While a fully fledged compiler generates machine code as a last step, RuSTy generates LLVM IR assembly code.
Structured Text
┌────────┐ ┌────────┐
│ Source │ │ LLVM │
│ │ │ IR │
│ Files │ │ │
└───┬────┘ └────────┘
│ ▲
▼ │
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────┴─────┐
│ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
│ Parser ├──►│ Indexer ├──►│ Linker ├──►│ Validation ├──►│ Codegen │
│ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
└────────────┘ └────────────┘ └────────────┘ └────────────┘ └────────────┘
CFC/FBD
┌────────┐ ┌────────┐
│ Source │ │ LLVM │
│ │ │ IR │
│ Files │ │ │
└───┬────┘ └────────┘
│ ▲
▼ │
┌────────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────┴─────┐
│ │ │ │ │ │ │ │ │ │
│ Model-to-Model │ │ │ │ │ │ │ │ │
│ Transformation ├───►│ Indexer ├──►│ Linker ├──►│ Validation ├──►│ Codegen │
│ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
└────────────────┘ └────────────┘ └────────────┘ └────────────┘ └────────────┘
Parser
The role of the parser is to turn source-code which is fed as a string (in the form of files) into a tree-representation of that source-code. This tree is typically called the Abstract Syntax Tree (AST). The step of parsing consists of two distinct stages. The first one is the lexical analysis (Lexer) which is performed by a lexer. After lexing we perform the syntactical analysis (Parser) to construct the syntax tree.
┌──┐
┌──────────────┐ │ │
│ │ └──┘
│ Source Code │ / \
│ │ ┌─────────┐ ┌──────────┐ / \
│ ────────── │ │ │ │ │ ┌──┐ ┌──┐
│ ├───► Lexer │ │ Parser ├────►│ │ │ │
│ ───────── │ │ │ │ │ └──┘ └──┘
│ │ └────┬────┘ └──────────┘ /\ /\
│ ──── │ │ ▲ / \ / \
│ │ │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐
│ ──────── │ ▼ │ │ │ │ │ │ │ │ │
│ │ ┌───────────────────────┴──┐ └──┘ └──┘ └──┘ └──┘
│ │ │ │
└──────────────┘ │ ┌───┐ ┌───┐ ┌───┐ ┌───┐ │ Abstract Syntax
│ │ T │ │ T │ │ T │ │...│ │ Tree
│ └───┘ └───┘ └───┘ └───┘ │
│ │
└──────────────────────────┘
Token-Stream
Lexer
The lexer performs the lexical analysis. This step turns the source-string into a sequence of well known tokens. The Lexer (or sometimes also called tokenizer) splits the source-string into tokens (or words). Each token has a distinct type which corresponds to a grammar's element. Typical token-types are keywords, numbers, identifiers, brackets, dots, etc. So with the help of this token-stream it is much easier for the parser to spot certain patterns. E.g. a floating-point number consists of the token-sequence: number, dot, number.
The lexer is implemented in the lexer
-module.
It uses the logos crate to create a lexer that is able to identify all different terminal-symbols.
Compared to other languages, Structured Text has a quite high number of keywords and other tokens, so RuSTy's lexer identifies a quite large number of different tokens.
Parser
The parser takes the token stream and creates the corresponding AST that represents the source code in a structured, hierarchical way.
The parser is implemented in the parser
module whereas the model for the AST is implemented in the ast
module.
AST - Abstract Syntax Tree
The abstract syntax tree is a tree representation of the source code.
Some parser implementations use a generic tree-data-structure consisting of Nodes
which can have an arbitrary number of children.
These nodes usually have dynamic properties like a type and an optional value and sometimes they even have dynamic properties stored in a map to make this representation even more flexible.
While this approach needs very little source code we decided to favour a less flexible approach. The RuSTy-AST models every single ast-node as its own struct with all necessary fields including the possible child-nodes. While this approach needs much more code and hand-written changes, its benefits lie in the clearness and simplicity of the data-structure. Every element of the AST is easily identified, debugged and understood. E.g. while in a generic node based AST it is easily possible to have a binary-statement with no, one, or seven child-nodes, the RuSTy-AST enforces the structure of every node. So the RuSTy-Binary-Statement has exactly two children. It is impossible to construct it differently.
Example
So an assignment a := 3;
will be parsed with the help of the following Structures:
struct Reference {
name: string
}
struct LiteralInteger {
value: i128
}
struct Assignment {
left: Box<AstStatement>,
right: Box<AstStatement>
}
Recursive Descent Parser
There are a lot of different frameworks to generate parsers from formal grammars. While they generate highly optimized parsers we felt we wanted more control and more understanding of the parsing process and the resulting AST. The fact that at that point in time we were pretty new to rust itself, writing the parser by hand also gave us more practice and a stronger feeling of control and understanding. Using a parser-generator framework will definitely be an option for future improvements.
As for now, the parser is a hand-written recursive descent parser inside the parser
-module.
As the parser reads the token stream Reference
, KeywordEquals
, Number
, Semicolon
it instantiates the corresponding syntax tree:
┌─────────────────┐
│ Assignment │
└──────┬──┬───────┘
left │ │ right
┌───────────┘ └──────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Reference │ │ LiteralInteger │
├──────────────────┤ ├──────────────────┤
│ name: 'a' │ │ value: '3' │
└──────────────────┘ └──────────────────┘
Indexer
The indexing step is responsible of building and maintaining the Symbol-Table (also called Index). The Index contains all known referable objects such as variables, data-types, POUs, Functions, etc. The Symbol-Table also maintains additional information about every referable object such as: the object's type, the objects' datatype, etc.
Indexing is performed by the index module. It contains the index itself (a.k.a. Symbol Table), the visitor which collects all global names and their additional information as well as a data structure that handles compile time constant expressions (constant_expressions).
The Index (Symbol Table)
The index stores information about all referable elements of the program. Depending on the type of element, we store different meta-information alongside the name of the element.
Index Field | Description |
---|---|
global_variables | All global variables accessible via their name. |
enum_global_variables | All enum elements accessible via their name (as if they were global variables, e.g. 'RED') |
member_variables | Member variables of structured types (Structs,Functionblocks, etc. This map allows to query all members of a container by name.) |
implementations | All callable implementations (Programs, Functions, Actions, Functionblocks) accessible by their name. |
pous | All pous (Programs, Functions, Functionblocks) with additional information. |
type_index | All data-types (intrinsic and complex) accessible via their name |
constant_expressions | The results of constant expressions that can be evaluated at compile time (e.g. the initializer of a constant: VAR_GLOBAL CONST TAU := 3.1415 * 2; END_VAR ) |
There are 3 different type of entries in the index:
- VariableIndexEntry The VariableIndexEntry holds information about every Variable in the source code and offers additional information relevant for linking, validation and code-generation.
┌─────────────────────────────┐ ┌─────────────────┐
│ VariableIndexEntry │ │ <enum> │
│ │ │ VariableType │
├─────────────────────────────┤ var_type ├─────────────────┤
│ │ │ - Local │
│ - name: String ├─────────────►│ - Temp │
│ - qualified_name: String │ │ - Input │
│ - is_constant: bool │ │ - Output │
│ - location_in_parent: u32 │ │ - InOut │
│ - data_type_name: String │ │ - Global │
│ │ │ - Return │
└───────────┬─────────────────┘ └─────────────────┘
│
│initial_value
│
│
│ ┌──────────────────┐
│ │ ConstExpression │
│ 0..1 ├──────────────────┤
└───────────►│ │
│ ... │
│ │
└──────────────────┘
- PouIndexEntry The PouIndexEntry offers information about all Program-Organization-Units. The index entry offers information like the name of an instance-struct, the name of the registered implementation, etc.
┌──────────────────────────┐
│ <abstract> │
│ POUIndexEntry │
├──────────────────────────┤
│ │
└──────────────────────────┘
▲
│
│
│ ┌──────────────────────────┐ ┌──────────────────────────┐
│ │ ProgramIndexEntry │ │ GenericParameter │
│ ├──────────────────────────┤ ├──────────────────────────┤
│ │ - name: String │ │ - name: String │
├─────┤ - instanceStruct: String ├──┬──►│ - typeNature: TypeNature │
│ │ │ │ │ │
│ │ │ │ │ │
│ └──────────────────────────┘ │ └──────────────────────────┘
│ │
│ │
│ │
│ ┌──────────────────────────┐ │
│ │ FunctionIndexEntry │ │ generics
│ ├──────────────────────────┤ │
│ │ - name: String │ │
├─────┤ ├──┤
│ │ │ │
│ │ │ │
│ └──────────────────────────┘ │
│ │
│ │
│ │
│ ┌──────────────────────────┐ │
│ │ FunctionBlockIndexEntry │ │
│ ├──────────────────────────┤ │
│ │ - name: String ├──┤
├─────┤ - instanceStruct: String │ │
│ │ │ │
│ │ │ │
│ └──────────────────────────┘ │
│ │
│ │
│ │
│ ┌──────────────────────────┐ │
│ │ ClassIndexEntry │ │
│ ├──────────────────────────┤ │
│ │ - name: String │ │
└─────┤ - instanceStruct: String ├──┘
│ │
│ │
└──────────────────────────┘
- ImplementationIndexEntry The ImplementationIndexEntry offers information about any callable implementation (Program, Functionblock, Function, etc.). It also offers metadata about the implementation type, the name of the method to call and the name of the parameter-struct (this-struct) to pass to the function.
┌───────────────────────┐
┌──────────────────────────┐ │ <enum> │
│ ImplementationIndexEntry │ │ ImplementationType │
├──────────────────────────┤ type │ │
│ ├─────────────►├───────────────────────┤
│ - call_name: String │ │ - Program │
│ - type_name: String │ │ - Function │
│ │ │ - FunctionBlock │
└──────────────────────────┘ │ - Action │
│ - Class │
│ - Method │
│ │
└───────────────────────┘
- DataType The entry for a DataType offers information about any data-type supported by the program to be compiled (internal data types as well as user defined data types). For each data-type we offer additional information such as it's initial value, its type-nature (in terms of generic functions - e.g: ANY_INT) and some additional information about the type's internal structure and size (e.g. is it a number/array/struct/etc).
┌─────────────┐ ┌────────────────────┐
│ DataType │ │ ConstantExpression │
├─────────────┤ initial_value ├────────────────────┤
│ ├──────────────────►│ │
│ - name │ │ ... │
│ ├─────────┐ │ │
└──────┬──────┘ │ └────────────────────┘
│ │
│ │ ┌────────────────────┐
│ │ │ TypeNature │
│ │ ├────────────────────┤
│ information │ │ - Any │
│ └────────►│ - Derived │
│ nature │ - Elementary │
│ │ - Num │
▼ │ - Int │
┌───────────────────────┐ │ - Signed │
│ <abstract> │ │ - ... │
│ DataTypeInformation │ └────────────────────┘
├───────────────────────┤
│ │
└───────────────────────┘
▲
│
│
│
┌────────────────┬───────┴───────┬──────────────┬──────────────┐
│ │ │ │ │
┌────────┴───────┐ ┌──────┴──────┐ ┌──────┴─────┐ ┌─────┴──────┐ ┌────┴─────┐
│ Struct │ │ Array │ │ Integer │ │ String │ │ ... │
├────────────────┤ ├─────────────┤ ├────────────┤ ├────────────┤ ├──────────┤
│ - name │ │- name │ │ - name │ │ - size │ │ ... │
│ - members │ │- inner_type │ │ - signed │ │ - encoding │ │ │
│ │ │- dimensions │ │ - size │ │ │ │ │
└────────────────┘ └─────────────┘ └────────────┘ └────────────┘ └──────────┘
Linker
The linker's task is to decide where all references in the source code point to. There are different references in Structured Text:
- variable references
x := 4
where x is a reference to the variable x. - type references
i : MyFunctionBlock
where MyFunctionBlock is a reference to the declared FunctionBlock. - Program references
PLC_PRG.x := 4
where PLC_PRG is a reference to a Program-POU called PLC_PRG. - Function references
max(a, b)
where max is a reference to a Function-POU called max.
So the linker decides where a reference points to. A reference has a corresponding declaration that matches the reference's name:
PROGRAM PLC_PRG
VAR
┌──────► x : INT;
│
│ END_VAR
│
└────┐
│
x := 3;
END_PROGRAM
The linker's results will be used by the semantic validation step and by the code-generation.
The validator decides whether the name you put at a certain location is valid or not. In order to decide whether a certain reference is valid or not, we need to know where it is pointing to, so whether we expect a variable, a datatype or something different.
The code-generation needs to know what certain names mean, in order to successfully generate the IR-code that reflects the behavior of your program.
Annotated Syntax Tree
The AST generated by the parser is a pretty static data-structure. So where should we store the linking information for a reference? Even if we would add fields for potential linking-information to the AST, the ownership concepts of Rust would give us a hard time to fill this information piece by piece during linking. So what we end up doing, is to use the arena-pattern to handle the different lifetimes of the parts of an AST (the AST itself is constructed very early in the compilation process, where the linking information is allocated later). We don't store the linking information directly in the AST, but we store it inside the mentioned arena-data-structure and link it with certain AST-elements.
The RuSTy linker stores the linking information in an arena called AnnotationMap. The AnnotationMap can store two type of annotations for any AST-element. So the first step is that we need a way to uniquely identify every single AST-node so we can use this ID as a key for the annotations stored in the AnnotationMap to automatically associate it with the given AST-Node. The parser assigns a unique ID to every Statement-Tree-Node (Note that we only assign IDs to Statements, not every AST-Node).
So the expression a + 3
now looks like this:
┌─────────────────┐
│ BinaryOperation │
├─────────────────┤
│ operator: Plus │
│ ID: 1 │
└──────┬──┬───────┘
│ │
left │ │ right
┌───────────┘ └──────────┐
│ │
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Reference │ │ LiteralInteger │
├──────────────────┤ ├──────────────────┤
│ name: 'a' │ │ value: '3' │
│ ID: 2 │ │ ID: 3 │
└──────────────────┘ └──────────────────┘
The AnnotationMap stores 5 different types of annotation:
Value
The Value-annotation indicates that this AST-Element resolves to a value with the given resulting datatype. So for Example the LiteralInteger(3) node gets a Value-Annotation with a resulting type ofDINT
.
┌─────────────────────────┐
│ Value │
├─────────────────────────┤
│ │
│ resulting_type: String │
│ │
└─────────────────────────┘
Variable
The Variable-annotation indicates that this AST-Element resolves to a variable with the given qualified name (and some comfort-information like whether it is a constant and whether it is an auto-deref pointer). Similar to the value-Annotation it also saves the resulting datatype.
┌─────────────────────────┐
│ Variable │
├─────────────────────────┤
│ │
│ resulting_type: String │
│ qualified_name: String │
│ constant: bool │
│ is_auto_deref: bool │
│ │
└─────────────────────────┘
Function
The Function-annotation indicates that this AST-Element resolves to a Function-POU (a call-statement) with the given qualified name. Similar to the value-Annotation it also saves the resulting datatype but this time as the function's return type (return_type).
┌─────────────────────────┐
│ Function │
├─────────────────────────┤
│ │
│ return_type: String │
│ qualified_name: String │
│ │
└─────────────────────────┘
Type
The Type-annotation indicates that this AST-Element resolves to a DataType (e.g. a Declaration:x: INT
) with the given name.
┌─────────────────────────┐
│ Type │
├─────────────────────────┤
│ │
│ type_name: String │
│ │
└─────────────────────────┘
Program
The Program-annotation is very similar to the Function-annotation. Since a Program has no return-value it also offers no return-type information.
┌─────────────────────────┐
│ Program │
├─────────────────────────┤
│ │
│ qualified_name: String │
│ │
└─────────────────────────┘
So the example expression from above `a + 3* will be annotated like this: (Note that the resulting type of the Binary-Operation must be calculated by the linker by determining the bigger of both types.)
┌─────────────────┐
│ BinaryOperation │
├─────────────────┤
│ operator: Plus │
│ ID: 1 │
└──────┬──┬───────┘
│ │
left │ │ right
┌───────────┘ └──────────┐
│ │
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Reference │ │ LiteralInteger │
├──────────────────┤ ├──────────────────┤
│ name: 'a' │ │ value: '3' │
│ ID: 2 │ │ ID: 3 │
└──────────────────┘ └──────────────────┘
┌────────────────────────────┐
│ Value │
┌───────────────────┐ ├────────────────────────────┤
│ AnnotationMap │ ┌───►│ resulting_type: DINT │
│ │ │ │ │
├───────┬───────────┤ │ └────────────────────────────┘
│ ID: 1 │ Value ├───┘
├───────┼───────────┤ ┌────────────────────────────┐
│ ID: 2 │ Variable ├────┐ │ Variable │
├───────┼───────────┤ │ ├────────────────────────────┤
│ ID: 3 │ Value ├──┐ │ │ resulting_type: SINT │
└───────┴───────────┘ │ └──►│ qualified_name: PLC_PRG.a │
│ │ constant: false │
│ │ is_auto_deref: false │
│ └────────────────────────────┘
│
│ ┌────────────────────────────┐
│ │ Value │
│ ├────────────────────────────┤
└────►│ resulting_type: DINT │
│ │
└────────────────────────────┘
Another example where the annotated AST carries a lot of useful information is with complex expressions like array-expressions or qualified references. Lets consider the following statement:
PLC_PRG.a.b[2]
It is annotated in the following way:
┌────────────────────┐
│ QualifiedReference │
├────────────────────┤
│ ID: 1 │
└─────────┬──────────┘
│ elements: Vec<AstStatement>
┌─────────┴──────────┬─────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Reference │ │ Reference │ │ ArrayAccess │
├──────────────────┤ ├──────────────────┤ ├──────────────────┤
│ name: 'PLC_PRG' │ │ name: 'a' │ │ │
│ ID: 2 │ │ ID: 3 │ │ ID: 4 │
└──────────────────┘ └──────────────────┘ └─────┬──────┬─────┘
│ │
reference │ │ access
┌────────┘ └─────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Reference │ │ LiteralInteger │
├──────────────────┤ ├──────────────────┤
│ name: 'b' │ │ value: '2' │
│ ID: 5 │ │ ID: 6 │
└──────────────────┘ └──────────────────┘
┌────────────────────────────┐
│ Value │
┌───►├────────────────────────────┤
│ │ resulting_type: INT │
│ │ │
│ └────────────────────────────┘
│
│ ┌────────────────────────────┐
┌───────────────────┐ │ │ Program │
│ AnnotationMap │ │ ┌─►├────────────────────────────┤
│ │ │ │ │ qualified_name: PLC_PRG │
├───────┬───────────┤ │ │ │ │
│ ID: 1 │ Value ├───┘ │ └────────────────────────────┘
├───────┼───────────┤ │
│ ID: 2 │ Program ├─────┘ ┌────────────────────────────┐
├───────┼───────────┤ │ Variable │
│ ID: 3 │ Variable ├───────►├────────────────────────────┤
├───────┼───────────┤ │ resulting_type: MyStruct │
│ ID: 4 │ Value ├─────┐ │ qualified_name: PLC_PRG.a │
├───────┼───────────┤ │ └────────────────────────────┘
│ ID: 5 │ Variable ├───┐ │
├───────┼───────────┤ │ │ ┌────────────────────────────┐
│ ID: 6 │ Value ├─┐ │ │ │ Value │
└───────┴───────────┘ │ │ └─►├────────────────────────────┤
│ │ │ resulting_type: INT │
│ │ │ │
│ │ └────────────────────────────┘
│ │
│ │ ┌─────────────────────────────────┐
│ │ │ Variable │
│ └───►├─────────────────────────────────┤
│ │ resulting_type : ARRAY[] OF INT │
│ │ qualified_name : MyStruct.b │
│ └─────────────────────────────────┘
│
│ ┌────────────────────────────┐
│ │ Value │
│ ├────────────────────────────┤
└─────►│ resulting_type: DINT │
│ │
└────────────────────────────┘
Type vs. Type-Hint
The AnnotationMap not only offers annotations regarding the AST-node's type, but it also offers a second type of annotation.
Consider the following snippet:
PROGRAM PLC_PRG
VAR
x : SINT;
y : INT;
z : BYTE;
END_VAR
z := x + y;
END_PROGRAM
The assignment z := x + y
is loaded with different types:
x
is annotated as Variable of type SINT and will be auto-upgraded to DINT.y
is annotated as Variable of type INT and will be auto-upgraded to DINT.z
is annotated as Variable of type BYTE.x + y
is annotated as Value of type DINT (the bigger of both).
In order to make life easier for validation and code-generation we add an additional annotation to x + y
to indicate, that while it technically results in a DINT, it should rather be treated as a BYTE since it is going to be assigned to z
.
This second annotation is called the type-hint. It indicates that while it technically is not the real type of this expression, the program's semantic wants the compiler to treat it as this type.
The expression z := x + y
is annotated like this:
expression | type annotation | type-hint annotation | explanation |
---|---|---|---|
x | SINT | DINT | auto-upgraded to DINT |
y | INT | DINT | auto-upgraded to DINT |
z | BYTE | - | |
x + y | DINT | BYTE | type-hint indicates that the resulting DINT needs to be cast to BYTE |
With the help of the type-hint annotations the validation can decide whether certain type-cast operations are valid very easily. The code-generation steps can easily decide when to generate casts, by simply comparing a node's type annotation and it's type-hint annotation.
Dependencies
When generating multiple units, the Linker will keep track of a dependency-tree for the unit. This means that every datatype or global variable referenced directly or indirectly by the module will be marked as a dependency. This information can then be used during the codegen period to only generated types and variables that are relevant to the unit.
Validation
The validation module implements the semantic validation step of the compiler. The validator is a hand-written visitor that offers a callback when visiting the single AST-nodes to then perform the different validation tasks.
The validation rules are implemented in dedicated validator-structs:
Validator | Responsibilities |
---|---|
global_validator | Semantic rules on the level of declarations as a whole (e.g. name-conflicts) |
pou_validator | Semantic rules on the level of programs, function- and function-blocks. |
recursive_validator | Semantic rules on the level of recursion (e.g. struct referencing itself) |
stmt_validator | Semantic rules on the level of statements (e.g. invalid type-casts). |
variable_validator | Semantic rules on the level of variable declarations (e.g. empty var-blocks, empty structs, etc.). |
Diagnostics
Problems (semantic or syntactic) are represented as Diagnostics 1. Diagnostics carry information on the exact location inside the source-string (start- & end-offset), a custom message and a unique error-number to identify the problem.
There are 3 types of Diagnostics:
Diagnostic | Description |
---|---|
SyntaxError | A syntax error is a diagnostic that is created by the parser if it discovers a token-stream that does not match the language's grammar. |
GeneralError | General errors are problems that occured during the compilation process, that cannot be linked to a malformed input (e.g. file-I/O problems, internal LLVM errors, etc.). |
Improvement | Problems that may not prevent successful compilation but are still considered a flaw in the source-code. (e.g. use proprietary POINTER TO instead of the norm-compliant REF_TO). |
:(i): The diagnostics are subject to change since they don't elegantly represent the different types of problems (e.g. semantic problems).
Code-Generation
The codegen module contains all code that turns the parsed and verified code represented as an AST into llvm-ir code. To generate the IR we use a crate that wraps the native llvm C-API.
The code-generator is basically a transformation from the ST-AST into an IR-Tree representation. Therefore the AST is traversed in a visitor-like way and transformed simultaneously.
The code generation is split into specialized sub-generators for different tasks:
Generator | Responsibilities |
---|---|
pou_generator | The pou-generator takes care of generating the programming organization units (Programs, FunctionBlocks, Functions) including their signature and body. More specialized tasks are delegated to other generators. |
data_type_generator | Generates complex datatypes like Structs, Arrays, Enums, Strings, etc. |
variable_generator | Generates global variables and their initialization. |
statement_generator | Generates everything of the body of a POU except expressions. Non-expressions include: IFs, Loops, Assignments, etc. |
expression_generator | Generates expressions (everything that possibly resolves to a value) including: call-statements, references, array-access, etc. |
Generating POUs
Generating POUs (Programs, Function-Blocks, Functions) must generate the POU's body itself, as well as the POU's interface (or state) variables. In this segment we focus on generating the interface for a POU. Further information about generating a POU's body can be found [here].
Programs
A program is static POU with some code attached. This means that there is exactly one instance. So wherever from it is called, every caller uses the exact same instance which means that you may see the residuals of the laster caller in the program's variables when you call it yourself.
PROGRAM prg
VAR
x : DINT;
y : DINT;
END_VAR
END_PROGRAM
The program's interface is persistent across calls, so we store it in a global variable.
Therefore the code-generator creates a dedicated struct-type called prg_interface
.
A global variable called prg_instance
is generated to store the program's state across calls.
This global instance variable is passed as a this
pointer to calls to the prg
function.
%prg_interface = type { i32, i32 }
@prg_instance = global %prg_interface zeroinitializer
define void @prg(%prg_interface* %this) {
entry:
ret void
}
FunctionBlocks
A FunctionBlock is an POU that is instantiated in a declaration. So in contrast to Programs, a FunctionBlock can have multiple instances. Nevertheless the code-generator uses a very similar strategy. A struct-type for the FunctionBlock's interface is created but no global instance-variable is allocated. Instead the function block can be used as a DataType to declare instances like in the following example:
FUNCTION_BLOCK foo
VAR_INPUT
x, y : INT;
END_VAR
END_FUNCTION_BLOCK
PROGRAM prg
VAR
f : foo;
END_VAR
END_PROGRAM
So for the given example, we see the code-generator creating a type for the FunctionBlock's state (foo_interface
).
The declared instance of foo, in prg's
interface is seen in the program's generated interface struct-type (prg_interface
).
; ModuleID = 'main'
source_filename = "main"
%prg_interface = type { %foo_interface }
%foo_interface = type { i16, i16 }
@prg_instance = global %prg_interface zeroinitializer
define void @foo(%foo_interface* %0) {
entry:
ret void
}
define void @prg(%prg_interface* %0) {
entry:
ret void
}
Functions
Functions generate very similar to programs and function_blocks. The main difference is, that no instance-global is allocated and the function's interface-type cannot be used as a datatype to declare your own instances. Instances of the program's interface-type are allocated whenever the function is called for the lifetime of a single call. Otherwise the code generated for functions is comparable to the code presented above for programs and function-blocks.
Generating Data Types
IEC61131-3 languages offer a wide range of data types. Next to the built-in intrinsic data types, we support following user defined data types:
Range Types
For range types we don't generate special code. Internally the new data type just becomes an alias for the derived type.
Pointer Types
For pointer types we don't generate special code. Internally the new data type just becomes an alias for the pointer-type.
Struct Types
Struct types translate direclty to llvm struct datatypes. We generate a new datatype with the user-type's name for the struct.
TYPE MyStruct:
STRUCT
a: DINT;
b: INT;
END_STRUCT
END_TYPE
This struct simply generates a llvm struct type:
%MyStruct = type { i32, i16 }
Enum Types
Enumerations are represented as DINT
.
TYPE MyEnum: (red, yellow, green);
END_TYPE
For every enum's element we generate a global variable with the element's value.
@red = global i32 0
@yellow = global i32 1
@green = global i32 2
Array Types
Array types are generated as fixed sized llvm vector types - note that Array types must be fixed sized in ST :
TYPE MyArray: ARRAY[0..9] OF INT;
END_TYPE
VAR_GLOBAL
x : MyArray;
y : ARRAY[0..5] OF REAL;
END_VAR
Custom array data types are not reflected as dedicated types on the llvm-level.
@x = global [10 x i16] zeroinitializer
@y = global [6 x float] zeroinitializer
Multi dimensional arrays
Arrays can be declared as multi-dimensional:
VAR_GLOBAL
x : ARRAY[0..5, 2..5, 0..1] OF INT;
END_VAR
The compiler will flatten these type of arrays to a single-dimension. To accomplish that, it calculates the total length by mulitplying the sizes of all dimensions:
0..5 x 2..5 x 0..1
6 x 4 x 2 = 64
So the array x : ARRAY[0..5, 2..5, 0..1] OF INT;
will be generated as:
@x = global [64 x i16] zeroinitializer
This means that such a multidimensional array must be initialized like a single-dimensional array:
- wrong
VAR_GLOBAL
wrong_array : ARRAY[1..2, 0..3] OF INT := [ [10, 11, 12],
[20, 21, 22],
[30, 31, 32]];
END_VAR
- correct
VAR_GLOBAL
correct_array : ARRAY[1..2, 0..3] OF INT := [ 10, 11, 12,
20, 21, 22,
30, 31, 32];
END_VAR
Nested Arrays
Note that arrays declared as
x : ARRAY[0..2] OF ARRAY[0..2] OF INT
are different from mutli-dimensional arrays discussed in this section. Nested arrays are represented as multi-dimensional arrays on the LLVM-IR level and must also be initialized using nested array-literals!
String Types
String types are generated as fixed sized vector types.
VAR_GLOBAL
str : STRING[20];
wstr : WSTRING[20];
END_VAR
Strings can be represented in two different encodings: UTF-8 (STRING) or UTF-16 (WSTRING).
@str = global [21 x i8] zeroinitializer
@wstr = global [21 x i16] zeroinitializer
CFC (Continous Function Chart)
RuSTy is compatible with CFC, as per the FBD part detailed in the IEC61131-3 XML-exchange format. The CFC implementation borrows extensively from the ST compiler-pipeline, with the exception that the lexical analysis and parsing phases are replaced by a model-to-model conversion process. This involves converting the XML into a structured model, which is then converted into ST AST statements.
The next chapter will walk you through the CFC implementation, giving you a better understanding of underlying code.
Model-to-Model Conversion
As previously mentioned, the lexical and parsing phases are replaced by a model-to-model conversion process which consists of two steps:
- Transform the input file (XML) into a data-model
- Transform the data-model into an AST
XML to Data-Model
Consider the heavily minified CFC file MyProgram.cfc
, which translates to the CFC chart below.
x MyAdd
┌─────────────┐ ┌─────────────────┐
│ │ │ exec_id:0 │
│ ├───────►│ a │ z
│ local_id: 0 │ │ ref_local_id: 0 │ ┌──────────────┐
└─────────────┘ │ │ │ exec_id: 1 │
y │ ├─────────►│ │
┌─────────────┐ │ │ │ref_local_id:2│
│ │ │ │ └──────────────┘
│ ├───────►│ b │ local_id: 3
│ local_id:1 │ │ ref_local_id: 1 │
└─────────────┘ └─────────────────┘
local_id: 2
The initial phase of the transformation process involves streaming the entire input file.
During the streaming process, whenever important keywords such as block
are encountered, they are directly mapped into a corresponding model structure.
For example, when reaching the line <block localId="3" ...>
within the XML file, we generate a model that can be represented as follows:
struct Block {
localId: 2,
type_name: "MyAdd",
instance_name: None,
execution_order_id: 0,
variables: [
InputVariable { ... }, // x, with localId = 0
InputVariable { ... }, // y, with localId = 1
OutputVariable { ... }, // MyAdd eventually becoming `z := MyAdd`, with z having a localId = 2
]
}
This process is repeated for every element in the input file which has a corresponding model implementation. For more information on implementation details, see the model folder.
Since the CFC programming language utilizes blocks and their interconnections to establish the program's logic flow,
with the sequencing of block execution and inter-block links represented through corresponding localId
, refLocalId
and excutionOrderId
,
we have to order each element by their execution ID before proceeding to the next phase.
Otherwise the generated AST statements would be out of order and hence semantically incorrect.
Data-Model to AST
The final part of the model-to-model transformation takes the input from the previous step and transforms it into an AST which the compiler pipeline understands and can generate code from.
Consider the previous block
example - the transformer first encounters the element with the executionOrderId
of 0, which is a call to myAdd
.
We then check and transform each parameter, input a
and b
corresponding to the variables x
and y
respectively. The result of this transformation looks as follows:
CallStatement {
operator: myAdd,
parameters: [x, y]
}
Next, we process the element with an executionOrderId
of 1, which corresponds to an assignment of the previous call's result to z. This update modifies the generated AST as follows:
AssignmentStatement {
left: z,
right: CallStatement {
operator: myAdd,
parameters: [x, y]
}
}
While this explanation covers the handling of blocks and variables, there are other elements (e.g. control-flow), that are not discussed here. For more information on implementation details, see plc_xml/src/xml_parser
.
Finally, after transforming all elements into their respective AST statements, the result is passed to the indexer and subsequently enters the next stages of the compiler pipeline, as described in the architecture documentation).
Appendix
MyAdd.st
FUNCTION MyAdd : DINT
VAR_INPUT
x, y : DINT;
END_VAR
MyAdd := x + y;
END_FUNCTION
MyProgram.cfc
<pou xmlns="http://www.plcopen.org/xml/tc6_0201" name="myProgram" pouType="program">
<content>
PROGRAM myProgram
VAR
x, y, z : DINT;
END_VAR
</content>
<body>
<FBD>
<inVariable localId="1" height="20" width="80" negated="false">
<expression>x</expression>
</inVariable>
<inVariable localId="2" height="20" width="80" negated="false">
<expression>y</expression>
</inVariable>
<block localId="3" width="74" height="60" typeName="MyAdd" executionOrderId="0">
<inputVariables>
<variable formalParameter="x" negated="false">
<connectionPointIn>
<connection refLocalId="1"/>
</connectionPointIn>
</variable>
<variable formalParameter="y" negated="false">
<connectionPointIn>
<connection refLocalId="2"/>
</connectionPointIn>
</variable>
</inputVariables>
<outputVariables>
</variable formalParameter="MyAdd" negated="false">
</outputVariables>
</block>
<outVariable localId="4" height="20" width="80" executionOrderId="1" negated="false" storage="none">
<position x="680" y="160"/>
<connectionPointIn>
<connection refLocalId="3" formalParameter="MyAdd"/>
</connectionPointIn>
<expression>z</expression>
</outVariable>
</FBD>
</body>
</pou>