RuSTy
RuSTy is a structured text (ST) compiler written in Rust and based on the LLVM compiler backend. We use the logos crate library to perform lexical analysis before the custom parser runs. RuSTy puts out static or shared objects as well as LLVM IR or bitcode by the flip of a command line flag. We are aiming towards an open-source industry-grade ST compiler supporting at least the features in 2nd edition IEC 61131 standard.
You might also want to refer to the API documentation.
Supported Language Concepts
POUs
- ✔ Program
- ✔ Function
- ✔ FunctionBlock
- ✔ Action
Datatypes
- ✔ IEC 61131-3 numeric types
- ✔ Strings
- ✔ Wide Strings
- ✔ Struct types
- ✔ Enum types
- ✔ Array data types
- ✔ Alias types
- ✔ Sub-ranges types
- ✔ Date and Time types
- ✔ Sized String types
- ✔ Sized Wide String types
- ✔ Initial values
Declarations
- ✔ VAR
- ✔ VAR_INPUT
- ✔ VAR_INPUT {ref}
- ✔ VAR_OUTPUT
- ✔ VAR_IN_OUT
Statements
- ✔ Assignments
- ✔ Call statements
- ✔ Implicit call arguments
- ✔ Explicit call arguments
- ✔ EXIT, CONTINUE statements
Control Structures
- ✔ IF Statement
- ✔ CASE Statement
- ✔ FOR Loops
- ✔ WHILE Loops
- ✔ REPEAT Loops
- ✔ RETURN statement
Expressions
- ✔ Arithmetic Operators
- ✔ Relational Operators
- ✔ Logical Operators
- ✔ Bitwise Operators
Build & Install
RuSTys code can be found on GitHub.
By default a Dockerfile
and a devcontainer.json
file are provided. If you wish to develop natively
however, you will need some additional dependencies namely:
- Rust
- LLVM 14
- LLVM Polly
- Build Tools (e.g.
build-essential
on Ubuntu) - zlib
The next sections cover how to install these dependencies on different platforms, if you already have them
however, RuSTy can be build using the cargo
command. For debug builds this can be accomplished by executing
cargo build
and for release builds (smaller & faster) you would execute cargo build --release
. The
resulting binaries can be found at target/debug/plc
and target/release/plc
respectively.
Ubuntu
The specified dependencies can be installed with the following command on Ubuntu:
sudo apt install \
build-essential \
llvm-14-dev liblld-14-dev \
libz-dev \
lld \
libclang-common-14-dev
Additionally you might need libffi7
, which can be installed with sudo apt install libffi7
.
Debian
Same as Ubuntu with the exception of adding additional repository sources since Debian 11 only includes LLVM packages up to version 11. To do so follow the official documentation.
MacOS
On MacOS you need to install the Xcode Command Line Tools
.
Furthermore LLVM 14 is needed, which can be easily installed with homebrew :
brew install llvm@14
After the installation you have to add /opt/homebrew/opt/llvm@14/bin
to your $PATH
environment variable, e.g. with the following command:
echo 'export PATH="/opt/homebrew/opt/llvm@14/bin:$PATH"' >> ~/.zshrc
Windows
Compiling RuSTy on Windows requires three dependencies:
- Windows 10 SDK
- MSVC (at the point of writing this we tested it on v142 - VS 2019 C++ x64/x86 build tools)
- LLVM 14.0.6
The first two dependencies are typically installed during the Rust installation itself. More specifically during the
installation you should have been prompted to install them. If not, you'll be able to install them via Visual Studio at any point.
The third dependency is based on a custom build which is hosted on GitHub.
Download it, extract it and add the bin/
directory to your environment variables.
In theory this should cover everything to be able to compile RuSTy (with some reboots here and there).
Installing
TODO
Troubleshooting
- Because of weak compatibility guarantees of the LLVM API, the LLVM installation must exactly match the
major version of the
llvm-sys
crate.Currently you will need to install LLVM 14 to satisfy this constraint. Read more - To avoid installation conflicts on Linux/Ubuntu, make sure you don't have a default installation available
(like you get by just installing
llvm-dev
), which may break things. If you do, make sure you have set the appropriate environment variable (LLVM_SYS_140_PREFIX=/usr/lib/llvm-14
for LLVM 14), so the build of thellvm-sys
crate knows what files to grab.
Using RuSTy
The RuSTy compiler binary is called
plc
plc
offers a comprehensive help via the -h
(--help
) option.
plc
takes one output-format parameter and any number of input-files.
The input files can also be written as glob patterns.
plc [OPTIONS] <input-files>... <--ir|--shared|--pic|--static|--bc>
Note that you can only specify at most one output format.
In the case that no output format switch has been specified, the compiler will select --static
by default.
Similarly, if you do not specify an output filename via the -o
or --output
options,
the output filename will consist of the first input filename, but with an appropriate
file extension depending on the output file format.
A minimal invocation looks like this:
plc input.st
this will take in the file input.st
and compile it into a static object that will be written to a file named input.o
.
More examples:
plc --ir file1.st file2.st
will compile file1.st and file2.st.plc --ir file1.cfc file2.st
will compile file1.cfc and file2.st.plc --ir src/*.st
will compile all ST files in the src-folder.plc --ir "**/*.st"
will compile all ST-files in the current folder and its subfolders recursively.
Example: Building a hello world program
Writing the code
We want to print something to the terminal, so we're going to declare external functions for that.
This example is available under examples/hello_world.st
in the main RuSTy repository.
main
is our entry point to the program.- To link the program, we are going to use the system's linker using the
--linker=cc
argument. - On Windows and MacOS, replace this with
--linker=clang
as cc is usually not available.
{external}
FUNCTION puts : DINT
VAR_INPUT {ref}
text : STRING;
END_VAR
END_FUNCTION
FUNCTION main : DINT
puts('hello, world!$N');
END_FUNCTION
Compiling with RuSTy
The RuSTy command line interface is similar to that of other compilers.
If you just want to build an object file, then do this:
plc -c hello_world.st -o hello_world.o
Optimization
plc
offers 4 levels of optimization which correspond to the levels established by llvm respectively clang (none
to aggressive
, respectively -O0
to -O3
).
To use an optimization, the flag -O
or --optimization
is required:
plc -c "**/*.st" -O none
plc -c "**/*.st" -O less
plc -c "**/*.st" -O default
plc -c "**/*.st" -O aggressive
By default plc
will use default
which corresponds to clang's -O2
.
Linking an executable
Instead, you can also compile this into an executable and run it:
plc hello_world.st -o hello_world --linker=cc
./hello_world
Please note that RuSTy will attempt to link the generated object file by default to generate an executable if you didn't specify something else (option -c
).
- The
--linker=cc
flag tells RuSTy that it should link with the system's compiler driver instead of the built in linker. This provides support to create executables. - Additional libraries can be linked using the
-l
flag, additional library paths can be added with-L
- You add library search paths by providing additional
-L /path/...
options. By default, this will be the current directory. - The linker will prefer a dynamically linked library if available, and revert to a static one otherwise.
Building for separate targets
RuSTy supports building for multiple targets by specifing the --target
and optionally the --sysroot
command.
- Multiple targets and sysroot can be specified for the compilation simply by adding additional
--target
and--sysroot
entries.
--target
To build and compile structured text for the rigth platform we need to specify the target
.
As RuSTy is using LLVM a target-tripple supported by LLVM needs to be selected.
The default target
is the host machine's target.
So if a dev container on an x86_64-docker
is used the target is x86_64-linux-gnu
.
--sysroot
plc
use the sysroot
option for linking purposes.
It is considered to be the root directory for the purpose of locating headers and libraries.
- If a target and sysroot are provided, the output will always be stored in a folder with the target name (e.g. an
x86_64-linux-gnu
target will have the output strored in a folder calledx86_64-linux-gnu
) --sysroot
parameters have to always match target parameters, there can be nosysroot
without a target.
Parallel Compilation
By default, plc
uses parallel compilation.
This option can be controlled with the -j
or --threads
flag. A value above 0
will indicate the number of threads to use for the compilation
Leaving the value unset, setting it to 0
or simply specifying -j
sets the value to the maximum threads that can run for the current machine.
This is determined by the underlying parallelisation library Rayon
Single module Compilation
With the introducton of parallel compilation, every unit is compiled into an object file independently and then linked together in a single module.
This behaviour might not always be desired and can be disabled using the --single-module
flag.
Note that the single module flag is currently much slower to produce as it requires first generating all modules and then merging them together.
Configuration Options
plc
supports different configuration options, these can be printed using the config
subcommand
config schema
Outputs the json schema used for the validation of the plc.json
file
config diagnostics
Ouputs a json file with the default error severity configuration for the project. See Error Configuration for more information.
Build Configuration
In addition to the comprehensive help, plc
offers a build subcommand that simplifies the build process.
Instead of having numerous inline arguments, using the build subcommand along with a build description file makes passing the arguments easier.
The build description file needs to be saved in the json format.
Usage:
plc build
Note that if plc
cannot find the plc.json
file, it will throw an error and request the path.
The default location for the build file is the current directory.
The command for building with an additional path looks like this:
plc build src/plc.json
Build description file (plc.json)
For the build description file to work, it must be written in the json format. All the keys used in the build description file are described in the following sections.
files
The keyword files
is the equivalent to the input
parameter, which adds all the ST
files that need to be compiled.
The value of files
is an array of strings, definied as follows:
"files" : [
"examples/hello_world.st",
"examples/hw.st"
"examples/*.gvl"
]
libraries
To link several objects into one executable plc
has the option to add libraries and automatically build and link them together.
The libraries
keyword is optional.
"libraries" : [
{
"name" : "iec61131std",
"path" : "path/to/lib/",
"package" : "Copy",
"include_path" : [
"examples/hw.st",
"examples/hello_world.st"
]
}
]
output
Similarly to specifying an output file via the -o
or --output
option using the command line, in the build file we use "output" : "output.so"
to define the output file. The default location is the current build directory. (see Build Location).
compile_type
The following options can be used for the compile_type
:
Static
specifies that linking/binding must be done at compile time.Shared
(dynamic) specifies that linking/bingind must be done dynamically (at runtime).PIC
Position Independent Code (Choosing this option implies that the linking will be done dynamically).Relocatable
generates relocatable object code (for combining with other object code).Bitcode
adds bitcode alongside machine code in executable file.IR
intermediatellvm
representation.
The compile format is specified in the build description file as follows: "compile_type" : "Shared"
.
The compile_type
keyword is optional.
package_commands
The package_commands
keyword is optional.
TODO
Example
{
"files" : [
"examples/hw.st",
"examples/hello_world.st",
"examples/ExternalFunctions.st",
"examples/*.dt"
],
"compile_type" : "Shared",
"output" : "proj.so",
"libraries" : [
{
"name" : "iec61131std",
"path" : "path/to/lib",
"package" : "Copy",
"include_path" : [
"examples/lib.st"
]
},
{
"name" : "other_lib",
"path" : "path/to/lib",
"package" : "System",
"include_path" : [
"examples/hello_world.st"
]
}
]
}
Build Parameters
The build
subcommand exposes the following optional parameters:
--build-location
The build location is the location all build files will be copied to.
By default the build location is the build
folder in the root of the project (the location of the plc.json
).
This can be overriden with the --build-location
command line parameter.
--lib-location
The lib location is where all libraries marked with Copy
will be copied.
By default it is the same as the build-location
.
This can be overriden with the --lib-location
command line parameter.
Environment Variables
Environment variables can be used inside the build description file, the variables are evaluated before an entry is evaluated.
In addition to externally defined variables, the build exports variables that can be referenced in the description file:
PROJECT_ROOT
The folder containing the plc.json
file, i.e. the root of the project.
ARCH
The target architecture currently being built, for a multi architecture build.
The value for ARCH
will be updated for every target.
Example targets are:
x86_64-pc-linux-gnu
, x86_64-pc-windows-msvc
, aarch64-pc-linux-musl
BUILD_LOCATION
BUILD_LOCATION
is the folder where the build will be saved.
This is the value of either the --build-location
parameter or the default build location.
LIB_LOCATION
LIB_LOCATION
is the folder where the lib will be saved.
This is the value of either the --lib-location
parameter or the build location.
Usage
To reference an environment variable in the description file, reference the variables with a preceding $
.
Example:
{
"name" : "mylib",
"path" : "$ARCH/lib",
"package" : "System",
"include_path" : [
"examples/hello_world.st"
]
}
Validation
The build description file uses a Json Schema file located at compiler/plc_project/schema/plc-json.schema
to validate the build description before build.
In order for the schema to be used, it has to be either in that location for source builds or copied next to the build binaries.
If the schema is not found, the schema based validation will be skipped.
Error Configuration
Errors in a plc
project can be configured by providing a json configuration file.
A diagnostics severity can be changed for example from warning
to error
or info
and vice-versa or ignore
d completely.
To see a default error configuration use plc config diagnostics
.
To provide a custom error configuration use plc --error-config <custom.json>
.
Note that the --error-config
command can be used with all subcommands such as build
and check
.
Running plc config diagnostics --error-config <custom.json>
will print out the full diagnostics configuration taking the provided overrides into account.
Error Description
Errors produced by plc
can be explained using the plc explain <ErrorCode>
command.
Error codes are usually provided in the diagnostic report.
General Error
This error is a catch all error. It is usually thrown when no other error better matches the case.
General IO Error
This error describes a problem during an IO operation such as reading or writing a file. It is usually accompanied by an internal error with further details.
Parameter Error
This error describes a problem with the command parameters, such as a file required for the compilation not being found.:
Duplicate Symbol
The marked symbol has been defined multiple times.
Generic LLVM Error
An unexpected error occurred during the LLVM generation phase. This is usually a follow up problem from a different diagnostics. If it occurrs without a previous diagnostics please file a bug report.
Missing Token
During the parsing phase, an additional Token (Element) was required to correctly interpret the code. The error message usually indicates what Token was missing.
Example
In the following example the name (Identifier) of the program is missing.
PROGRAM (*name*)
END_PROGRAM
error: Unexpected token: expected Identifier but found END_PROGRAM
┌─ example.st:2:1
│
2 │ END_PROGRAM
│ ^^^^^^^^^^^ Unexpected token: expected Identifier but found END_PROGRAM
Unexpected Token
During parsing, a Token (Element) was encountered in the wrong location. This could be an indication of a missused or misspelled keyword
Invalid Range
Mismatched Parantheses
Invalid time literal
Invalid Number
Missing Case Contition
Keywords should contain Underscores
Wrong paranthese for String delimiter
POINTER_TO is no standard keyword
Return types cannot have a default value
Classes cannot contain implementation
Duplicate Label
Classes cannot contain IN_OUT variables
Classes cannot contain a return type
POUs cannot be extended
Missing container name for action
Statement has no effect
Invalid Pragma Location
Missing return type
Unexpected return type
Unsupported return type
Empty variable block
Recursive data structure
Missing IN_OUT parameters
Invalid parameter type
Invalid number of arguments
An invalid number of arguments was passed to a POU. For example
FUNCTION foo
(* ... *)
END_FUNCTION
FUNCTION main : DINT
foo('bar'); // Error, foo isn't expecting any arguments
END_FUNCTION
Note that for FUNCTION
s the argument count must match with the parameter list and can be bigger if a variadic
parameter is present. For stateful POUs variadic parameters are not supported, thus the argument count must be equal
or less than the parameter list depending on whether optional arguments such as VAR_INPUT
or VAR_OUTPUT
were
passed or not.
Unresolved Constant
Invalid constant block
Invalid Constant
Cannot assign to constant
Invalid assignment
Missing type
Variable Overflow
Invalid Enum Variant
This error indicates the right-hand side in an enum assignment is invalid.
For example an enum such as TYPE Color : (red := 0, green := 1, blue := 2); END_TYPE
can only take values
which (internally) yield a literal integer 0, 1 or 2.
Invalid variable initializer
Assignment to Reference
Invalid array assignment
Invalid POU for VLA
Invalid VLA array access
VLA Dimension out of bounds
VLAs are always By Reference
Unresolved Reference
Illegal reference access
Expression is not assignable
Typecast error
Unknown type
Use of undeclared type-identifier.
Literal out of range
Literal not compatible with type
Incompatible direct access
Incompatible variable for direct access
Invalid range for direct access
Invalid range for array access
Invalid variable for array access
Direct access to variable with %
Expected literal
Invalid Nature
Unknown Nature
Unresolved Generic
Incompatible size
Invalid operation
Implicit typecast
Pointer derefernce to non pointer
Array access to non array value
Address-of requires a value
General codegen error
Missing function
Missing compare function
Cannot generate string literal
Initial values were not generated
General debug error
Generic linker error
Duplicate case condition
Case condition outside of a case statement
Invalid case condition
Empty control statement
Undefined node
Unexpected node
Unconnected source
Cyclic connection
No associated connector
Unnamed control
Invalid PLC Json file
Invalid Call parameters
Incompatible reference assingment
Unsafe Enum Assignment
At runtime there is no way to guarantee that a non-const reference will not change its value to something out-of-bounds for enums. For example consider the following
PROGRAM main
VAR
zero : DINT := 0;
color : (red := 0, green := 1, blue := 2);
END_VAR
zero := 10;
color := zero; // Invalid because `color` accepts values from 0 to 2, but we assigned 10 to it
END_PROGRAM
Equivalent enum value used
This message indicates that the assigned enum value is not part of the enum, but is equivalent to one of the internal values of the enum.
Example:
TYPE Colors : (Red, Green, Blue, Yellow) END_TYPE
TYPE Directions : (N, S, W, E) END_TYPE
VAR_GLOBAL
col : Colors := N; //N is equivalent to Red but is not part of the enum
dir : Directions := Red; //Red is equivalent to N but is not part of the enum
END_VAR
To solve the issue, use the equivalent value indicated by the enum
Return Value Of Void Functions
Functions of type VOID can not have an explicit return value, e.g. foo := 1
in the following example is invalid.
FUNCTION foo
foo := 1;
END_FUNCTION
Choose a type for your function, if a value must be returned.
Invalid Conditional Value
Control statements such as IF
, FOR
and WHILE
require specific types for their condition.
If, While
IF
and WHILE
statements require an expression which yields a boolean, any other type is invalid and will trigger an
error.
For
FOR
statements require four conditional values: a counter
, a start
value, an end
value and a step
value. All
of these need to be integers and share the same type.
FOR start := counter TO end BY step DO
// ...
END_FOR
Action call without parentheses
Integer Condition
This error is generated because an integer was used in a IF
or WHILE
statement, when a boolean was expected.
See also plc explain E094
Invalid Array Range
Ranges such as ARRAY [0..-1]
are invalid in ST because end values of ranges must be greater than their start values.
A valid range for the given statement would have been ARRAY[-1..0]
.
Invalid REF= assignment
REF=
assignments are considered valid if the left-hand side of the assignment is a pointer variable
and the right-hand side is a variable of the type that is being referenced.
For example assignments such as the following are invalid
VAR
foo : DINT;
bar : DINT;
qux : SINT;
refFoo : REFERENCE TO DINT;
END_VAR
refFoo REF= 5; // `5` is not a variable
foo REF= bar; // `foo` is not a pointer
refFoo REF= qux; // `refFoo` and `qux` have different types, DINT vs SINT
Invalid REFERENCE TO
declaration
REFERENCE TO
variable declarations are considered valid if the referenced type is not of the following form
foo : REFERENCE TO REFERENCE TO (* ... *)
foo : ARRAY[...] OF REFERENCE TO (* ... *)
foo : REF_TO REFERENCE TO (* ... *)
Immutable Variable Address
Alias variables are immutable with regards to their pointer address, thus re-assigning an address will return an error. For example the following code will not compile
FUNCTION main
VAR
foo AT bar : DINT;
bar : DINT;
baz : DINT;
END_VAR
foo := baz; // Valid, because we are changing the pointers dereferenced value
foo REF= baz; // Invalid, `foo` is immutable with regards to it's pointer address
END_FUNCTION
Template variable does not exist
A variable was configured in a VAR_CONFIG
block, but the variable can not be found in the code.
Erroneous code example:
VAR_CONFIG
main.foo.bar AT %IX1.0 : BOOL;
END_VAR
PROGRAM main
VAR
foo : foo_fb;
END_VAR
END_PROGRAM
FUNCTION_BLOCK foo_fb
VAR
qux AT %I* : BOOL;
END_VAR
END_FUNCTION_BLOCK
In this example a variable named bar
is configured, however the function block foo_fb
does not contain
a bar
variable. The could should have been main.foo.qux AT %IX1.0 : BOOL
instead for it to be valid.
Template variable without hardware binding
A template variable must contain a hardware binding.
Erroneous code example:
VAR_CONFIG
main.foo.bar AT %IX1.0 : BOOL;
END_VAR
PROGRAM main
VAR
foo : foo_fb;
END_VAR
END_PROGRAM
FUNCTION_BLOCK foo_fb
VAR
bar : BOOL;
END_VAR
END_FUNCTION_BLOCK
In this example the VAR_CONFIG
block declares the bar
variable inside foo_fb
as a
template variable. However bar
does not have a hardware binding. For the example to be
considered valid, bar
should have been declared as e.g. bar AT %I* : BOOL
.
Immutable Hardware Binding
Variables configured in a VAR_CONFIG
block can not override their hardware binding.
Erroneous code example:
VAR_CONFIG
main.foo.bar AT %IX1.0 : BOOL;
END_VAR
PROGRAM main
VAR
foo : foo_fb;
END_VAR
END_PROGRAM
FUNCTION_BLOCK foo_fb
VAR
bar AT IX1.5: BOOL;
END_VAR
END_FUNCTION_BLOCK
In this example the VAR_CONFIG
block configures bar
to have a hardware adress IX1.0
.
However, at the same time the bar
inside the POU foo_fb
assigns a hardware address IX1.5
.
For the code to be considered valid, bar
should have been declared as bar AT %I* : BOOL
.
Config Variable With Incomplete Address
Variables defined in a VAR_CONFIG
block, i.e. config variables, must specify a complete address.
Erroneous code example:
VAR_CONFIG
main.foo.bar AT %I* : BOOL;
END_VAR
In this example main.foo.bar
has specified a placeholder hardware address.
For the example to be considered valid, a specific address such as %IX1.0
should have been declared.
CONSTANT keyword in POU
The CONSTANT
keyword is not allowed for POU declarations, only variables can be CONSTANT
Erroneous code example:
FUNCTION FOO : BOOL CONSTANT
VAR_INPUT
END_VAR
// ...
END_FUNCTION
VAR_EXTERNAL blocks have no effect
Variables declared in a VAR_EXTERNAL
block are currently ignored and the referenced globals will be used instead.
Example:
VAR_GLOBAL
myArray : ARRAY [0..10] OF INT;
myString: STRING;
END_VAR
FUNCTION main
VAR_EXTERNAL CONSTANT
myArray : ARRAY [0..10] OF INT;
END_VAR
myArray[5] := 42;
myString := 'Hello, world!';
END_FUNCTION
In this example, even though arr
is declared as VAR_EXTERNAL CONSTANT
, the CONSTANT
constraint will be ignored and
the global myArray
will be mutated. The global myString
can be read from and written to from within main
even though it
is not declared in a VAR_EXTERNAL
block.
Missing configuration for template variable
A template variable was left unconfigured.
Erroneous code example:
VAR_CONFIG
main.foo.bar AT %IX1.0 : BOOL;
END_VAR
PROGRAM main
VAR
foo : foo_fb;
END_VAR
END_PROGRAM
FUNCTION_BLOCK foo_fb
VAR
bar AT %I* : BOOL;
qux AT %I* : BOOL;
END_VAR
END_FUNCTION_BLOCK
In this example a variable named main.foo.qux
is declared as a template, however the VAR_CONFIG
-block does not contain
an address-configuration for it. Each template variable needs to be configured, otherwise it could lead to segmentation faults at runtime.
Template variable is configured multiple times
A template variable is configured more than once, leading to ambiguity.
Erroneous code example:
VAR_CONFIG
main.foo.bar AT %IX1.0 : BOOL;
main.foo.bar AT %IX1.1 : BOOL;
END_VAR
PROGRAM main
VAR
foo : foo_fb;
END_VAR
END_PROGRAM
FUNCTION_BLOCK foo_fb
VAR
bar AT %I* : BOOL;
END_VAR
END_FUNCTION_BLOCK
In this example a variable named main.foo.bar
has multiple configurations in the VAR_CONFIG
-block. It is not clear which address this variable should map to - only a single configuration entry per instance-variable is allowed.
Stateful member variable initialized with temporary reference
Stack-local variables do not yet exist at the time of initialization. Additionally, pointing to a temporary variable will lead to a dangling pointer as soon as it goes out of scope - potential use after free.
Erroneous code example:
FUNCTION_BLOCK foo
VAR
a : REF_TO BOOL := REF(b);
END_VAR
VAR_TEMP
b : BOOL;
END_VAR
END_FUNCTION_BLOCK
Libraries
RuSTy does not currently have support for importing source based libraries.
Source based libraries can, however, be compiled together with the application as normal files.
Precompiled libraries or system functions can be added using compilation flags or an entry in the plc.json
file.
System functions can also be added using External Function for each POU in that library.
Library Structure
A library is defined by:
- A set of
st
interfaces, each interface represents a function that has been precompiled.
In a POU, the interface is the definition and variable section e.g:
(*Interface for program example *) PROGRAM example VAR_INPUT a,b,c : DINT END_VAR (* End of interface *) (* Implementation *) END_PROGRAM
- A binary file for each architecture the library has been built for (
x86_64-linux-gnu
,aarch64-linux-gnu
, ..)
Linking libraries using the plc
command line
To include a library when using the plc
command line interface, the include files can be added using the -i
flag.
Each POU, Global Variable, or Datatype defined in the included files will be added to the project.
POUs and Global variables included with the -i
are marked as external, the implementation part of a POU is ignored.
To link the library, two options are then available: Shared and Static libraries.
Shared Libraries
A shared library (i.e. extension .so
) can be linked using the -l
flag.
For a library called mylib
, when the flag -lmylib
is passed, the linker will search for a file called libmylib.so
.
Note that the
lib<LibName>.so
format is required by the linker for unix like systems.
The library locations used by the linker are the default search locations of the linker (i.e. /usr/lib
, /lib
), additional paths can be provided using the -L
flag (e.g -L/opt/lib
will make the linker also search for files in /opt/lib).
Additional library locations can be provided by supplying additional -L
entries.
Additionally, the environment variable LD_LIBRARY_PATH
can be defined to append entries to the linker's search location. More information can be found here.
Static Libraries
Static libraries compiled as object files can be linked by simply passing the object file (i.e. extension .o
) as an input (simlar to other .st
files).
Archive files (i.e. extension .a
) can be linked similarly to Shared Libraries using the -l
flag.
If the application is being compiled with the --static
flag (or no shared library (.so
) is found), the linker will use the archive file.
If neither a shared object (
.so
) or an archive file (.a
) is found, compilation will fail.
Command line example
To compile a file called input.st
including a header and linking a library called libiec.so
from /lib
:
plc input.st -i iec/header.st -L/lib/ -liec
Linking libraries using the Build Description File plc.json
Libraries can be added to a project managed with a Build Description File.
To add a library to the project, the "libraries"
section can be used.
A library entry requires a name
, a path
, the package
behaviour, and a set of files to include (include_path
).
name
The name of the library to be linked. This will be used by the linker to find the library.
A library with the name mylib
must have an equivalant compiled file called libmylib.so
.
Note, archive files (ending with
.a
) are currently not supported.
path
The location of the library to be linked. The path can be either absolute or relative to the project.
package
The packaging option for the library, i.e wether the library should be copied or is already available on the system.
The value "Copy"
indicates that the given library should be copied to the Library Location.
The value "System"
indicates that the given library exists on the system and does not need to be copied.
include_path
A list of files (can include globs) that should be included with the project. Each POU, Global Variable, or Datatype defined in the included files will be added to the project. POUs and Global variables included in the list are marked as external, the implementation part of a POU is ignored.
Library Location
Libraries marked as Copy
will be copied during the compilation to the defined Library Location.
By default this is the same as the Build Location unless overridden by the --lib-location
parameter.
Using environment variables
Since libraries can be compiled for multiple targets, the lib path can contain environment variables to disambiguate the compile location.
$ARCH
can be used as placeholder in the path to indicate the the currently compiled target.
During linking, if no
.so
file with namelib<name>.so
is found, the compilation will fail.
Configuration Example (plc.json
)
A configuration example for a Copy
library called mylib and a System
library called std:
"libraries" : [
{
"name" : "mylib",
"path" : "libs/$ARCH/",
"package" : "Copy",
"include_path" : [
"simple_program.st"
]
},
{
"name" : "std",
"path" : "libs/$ARCH/",
"package" : "System",
"include_path" : [
"include/*.st"
]
}
]
External Functions
A POU
(PROGRAM
, FUNCTION
, FUNCTION_BLOCK
) can be marked as external,
which will cause the compiler to ignore its implementation.
{external}
FUNCTION log : DINT
VAR_IN_OUT
message : STRING[1024];
END_VAR
VAR_INPUT
type : (Err,Warn,Info) := Info;
END_VAR
END_FUNCTION
At compilation time, the function log
will be defined as an externally available function, and can be called from ST
code.
Note: At linking time, a
log
function with a compatible signature must be available on the system.
Calling C functions
ST
code can call into foreign functions natively.
To achieve this, the called function must be defined in a C
compatible API, e.g. extern "C"
blocks.
The interface of the function has to:
- either be included with the
-i
flag - or be declared in
ST
using the{external}
keyword
When including multiple header files/function interfaces, the -i
flag must precede each individual file, e.g. -i file1.st -i file2.st -i file3.st
. Alternatively, when including an entire folder with -i '/liblocation/*.st'
, the path must be put in quotes, otherwise the command-line might parse the arguments in a way that is incompatible (i.e. does not precede each file with -i
).
Example
Given a min
function defined in C
as follows:
int min(int a, int b) {
//...
}
an interface of that function in ST
can be defined as:
{external}
FUNCTION min : DINT
VAR_INPUT
a : DINT;
b : DINT;
END_VAR
END_FUNCTION
Variadic arguments
Some foreign functions, especially ones defined in C
, could be variadic functions.
These functions are usually defined with the last parameter ...
, and signify that a function can be called with unlimited parameters.
An example of a variadic function is printf
.
Calling a variadic function is supported in ST
. To mark an external function as variadic, you can add a parameter of type ...
to the VAR_INPUT
block.
Variadic function example
Given the printf
function defined as:
int printf( const char *restrict format, ... );
the ST
interface can be defined as:
{external}
FUNCTION printf : DINT
VAR_INPUT {ref}
format : STRING;
END_VAR
VAR_INPUT
args : ...;
END_VAR
END_FUNCTION
Runnable example
With the printf
function available on the system, there is no need to declare
the C function.
An ST
program called ExternalFunctions.st
with the following code can be declared:
(*ExternalFunctions.st*)
(**
* The printf function's interface, marked as external since
* it is defined directly along other ST functions
*)
{external}
FUNCTION printf : DINT
VAR_INPUT {ref}
format : STRING;
END_VAR
VAR_INPUT
args: ...;
END_VAR
END_FUNCTION
(**
* The main function of the program prints a demo to the standard out
* The function main is implemented at this location and thus not marked
* as {external}
*)
FUNCTION main : DINT
VAR
tmp : DINT;
END_VAR
tmp := 1;
printf('Value %d, %d, %d$N', tmp, tmp * 10, tmp * 100);
main := tmp;
END_FUNCTION
Compiling the previous code with the following command:
plc ExternalFunctions.st -o ExternalFunctions --linker=clang
will yield an executable called ExternalFunctions
.
We use clang to link the generated object file and generate an executable since the embedded linker cannot generate executable files.
The executable can then be started with ./ExternalFunctions
.
API guidelines
1. Introduction
- Purpose: Provide plc library developers with information how an interface for an application written in
IEC61131-3
should be designed and why. - Scope: This guideline applies to all developers writing libraries for use in IEC61131-3 applications.
2. API Guidelines
2.1 VAR_IN_OUT
instead of pointers
If a function takes a parameter for the purpose of reading and writing to it
a VAR_IN_OUT
can be used instead of a pointer in the VAR_INPUT
.
2.2 FUNCTION
and FUNCTION_BLOCK
FUNCTION
and FUNCTION_BLOCK
have similar properties, but they have fundamentally different representation in the compiler.
A FUNCTION
is defined in a similar manner to a C
function:
- It has no backing struct
- values defined inside it will only persist for the duration of the function call
Example:
FUNCTION myFunc : DINT
VAR_INPUT
x : DINT;
END_VAR
END_FUNCTION
int32_t myFunc(int32_t);
In contrast, a FUNCTION_BLOCK
is backed by a struct and is globally accessible by a defined instance.
To declare a FUNCTION_BLOCK
, a backing struct has to be declared and passed as a reference to the function block implementation.
FUNCTION_BLOCK myFb
VAR_INPUT
x : DINT;
END_VAR
END_FUNCTION_BLOCK
typedef struct {
int32_t x;
} myFunctStr;
void myFb(myFunctStr*);
2.2.1 Parameters
FUNCTION
and FUNCTION_BLOCK
may define input parameters. These are passed using the VAR_INPUT
or VAR_IN_OUT
blocks.
The difference between the two blocks is how the values are passed.
A VAR_INPUT
variable is passed by value, while a VAR_IN_OUT
variable is passed by reference.
In general, it is recommended to use a VAR_IN_OUT
for data that needs to be both read and written, while VAR_INPUT
should be reserved to read only values.
NOTE: In
FUNCTION
s complex datatypes are handled as pointers. They are however copied and changes in the function will have no effect on the actual variable.
Examples:
FUNCTION
:
FUNCTION myFunc : DINT
VAR_INPUT
myInt : DINT;
myString : STRING[255];
END_VAR
VAR_INPUT
myRefStr : STRING;
END_VAR
VAR_IN_OUT
myInOutInt : DINT;
END_VAR
END_FUNCTION
int32_t myFunc(int32_t myInt, char* myString, char* myRefStr, int32_t* myInOutInt);
FUNCTION_BLOCK
:
FUNCTION_BLOCK myFb
VAR_INPUT
myInt : DINT;
myString : STRING[255];
END_VAR
VAR_IN_OUT
myInOutInt : DINT;
END_VAR
END_FUNCTION_BLOCK
typedef struct {
int32_t myInt;
char myString[256];
int32_t* myInOutInt;
} myFbStruct
void myFb(myFbStruct* myFbInstance);
2.2.2 Private members
A FUNCTION_BLOCK
often requires local (private) members to hold data across executions. These members have to be declared in the struct.
As a side effect, these variables are visible to the users.
For example:
FUNCTION_BLOCK Count
VAR
current : DINT;
END_VAR
END_FUNCTION_BLOCK
typedef struct {
int32_t current;
} CountStruct;
void Count(CountStruct* countInst) {
countInst->current = countInst->current + 1;
}
2.2.3 Return values
A FUNCTION
defines a return value in the signature, while a FUNCTION_BLOCK
relies on VAR_OUTPUT
definitions.
Example:
FUNCTION myFunc : DINT
VAR_INPUT
x : DINT;
END_VAR
VAR_IN_OUT
y : DINT;
END_VAR
END_FUNCTION
The C interface would look like:
int32_t myFunc(int32_t x, int32_t* y);
The return type for a function can also include complex datatypes, such as strings, arrays and structs. Internally, complex return types are treated as reference parameters (pointers).
For complex return types, the function signature expects the return value as the first parameter.
Example:
FUNCTION myFunc : STRING
VAR_INPUT
x : DINT;
END_VAR
VAR_IN_OUT
y : DINT;
END_VAR
END_FUNCTION
The C interface would look like:
void myFunc(char* out, int32_t x, int32_t* y);
A FUNCTION_BLOCK
should use VAR_OUTPUT
for return values. Avoid using a
pointer in the VAR_INPUT
as a return value.
Example:
FUNCTION_BLOCK myFb
VAR_INPUT
x : DINT;
END_VAR
VAR_IN_OUT
y : DINT;
END_VAR
VAR_OUTPUT
myOut: DINT;
myOut2: STRING[255];
END_VAR
END_FUNCTION
The C interface would look like:
typedef struct {
int32_t x;
int32_t* y;
int32_t myOut;
char myOut2[256];
} myFbStruct;
void myFb(myFbStruct* myFbInst);
2.2.4 When to use a FUNCTION
vs. FUNCTION_BLOCK
A FUNCTION
can be well integrated into the API because of its return value which
can be nested into expressions. They however don't keep data over subsequent
executions. If you need to store static data use a FUNCTION_BLOCK
or use
VAR_IN_OUT
.
NOTE: Do not use
PROGRAM
s in your librariesPROGRAM
s have static instances. These are reserved for applications and should not be used in libraries.
2.3 Datatypes
The IEC61131-3 Standard defines several datatypes with their intended uses. To stay standard compliant, an API/Library should try and follow these guidelines.
2.3.1 Type sizes
Datatypes are generally convertable to C
equivalent. With the compiler defaulting to 64bit, some sizes were also fixed to 64bit.
Below is a table of types and how they can be used from C
type | c equivalent | size | comment |
---|---|---|---|
BOOL | bool | 8 | |
BYTE | uint8_t | 8 | intended to be used as bit sequence and not as a number |
SINT | int8_t | 8 | |
USINT | uint8_t | 8 | |
WORD | uint16_t | 16 | |
INT | int16_t | 16 | |
UINT | uint16_t | 16 | |
DINT | int32_t | 32 | |
DWORD | uint32_t | 32 | |
UDINT | uint32_t | 32 | |
LINT | int64_t | 64 | |
LWORD | uint64_t | 64 | |
ULINT | uint64_t | 64 | |
REAL | float_t | 32 | |
LREAL | double_t | 64 | |
TIME | time_t | 64 | Note that all time and date types are 64 bit |
LTIME | time_t | 64 | |
DATE | time_t | 64 | |
LDATE | time_t | 64 | |
DATE_AND_TIME | time_t | 64 | |
LDATE_AND_TIME | time_t | 64 | |
DT | time_t | 64 | |
LDT | time_t | 64 | |
TIME_OF_DAY | time_t | 64 | |
LTIME_OF_DAY | time_t | 64 | |
TOD | time_t | 64 | |
LTOD | time_t | 64 | |
POINTER TO type | *type | 64 | The Pointer size is equivalent to LWORD and not DWORD |
REF_TO type | *type | 64 | Prefer this type to POINTER TO for standard compliance |
STRING | uint8_t[] | var | UTF-8 String, null terminated. Default is 80 chars + 1 termination byte |
WSTRING | uint16_t[] | var | UTF-16 (wide) String, null terminated. Default is 80 chars + 1 termination byte |
2.3.2 Using Types in interfaces
When deciding on a type to use for a FUNCTION
, FUNCTION_BLOCK
, or STRUCT
use a type that reflects the intention of the API:
- A bit sequence should be in a BIT type like
WORD
and not in a numeric type likeINT
. - A variable representing a time should be stored in the appropriate time type and not an
LINT
orLWORD
- A pointer should be stored as a
REF_TO
and not as anLWORD
where possible. (W)STRING
s andARRAY
s stored inVAR
,VAR_INPUT
, andVAR_OUTPUT
sections ofFUNCTION_BLOCK
s are stored in theFUNCTION_BLOCK
, and are passed by value.- A
VAR_IN_OUT
block can be used to force a type to be passed as a pointer. Note thatVAR_IN_OUT
is a read-write variable and changes to the parameter will change it for the caller. FUNCTION
s expecting anARRAY
parameter can use theARRAY[*]
syntax (Variable sized array). The same functionality will be available forSTRING
. It is however not yet implemented.
- A
2.4 Struct alignment
Struct alignment in plc follows the default behaviour of C
.
When developing a library in C
a normal struct can be declared.
In langugages other than C
the struct has to be C
compatible. For example in rust
the #[repr(C)]
can be used to make the struct C
compatible.
Example:
TYPE myStruct:
STRUCT
x : DINT;
y : REF_TO DINT;
z : ARRAY[0..255] OF BYTE;
END_STRUCT
END_TYPE
The C
struct would look like:
typedef struct {
int32_t x;
int32_t* y;
char z[256];
} myStruct;
The rust
struct would look like
#![allow(unused)] fn main() { #[repr(C)] pub struct myStruct { x: i32, y: *mut i32, z: [c_char; 256], } }
2.5 FUNCTION_BLOCK
initialization
Not yet implemented.
Program Organization Unit (POU)
Definition
A POU is a executable unit available in an IEC61131-3 application. It can be defined as either a Program, a Function, a Function Block, or an Action.
Methods on classes are also considered POUs but are not covered by this document
A POU is defined as:
<POU Type> name
(* parameters *)
(* code *)
END_<POU Type>
Parameters
POUs can use input, output, or in/out parameters to pass data to the outside.
Such parameters are defined in a variable block delimeted by VAR_<TYPE>
and END_VAR
Supported parameter types are VAR_INPUT
, VAR_INPUT {ref}
, VAR_OUTPUT
and VAR_IN_OUT
Input
Input parameters are typically copied into the target POU to be stored and read for later references.
A definition for input parameters is as follows:
VAR_INPUT
a : INT;
END_VAR
In some cases, especially when passing large strings or arrays, or when interacting with foreign code (see External Functions) it is more efficient to avoid copying the variable values and just use a pointer to the required input.
This can be done either using the in/out variables or by specifying a special property ref
on the input block.
Example:
VAR_INPUT {ref}
a : STRING;
END_VAR
Note that passing the ref property will convert all variables in that block to pointers, and should only be used in Functions.
In Out
In/Out parameters are required parameters that are always passed by reference. They can be modified by the POU the call, and the changes are applied directly to the passed variable. An In/Out parameter must always be passed in a POU call and cannot be stored.
Output
Output parameters are used to return the result(s) of the POU call. They are passed by reference, but are optional. If an output parameter is not passed in a call, its value is not persisted.
Variables
In addition to parameters, a POU contains local variables, these can either be stored in the POU for later reference (VAR
) or only created for a single call (VAR_TEMP
)
In a function, all local variables are temporary.
Specialization
In addition to the default behavior, each type of POU has some special cases.
Function
Functions are stateless sequences of callable code. They are not backed by any structs, and cannot hold any state accross multiple calls. A function's input parameter can be passed by value, or by reference.
Functions also support a return type, the resulting definition is:
FUNCTION fnName : <TYPE>
(* parameters *)
VAR_INPUT (* by value *)
x : INT;
END_VAR
VAR_INPUT {ref} (* by reference *)
x : INT;
END_VAR
(* temporary variables *)
VAR
y : INT;
END_VAR
VAR_TEMP
z : INT;
END_VAR
(* code *)
END_FUNCTION
Program
Programs are a static (i.e. GLOBAL
) STRUCT
that holds its state accross multiple calls.
A Program exists once, and only once in an application, and subsequent calls to a program will change and store the passed parameters as well as internal variables.
A program does not support passing input parameters by reference.
Example:
PROGRAM prg
(* parameters *)
VAR_INPUT
x : INT;
END_VAR
(* persisted variables *)
VAR
y : INT;
END_VAR
(* temporary variables *)
VAR_TEMP
z : INT;
END_VAR
(* code *)
END_PROGRAM
Function Block
A function block is a STRUCT
that can be initialized multiple times using different variables (i.e instance
s).
A function block instance can hold its state (including input parameters) across multiple calls, but does not share any state with different instances.
A function block does not support passing input parameters by reference.
FUNCTION_BLOCK fb
(* parameters *)
VAR_INPUT
x : INT;
END_VAR
(* persisted variables *)
VAR
y : INT;
END_VAR
(* temporary variables *)
VAR_TEMP
z : INT;
END_VAR
(* code *)
END_FUNCTION_BLOCK
Action
An action is represented by a parent struct, and does not define its own interface (VAR blocks). An action can only be defined for Programs and Function Blocks.
An action is defined in 3 different ways, either in a container (ACTIONS
) directly below the POU, in a named ACTIONS
container, or using a qualified name on the action.
Example:
FUNCTION_BLOCK fb
(* parameters *)
VAR_INPUT
x : INT;
END_VAR
(* persisted variables *)
VAR
y : INT;
END_VAR
(* temporary variables *)
VAR_TEMP
z : INT;
END_VAR
(* code *)
END_FUNCTION_BLOCK
ACTIONS (* implicitly belongs to FB *)
ACTION act
(* code *)
END_ACTION
END_ACTIONS
ACTIONS fb (* explicitly belongs to FB *)
ACTION act2
(* code *)
END_ACTION
END_ACTIONS
ACTION fb.act3 (* linked to FB with name definition *)
(* code *)
END_ACTION
Variables
Constants
Variable declaration blocks can be delcared as CONSTANT. All variables of a constant declaration block become constants. Constant variables can not be changed and need to be initialized.
Example
TYPE OneInt : INT := 1; END_TYPE
VAR_GLOBAL CONSTANT
MAX_SIZE : INT := 99;
MIN_LEN : INT := 1;
counter : OneInt; (* 1 *)
END_VAR
PROGRAM PLC_PRG
VAR CONSTANT
DEFAULT_INPUT : BOOL := FALSE;
END_VAR
END_PROGRAM
Variable Initialization
Initializers of variables are evaluated at compile time. Therefore they can only consist of literals, other constants or expressions consisting of a combination of them. Note that initializers must not contain recursive definitions.
If a variable has no initializer, the variable may be initialized with it's datatype's default value or else with 0
.
Array Initialization
Arrays can be initialized using array literals. If the array-initial value does not contain all required elements, the array's inner type's default value will be used to fill the missing values.
Example
TYPE SignalValue : INT := -1; END_TYPE
VAR_GLOBAL CONSTANT
MIN_LEN : INT := 1;
MAX_LEN : INT := 100;
SIZE : INT := MAX_LEN - MIN_LEN;
END_VAR
PROGRAM PLC_PRG
VAR_INPUT
signals: ARRAY[0..SIZE] OF SignalValue := [99, 99]; (* rest is -1 *)
END_VAR
...
END_PROGRAM
Pointer Initialization
A pointer variable can be initialized with the address of a global reference or an IEC-address using the AT
or REFERENCE TO
syntax. REF_TO
pointers can be initialized using the built-in REF
function in its initializer.
This initialization, however, does not take place during compile time. Instead, each pointer initialized with an address will be zero-initialized to a null pointer by default. The compiler collects all pointer initializations during compilation and creates internal initializer functions for each POU. These functions are then called in a single overarching project-initialization function, which can be called either manually in your main function or by a runtime. Additionally, global variables — whether they are initialized pointers or POU instances containing pointer initializers — are also assigned within this overarching function.
This function follows a naming scheme (__init___<project name>
) that varies slightly depending on whether a build config (plc.json
) was used.
-
When using a build config (
plc.json
), the project name is used:Build config snippet:
{ "name": "myProject", "files": [] }
Resulting symbol:
__init___myProject()
-
When compiling without a build config, the name of the first file passed via CLI is used as the base for the name.
CLI command:
# build command plc myFile1.st myFile2.st
Resulting symbol:
__init___myFile1_st()
It is important to note that if there are pointer initializations present in your project, failing to call the initialization function in your runtime or in main
will result in null pointer dereferences at runtime.
Example
myProject.st:
VAR_GLOBAL
myGlobal : STRING;
END_VAR
PROGRAM prog
VAR
myString : REF_TO STRING := REF(myGlobal);
myOtherString : REFERENCE TO STRING REF= myGlobal;
myAlias AT myGlobal: STRING;
myAnalogSignal AT %IX1.0 : REAL;
END_VAR
// ...
END_PROGRAM
FUNCTION main: DINT
__init___myProject_st();
prog();
END_FUNCTION
Datatypes
Numeric types
A variety of numeric types exist with different sizes and properties complying with IEC61131.
Overview
Type name | Size | Properties |
---|---|---|
SINT | 8 bit | signed |
USINT | 8 bit | unsigned |
INT | 16 bit | signed |
UINT | 16 bit | unsigned |
DINT | 32 bit | signed |
UDINT | 32 bit | unsigned |
LINT | 64 bit | signed |
ULINT | 64 bit | unsigned |
REAL | 32 bit | float |
LREAL | 64 bit | float |
When such a variable is declared without being initialized, it will
be default-initialized with a value of 0
or 0.0
respectively.
A word on integer literals
Integer literals can be prefixed with either 2#
(binary), 8#
(octal) or 16#
(hexadecimal).
They will then be treated with regard to the respective number system.
Examples:
i1 : DINT := 42;
- declares and initializes a 32bit signed integer with value 42.i1 : DINT := 2#101010;
- declares and initializes a 32bit signed integer with value 42.i1 : DINT := 8#52;
- declares and initializes a 32bit signed integer with value 42.i1 : DINT := 16#2A;
- declares and initializes a 32bit signed integer with value 42.
Strings
Overview
Type name | Size | Encoding |
---|---|---|
STRING | n+1 | UTF-8 |
WSTRING | 2n+2 | UTF-16 |
When such a variable is declared without being initialized, it will be default-initialized with a value of '' or "" respectively (empty strings).
STRING
RuSTy treats STRING
s as byte-arrays storing UTF-8 character bytes with a Null-terminator (0-byte) at the end.
So a String of size n requres n+1 bytes to account for the Null-terminator.
A STRING
literal is surrounded by single-ticks '
.
A String has a well defined length which can be defined similar to the array-syntax.
A String-variable myVariable: STRING[20]
declares a byte array of length 21, to store 20 utf8 character bytes.
When declaring a STRING
, the length-attribute is optional. The default length is 80.
Examples:
s1 : STRING;
- declares a String of length 80.s2 : STRING[20];
- declares a String of length 20.s3 : STRING := 'Hello World';
- declares and initializes a String of length 80, and initializes it with the utf8 characters and a null-terminator at the end.s4 : STRING[55] := 'Foo Baz';
- declares and initializes a String of length 55 and initializes it with the utf8 characters and a null-terminator at the end.
WSTRING (Wide Strings)
RuSTy treats WSTRING
s as byte-arrays storing UTF-16 character bytes with two Null-terminator bytes at the end.
The bytes are stored in Little Endian encoding.
A Wide-String of size n requres 2 * (n+1) bytes to account for the 2 byes per utf16 character and the Null-terminators.
A WSTRING
literal is surrounded by doubly-ticks "
.
A WSTRING
has a well defined length which can be defined similar to the array-syntax.
A WSTRING
-variable myVariable: WSTRING[20]
declares a byte array of length 42, to store 20 utf16 character bytes.
When declaring a WSTRING
, the length-attribute is optional. The default length is 80.
Examples:
ws1 : WSTRING;
- declares a Wide-String of length 80.ws2 : WSTRING[20];
- declares a Wide-String of length 20.ws3 : WSTRING := "Hello World";
- declares and initializes a Wide-String of length 80, and initializes it with the utf16 characters and a utf16-null-terminator at the end.ws4 : WSTRING[55] := "Foo Baz";
- declares and initializes a Wide-String of length 55 and initializes it with the utf8 characters and a utf16-null-terminator at the end.
Date and Time
Overview
Type name | Size | Internally stored as |
---|---|---|
TIME | 64 bit | Timespan in nanoseconds |
TIME_OF_DAY | 64 bit | Nanoseconds since Jan 1, 1970 UTC |
DATE | 64 bit | Nanoseconds since Jan 1, 1970 UTC |
DATE_AND_TIME | 64 bit | Nanoseconds since Jan 1, 1970 UTC |
Note that RuSTy already treats TIME
, TIME_OF_DAY
, DATE
and DATE_AND_TIME
as 64 bit numbers.
Therefore the long pendants LTIME
, LTOD
, LDATE
and LDT
are mere aliases to the original types.
DATE
The DATE
datatype is used to represent a Date in the Gregorian Calendar.
Such a value is stored as an i64 with a precision in nanoseconds and denotes the number of nanoseconds
that have elapsed since January 1, 1970 UTC not counting leap seconds.
DATE literals start with DATE#
or D#
followed by a date in the format of yyyy-mm-dd
.
Examples:
d1 : DATE := DATE#2021-05-02;
d2 : DATE := DATE#1-12-24;
d3 : DATE := D#2000-1-1;
DATE_AND_TIME
The DATE_AND_TIME
datatype is used to represent a certain point in time in the Gregorian Calendar.
Such a value is stored as an i64
with a precision in nanoseconds and denotes the
number of nanoseconds that have elapsed since January 1, 1970 UTC not counting leap seconds.
DATE_AND_TIME literals start with DATE_AND_TIME#
or DT#
followed by a date and time in the
format of yyyy-mm-dd-hh:mm:ss
.
Note that only the seconds-segment can have a fraction denoting the milliseconds.
Examples:
d1 : DATE_AND_TIME := DATE_AND_TIME#2021-05-02-14:20:10.25;
d2 : DATE_AND_TIME := DATE_AND_TIME#1-12-24-00:00:1;
d3 : DATE_AND_TIME := DT#1999-12-31-23:59:59.999;
TIME_OF_DAY
The TIME_OF_DAY
datatype is used to represent a specific moment in time in a day.
Such a value is stored as an i64
value with a precision in nanoseconds and denotes the
number of nanoseconds that have elapsed since January 1, 1970 UTC not counting leap seconds.
Hence this value is stored as a DATE_AND_TIME
with the day fixed to 1970-01-01.
TIME_OF_DAY
literals start with TIME_OF_DAY#
or TOD#
followed by a time in the
format of hh:mm:ss
.
Note that only the seconeds-segment can have a fraction denoting the milliseconds.
Examples:
t1 : TIME_OF_DAY := TIME_OF_DAY#14:20:10.25;
t2 : TIME_OF_DAY := TIME_OF_DY#0:00:1;
t3 : TIME_OF_DAY := TOD#23:59:59.999;
TIME
The TIME
datatype is used to represent a time-span.
A TIME
value is stored as an i64
value with a precision in nanoseconds.
TIME literals start with TIME#
or T#
followed by the TIME
segements.
Supported segements are:
d
...f64
daysh
...f64
hoursm
...f64
minutess
...f64
secondsms
...f64
millisecondsus
...f64
microsecondsns
...u32
nanaoseconds
Note that only the last segment of a TIME
literal can have a fraction.
Examples:
t1 : TIME := TIME#2d4h6m8s10ms;
t2 : TIME := T#2d4.2h;
t3 : TIME := T#-10s4ms16ns;
Other types
The BOOL
type can either be assigned TRUE
or FALSE
.
The type __VOID
is the empty type and has an undefined size.
Type name | Size | Properties |
---|---|---|
BOOL | 8 bit | signed |
__VOID | undefined |
Bit datatypes are defined as follows:
Type name | Size | Properties |
---|---|---|
BYTE | 8 bit | unsigned |
WORD | 16 bit | unsigned |
DWORD | 32 bit | unsigned |
LWORD | 64 bit | unsigned |
Direct (Bit) Access on Variables
The IEC61131-3 Standard allows reading specific Bits
, Bytes
, Words
or DWords
from an ANY_BIT
type.
RuSTy supports this functionalty and extends it to support all INT
types.
Constant based Direct Access
To access a bit sequence in a variable, a direct access instruction %<Type><Value>
is used.
Type
is the bit sequence size required and is described as follows:
Type | Size | Example |
---|---|---|
X | 1 | `%X1 |
B | 8 | `%B1 |
W | 16 | `%W1 |
D | 32 | `%D1 |
For
Bit
access, the%X
is optional.
Example
FUNCTION main : DINT
VAR
variable : LWORD;
bitTarget : BOOL;
bitTarget2 : BOOL;
byteTarget : BYTE;
wordTarget : WORD;
dwordTarget : DWORD;
END_VAR
variable := 16#AB_CD_EF_12_34_56_78_90;
bitTarget := variable.%X63; (*Access last bit*)
byteTarget := variable.%B7; (*Access last byte*)
wordTarget := variable.%W3; (*Access last word*)
dwordTarget := variable.%D1; (*Access last dword*)
(*Chaining an access is also allowed *)
bitTarget2 := variable.%D1.%W1.%B1.%X1;
END_FUNCTION
Varirable based Direct Access
While the IEC61131-3 Standard only defines variable access using constant int literals,
RuSTy additionally supports access using Variables.
The Syntax for a variable based access is %<Type><Variable>
.
The provided varibale has to be a direct Reference variable (non Qualified).
Short hand access for Bit (Without the
%X
modifier) is not allowed.
Example
FUNCTION main : DINT
VAR
variable : LWORD;
access_var : INT;
bitTarget : BOOL;
bitTarget2 : BOOL;
byteTarget : BYTE;
wordTarget : WORD;
dwordTarget : DWORD;
END_VAR
variable := 16#AB_CD_EF_12_34_56_78_90;
access_var := 63;
bitTarget := variable.%Xaccess_var; (*Access last bit*)
access_var := 7;
byteTarget := variable.%Baccess_var; (*Access last byte*)
access_var := 3;
wordTarget := variable.%Waccess_var; (*Access last word*)
access_var := 1;
dwordTarget := variable.%Daccess_var; (*Access last dword*)
(*Chaining an access is also allowed *)
bitTarget2 := variable.%Daccess_var.%Waccess_var.%Baccess_var.%Xaccess_var;
END_FUNCTION
Architecture
Overview
RuSTy is a compiler for IEC61131-3 languages. At the moment, ST and CFC ("FBD") are supported. It utilizes the LLVM compiler infrastructurue and contributes a Structured Text frontend that translates Structured Text into LLVM's language independent intermediate representation (IR). CFC uses a M2M-transformation and reuses most of the ST frontend for compilation. The further optimization and native code generation is performed by the existing LLVM infrastructure, namely LLVM's common optimizer and the platform specific backend (see here).
┌──────────────────┐ ┌───────────────┐ ┌────────────────┐
│ │ │ │ │ │
│ RuSTy │ │ LLVM Common │ │ LLVM Backend │
│ ├───►│ ├───►│ │
│ LLVM Frontend │ │ Optimizer │ │ (e.g Clang) │
│ │ │ │ │ │
└──────────────────┘ └───────────────┘ └────────────────┘
So RuSTy consists of the frontend part of the llvm compiler-infrastructure. This means that this compiler can benefit from llvm's existing compiler-optimizations, as well as all backend target platforms available.
Rusty Frontend Architecture
Ultimately the goal of a compiler frontend is to translate the original source code into the infrastructure's intermediate representation (in this case we're talking about LLVM IR). RuSTy treats this task as a compilation step of its own. While a fully fledged compiler generates machine code as a last step, RuSTy generates LLVM IR assembly code.
Structured Text
┌────────┐ ┌────────┐
│ Source │ │ LLVM │
│ │ │ IR │
│ Files │ │ │
└───┬────┘ └────────┘
│ ▲
▼ │
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────┴─────┐
│ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
│ Parser ├──►│ Indexer ├──►│ Linker ├──►│ Validation ├──►│ Codegen │
│ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
└────────────┘ └────────────┘ └────────────┘ └────────────┘ └────────────┘
CFC/FBD
┌────────┐ ┌────────┐
│ Source │ │ LLVM │
│ │ │ IR │
│ Files │ │ │
└───┬────┘ └────────┘
│ ▲
▼ │
┌────────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────┴─────┐
│ │ │ │ │ │ │ │ │ │
│ Model-to-Model │ │ │ │ │ │ │ │ │
│ Transformation ├───►│ Indexer ├──►│ Linker ├──►│ Validation ├──►│ Codegen │
│ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
└────────────────┘ └────────────┘ └────────────┘ └────────────┘ └────────────┘
Parser
The role of the parser is to turn source-code which is fed as a string (in the form of files) into a tree-representation of that source-code. This tree is typically called the Abstract Syntax Tree (AST). The step of parsing consists of two distinct stages. The first one is the lexical analysis (Lexer) which is performed by a lexer. After lexing we perform the syntactical analysis (Parser) to construct the syntax tree.
┌──┐
┌──────────────┐ │ │
│ │ └──┘
│ Source Code │ / \
│ │ ┌─────────┐ ┌──────────┐ / \
│ ────────── │ │ │ │ │ ┌──┐ ┌──┐
│ ├───► Lexer │ │ Parser ├────►│ │ │ │
│ ───────── │ │ │ │ │ └──┘ └──┘
│ │ └────┬────┘ └──────────┘ /\ /\
│ ──── │ │ ▲ / \ / \
│ │ │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐
│ ──────── │ ▼ │ │ │ │ │ │ │ │ │
│ │ ┌───────────────────────┴──┐ └──┘ └──┘ └──┘ └──┘
│ │ │ │
└──────────────┘ │ ┌───┐ ┌───┐ ┌───┐ ┌───┐ │ Abstract Syntax
│ │ T │ │ T │ │ T │ │...│ │ Tree
│ └───┘ └───┘ └───┘ └───┘ │
│ │
└──────────────────────────┘
Token-Stream
Lexer
The lexer performs the lexical analysis. This step turns the source-string into a sequence of well known tokens. The Lexer (or sometimes also called tokenizer) splits the source-string into tokens (or words). Each token has a distinct type which corresponds to a grammar's element. Typical token-types are keywords, numbers, identifiers, brackets, dots, etc. So with the help of this token-stream it is much easier for the parser to spot certain patterns. E.g. a floating-point number consists of the token-sequence: number, dot, number.
The lexer is implemented in the lexer
-module.
It uses the logos crate to create a lexer that is able to identify all different terminal-symbols.
Compared to other languages, Structured Text has a quite high number of keywords and other tokens, so RuSTy's lexer identifies a quite large number of different tokens.
Parser
The parser takes the token stream and creates the corresponding AST that represents the source code in a structured, hierarchical way.
The parser is implemented in the parser
module whereas the model for the AST is implemented in the ast
module.
AST - Abstract Syntax Tree
The abstract syntax tree is a tree representation of the source code.
Some parser implementations use a generic tree-data-structure consisting of Nodes
which can have an arbitrary number of children.
These nodes usually have dynamic properties like a type and an optional value and sometimes they even have dynamic properties stored in a map to make this representation even more flexible.
While this approach needs very little source code we decided to favour a less flexible approach. The RuSTy-AST models every single ast-node as its own struct with all necessary fields including the possible child-nodes. While this approach needs much more code and hand-written changes, its benefits lie in the clearness and simplicity of the data-structure. Every element of the AST is easily identified, debugged and understood. E.g. while in a generic node based AST it is easily possible to have a binary-statement with no, one, or seven child-nodes, the RuSTy-AST enforces the structure of every node. So the RuSTy-Binary-Statement has exactly two children. It is impossible to construct it differently.
Example
So an assignment a := 3;
will be parsed with the help of the following Structures:
struct Reference {
name: string
}
struct LiteralInteger {
value: i128
}
struct Assignment {
left: Box<AstStatement>,
right: Box<AstStatement>
}
Recursive Descent Parser
There are a lot of different frameworks to generate parsers from formal grammars. While they generate highly optimized parsers we felt we wanted more control and more understanding of the parsing process and the resulting AST. The fact that at that point in time we were pretty new to rust itself, writing the parser by hand also gave us more practice and a stronger feeling of control and understanding. Using a parser-generator framework will definitely be an option for future improvements.
As for now, the parser is a hand-written recursive descent parser inside the parser
-module.
As the parser reads the token stream Reference
, KeywordEquals
, Number
, Semicolon
it instantiates the corresponding syntax tree:
┌─────────────────┐
│ Assignment │
└──────┬──┬───────┘
left │ │ right
┌───────────┘ └──────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Reference │ │ LiteralInteger │
├──────────────────┤ ├──────────────────┤
│ name: 'a' │ │ value: '3' │
└──────────────────┘ └──────────────────┘
Indexer
The indexing step is responsible of building and maintaining the Symbol-Table (also called Index). The Index contains all known referable objects such as variables, data-types, POUs, Functions, etc. The Symbol-Table also maintains additional information about every referable object such as: the object's type, the objects' datatype, etc.
Indexing is performed by the index module. It contains the index itself (a.k.a. Symbol Table), the visitor which collects all global names and their additional information as well as a data structure that handles compile time constant expressions (constant_expressions).
The Index (Symbol Table)
The index stores information about all referable elements of the program. Depending on the type of element, we store different meta-information alongside the name of the element.
Index Field | Description |
---|---|
global_variables | All global variables accessible via their name. |
enum_global_variables | All enum elements accessible via their name (as if they were global variables, e.g. 'RED') |
member_variables | Member variables of structured types (Structs,Functionblocks, etc. This map allows to query all members of a container by name.) |
implementations | All callable implementations (Programs, Functions, Actions, Functionblocks) accessible by their name. |
pous | All pous (Programs, Functions, Functionblocks) with additional information. |
type_index | All data-types (intrinsic and complex) accessible via their name |
constant_expressions | The results of constant expressions that can be evaluated at compile time (e.g. the initializer of a constant: VAR_GLOBAL CONST TAU := 3.1415 * 2; END_VAR ) |
There are 3 different type of entries in the index:
- VariableIndexEntry The VariableIndexEntry holds information about every Variable in the source code and offers additional information relevant for linking, validation and code-generation.
┌─────────────────────────────┐ ┌─────────────────┐
│ VariableIndexEntry │ │ <enum> │
│ │ │ VariableType │
├─────────────────────────────┤ var_type ├─────────────────┤
│ │ │ - Local │
│ - name: String ├─────────────►│ - Temp │
│ - qualified_name: String │ │ - Input │
│ - is_constant: bool │ │ - Output │
│ - location_in_parent: u32 │ │ - InOut │
│ - data_type_name: String │ │ - Global │
│ │ │ - Return │
└───────────┬─────────────────┘ └─────────────────┘
│
│initial_value
│
│
│ ┌──────────────────┐
│ │ ConstExpression │
│ 0..1 ├──────────────────┤
└───────────►│ │
│ ... │
│ │
└──────────────────┘
- PouIndexEntry The PouIndexEntry offers information about all Program-Organization-Units. The index entry offers information like the name of an instance-struct, the name of the registered implementation, etc.
┌──────────────────────────┐
│ <abstract> │
│ POUIndexEntry │
├──────────────────────────┤
│ │
└──────────────────────────┘
▲
│
│
│ ┌──────────────────────────┐ ┌──────────────────────────┐
│ │ ProgramIndexEntry │ │ GenericParameter │
│ ├──────────────────────────┤ ├──────────────────────────┤
│ │ - name: String │ │ - name: String │
├─────┤ - instanceStruct: String ├──┬──►│ - typeNature: TypeNature │
│ │ │ │ │ │
│ │ │ │ │ │
│ └──────────────────────────┘ │ └──────────────────────────┘
│ │
│ │
│ │
│ ┌──────────────────────────┐ │
│ │ FunctionIndexEntry │ │ generics
│ ├──────────────────────────┤ │
│ │ - name: String │ │
├─────┤ ├──┤
│ │ │ │
│ │ │ │
│ └──────────────────────────┘ │
│ │
│ │
│ │
│ ┌──────────────────────────┐ │
│ │ FunctionBlockIndexEntry │ │
│ ├──────────────────────────┤ │
│ │ - name: String ├──┤
├─────┤ - instanceStruct: String │ │
│ │ │ │
│ │ │ │
│ └──────────────────────────┘ │
│ │
│ │
│ │
│ ┌──────────────────────────┐ │
│ │ ClassIndexEntry │ │
│ ├──────────────────────────┤ │
│ │ - name: String │ │
└─────┤ - instanceStruct: String ├──┘
│ │
│ │
└──────────────────────────┘
- ImplementationIndexEntry The ImplementationIndexEntry offers information about any callable implementation (Program, Functionblock, Function, etc.). It also offers metadata about the implementation type, the name of the method to call and the name of the parameter-struct (this-struct) to pass to the function.
┌───────────────────────┐
┌──────────────────────────┐ │ <enum> │
│ ImplementationIndexEntry │ │ ImplementationType │
├──────────────────────────┤ type │ │
│ ├─────────────►├───────────────────────┤
│ - call_name: String │ │ - Program │
│ - type_name: String │ │ - Function │
│ │ │ - FunctionBlock │
└──────────────────────────┘ │ - Action │
│ - Class │
│ - Method │
│ │
└───────────────────────┘
- DataType The entry for a DataType offers information about any data-type supported by the program to be compiled (internal data types as well as user defined data types). For each data-type we offer additional information such as it's initial value, its type-nature (in terms of generic functions - e.g: ANY_INT) and some additional information about the type's internal structure and size (e.g. is it a number/array/struct/etc).
┌─────────────┐ ┌────────────────────┐
│ DataType │ │ ConstantExpression │
├─────────────┤ initial_value ├────────────────────┤
│ ├──────────────────►│ │
│ - name │ │ ... │
│ ├─────────┐ │ │
└──────┬──────┘ │ └────────────────────┘
│ │
│ │ ┌────────────────────┐
│ │ │ TypeNature │
│ │ ├────────────────────┤
│ information │ │ - Any │
│ └────────►│ - Derived │
│ nature │ - Elementary │
│ │ - Num │
▼ │ - Int │
┌───────────────────────┐ │ - Signed │
│ <abstract> │ │ - ... │
│ DataTypeInformation │ └────────────────────┘
├───────────────────────┤
│ │
└───────────────────────┘
▲
│
│
│
┌────────────────┬───────┴───────┬──────────────┬──────────────┐
│ │ │ │ │
┌────────┴───────┐ ┌──────┴──────┐ ┌──────┴─────┐ ┌─────┴──────┐ ┌────┴─────┐
│ Struct │ │ Array │ │ Integer │ │ String │ │ ... │
├────────────────┤ ├─────────────┤ ├────────────┤ ├────────────┤ ├──────────┤
│ - name │ │- name │ │ - name │ │ - size │ │ ... │
│ - members │ │- inner_type │ │ - signed │ │ - encoding │ │ │
│ │ │- dimensions │ │ - size │ │ │ │ │
└────────────────┘ └─────────────┘ └────────────┘ └────────────┘ └──────────┘
Linker
The linker's task is to decide where all references in the source code point to. There are different references in Structured Text:
- variable references
x := 4
where x is a reference to the variable x. - type references
i : MyFunctionBlock
where MyFunctionBlock is a reference to the declared FunctionBlock. - Program references
PLC_PRG.x := 4
where PLC_PRG is a reference to a Program-POU called PLC_PRG. - Function references
max(a, b)
where max is a reference to a Function-POU called max.
So the linker decides where a reference points to. A reference has a corresponding declaration that matches the reference's name:
PROGRAM PLC_PRG
VAR
┌──────► x : INT;
│
│ END_VAR
│
└────┐
│
x := 3;
END_PROGRAM
The linker's results will be used by the semantic validation step and by the code-generation.
The validator decides whether the name you put at a certain location is valid or not. In order to decide whether a certain reference is valid or not, we need to know where it is pointing to, so whether we expect a variable, a datatype or something different.
The code-generation needs to know what certain names mean, in order to successfully generate the IR-code that reflects the behavior of your program.
Annotated Syntax Tree
The AST generated by the parser is a pretty static data-structure. So where should we store the linking information for a reference? Even if we would add fields for potential linking-information to the AST, the ownership concepts of Rust would give us a hard time to fill this information piece by piece during linking. So what we end up doing, is to use the arena-pattern to handle the different lifetimes of the parts of an AST (the AST itself is constructed very early in the compilation process, where the linking information is allocated later). We don't store the linking information directly in the AST, but we store it inside the mentioned arena-data-structure and link it with certain AST-elements.
The RuSTy linker stores the linking information in an arena called AnnotationMap. The AnnotationMap can store two type of annotations for any AST-element. So the first step is that we need a way to uniquely identify every single AST-node so we can use this ID as a key for the annotations stored in the AnnotationMap to automatically associate it with the given AST-Node. The parser assigns a unique ID to every Statement-Tree-Node (Note that we only assign IDs to Statements, not every AST-Node).
So the expression a + 3
now looks like this:
┌─────────────────┐
│ BinaryOperation │
├─────────────────┤
│ operator: Plus │
│ ID: 1 │
└──────┬──┬───────┘
│ │
left │ │ right
┌───────────┘ └──────────┐
│ │
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Reference │ │ LiteralInteger │
├──────────────────┤ ├──────────────────┤
│ name: 'a' │ │ value: '3' │
│ ID: 2 │ │ ID: 3 │
└──────────────────┘ └──────────────────┘
The AnnotationMap stores 5 different types of annotation:
Value
The Value-annotation indicates that this AST-Element resolves to a value with the given resulting datatype. So for Example the LiteralInteger(3) node gets a Value-Annotation with a resulting type ofDINT
.
┌─────────────────────────┐
│ Value │
├─────────────────────────┤
│ │
│ resulting_type: String │
│ │
└─────────────────────────┘
Variable
The Variable-annotation indicates that this AST-Element resolves to a variable with the given qualified name (and some comfort-information like whether it is a constant and whether it is an auto-deref pointer). Similar to the value-Annotation it also saves the resulting datatype.
┌─────────────────────────┐
│ Variable │
├─────────────────────────┤
│ │
│ resulting_type: String │
│ qualified_name: String │
│ constant: bool │
│ is_auto_deref: bool │
│ │
└─────────────────────────┘
Function
The Function-annotation indicates that this AST-Element resolves to a Function-POU (a call-statement) with the given qualified name. Similar to the value-Annotation it also saves the resulting datatype but this time as the function's return type (return_type).
┌─────────────────────────┐
│ Function │
├─────────────────────────┤
│ │
│ return_type: String │
│ qualified_name: String │
│ │
└─────────────────────────┘
Type
The Type-annotation indicates that this AST-Element resolves to a DataType (e.g. a Declaration:x: INT
) with the given name.
┌─────────────────────────┐
│ Type │
├─────────────────────────┤
│ │
│ type_name: String │
│ │
└─────────────────────────┘
Program
The Program-annotation is very similar to the Function-annotation. Since a Program has no return-value it also offers no return-type information.
┌─────────────────────────┐
│ Program │
├─────────────────────────┤
│ │
│ qualified_name: String │
│ │
└─────────────────────────┘
So the example expression from above `a + 3* will be annotated like this: (Note that the resulting type of the Binary-Operation must be calculated by the linker by determining the bigger of both types.)
┌─────────────────┐
│ BinaryOperation │
├─────────────────┤
│ operator: Plus │
│ ID: 1 │
└──────┬──┬───────┘
│ │
left │ │ right
┌───────────┘ └──────────┐
│ │
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Reference │ │ LiteralInteger │
├──────────────────┤ ├──────────────────┤
│ name: 'a' │ │ value: '3' │
│ ID: 2 │ │ ID: 3 │
└──────────────────┘ └──────────────────┘
┌────────────────────────────┐
│ Value │
┌───────────────────┐ ├────────────────────────────┤
│ AnnotationMap │ ┌───►│ resulting_type: DINT │
│ │ │ │ │
├───────┬───────────┤ │ └────────────────────────────┘
│ ID: 1 │ Value ├───┘
├───────┼───────────┤ ┌────────────────────────────┐
│ ID: 2 │ Variable ├────┐ │ Variable │
├───────┼───────────┤ │ ├────────────────────────────┤
│ ID: 3 │ Value ├──┐ │ │ resulting_type: SINT │
└───────┴───────────┘ │ └──►│ qualified_name: PLC_PRG.a │
│ │ constant: false │
│ │ is_auto_deref: false │
│ └────────────────────────────┘
│
│ ┌────────────────────────────┐
│ │ Value │
│ ├────────────────────────────┤
└────►│ resulting_type: DINT │
│ │
└────────────────────────────┘
Another example where the annotated AST carries a lot of useful information is with complex expressions like array-expressions or qualified references. Lets consider the following statement:
PLC_PRG.a.b[2]
It is annotated in the following way:
┌────────────────────┐
│ QualifiedReference │
├────────────────────┤
│ ID: 1 │
└─────────┬──────────┘
│ elements: Vec<AstStatement>
┌─────────┴──────────┬─────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Reference │ │ Reference │ │ ArrayAccess │
├──────────────────┤ ├──────────────────┤ ├──────────────────┤
│ name: 'PLC_PRG' │ │ name: 'a' │ │ │
│ ID: 2 │ │ ID: 3 │ │ ID: 4 │
└──────────────────┘ └──────────────────┘ └─────┬──────┬─────┘
│ │
reference │ │ access
┌────────┘ └─────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Reference │ │ LiteralInteger │
├──────────────────┤ ├──────────────────┤
│ name: 'b' │ │ value: '2' │
│ ID: 5 │ │ ID: 6 │
└──────────────────┘ └──────────────────┘
┌────────────────────────────┐
│ Value │
┌───►├────────────────────────────┤
│ │ resulting_type: INT │
│ │ │
│ └────────────────────────────┘
│
│ ┌────────────────────────────┐
┌───────────────────┐ │ │ Program │
│ AnnotationMap │ │ ┌─►├────────────────────────────┤
│ │ │ │ │ qualified_name: PLC_PRG │
├───────┬───────────┤ │ │ │ │
│ ID: 1 │ Value ├───┘ │ └────────────────────────────┘
├───────┼───────────┤ │
│ ID: 2 │ Program ├─────┘ ┌────────────────────────────┐
├───────┼───────────┤ │ Variable │
│ ID: 3 │ Variable ├───────►├────────────────────────────┤
├───────┼───────────┤ │ resulting_type: MyStruct │
│ ID: 4 │ Value ├─────┐ │ qualified_name: PLC_PRG.a │
├───────┼───────────┤ │ └────────────────────────────┘
│ ID: 5 │ Variable ├───┐ │
├───────┼───────────┤ │ │ ┌────────────────────────────┐
│ ID: 6 │ Value ├─┐ │ │ │ Value │
└───────┴───────────┘ │ │ └─►├────────────────────────────┤
│ │ │ resulting_type: INT │
│ │ │ │
│ │ └────────────────────────────┘
│ │
│ │ ┌─────────────────────────────────┐
│ │ │ Variable │
│ └───►├─────────────────────────────────┤
│ │ resulting_type : ARRAY[] OF INT │
│ │ qualified_name : MyStruct.b │
│ └─────────────────────────────────┘
│
│ ┌────────────────────────────┐
│ │ Value │
│ ├────────────────────────────┤
└─────►│ resulting_type: DINT │
│ │
└────────────────────────────┘
Type vs. Type-Hint
The AnnotationMap not only offers annotations regarding the AST-node's type, but it also offers a second type of annotation.
Consider the following snippet:
PROGRAM PLC_PRG
VAR
x : SINT;
y : INT;
z : BYTE;
END_VAR
z := x + y;
END_PROGRAM
The assignment z := x + y
is loaded with different types:
x
is annotated as Variable of type SINT and will be auto-upgraded to DINT.y
is annotated as Variable of type INT and will be auto-upgraded to DINT.z
is annotated as Variable of type BYTE.x + y
is annotated as Value of type DINT (the bigger of both).
In order to make life easier for validation and code-generation we add an additional annotation to x + y
to indicate, that while it technically results in a DINT, it should rather be treated as a BYTE since it is going to be assigned to z
.
This second annotation is called the type-hint. It indicates that while it technically is not the real type of this expression, the program's semantic wants the compiler to treat it as this type.
The expression z := x + y
is annotated like this:
expression | type annotation | type-hint annotation | explanation |
---|---|---|---|
x | SINT | DINT | auto-upgraded to DINT |
y | INT | DINT | auto-upgraded to DINT |
z | BYTE | - | |
x + y | DINT | BYTE | type-hint indicates that the resulting DINT needs to be cast to BYTE |
With the help of the type-hint annotations the validation can decide whether certain type-cast operations are valid very easily. The code-generation steps can easily decide when to generate casts, by simply comparing a node's type annotation and it's type-hint annotation.
Dependencies
When generating multiple units, the Linker will keep track of a dependency-tree for the unit. This means that every datatype or global variable referenced directly or indirectly by the module will be marked as a dependency. This information can then be used during the codegen period to only generated types and variables that are relevant to the unit.
Validation
The validation module implements the semantic validation step of the compiler. The validator is a hand-written visitor that offers a callback when visiting the single AST-nodes to then perform the different validation tasks.
The validation rules are implemented in dedicated validator-structs:
Validator | Responsibilities |
---|---|
global_validator | Semantic rules on the level of declarations as a whole (e.g. name-conflicts) |
pou_validator | Semantic rules on the level of programs, function- and function-blocks. |
recursive_validator | Semantic rules on the level of recursion (e.g. struct referencing itself) |
stmt_validator | Semantic rules on the level of statements (e.g. invalid type-casts). |
variable_validator | Semantic rules on the level of variable declarations (e.g. empty var-blocks, empty structs, etc.). |
Diagnostics
Problems (semantic or syntactic) are represented as Diagnostics 1. Diagnostics carry information on the exact location inside the source-string (start- & end-offset), a custom message and a unique error-number to identify the problem.
There are 3 types of Diagnostics:
Diagnostic | Description |
---|---|
SyntaxError | A syntax error is a diagnostic that is created by the parser if it discovers a token-stream that does not match the language's grammar. |
GeneralError | General errors are problems that occured during the compilation process, that cannot be linked to a malformed input (e.g. file-I/O problems, internal LLVM errors, etc.). |
Improvement | Problems that may not prevent successful compilation but are still considered a flaw in the source-code. (e.g. use proprietary POINTER TO instead of the norm-compliant REF_TO). |
:(i): The diagnostics are subject to change since they don't elegantly represent the different types of problems (e.g. semantic problems).
Code-Generation
The codegen module contains all code that turns the parsed and verified code represented as an AST into llvm-ir code. To generate the IR we use a crate that wraps the native llvm C-API.
The code-generator is basically a transformation from the ST-AST into an IR-Tree representation. Therefore the AST is traversed in a visitor-like way and transformed simultaneously.
The code generation is split into specialized sub-generators for different tasks:
Generator | Responsibilities |
---|---|
pou_generator | The pou-generator takes care of generating the programming organization units (Programs, FunctionBlocks, Functions) including their signature and body. More specialized tasks are delegated to other generators. |
data_type_generator | Generates complex datatypes like Structs, Arrays, Enums, Strings, etc. |
variable_generator | Generates global variables and their initialization. |
statement_generator | Generates everything of the body of a POU except expressions. Non-expressions include: IFs, Loops, Assignments, etc. |
expression_generator | Generates expressions (everything that possibly resolves to a value) including: call-statements, references, array-access, etc. |
Generating POUs
Generating POUs (Programs, Function-Blocks, Functions) must generate the POU's body itself, as well as the POU's interface (or state) variables. In this segment we focus on generating the interface for a POU. Further information about generating a POU's body can be found [here].
Programs
A program is static POU with some code attached. This means that there is exactly one instance. So wherever from it is called, every caller uses the exact same instance which means that you may see the residuals of the laster caller in the program's variables when you call it yourself.
PROGRAM prg
VAR
x : DINT;
y : DINT;
END_VAR
END_PROGRAM
The program's interface is persistent across calls, so we store it in a global variable.
Therefore the code-generator creates a dedicated struct-type called prg_interface
.
A global variable called prg_instance
is generated to store the program's state across calls.
This global instance variable is passed as a this
pointer to calls to the prg
function.
%prg_interface = type { i32, i32 }
@prg_instance = global %prg_interface zeroinitializer
define void @prg(%prg_interface* %this) {
entry:
ret void
}
FunctionBlocks
A FunctionBlock is an POU that is instantiated in a declaration. So in contrast to Programs, a FunctionBlock can have multiple instances. Nevertheless the code-generator uses a very similar strategy. A struct-type for the FunctionBlock's interface is created but no global instance-variable is allocated. Instead the function block can be used as a DataType to declare instances like in the following example:
FUNCTION_BLOCK foo
VAR_INPUT
x, y : INT;
END_VAR
END_FUNCTION_BLOCK
PROGRAM prg
VAR
f : foo;
END_VAR
END_PROGRAM
So for the given example, we see the code-generator creating a type for the FunctionBlock's state (foo_interface
).
The declared instance of foo, in prg's
interface is seen in the program's generated interface struct-type (prg_interface
).
; ModuleID = 'main'
source_filename = "main"
%prg_interface = type { %foo_interface }
%foo_interface = type { i16, i16 }
@prg_instance = global %prg_interface zeroinitializer
define void @foo(%foo_interface* %0) {
entry:
ret void
}
define void @prg(%prg_interface* %0) {
entry:
ret void
}
Functions
Functions generate very similar to programs and function_blocks. The main difference is, that no instance-global is allocated and the function's interface-type cannot be used as a datatype to declare your own instances. Instances of the program's interface-type are allocated whenever the function is called for the lifetime of a single call. Otherwise the code generated for functions is comparable to the code presented above for programs and function-blocks.
Generating Data Types
IEC61131-3 languages offer a wide range of data types. Next to the built-in intrinsic data types, we support following user defined data types:
Range Types
For range types we don't generate special code. Internally the new data type just becomes an alias for the derived type.
Pointer Types
For pointer types we don't generate special code. Internally the new data type just becomes an alias for the pointer-type.
Struct Types
Struct types translate direclty to llvm struct datatypes. We generate a new datatype with the user-type's name for the struct.
TYPE MyStruct:
STRUCT
a: DINT;
b: INT;
END_STRUCT
END_TYPE
This struct simply generates a llvm struct type:
%MyStruct = type { i32, i16 }
Enum Types
Enumerations are represented as DINT
.
TYPE MyEnum: (red, yellow, green);
END_TYPE
For every enum's element we generate a global variable with the element's value.
@red = global i32 0
@yellow = global i32 1
@green = global i32 2
Array Types
Array types are generated as fixed sized llvm vector types - note that Array types must be fixed sized in ST :
TYPE MyArray: ARRAY[0..9] OF INT;
END_TYPE
VAR_GLOBAL
x : MyArray;
y : ARRAY[0..5] OF REAL;
END_VAR
Custom array data types are not reflected as dedicated types on the llvm-level.
@x = global [10 x i16] zeroinitializer
@y = global [6 x float] zeroinitializer
Multi dimensional arrays
Arrays can be declared as multi-dimensional:
VAR_GLOBAL
x : ARRAY[0..5, 2..5, 0..1] OF INT;
END_VAR
The compiler will flatten these type of arrays to a single-dimension. To accomplish that, it calculates the total length by mulitplying the sizes of all dimensions:
0..5 x 2..5 x 0..1
6 x 4 x 2 = 64
So the array x : ARRAY[0..5, 2..5, 0..1] OF INT;
will be generated as:
@x = global [64 x i16] zeroinitializer
This means that such a multidimensional array must be initialized like a single-dimensional array:
- wrong
VAR_GLOBAL
wrong_array : ARRAY[1..2, 0..3] OF INT := [ [10, 11, 12],
[20, 21, 22],
[30, 31, 32]];
END_VAR
- correct
VAR_GLOBAL
correct_array : ARRAY[1..2, 0..3] OF INT := [ 10, 11, 12,
20, 21, 22,
30, 31, 32];
END_VAR
Nested Arrays
Note that arrays declared as
x : ARRAY[0..2] OF ARRAY[0..2] OF INT
are different from mutli-dimensional arrays discussed in this section. Nested arrays are represented as multi-dimensional arrays on the LLVM-IR level and must also be initialized using nested array-literals!
String Types
String types are generated as fixed sized vector types.
VAR_GLOBAL
str : STRING[20];
wstr : WSTRING[20];
END_VAR
Strings can be represented in two different encodings: UTF-8 (STRING) or UTF-16 (WSTRING).
@str = global [21 x i8] zeroinitializer
@wstr = global [21 x i16] zeroinitializer
CFC (Continous Function Chart)
RuSTy is compatible with CFC, as per the FBD part detailed in the IEC61131-3 XML-exchange format. The CFC implementation borrows extensively from the ST compiler-pipeline, with the exception that the lexical analysis and parsing phases are replaced by a model-to-model conversion process. This involves converting the XML into a structured model, which is then converted into ST AST statements.
The next chapter will walk you through the CFC implementation, giving you a better understanding of underlying code.
Model-to-Model Conversion
As previously mentioned, the lexical and parsing phases are replaced by a model-to-model conversion process which consists of two steps:
- Transform the input file (XML) into a data-model
- Transform the data-model into an AST
XML to Data-Model
Consider the heavily minified CFC file MyProgram.cfc
, which translates to the CFC chart below.
x MyAdd
┌─────────────┐ ┌─────────────────┐
│ │ │ exec_id:0 │
│ ├───────►│ a │ z
│ local_id: 0 │ │ ref_local_id: 0 │ ┌──────────────┐
└─────────────┘ │ │ │ exec_id: 1 │
y │ ├─────────►│ │
┌─────────────┐ │ │ │ref_local_id:2│
│ │ │ │ └──────────────┘
│ ├───────►│ b │ local_id: 3
│ local_id:1 │ │ ref_local_id: 1 │
└─────────────┘ └─────────────────┘
local_id: 2
The initial phase of the transformation process involves streaming the entire input file.
During the streaming process, whenever important keywords such as block
are encountered, they are directly mapped into a corresponding model structure.
For example, when reaching the line <block localId="3" ...>
within the XML file, we generate a model that can be represented as follows:
struct Block {
localId: 2,
type_name: "MyAdd",
instance_name: None,
execution_order_id: 0,
variables: [
InputVariable { ... }, // x, with localId = 0
InputVariable { ... }, // y, with localId = 1
OutputVariable { ... }, // MyAdd eventually becoming `z := MyAdd`, with z having a localId = 2
]
}
This process is repeated for every element in the input file which has a corresponding model implementation. For more information on implementation details, see the model folder.
Since the CFC programming language utilizes blocks and their interconnections to establish the program's logic flow,
with the sequencing of block execution and inter-block links represented through corresponding localId
, refLocalId
and excutionOrderId
,
we have to order each element by their execution ID before proceeding to the next phase.
Otherwise the generated AST statements would be out of order and hence semantically incorrect.
Data-Model to AST
The final part of the model-to-model transformation takes the input from the previous step and transforms it into an AST which the compiler pipeline understands and can generate code from.
Consider the previous block
example - the transformer first encounters the element with the executionOrderId
of 0, which is a call to myAdd
.
We then check and transform each parameter, input a
and b
corresponding to the variables x
and y
respectively. The result of this transformation looks as follows:
CallStatement {
operator: myAdd,
parameters: [x, y]
}
Next, we process the element with an executionOrderId
of 1, which corresponds to an assignment of the previous call's result to z. This update modifies the generated AST as follows:
AssignmentStatement {
left: z,
right: CallStatement {
operator: myAdd,
parameters: [x, y]
}
}
While this explanation covers the handling of blocks and variables, there are other elements (e.g. control-flow), that are not discussed here. For more information on implementation details, see plc_xml/src/xml_parser
.
Finally, after transforming all elements into their respective AST statements, the result is passed to the indexer and subsequently enters the next stages of the compiler pipeline, as described in the architecture documentation).
Appendix
MyAdd.st
FUNCTION MyAdd : DINT
VAR_INPUT
x, y : DINT;
END_VAR
MyAdd := x + y;
END_FUNCTION
MyProgram.cfc
<pou xmlns="http://www.plcopen.org/xml/tc6_0201" name="myProgram" pouType="program">
<content>
PROGRAM myProgram
VAR
x, y, z : DINT;
END_VAR
</content>
<body>
<FBD>
<inVariable localId="1" height="20" width="80" negated="false">
<expression>x</expression>
</inVariable>
<inVariable localId="2" height="20" width="80" negated="false">
<expression>y</expression>
</inVariable>
<block localId="3" width="74" height="60" typeName="MyAdd" executionOrderId="0">
<inputVariables>
<variable formalParameter="x" negated="false">
<connectionPointIn>
<connection refLocalId="1"/>
</connectionPointIn>
</variable>
<variable formalParameter="y" negated="false">
<connectionPointIn>
<connection refLocalId="2"/>
</connectionPointIn>
</variable>
</inputVariables>
<outputVariables>
</variable formalParameter="MyAdd" negated="false">
</outputVariables>
</block>
<outVariable localId="4" height="20" width="80" executionOrderId="1" negated="false" storage="none">
<position x="680" y="160"/>
<connectionPointIn>
<connection refLocalId="3" formalParameter="MyAdd"/>
</connectionPointIn>
<expression>z</expression>
</outVariable>
</FBD>
</body>
</pou>