RuSTy

RuSTy is a structured text (ST) compiler written in Rust and based on the LLVM compiler backend. We use the logos crate library to perform lexical analysis before the custom parser runs. RuSTy puts out static or shared objects as well as LLVM IR or bitcode by the flip of a command line flag. We are aiming towards an open-source industry-grade ST compiler supporting at least the features in 2nd edition IEC 61131 standard.

You might also want to refer to the API documentation.

Supported Language Concepts

POUs

  • ✔ Program
  • ✔ Function
  • ✔ FunctionBlock
  • ✔ Action

Datatypes

  • ✔ IEC 61131-3 numeric types
  • ✔ Strings
  • ✔ Wide Strings
  • ✔ Struct types
  • ✔ Enum types
  • ✔ Array data types
  • ✔ Alias types
  • ✔ Sub-ranges types
  • ✔ Date and Time types
  • ✔ Sized String types
  • ✔ Sized Wide String types
  • ✔ Initial values

Declarations

  • ✔ VAR
  • ✔ VAR_INPUT
  • ✔ VAR_INPUT {ref}
  • ✔ VAR_OUTPUT
  • ✔ VAR_IN_OUT

Statements

  • ✔ Assignments
  • ✔ Call statements
  • ✔ Implicit call arguments
  • ✔ Explicit call arguments
  • ✔ EXIT, CONTINUE statements

Control Structures

  • ✔ IF Statement
  • ✔ CASE Statement
  • ✔ FOR Loops
  • ✔ WHILE Loops
  • ✔ REPEAT Loops
  • ✔ RETURN statement

Expressions

  • ✔ Arithmetic Operators
  • ✔ Relational Operators
  • ✔ Logical Operators
  • ✔ Bitwise Operators

Build & Install

RuSTys code can be found on GitHub. By default a Dockerfile and a devcontainer.json file are provided. If you wish to develop natively however, you will need some additional dependencies namely:

  • Rust
  • LLVM 14
  • LLVM Polly
  • Build Tools (e.g. build-essential on Ubuntu)
  • zlib

The next sections cover how to install these dependencies on different platforms, if you already have them however, RuSTy can be build using the cargo command. For debug builds this can be accomplished by executing cargo build and for release builds (smaller & faster) you would execute cargo build --release. The resulting binaries can be found at target/debug/plc and target/release/plc respectively.

Ubuntu

The specified dependencies can be installed with the following command on Ubuntu:

sudo apt install                \
    build-essential             \
    llvm-14-dev liblld-14-dev   \
    libz-dev                    \
    lld                         \
    libclang-common-14-dev

Additionally you might need libffi7, which can be installed with sudo apt install libffi7.

Debian

Same as Ubuntu with the exception of adding additional repository sources since Debian 11 only includes LLVM packages up to version 11. To do so follow the official documentation.

MacOS

On MacOS you need to install the Xcode Command Line Tools.

Furthermore LLVM 14 is needed, which can be easily installed with homebrew :

brew install llvm@14

After the installation you have to add /opt/homebrew/opt/llvm@14/bin to your $PATH environment variable, e.g. with the following command:

echo 'export PATH="/opt/homebrew/opt/llvm@14/bin:$PATH"' >> ~/.zshrc

Windows

Compiling RuSTy on Windows requires three dependencies:

  1. Windows 10 SDK
  2. MSVC (at the point of writing this we tested it on v142 - VS 2019 C++ x64/x86 build tools)
  3. LLVM 14.0.6

The first two dependencies are typically installed during the Rust installation itself. More specifically during the installation you should have been prompted to install them. If not, you'll be able to install them via Visual Studio at any point. The third dependency is based on a custom build which is hosted on GitHub. Download it, extract it and add the bin/ directory to your environment variables. In theory this should cover everything to be able to compile RuSTy (with some reboots here and there).

Installing

TODO

Troubleshooting

  • Because of weak compatibility guarantees of the LLVM API, the LLVM installation must exactly match the major version of the llvm-sys crate.Currently you will need to install LLVM 14 to satisfy this constraint. Read more
  • To avoid installation conflicts on Linux/Ubuntu, make sure you don't have a default installation available (like you get by just installing llvm-dev), which may break things. If you do, make sure you have set the appropriate environment variable (LLVM_SYS_140_PREFIX=/usr/lib/llvm-14 for LLVM 14), so the build of the llvm-sys crate knows what files to grab.

Using RuSTy

The RuSTy compiler binary is called plc

plc offers a comprehensive help via the -h (--help) option. plc takes one output-format parameter and any number of input-files. The input files can also be written as glob patterns.

plc [OPTIONS] <input-files>... <--ir|--shared|--pic|--static|--bc>

Note that you can only specify at most one output format. In the case that no output format switch has been specified, the compiler will select --static by default.

Similarly, if you do not specify an output filename via the -o or --output options, the output filename will consist of the first input filename, but with an appropriate file extension depending on the output file format.

A minimal invocation looks like this: plc input.st this will take in the file input.st and compile it into a static object that will be written to a file named input.o.

More examples:

  • plc --ir file1.st file2.st will compile file1.st and file2.st.
  • plc --ir file1.cfc file2.st will compile file1.cfc and file2.st.
  • plc --ir src/*.st will compile all ST files in the src-folder.
  • plc --ir "**/*.st" will compile all ST-files in the current folder and its subfolders recursively.

Example: Building a hello world program

Writing the code

We want to print something to the terminal, so we're going to declare external functions for that. This example is available under examples/hello_world.st in the main RuSTy repository.

  • main is our entry point to the program.
  • To link the program, we are going to use the system's linker using the --linker=cc argument.
  • On Windows and MacOS, replace this with --linker=clang as cc is usually not available.
{external}
FUNCTION puts : DINT
VAR_INPUT {ref}
    text : STRING;
END_VAR
END_FUNCTION

FUNCTION main : DINT
    puts('hello, world!$N');
END_FUNCTION

Compiling with RuSTy

The RuSTy command line interface is similar to that of other compilers.

If you just want to build an object file, then do this:

plc -c hello_world.st -o hello_world.o

Optimization

plc offers 4 levels of optimization which correspond to the levels established by llvm respectively clang (none to aggressive, respectively -O0 to -O3).

To use an optimization, the flag -O or --optimization is required:

  • plc -c "**/*.st" -O none
  • plc -c "**/*.st" -O less
  • plc -c "**/*.st" -O default
  • plc -c "**/*.st" -O aggressive

By default plc will use default which corresponds to clang's -O2.

Linking an executable

Instead, you can also compile this into an executable and run it:

plc hello_world.st -o hello_world --linker=cc
./hello_world

Please note that RuSTy will attempt to link the generated object file by default to generate an executable if you didn't specify something else (option -c).

  • The --linker=cc flag tells RuSTy that it should link with the system's compiler driver instead of the built in linker. This provides support to create executables.
  • Additional libraries can be linked using the -l flag, additional library paths can be added with -L
  • You add library search paths by providing additional -L /path/... options. By default, this will be the current directory.
  • The linker will prefer a dynamically linked library if available, and revert to a static one otherwise.

Building for separate targets

RuSTy supports building for multiple targets by specifing the --target and optionally the --sysroot command.

  • Multiple targets and sysroot can be specified for the compilation simply by adding additional --target and --sysroot entries.

--target

To build and compile structured text for the rigth platform we need to specify the target. As RuSTy is using LLVM a target-tripple supported by LLVM needs to be selected. The default target is the host machine's target. So if a dev container on an x86_64-docker is used the target is x86_64-linux-gnu.

--sysroot

plc use the sysroot option for linking purposes. It is considered to be the root directory for the purpose of locating headers and libraries.

  • If a target and sysroot are provided, the output will always be stored in a folder with the target name (e.g. an x86_64-linux-gnu target will have the output strored in a folder called x86_64-linux-gnu)
  • --sysroot parameters have to always match target parameters, there can be no sysroot without a target.

Parallel Compilation

By default, plc uses parallel compilation.

This option can be controlled with the -j or --threads flag. A value above 0 will indicate the number of threads to use for the compilation Leaving the value unset, setting it to 0 or simply specifying -j sets the value to the maximum threads that can run for the current machine. This is determined by the underlying parallelisation library Rayon

Single module Compilation

With the introducton of parallel compilation, every unit is compiled into an object file independently and then linked together in a single module. This behaviour might not always be desired and can be disabled using the --single-module flag.

Note that the single module flag is currently much slower to produce as it requires first generating all modules and then merging them together.

Configuration Options

plc supports different configuration options, these can be printed using the config subcommand

config schema

Outputs the json schema used for the validation of the plc.json file

config diagnostics

Ouputs a json file with the default error severity configuration for the project. See Error Configuration for more information.

Build Configuration

In addition to the comprehensive help, plc offers a build subcommand that simplifies the build process.
Instead of having numerous inline arguments, using the build subcommand along with a build description file makes passing the arguments easier.
The build description file needs to be saved in the json format.

Usage: plc build

Note that if plc cannot find the plc.json file, it will throw an error and request the path. The default location for the build file is the current directory.

The command for building with an additional path looks like this: plc build src/plc.json

Build description file (plc.json)

For the build description file to work, it must be written in the json format. All the keys used in the build description file are described in the following sections.

files

The keyword files is the equivalent to the input parameter, which adds all the ST files that need to be compiled.

The value of files is an array of strings, definied as follows:

"files" : [
    "examples/hello_world.st",
    "examples/hw.st"
    "examples/*.gvl"
]

libraries

To link several objects into one executable plc has the option to add libraries and automatically build and link them together.
The libraries keyword is optional.

"libraries" : [
    {
        "name" : "iec61131std",
        "path" : "path/to/lib/",
        "package" : "Copy",
        "include_path" : [
            "examples/hw.st",
            "examples/hello_world.st"
        ]
    }
]

output

Similarly to specifying an output file via the -o or --output option using the command line, in the build file we use "output" : "output.so" to define the output file. The default location is the current build directory. (see Build Location).

compile_type

The following options can be used for the compile_type :

  • Static specifies that linking/binding must be done at compile time.
  • Shared (dynamic) specifies that linking/bingind must be done dynamically (at runtime).
  • PIC Position Independent Code (Choosing this option implies that the linking will be done dynamically).
  • Relocatable generates relocatable object code (for combining with other object code).
  • Bitcode adds bitcode alongside machine code in executable file.
  • IR intermediate llvm representation.

The compile format is specified in the build description file as follows: "compile_type" : "Shared". The compile_type keyword is optional.

package_commands

The package_commands keyword is optional.

TODO

Example

{
    "files" : [
        "examples/hw.st",
        "examples/hello_world.st",
        "examples/ExternalFunctions.st",
        "examples/*.dt"
    ],
    "compile_type" : "Shared",
    "output" : "proj.so",
    "libraries" : [
        {
            "name" : "iec61131std",
            "path" : "path/to/lib",
            "package" : "Copy",
            "include_path" : [
                "examples/lib.st"
            ]
        },
        {
            "name" : "other_lib",
            "path" : "path/to/lib",
            "package" : "System",
            "include_path" : [
                "examples/hello_world.st"
            ]
        }
    ]
}

Build Parameters

The build subcommand exposes the following optional parameters:

--build-location

The build location is the location all build files will be copied to.
By default the build location is the build folder in the root of the project (the location of the plc.json).
This can be overriden with the --build-location command line parameter.

--lib-location

The lib location is where all libraries marked with Copy will be copied.
By default it is the same as the build-location.
This can be overriden with the --lib-location command line parameter.

Environment Variables

Environment variables can be used inside the build description file, the variables are evaluated before an entry is evaluated.

In addition to externally defined variables, the build exports variables that can be referenced in the description file:

PROJECT_ROOT

The folder containing the plc.json file, i.e. the root of the project.

ARCH

The target architecture currently being built, for a multi architecture build. The value for ARCH will be updated for every target.

Example targets are: x86_64-pc-linux-gnu, x86_64-pc-windows-msvc, aarch64-pc-linux-musl

BUILD_LOCATION

BUILD_LOCATION is the folder where the build will be saved. This is the value of either the --build-location parameter or the default build location.

LIB_LOCATION

LIB_LOCATION is the folder where the lib will be saved. This is the value of either the --lib-location parameter or the build location.

Usage

To reference an environment variable in the description file, reference the variables with a preceding $.

Example:

{
 "name" : "mylib",
 "path" : "$ARCH/lib",
 "package" : "System",
 "include_path" : [
  "examples/hello_world.st"
 ]
}

Validation

The build description file uses a Json Schema file located at compiler/plc_project/schema/plc-json.schema to validate the build description before build. In order for the schema to be used, it has to be either in that location for source builds or copied next to the build binaries. If the schema is not found, the schema based validation will be skipped.

Error Configuration

Errors in a plc project can be configured by providing a json configuration file. A diagnostics severity can be changed for example from warning to error or info and vice-versa or ignored completely. To see a default error configuration use plc config diagnostics. To provide a custom error configuration use plc --error-config <custom.json>. Note that the --error-config command can be used with all subcommands such as build and check. Running plc config diagnostics --error-config <custom.json> will print out the full diagnostics configuration taking the provided overrides into account.

Error Description

Errors produced by plc can be explained using the plc explain <ErrorCode> command. Error codes are usually provided in the diagnostic report.

General Error

This error is a catch all error. It is usually thrown when no other error better matches the case.

General IO Error

This error describes a problem during an IO operation such as reading or writing a file. It is usually accompanied by an internal error with further details.

Parameter Error

This error describes a problem with the command parameters, such as a file required for the compilation not being found.:

Duplicate Symbol

The marked symbol has been defined multiple times.

Generic LLVM Error

An unexpected error occurred during the LLVM generation phase. This is usually a follow up problem from a different diagnostics. If it occurrs without a previous diagnostics please file a bug report.

Missing Token

During the parsing phase, an additional Token (Element) was required to correctly interpret the code. The error message usually indicates what Token was missing.

Example

In the following example the name (Identifier) of the program is missing.

PROGRAM (*name*)
END_PROGRAM
error: Unexpected token: expected Identifier but found END_PROGRAM
  ┌─ example.st:2:1
  │
2 │ END_PROGRAM
  │ ^^^^^^^^^^^ Unexpected token: expected Identifier but found END_PROGRAM

Unexpected Token

During parsing, a Token (Element) was encountered in the wrong location. This could be an indication of a missused or misspelled keyword

Invalid Range

Mismatched Parantheses

Invalid time literal

Invalid Number

Missing Case Contition

Keywords should contain Underscores

Wrong paranthese for String delimiter

POINTER_TO is no standard keyword

Return types cannot have a default value

Classes cannot contain implementation

Duplicate Label

Classes cannot contain IN_OUT variables

Classes cannot contain a return type

POUs cannot be extended

Missing container name for action

Statement has no effect

Invalid Pragma Location

Missing return type

Unexpected return type

Unsupported return type

Empty variable block

Recursive data structure

Missing IN_OUT parameters

Invalid parameter type

Invalid number of parameters

Unresolved Constant

Invalid constant block

Invalid Constant

Cannot assign to constant

Invalid assignment

Missing type

Variable Overflow

Invalid Enum Variant

This error indicates the right-hand side in an enum assignment is invalid. For example an enum such as TYPE Color : (red := 0, green := 1, blue := 2); END_TYPE can only take values which (internally) yield a literal integer 0, 1 or 2.

Invalid variable initializer

Assignment to Reference

Invalid array assignment

Invalid POU for VLA

Invalid VLA array access

VLA Dimension out of bounds

VLAs are always By Reference

Unresolved Reference

Illegal reference access

Expression is not assignable

Typecast error

Unknown type

Literal out of range

Literal not compatible with type

Incompatible direct access

Incompatible variable for direct access

Invalid range for direct access

Invalid range for array access

Invalid variable for array access

Direct access to variable with %

Expected literal

Invalid Nature

Unknown Nature

Unresolved Generic

Incompatible size

Invalid operation

Implicit typecast

Pointer derefernce to non pointer

Array access to non array value

Address-of requires a value

General codegen error

Missing function

Missing compare function

Cannot generate string literal

Initial values were not generated

General debug error

Generic linker error

Duplicate case condition

Case condition outside of a case statement

Invalid case condition

Empty control statement

Undefined node

Unexpected node

Unconnected source

Cyclic connection

No associated connector

Unnamed control

Invalid PLC Json file

Invalid Call parameters

Incompatible reference assingment

Unsafe Enum Assignment

At runtime there is no way to guarantee that a non-const reference will not change its value to something out-of-bounds for enums. For example consider the following

PROGRAM main
    VAR
        zero  : DINT := 0;
        color : (red := 0, green := 1, blue := 2);
    END_VAR

    zero := 10;
    color := zero; // Invalid because `color` accepts values from 0 to 2, but we assigned 10 to it
END_PROGRAM

Equivalent enum value used

This message indicates that the assigned enum value is not part of the enum, but is equivalent to one of the internal values of the enum.

Example:

TYPE Colors : (Red, Green, Blue, Yellow) END_TYPE
TYPE Directions : (N, S, W, E) END_TYPE

VAR_GLOBAL
    col : Colors := N; //N is equivalent to Red but is not part of the enum
    dir : Directions := Red; //Red is equivalent to N but is not part of the enum
END_VAR

To solve the issue, use the equivalent value indicated by the enum

Return Value Of Void Functions

Functions of type VOID can not have an explicit return value, e.g. foo := 1 in the following example is invalid.

FUNCTION foo
    foo := 1;
END_FUNCTION

Choose a type for your function, if a value must be returned.

Invalid Conditional Value

Control statements such as IF, FOR and WHILE require specific types for their condition.

If, While

IF and WHILE statements require an expression which yields a boolean, any other type is invalid and will trigger an error.

For

FOR statements require four conditional values: a counter, a start value, an end value and a step value. All of these need to be integers and share the same type.

FOR start := counter TO end BY step DO
// ...
END_FOR

Action call without parentheses

Integer Condition

This error is generated because an integer was used in a IF or WHILE statement, when a boolean was expected.

See also plc explain E094

Invalid Array Range

Ranges such as ARRAY [0..-1] are invalid in ST because end values of ranges must be greater than their start values. A valid range for the given statement would have been ARRAY[-1..0].

Libraries

RuSTy does not currently have support for importing source based libraries.

Source based libraries can, however, be compiled together with the application as normal files.

Precompiled libraries or system functions can be added using compilation flags or an entry in the plc.json file.

System functions can also be added using External Function for each POU in that library.

Library Structure

A library is defined by:

  • A set of st interfaces, each interface represents a function that has been precompiled.

In a POU, the interface is the definition and variable section e.g:

(*Interface for program example *)
PROGRAM example
VAR_INPUT
 a,b,c : DINT
END_VAR
(* End of interface *)

(* Implementation *)
END_PROGRAM
  • A binary file for each architecture the library has been built for (x86_64-linux-gnu, aarch64-linux-gnu, ..)

Linking libraries using the plc command line

To include a library when using the plc command line interface, the include files can be added using the -i flag.

Each POU, Global Variable, or Datatype defined in the included files will be added to the project. POUs and Global variables included with the -i are marked as external, the implementation part of a POU is ignored.

To link the library, two options are then available: Shared and Static libraries.

Shared Libraries

A shared library (i.e. extension .so) can be linked using the -l flag.
For a library called mylib, when the flag -lmylib is passed, the linker will search for a file called libmylib.so.

Note that the lib<LibName>.so format is required by the linker for unix like systems.

The library locations used by the linker are the default search locations of the linker (i.e. /usr/lib, /lib), additional paths can be provided using the -L flag (e.g -L/opt/lib will make the linker also search for files in /opt/lib).
Additional library locations can be provided by supplying additional -L entries.
Additionally, the environment variable LD_LIBRARY_PATH can be defined to append entries to the linker's search location. More information can be found here.

Static Libraries

Static libraries compiled as object files can be linked by simply passing the object file (i.e. extension .o) as an input (simlar to other .st files).

Archive files (i.e. extension .a) can be linked similarly to Shared Libraries using the -l flag. If the application is being compiled with the --static flag (or no shared library (.so) is found), the linker will use the archive file.

If neither a shared object (.so) or an archive file (.a) is found, compilation will fail.

Command line example

To compile a file called input.st including a header and linking a library called libiec.so from /lib :

plc input.st -i iec/header.st -L/lib/ -liec

Linking libraries using the Build Description File plc.json

Libraries can be added to a project managed with a Build Description File.
To add a library to the project, the "libraries" section can be used. A library entry requires a name, a path, the package behaviour, and a set of files to include (include_path).

name

The name of the library to be linked. This will be used by the linker to find the library.
A library with the name mylib must have an equivalant compiled file called libmylib.so.

Note, archive files (ending with .a) are currently not supported.

path

The location of the library to be linked. The path can be either absolute or relative to the project.

package

The packaging option for the library, i.e wether the library should be copied or is already available on the system.
The value "Copy" indicates that the given library should be copied to the Library Location.
The value "System" indicates that the given library exists on the system and does not need to be copied.

include_path

A list of files (can include globs) that should be included with the project.
Each POU, Global Variable, or Datatype defined in the included files will be added to the project. POUs and Global variables included in the list are marked as external, the implementation part of a POU is ignored.

Library Location

Libraries marked as Copy will be copied during the compilation to the defined Library Location. By default this is the same as the Build Location unless overridden by the --lib-location parameter.

Using environment variables

Since libraries can be compiled for multiple targets, the lib path can contain environment variables to disambiguate the compile location. $ARCH can be used as placeholder in the path to indicate the the currently compiled target.


During linking, if no .so file with name lib<name>.so is found, the compilation will fail.

Configuration Example (plc.json)

A configuration example for a Copy library called mylib and a System library called std:

"libraries" : [
    {
        "name" : "mylib",
        "path" : "libs/$ARCH/",
        "package" : "Copy",
        "include_path" : [
            "simple_program.st"
        ]
    },
    {
        "name" : "std",
        "path" : "libs/$ARCH/",
        "package" : "System",
        "include_path" : [
            "include/*.st"
        ]
    }
]

External Functions

A POU (PROGRAM, FUNCTION, FUNCTION_BLOCK) can be marked as external, which will cause the compiler to ignore its implementation.

{external}
FUNCTION log : DINT
VAR_IN_OUT
  message : STRING[1024];
END_VAR
VAR_INPUT
  type : (Err,Warn,Info) := Info;
END_VAR
END_FUNCTION

At compilation time, the function log will be defined as an externally available function, and can be called from ST code.

Note: At linking time, a log function with a compatible signature must be available on the system.

Calling C functions

ST code can call into foreign functions natively. To achieve this, the called function must be defined in a C compatible API, e.g. extern "C" blocks.

The interface of the function has to:

  • either be included with the -i flag
  • or be declared in ST using the {external} keyword

When including multiple header files/function interfaces, the -i flag must precede each individual file, e.g. -i file1.st -i file2.st -i file3.st. Alternatively, when including an entire folder with -i '/liblocation/*.st', the path must be put in quotes, otherwise the command-line might parse the arguments in a way that is incompatible (i.e. does not precede each file with -i).

Example

Given a min function defined in C as follows:

int min(int a, int b) {
//...
}

an interface of that function in ST can be defined as:

{external}
FUNCTION min : DINT
VAR_INPUT
  a : DINT;
  b : DINT;
END_VAR
END_FUNCTION

Variadic arguments

Some foreign functions, especially ones defined in C, could be variadic functions.

These functions are usually defined with the last parameter ..., and signify that a function can be called with unlimited parameters.

An example of a variadic function is printf.

Calling a variadic function is supported in ST. To mark an external function as variadic, you can add a parameter of type ... to the VAR_INPUT block.

Variadic function example

Given the printf function defined as:

int printf( const char *restrict format, ... );

the ST interface can be defined as:

{external}
FUNCTION printf : DINT
VAR_INPUT {ref}
  format : STRING;
END_VAR
VAR_INPUT
  args : ...;
END_VAR
END_FUNCTION

Runnable example

With the printf function available on the system, there is no need to declare the C function.

An ST program called ExternalFunctions.st with the following code can be declared:

(*ExternalFunctions.st*)

(**
 * The printf function's interface, marked as external since
 * it is defined directly along other ST functions
 *)
{external}
FUNCTION printf : DINT
VAR_INPUT {ref}
    format : STRING;
END_VAR
VAR_INPUT
    args: ...;
END_VAR
END_FUNCTION


(**
* The main function of the program prints a demo to the standard out
* The function main is implemented at this location and thus not marked
* as {external}
*)
FUNCTION main : DINT
VAR
    tmp : DINT;
END_VAR
  tmp := 1;
  printf('Value %d, %d, %d$N', tmp, tmp * 10, tmp * 100);
  main := tmp;
END_FUNCTION

Compiling the previous code with the following command:

plc ExternalFunctions.st -o ExternalFunctions --linker=clang

will yield an executable called ExternalFunctions.

We use clang to link the generated object file and generate an executable since the embedded linker cannot generate executable files.

The executable can then be started with ./ExternalFunctions.

Program Organization Unit (POU)

Definition

A POU is a executable unit available in an IEC61131-3 application. It can be defined as either a Program, a Function, a Function Block, or an Action.

Methods on classes are also considered POUs but are not covered by this document

A POU is defined as:

<POU Type> name
(* parameters *)

(* code *)
END_<POU Type>

Parameters

POUs can use input, output, or in/out parameters to pass data to the outside. Such parameters are defined in a variable block delimeted by VAR_<TYPE> and END_VAR Supported parameter types are VAR_INPUT, VAR_INPUT {ref}, VAR_OUTPUT and VAR_IN_OUT

Input

Input parameters are typically copied into the target POU to be stored and read for later references.

A definition for input parameters is as follows:

VAR_INPUT
    a : INT;
END_VAR

In some cases, especially when passing large strings or arrays, or when interacting with foreign code (see External Functions) it is more efficient to avoid copying the variable values and just use a pointer to the required input. This can be done either using the in/out variables or by specifying a special property ref on the input block.

Example:

VAR_INPUT {ref}
    a : STRING;
END_VAR

Note that passing the ref property will convert all variables in that block to pointers, and should only be used in Functions.

In Out

In/Out parameters are required parameters that are always passed by reference. They can be modified by the POU the call, and the changes are applied directly to the passed variable. An In/Out parameter must always be passed in a POU call and cannot be stored.

Output

Output parameters are used to return the result(s) of the POU call. They are passed by reference, but are optional. If an output parameter is not passed in a call, its value is not persisted.

Variables

In addition to parameters, a POU contains local variables, these can either be stored in the POU for later reference (VAR) or only created for a single call (VAR_TEMP) In a function, all local variables are temporary.

Specialization

In addition to the default behavior, each type of POU has some special cases.

Function

Functions are stateless sequences of callable code. They are not backed by any structs, and cannot hold any state accross multiple calls. A function's input parameter can be passed by value, or by reference.

Functions also support a return type, the resulting definition is:

    FUNCTION fnName : <TYPE>
    (* parameters *)
    VAR_INPUT (* by value *)
        x : INT;
    END_VAR
    VAR_INPUT {ref} (* by reference *)
        x : INT;
    END_VAR
    (* temporary variables *)
    VAR
        y : INT;
    END_VAR
    VAR_TEMP
        z : INT;
    END_VAR

    (* code *)
    END_FUNCTION

Program

Programs are a static (i.e. GLOBAL) STRUCT that holds its state accross multiple calls. A Program exists once, and only once in an application, and subsequent calls to a program will change and store the passed parameters as well as internal variables. A program does not support passing input parameters by reference.

Example:

PROGRAM prg
(* parameters *)
VAR_INPUT
    x : INT;
END_VAR
(* persisted variables *)
VAR
    y : INT;
END_VAR
(* temporary variables *)
VAR_TEMP
    z : INT;
END_VAR
(* code *)
END_PROGRAM

Function Block

A function block is a STRUCT that can be initialized multiple times using different variables (i.e instances). A function block instance can hold its state (including input parameters) across multiple calls, but does not share any state with different instances. A function block does not support passing input parameters by reference.

FUNCTION_BLOCK fb
(* parameters *)
VAR_INPUT
    x : INT;
END_VAR
(* persisted variables *)
VAR
    y : INT;
END_VAR
(* temporary variables *)
VAR_TEMP
    z : INT;
END_VAR
(* code *)
END_FUNCTION_BLOCK

Action

An action is represented by a parent struct, and does not define its own interface (VAR blocks). An action can only be defined for Programs and Function Blocks.

An action is defined in 3 different ways, either in a container (ACTIONS) directly below the POU, in a named ACTIONS container, or using a qualified name on the action.

Example:

FUNCTION_BLOCK fb
(* parameters *)
VAR_INPUT
    x : INT;
END_VAR
(* persisted variables *)
VAR
    y : INT;
END_VAR
(* temporary variables *)
VAR_TEMP
    z : INT;
END_VAR
(* code *)
END_FUNCTION_BLOCK

ACTIONS (* implicitly belongs to FB *)
    ACTION act
    (* code *)
    END_ACTION
END_ACTIONS

ACTIONS fb (* explicitly belongs to FB *)
    ACTION act2
    (* code *)
    END_ACTION
END_ACTIONS

ACTION fb.act3 (* linked to FB with name definition *)
(* code *)
END_ACTION

Variables

Constants

Variable declaration blocks can be delcared as CONSTANT. All variables of a constant declaration block become constants. Constant variables can not be changed and need to be initialized.

Example

TYPE OneInt : INT := 1; END_TYPE

VAR_GLOBAL CONSTANT
    MAX_SIZE : INT := 99;
    MIN_LEN : INT := 1;
    counter : OneInt;  (* 1 *)
END_VAR

PROGRAM PLC_PRG
    VAR CONSTANT
        DEFAULT_INPUT : BOOL := FALSE;
    END_VAR
END_PROGRAM

Variable Initialization

Initializers of variables are evaluated at compile time. Therefore they can only consist of literals, other constants or expressions consisting of a combination of them. Note that initializers must not contain recursive definitions.

If a variable has no initializer, the variable may be initialized with it's datatype's default value or else with 0.

Array Initialization

Arrays can be initialized using array literals. If the array-initial value does not contain all required elements, the array's inner type's default value will be used to fill the missing values.

Example

TYPE SignalValue : INT := -1; END_TYPE

VAR_GLOBAL CONSTANT
    MIN_LEN : INT := 1;
    MAX_LEN : INT := 100;

    SIZE : INT := MAX_LEN - MIN_LEN;
END_VAR

PROGRAM PLC_PRG
    VAR_INPUT
        signals: ARRAY[0..SIZE] OF SignalValue := [99, 99]; (* rest is -1 *)
    END_VAR

    ...
END_PROGRAM

Datatypes

Numeric types

A variety of numeric types exist with different sizes and properties complying with IEC61131.

Overview

Type nameSizeProperties
SINT8 bitsigned
USINT8 bitunsigned
INT16 bitsigned
UINT16 bitunsigned
DINT32 bitsigned
UDINT32 bitunsigned
LINT64 bitsigned
ULINT64 bitunsigned
REAL32 bitfloat
LREAL64 bitfloat

When such a variable is declared without being initialized, it will be default-initialized with a value of 0 or 0.0 respectively.

A word on integer literals

Integer literals can be prefixed with either 2# (binary), 8# (octal) or 16# (hexadecimal). They will then be treated with regard to the respective number system.

Examples:

  • i1 : DINT := 42; - declares and initializes a 32bit signed integer with value 42.
  • i1 : DINT := 2#101010; - declares and initializes a 32bit signed integer with value 42.
  • i1 : DINT := 8#52; - declares and initializes a 32bit signed integer with value 42.
  • i1 : DINT := 16#2A; - declares and initializes a 32bit signed integer with value 42.

Strings

Overview

Type nameSizeEncoding
STRINGn+1UTF-8
WSTRING2n+2UTF-16

When such a variable is declared without being initialized, it will be default-initialized with a value of '' or "" respectively (empty strings).

STRING

RuSTy treats STRINGs as byte-arrays storing UTF-8 character bytes with a Null-terminator (0-byte) at the end. So a String of size n requres n+1 bytes to account for the Null-terminator. A STRING literal is surrounded by single-ticks '.

A String has a well defined length which can be defined similar to the array-syntax. A String-variable myVariable: STRING[20] declares a byte array of length 21, to store 20 utf8 character bytes. When declaring a STRING, the length-attribute is optional. The default length is 80.

Examples:

  • s1 : STRING; - declares a String of length 80.
  • s2 : STRING[20]; - declares a String of length 20.
  • s3 : STRING := 'Hello World'; - declares and initializes a String of length 80, and initializes it with the utf8 characters and a null-terminator at the end.
  • s4 : STRING[55] := 'Foo Baz'; - declares and initializes a String of length 55 and initializes it with the utf8 characters and a null-terminator at the end.

WSTRING (Wide Strings)

RuSTy treats WSTRINGs as byte-arrays storing UTF-16 character bytes with two Null-terminator bytes at the end. The bytes are stored in Little Endian encoding. A Wide-String of size n requres 2 * (n+1) bytes to account for the 2 byes per utf16 character and the Null-terminators. A WSTRING literal is surrounded by doubly-ticks ".

A WSTRING has a well defined length which can be defined similar to the array-syntax. A WSTRING-variable myVariable: WSTRING[20] declares a byte array of length 42, to store 20 utf16 character bytes. When declaring a WSTRING, the length-attribute is optional. The default length is 80.

Examples:

  • ws1 : WSTRING; - declares a Wide-String of length 80.
  • ws2 : WSTRING[20]; - declares a Wide-String of length 20.
  • ws3 : WSTRING := "Hello World"; - declares and initializes a Wide-String of length 80, and initializes it with the utf16 characters and a utf16-null-terminator at the end.
  • ws4 : WSTRING[55] := "Foo Baz"; - declares and initializes a Wide-String of length 55 and initializes it with the utf8 characters and a utf16-null-terminator at the end.

Date and Time

Overview

Type nameSizeInternally stored as
TIME64 bitTimespan in nanoseconds
TIME_OF_DAY64 bitNanoseconds since Jan 1, 1970 UTC
DATE64 bitNanoseconds since Jan 1, 1970 UTC
DATE_AND_TIME64 bitNanoseconds since Jan 1, 1970 UTC

Note that RuSTy already treats TIME, TIME_OF_DAY, DATE and DATE_AND_TIME as 64 bit numbers. Therefore the long pendants LTIME, LTOD, LDATE and LDT are mere aliases to the original types.

DATE

The DATE datatype is used to represent a Date in the Gregorian Calendar. Such a value is stored as an i64 with a precision in nanoseconds and denotes the number of nanoseconds that have elapsed since January 1, 1970 UTC not counting leap seconds. DATE literals start with DATE# or D# followed by a date in the format of yyyy-mm-dd.

Examples:

  • d1 : DATE := DATE#2021-05-02;
  • d2 : DATE := DATE#1-12-24;
  • d3 : DATE := D#2000-1-1;

DATE_AND_TIME

The DATE_AND_TIME datatype is used to represent a certain point in time in the Gregorian Calendar. Such a value is stored as an i64 with a precision in nanoseconds and denotes the number of nanoseconds that have elapsed since January 1, 1970 UTC not counting leap seconds. DATE_AND_TIME literals start with DATE_AND_TIME# or DT# followed by a date and time in the format of yyyy-mm-dd-hh:mm:ss.

Note that only the seconds-segment can have a fraction denoting the milliseconds.

Examples:

  • d1 : DATE_AND_TIME := DATE_AND_TIME#2021-05-02-14:20:10.25;
  • d2 : DATE_AND_TIME := DATE_AND_TIME#1-12-24-00:00:1;
  • d3 : DATE_AND_TIME := DT#1999-12-31-23:59:59.999;

TIME_OF_DAY

The TIME_OF_DAY datatype is used to represent a specific moment in time in a day. Such a value is stored as an i64 value with a precision in nanoseconds and denotes the number of nanoseconds that have elapsed since January 1, 1970 UTC not counting leap seconds. Hence this value is stored as a DATE_AND_TIME with the day fixed to 1970-01-01. TIME_OF_DAY literals start with TIME_OF_DAY# or TOD# followed by a time in the format of hh:mm:ss.

Note that only the seconeds-segment can have a fraction denoting the milliseconds.

Examples:

  • t1 : TIME_OF_DAY := TIME_OF_DAY#14:20:10.25;
  • t2 : TIME_OF_DAY := TIME_OF_DY#0:00:1;
  • t3 : TIME_OF_DAY := TOD#23:59:59.999;

TIME

The TIME datatype is used to represent a time-span. A TIME value is stored as an i64 value with a precision in nanoseconds. TIME literals start with TIME# or T# followed by the TIME segements.

Supported segements are:

  • d ... f64 days
  • h ... f64 hours
  • m ... f64minutes
  • s ... f64 seconds
  • ms ... f64 milliseconds
  • us ... f64 microseconds
  • ns ... u32 nanaoseconds

Note that only the last segment of a TIME literal can have a fraction.

Examples:

  • t1 : TIME := TIME#2d4h6m8s10ms;
  • t2 : TIME := T#2d4.2h;
  • t3 : TIME := T#-10s4ms16ns;

Other types

The BOOL type can either be assigned TRUE or FALSE. The type __VOID is the empty type and has an undefined size.

Type nameSizeProperties
BOOL8 bitsigned
__VOIDundefined

Bit datatypes are defined as follows:

Type nameSizeProperties
BYTE8 bitunsigned
WORD16 bitunsigned
DWORD32 bitunsigned
LWORD64 bitunsigned

Direct (Bit) Access on Variables

The IEC61131-3 Standard allows reading specific Bits, Bytes, Words or DWords from an ANY_BIT type. RuSTy supports this functionalty and extends it to support all INT types.

Constant based Direct Access

To access a bit sequence in a variable, a direct access instruction %<Type><Value> is used.

Type is the bit sequence size required and is described as follows:

TypeSizeExample
X1`%X1
B8`%B1
W16`%W1
D32`%D1

For Bit access, the %X is optional.

Example

FUNCTION main : DINT
VAR
    variable    : LWORD;
    bitTarget   : BOOL;
    bitTarget2  : BOOL;
    byteTarget  : BYTE;
    wordTarget  : WORD;
    dwordTarget : DWORD;
END_VAR

variable    := 16#AB_CD_EF_12_34_56_78_90;
bitTarget   := variable.%X63; (*Access last bit*)
byteTarget  := variable.%B7; (*Access last byte*)
wordTarget  := variable.%W3; (*Access last word*)
dwordTarget := variable.%D1; (*Access last dword*)
(*Chaining an access is also allowed *)
bitTarget2  := variable.%D1.%W1.%B1.%X1;

END_FUNCTION

Varirable based Direct Access

While the IEC61131-3 Standard only defines variable access using constant int literals, RuSTy additionally supports access using Variables. The Syntax for a variable based access is %<Type><Variable>. The provided varibale has to be a direct Reference variable (non Qualified).

Short hand access for Bit (Without the %X modifier) is not allowed.

Example

FUNCTION main : DINT
VAR
    variable    : LWORD;
    access_var  : INT;
    bitTarget   : BOOL;
    bitTarget2  : BOOL;
    byteTarget  : BYTE;
    wordTarget  : WORD;
    dwordTarget : DWORD;
END_VAR
variable    := 16#AB_CD_EF_12_34_56_78_90;
access_var := 63;
bitTarget   := variable.%Xaccess_var; (*Access last bit*)
access_var := 7;
byteTarget  := variable.%Baccess_var; (*Access last byte*)
access_var := 3;
wordTarget  := variable.%Waccess_var; (*Access last word*)
access_var := 1;
dwordTarget := variable.%Daccess_var; (*Access last dword*)
(*Chaining an access is also allowed *)
bitTarget2  := variable.%Daccess_var.%Waccess_var.%Baccess_var.%Xaccess_var;
END_FUNCTION

Architecture

Overview

RuSTy is a compiler for IEC61131-3 languages. At the moment, ST and CFC ("FBD") are supported. It utilizes the LLVM compiler infrastructurue and contributes a Structured Text frontend that translates Structured Text into LLVM's language independent intermediate representation (IR). CFC uses a M2M-transformation and reuses most of the ST frontend for compilation. The further optimization and native code generation is performed by the existing LLVM infrastructure, namely LLVM's common optimizer and the platform specific backend (see here).

    ┌──────────────────┐    ┌───────────────┐    ┌────────────────┐
    │                  │    │               │    │                │
    │      RuSTy       │    │  LLVM Common  │    │  LLVM Backend  │
    │                  ├───►│               ├───►│                │
    │  LLVM Frontend   │    │   Optimizer   │    │   (e.g Clang)  │
    │                  │    │               │    │                │
    └──────────────────┘    └───────────────┘    └────────────────┘

So RuSTy consists of the frontend part of the llvm compiler-infrastructure. This means that this compiler can benefit from llvm's existing compiler-optimizations, as well as all backend target platforms available.

Rusty Frontend Architecture

Ultimately the goal of a compiler frontend is to translate the original source code into the infrastructure's intermediate representation (in this case we're talking about LLVM IR). RuSTy treats this task as a compilation step of its own. While a fully fledged compiler generates machine code as a last step, RuSTy generates LLVM IR assembly code.

Structured Text

      ┌────────┐                                                          ┌────────┐
      │ Source │                                                          │  LLVM  │
      │        │                                                          │   IR   │
      │ Files  │                                                          │        │
      └───┬────┘                                                          └────────┘
          │                                                                    ▲
          ▼                                                                    │
    ┌────────────┐   ┌────────────┐   ┌────────────┐   ┌────────────┐   ┌──────┴─────┐
    │            │   │            │   │            │   │            │   │            │
    │            │   │            │   │            │   │            │   │            │
    │   Parser   ├──►│   Indexer  ├──►│   Linker   ├──►│ Validation ├──►│   Codegen  │
    │            │   │            │   │            │   │            │   │            │
    │            │   │            │   │            │   │            │   │            │
    └────────────┘   └────────────┘   └────────────┘   └────────────┘   └────────────┘

CFC/FBD

         ┌────────┐                                                            ┌────────┐
         │ Source │                                                            │  LLVM  │
         │        │                                                            │   IR   │
         │ Files  │                                                            │        │
         └───┬────┘                                                            └────────┘
             │                                                                      ▲
             ▼                                                                      │
    ┌────────────────┐    ┌────────────┐   ┌────────────┐   ┌────────────┐   ┌──────┴─────┐
    │                │    │            │   │            │   │            │   │            │
    │ Model-to-Model │    │            │   │            │   │            │   │            │
    │ Transformation ├───►│   Indexer  ├──►│   Linker   ├──►│ Validation ├──►│   Codegen  │
    │                │    │            │   │            │   │            │   │            │
    │                │    │            │   │            │   │            │   │            │
    └────────────────┘    └────────────┘   └────────────┘   └────────────┘   └────────────┘

Parser

The role of the parser is to turn source-code which is fed as a string (in the form of files) into a tree-representation of that source-code. This tree is typically called the Abstract Syntax Tree (AST). The step of parsing consists of two distinct stages. The first one is the lexical analysis (Lexer) which is performed by a lexer. After lexing we perform the syntactical analysis (Parser) to construct the syntax tree.

                                                                   ┌──┐
       ┌──────────────┐                                            │  │
       │              │                                            └──┘
       │  Source Code │                                            /  \
       │              │   ┌─────────┐        ┌──────────┐         /    \
       │  ──────────  │   │         │        │          │     ┌──┐      ┌──┐
       │              ├───►  Lexer  │        │  Parser  ├────►│  │      │  │
       │  ─────────   │   │         │        │          │     └──┘      └──┘
       │              │   └────┬────┘        └──────────┘      /\        /\
       │  ────        │        │                  ▲           /  \      /  \
       │              │        │                  │        ┌──┐ ┌──┐ ┌──┐ ┌──┐
       │  ────────    │        ▼                  │        │  │ │  │ │  │ │  │
       │              │   ┌───────────────────────┴──┐     └──┘ └──┘ └──┘ └──┘
       │              │   │                          │
       └──────────────┘   │  ┌───┐ ┌───┐ ┌───┐ ┌───┐ │       Abstract Syntax
                          │  │ T │ │ T │ │ T │ │...│ │            Tree
                          │  └───┘ └───┘ └───┘ └───┘ │
                          │                          │
                          └──────────────────────────┘
                                 Token-Stream

Lexer

The lexer performs the lexical analysis. This step turns the source-string into a sequence of well known tokens. The Lexer (or sometimes also called tokenizer) splits the source-string into tokens (or words). Each token has a distinct type which corresponds to a grammar's element. Typical token-types are keywords, numbers, identifiers, brackets, dots, etc. So with the help of this token-stream it is much easier for the parser to spot certain patterns. E.g. a floating-point number consists of the token-sequence: number, dot, number.

The lexer is implemented in the lexer-module. It uses the logos crate to create a lexer that is able to identify all different terminal-symbols. Compared to other languages, Structured Text has a quite high number of keywords and other tokens, so RuSTy's lexer identifies a quite large number of different tokens.

Parser

The parser takes the token stream and creates the corresponding AST that represents the source code in a structured, hierarchical way. The parser is implemented in the parser module whereas the model for the AST is implemented in the ast module.

AST - Abstract Syntax Tree

The abstract syntax tree is a tree representation of the source code. Some parser implementations use a generic tree-data-structure consisting of Nodes which can have an arbitrary number of children. These nodes usually have dynamic properties like a type and an optional value and sometimes they even have dynamic properties stored in a map to make this representation even more flexible.

While this approach needs very little source code we decided to favour a less flexible approach. The RuSTy-AST models every single ast-node as its own struct with all necessary fields including the possible child-nodes. While this approach needs much more code and hand-written changes, its benefits lie in the clearness and simplicity of the data-structure. Every element of the AST is easily identified, debugged and understood. E.g. while in a generic node based AST it is easily possible to have a binary-statement with no, one, or seven child-nodes, the RuSTy-AST enforces the structure of every node. So the RuSTy-Binary-Statement has exactly two children. It is impossible to construct it differently.

Example

So an assignment a := 3; will be parsed with the help of the following Structures:

struct Reference {
   name: string
}

struct LiteralInteger {
   value: i128
}

struct Assignment {
   left: Box<AstStatement>,
   right: Box<AstStatement>
}

Recursive Descent Parser

There are a lot of different frameworks to generate parsers from formal grammars. While they generate highly optimized parsers we felt we wanted more control and more understanding of the parsing process and the resulting AST. The fact that at that point in time we were pretty new to rust itself, writing the parser by hand also gave us more practice and a stronger feeling of control and understanding. Using a parser-generator framework will definitely be an option for future improvements.

As for now, the parser is a hand-written recursive descent parser inside the parser-module.

As the parser reads the token stream Reference, KeywordEquals, Number, Semicolon it instantiates the corresponding syntax tree:

                      ┌─────────────────┐
                      │   Assignment    │
                      └──────┬──┬───────┘
                   left      │  │     right
                 ┌───────────┘  └──────────┐
                 ▼                         ▼
        ┌──────────────────┐     ┌──────────────────┐
        │    Reference     │     │  LiteralInteger  │
        ├──────────────────┤     ├──────────────────┤
        │    name: 'a'     │     │    value: '3'    │
        └──────────────────┘     └──────────────────┘

Indexer

The indexing step is responsible of building and maintaining the Symbol-Table (also called Index). The Index contains all known referable objects such as variables, data-types, POUs, Functions, etc. The Symbol-Table also maintains additional information about every referable object such as: the object's type, the objects' datatype, etc.

Indexing is performed by the index module. It contains the index itself (a.k.a. Symbol Table), the visitor which collects all global names and their additional information as well as a data structure that handles compile time constant expressions (constant_expressions).

The Index (Symbol Table)

The index stores information about all referable elements of the program. Depending on the type of element, we store different meta-information alongside the name of the element.

Index FieldDescription
global_variablesAll global variables accessible via their name.
enum_global_variablesAll enum elements accessible via their name (as if they were global variables, e.g. 'RED')
member_variablesMember variables of structured types (Structs,Functionblocks, etc. This map allows to query all members of a container by name.)
implementationsAll callable implementations (Programs, Functions, Actions, Functionblocks) accessible by their name.
pousAll pous (Programs, Functions, Functionblocks) with additional information.
type_indexAll data-types (intrinsic and complex) accessible via their name
constant_expressionsThe results of constant expressions that can be evaluated at compile time (e.g. the initializer of a constant: VAR_GLOBAL CONST TAU := 3.1415 * 2; END_VAR)

There are 3 different type of entries in the index:

  • VariableIndexEntry The VariableIndexEntry holds information about every Variable in the source code and offers additional information relevant for linking, validation and code-generation.
        ┌─────────────────────────────┐              ┌─────────────────┐
        │  VariableIndexEntry         │              │     <enum>      │
        │                             │              │   VariableType  │
        ├─────────────────────────────┤   var_type   ├─────────────────┤
        │                             │              │  - Local        │
        │  - name: String             ├─────────────►│  - Temp         │
        │  - qualified_name: String   │              │  - Input        │
        │  - is_constant: bool        │              │  - Output       │
        │  - location_in_parent: u32  │              │  - InOut        │
        │  - data_type_name: String   │              │  - Global       │
        │                             │              │  - Return       │
        └───────────┬─────────────────┘              └─────────────────┘
                    │
                    │initial_value
                    │
                    │
                    │            ┌──────────────────┐
                    │            │ ConstExpression  │
                    │       0..1 ├──────────────────┤
                    └───────────►│                  │
                                 │ ...              │
                                 │                  │
                                 └──────────────────┘
  • PouIndexEntry The PouIndexEntry offers information about all Program-Organization-Units. The index entry offers information like the name of an instance-struct, the name of the registered implementation, etc.
┌──────────────────────────┐
│       <abstract>         │
│       POUIndexEntry      │
├──────────────────────────┤
│                          │
└──────────────────────────┘
             ▲
             │
             │
             │     ┌──────────────────────────┐      ┌──────────────────────────┐
             │     │    ProgramIndexEntry     │      │    GenericParameter      │
             │     ├──────────────────────────┤      ├──────────────────────────┤
             │     │ - name: String           │      │ - name: String           │
             ├─────┤ - instanceStruct: String ├──┬──►│ - typeNature: TypeNature │
             │     │                          │  │   │                          │
             │     │                          │  │   │                          │
             │     └──────────────────────────┘  │   └──────────────────────────┘
             │                                   │
             │                                   │
             │                                   │
             │     ┌──────────────────────────┐  │
             │     │    FunctionIndexEntry    │  │ generics
             │     ├──────────────────────────┤  │
             │     │ - name: String           │  │
             ├─────┤                          ├──┤
             │     │                          │  │
             │     │                          │  │
             │     └──────────────────────────┘  │
             │                                   │
             │                                   │
             │                                   │
             │     ┌──────────────────────────┐  │
             │     │ FunctionBlockIndexEntry  │  │
             │     ├──────────────────────────┤  │
             │     │ - name: String           ├──┤
             ├─────┤ - instanceStruct: String │  │
             │     │                          │  │
             │     │                          │  │
             │     └──────────────────────────┘  │
             │                                   │
             │                                   │
             │                                   │
             │     ┌──────────────────────────┐  │
             │     │    ClassIndexEntry       │  │
             │     ├──────────────────────────┤  │
             │     │ - name: String           │  │
             └─────┤ - instanceStruct: String ├──┘
                   │                          │
                   │                          │
                   └──────────────────────────┘
  • ImplementationIndexEntry The ImplementationIndexEntry offers information about any callable implementation (Program, Functionblock, Function, etc.). It also offers metadata about the implementation type, the name of the method to call and the name of the parameter-struct (this-struct) to pass to the function.
                                                  ┌───────────────────────┐
        ┌──────────────────────────┐              │       <enum>          │
        │ ImplementationIndexEntry │              │   ImplementationType  │
        ├──────────────────────────┤     type     │                       │
        │                          ├─────────────►├───────────────────────┤
        │ - call_name: String      │              │   - Program           │
        │ - type_name: String      │              │   - Function          │
        │                          │              │   - FunctionBlock     │
        └──────────────────────────┘              │   - Action            │
                                                  │   - Class             │
                                                  │   - Method            │
                                                  │                       │
                                                  └───────────────────────┘
  • DataType The entry for a DataType offers information about any data-type supported by the program to be compiled (internal data types as well as user defined data types). For each data-type we offer additional information such as it's initial value, its type-nature (in terms of generic functions - e.g: ANY_INT) and some additional information about the type's internal structure and size (e.g. is it a number/array/struct/etc).
                      ┌─────────────┐                   ┌────────────────────┐
                      │  DataType   │                   │ ConstantExpression │
                      ├─────────────┤   initial_value   ├────────────────────┤
                      │             ├──────────────────►│                    │
                      │ - name      │                   │  ...               │
                      │             ├─────────┐         │                    │
                      └──────┬──────┘         │         └────────────────────┘
                             │                │
                             │                │         ┌────────────────────┐
                             │                │         │ TypeNature         │
                             │                │         ├────────────────────┤
                             │ information    │         │ - Any              │
                             │                └────────►│ - Derived          │
                             │                nature    │ - Elementary       │
                             │                          │ - Num              │
                             ▼                          │ - Int              │
                      ┌───────────────────────┐         │ - Signed           │
                      │    <abstract>         │         │ - ...              │
                      │  DataTypeInformation  │         └────────────────────┘
                      ├───────────────────────┤
                      │                       │
                      └───────────────────────┘
                                  ▲
                                  │
                                  │
                                  │
         ┌────────────────┬───────┴───────┬──────────────┬──────────────┐
         │                │               │              │              │
┌────────┴───────┐ ┌──────┴──────┐ ┌──────┴─────┐  ┌─────┴──────┐  ┌────┴─────┐
│ Struct         │ │  Array      │ │ Integer    │  │  String    │  │ ...      │
├────────────────┤ ├─────────────┤ ├────────────┤  ├────────────┤  ├──────────┤
│ - name         │ │- name       │ │ - name     │  │ - size     │  │ ...      │
│ - members      │ │- inner_type │ │ - signed   │  │ - encoding │  │          │
│                │ │- dimensions │ │ - size     │  │            │  │          │
└────────────────┘ └─────────────┘ └────────────┘  └────────────┘  └──────────┘

Linker

The linker's task is to decide where all references in the source code point to. There are different references in Structured Text:

  • variable references x := 4 where x is a reference to the variable x.
  • type references i : MyFunctionBlock where MyFunctionBlock is a reference to the declared FunctionBlock.
  • Program references PLC_PRG.x := 4 where PLC_PRG is a reference to a Program-POU called PLC_PRG.
  • Function references max(a, b) where max is a reference to a Function-POU called max.

So the linker decides where a reference points to. A reference has a corresponding declaration that matches the reference's name:

        PROGRAM PLC_PRG
             VAR

        ┌──────► x : INT;
        │
        │    END_VAR
        │
        └────┐
             │
             x := 3;
        END_PROGRAM

The linker's results will be used by the semantic validation step and by the code-generation.

The validator decides whether the name you put at a certain location is valid or not. In order to decide whether a certain reference is valid or not, we need to know where it is pointing to, so whether we expect a variable, a datatype or something different.

The code-generation needs to know what certain names mean, in order to successfully generate the IR-code that reflects the behavior of your program.

Annotated Syntax Tree

The AST generated by the parser is a pretty static data-structure. So where should we store the linking information for a reference? Even if we would add fields for potential linking-information to the AST, the ownership concepts of Rust would give us a hard time to fill this information piece by piece during linking. So what we end up doing, is to use the arena-pattern to handle the different lifetimes of the parts of an AST (the AST itself is constructed very early in the compilation process, where the linking information is allocated later). We don't store the linking information directly in the AST, but we store it inside the mentioned arena-data-structure and link it with certain AST-elements.

The RuSTy linker stores the linking information in an arena called AnnotationMap. The AnnotationMap can store two type of annotations for any AST-element. So the first step is that we need a way to uniquely identify every single AST-node so we can use this ID as a key for the annotations stored in the AnnotationMap to automatically associate it with the given AST-Node. The parser assigns a unique ID to every Statement-Tree-Node (Note that we only assign IDs to Statements, not every AST-Node).

So the expression a + 3 now looks like this:

                      ┌─────────────────┐
                      │ BinaryOperation │
                      ├─────────────────┤
                      │  operator: Plus │
                      │  ID: 1          │
                      └──────┬──┬───────┘
                             │  │
                   left      │  │     right
                 ┌───────────┘  └──────────┐
                 │                         │
                 │                         │
                 ▼                         ▼
        ┌──────────────────┐     ┌──────────────────┐
        │    Reference     │     │  LiteralInteger  │
        ├──────────────────┤     ├──────────────────┤
        │    name: 'a'     │     │    value: '3'    │
        │    ID: 2         │     │    ID: 3         │
        └──────────────────┘     └──────────────────┘

The AnnotationMap stores 5 different types of annotation:

  • Value The Value-annotation indicates that this AST-Element resolves to a value with the given resulting datatype. So for Example the LiteralInteger(3) node gets a Value-Annotation with a resulting type of DINT.
        ┌─────────────────────────┐
        │   Value                 │
        ├─────────────────────────┤
        │                         │
        │  resulting_type: String │
        │                         │
        └─────────────────────────┘
  • Variable The Variable-annotation indicates that this AST-Element resolves to a variable with the given qualified name (and some comfort-information like whether it is a constant and whether it is an auto-deref pointer). Similar to the value-Annotation it also saves the resulting datatype.
        ┌─────────────────────────┐
        │   Variable              │
        ├─────────────────────────┤
        │                         │
        │  resulting_type: String │
        │  qualified_name: String │
        │  constant: bool         │
        │  is_auto_deref: bool    │
        │                         │
        └─────────────────────────┘
  • Function The Function-annotation indicates that this AST-Element resolves to a Function-POU (a call-statement) with the given qualified name. Similar to the value-Annotation it also saves the resulting datatype but this time as the function's return type (return_type).
        ┌─────────────────────────┐
        │   Function              │
        ├─────────────────────────┤
        │                         │
        │  return_type: String    │
        │  qualified_name: String │
        │                         │
        └─────────────────────────┘
  • Type The Type-annotation indicates that this AST-Element resolves to a DataType (e.g. a Declaration: x: INT) with the given name.
        ┌─────────────────────────┐
        │   Type                  │
        ├─────────────────────────┤
        │                         │
        │  type_name: String      │
        │                         │
        └─────────────────────────┘
  • Program The Program-annotation is very similar to the Function-annotation. Since a Program has no return-value it also offers no return-type information.
        ┌─────────────────────────┐
        │   Program               │
        ├─────────────────────────┤
        │                         │
        │  qualified_name: String │
        │                         │
        └─────────────────────────┘

So the example expression from above `a + 3* will be annotated like this: (Note that the resulting type of the Binary-Operation must be calculated by the linker by determining the bigger of both types.)

                  ┌─────────────────┐
                  │ BinaryOperation │
                  ├─────────────────┤
                  │  operator: Plus │
                  │  ID: 1          │
                  └──────┬──┬───────┘
                         │  │
               left      │  │     right
             ┌───────────┘  └──────────┐
             │                         │
             │                         │
             ▼                         ▼
    ┌──────────────────┐     ┌──────────────────┐
    │    Reference     │     │  LiteralInteger  │
    ├──────────────────┤     ├──────────────────┤
    │    name: 'a'     │     │    value: '3'    │
    │    ID: 2         │     │    ID: 3         │
    └──────────────────┘     └──────────────────┘



                             ┌────────────────────────────┐
                             │        Value               │
┌───────────────────┐        ├────────────────────────────┤
│    AnnotationMap  │   ┌───►│  resulting_type: DINT      │
│                   │   │    │                            │
├───────┬───────────┤   │    └────────────────────────────┘
│ ID: 1 │ Value     ├───┘
├───────┼───────────┤        ┌────────────────────────────┐
│ ID: 2 │ Variable  ├────┐   │        Variable            │
├───────┼───────────┤    │   ├────────────────────────────┤
│ ID: 3 │ Value     ├──┐ │   │  resulting_type: SINT      │
└───────┴───────────┘  │ └──►│  qualified_name: PLC_PRG.a │
                       │     │        constant: false     │
                       │     │   is_auto_deref: false     │
                       │     └────────────────────────────┘
                       │
                       │     ┌────────────────────────────┐
                       │     │        Value               │
                       │     ├────────────────────────────┤
                       └────►│  resulting_type: DINT      │
                             │                            │
                             └────────────────────────────┘

Another example where the annotated AST carries a lot of useful information is with complex expressions like array-expressions or qualified references. Lets consider the following statement:

PLC_PRG.a.b[2]

It is annotated in the following way:

                ┌────────────────────┐
                │ QualifiedReference │
                ├────────────────────┤
                │ ID: 1              │
                └─────────┬──────────┘
                          │          elements: Vec<AstStatement>
                ┌─────────┴──────────┬─────────────────────┐
                │                    │                     │
                ▼                    ▼                     ▼
        ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
        │    Reference     │  │    Reference     │  │   ArrayAccess    │
        ├──────────────────┤  ├──────────────────┤  ├──────────────────┤
        │  name: 'PLC_PRG' │  │  name: 'a'       │  │                  │
        │  ID: 2           │  │  ID: 3           │  │    ID: 4         │
        └──────────────────┘  └──────────────────┘  └─────┬──────┬─────┘
                                                          │      │
                                             reference    │      │       access
                                                 ┌────────┘      └─────────┐
                                                 ▼                         ▼
                                        ┌──────────────────┐   ┌──────────────────┐
                                        │    Reference     │   │  LiteralInteger  │
                                        ├──────────────────┤   ├──────────────────┤
                                        │  name: 'b'       │   │    value: '2'    │
                                        │  ID: 5           │   │    ID: 6         │
                                        └──────────────────┘   └──────────────────┘


                                     ┌────────────────────────────┐
                                     │        Value               │
                                ┌───►├────────────────────────────┤
                                │    │  resulting_type: INT       │
                                │    │                            │
                                │    └────────────────────────────┘
                                │
                                │    ┌────────────────────────────┐
        ┌───────────────────┐   │    │       Program              │
        │    AnnotationMap  │   │ ┌─►├────────────────────────────┤
        │                   │   │ │  │  qualified_name: PLC_PRG   │
        ├───────┬───────────┤   │ │  │                            │
        │ ID: 1 │ Value     ├───┘ │  └────────────────────────────┘
        ├───────┼───────────┤     │
        │ ID: 2 │ Program   ├─────┘  ┌────────────────────────────┐
        ├───────┼───────────┤        │        Variable            │
        │ ID: 3 │ Variable  ├───────►├────────────────────────────┤
        ├───────┼───────────┤        │ resulting_type: MyStruct   │
        │ ID: 4 │ Value     ├─────┐  │ qualified_name: PLC_PRG.a  │
        ├───────┼───────────┤     │  └────────────────────────────┘
        │ ID: 5 │ Variable  ├───┐ │
        ├───────┼───────────┤   │ │  ┌────────────────────────────┐
        │ ID: 6 │ Value     ├─┐ │ │  │        Value               │
        └───────┴───────────┘ │ │ └─►├────────────────────────────┤
                              │ │    │ resulting_type: INT        │
                              │ │    │                            │
                              │ │    └────────────────────────────┘
                              │ │
                              │ │    ┌─────────────────────────────────┐
                              │ │    │        Variable                 │
                              │ └───►├─────────────────────────────────┤
                              │      │ resulting_type : ARRAY[] OF INT │
                              │      │ qualified_name : MyStruct.b     │
                              │      └─────────────────────────────────┘
                              │
                              │      ┌────────────────────────────┐
                              │      │        Value               │
                              │      ├────────────────────────────┤
                              └─────►│ resulting_type: DINT       │
                                     │                            │
                                     └────────────────────────────┘

Type vs. Type-Hint

The AnnotationMap not only offers annotations regarding the AST-node's type, but it also offers a second type of annotation.

Consider the following snippet:

PROGRAM PLC_PRG
   VAR
      x : SINT;
      y : INT;
      z : BYTE;
   END_VAR

   z := x + y;

END_PROGRAM

The assignment z := x + y is loaded with different types:

  • x is annotated as Variable of type SINT and will be auto-upgraded to DINT.
  • y is annotated as Variable of type INT and will be auto-upgraded to DINT.
  • z is annotated as Variable of type BYTE.
  • x + y is annotated as Value of type DINT (the bigger of both).

In order to make life easier for validation and code-generation we add an additional annotation to x + y to indicate, that while it technically results in a DINT, it should rather be treated as a BYTE since it is going to be assigned to z. This second annotation is called the type-hint. It indicates that while it technically is not the real type of this expression, the program's semantic wants the compiler to treat it as this type.

The expression z := x + y is annotated like this:

expressiontype annotationtype-hint annotationexplanation
xSINTDINTauto-upgraded to DINT
yINTDINTauto-upgraded to DINT
zBYTE-
x + yDINTBYTEtype-hint indicates that the resulting DINT needs to be cast to BYTE

With the help of the type-hint annotations the validation can decide whether certain type-cast operations are valid very easily. The code-generation steps can easily decide when to generate casts, by simply comparing a node's type annotation and it's type-hint annotation.

Dependencies

When generating multiple units, the Linker will keep track of a dependency-tree for the unit. This means that every datatype or global variable referenced directly or indirectly by the module will be marked as a dependency. This information can then be used during the codegen period to only generated types and variables that are relevant to the unit.

Validation

The validation module implements the semantic validation step of the compiler. The validator is a hand-written visitor that offers a callback when visiting the single AST-nodes to then perform the different validation tasks.

The validation rules are implemented in dedicated validator-structs:

ValidatorResponsibilities
global_validatorSemantic rules on the level of declarations as a whole (e.g. name-conflicts)
pou_validatorSemantic rules on the level of programs, function- and function-blocks.
recursive_validatorSemantic rules on the level of recursion (e.g. struct referencing itself)
stmt_validatorSemantic rules on the level of statements (e.g. invalid type-casts).
variable_validatorSemantic rules on the level of variable declarations (e.g. empty var-blocks, empty structs, etc.).

Diagnostics

Problems (semantic or syntactic) are represented as Diagnostics 1. Diagnostics carry information on the exact location inside the source-string (start- & end-offset), a custom message and a unique error-number to identify the problem.

There are 3 types of Diagnostics:

DiagnosticDescription
SyntaxErrorA syntax error is a diagnostic that is created by the parser if it discovers a token-stream that does not match the language's grammar.
GeneralErrorGeneral errors are problems that occured during the compilation process, that cannot be linked to a malformed input (e.g. file-I/O problems, internal LLVM errors, etc.).
ImprovementProblems that may not prevent successful compilation but are still considered a flaw in the source-code. (e.g. use proprietary POINTER TO instead of the norm-compliant REF_TO).
1

:(i): The diagnostics are subject to change since they don't elegantly represent the different types of problems (e.g. semantic problems).

Code-Generation

The codegen module contains all code that turns the parsed and verified code represented as an AST into llvm-ir code. To generate the IR we use a crate that wraps the native llvm C-API.

The code-generator is basically a transformation from the ST-AST into an IR-Tree representation. Therefore the AST is traversed in a visitor-like way and transformed simultaneously.

The code generation is split into specialized sub-generators for different tasks:

GeneratorResponsibilities
pou_generatorThe pou-generator takes care of generating the programming organization units (Programs, FunctionBlocks, Functions) including their signature and body. More specialized tasks are delegated to other generators.
data_type_generatorGenerates complex datatypes like Structs, Arrays, Enums, Strings, etc.
variable_generatorGenerates global variables and their initialization.
statement_generatorGenerates everything of the body of a POU except expressions. Non-expressions include: IFs, Loops, Assignments, etc.
expression_generatorGenerates expressions (everything that possibly resolves to a value) including: call-statements, references, array-access, etc.

Generating POUs

Generating POUs (Programs, Function-Blocks, Functions) must generate the POU's body itself, as well as the POU's interface (or state) variables. In this segment we focus on generating the interface for a POU. Further information about generating a POU's body can be found [here].

Programs

A program is static POU with some code attached. This means that there is exactly one instance. So wherever from it is called, every caller uses the exact same instance which means that you may see the residuals of the laster caller in the program's variables when you call it yourself.

PROGRAM prg
    VAR
        x : DINT;
        y : DINT;
    END_VAR

END_PROGRAM

The program's interface is persistent across calls, so we store it in a global variable. Therefore the code-generator creates a dedicated struct-type called prg_interface. A global variable called prg_instance is generated to store the program's state across calls. This global instance variable is passed as a this pointer to calls to the prg function.

%prg_interface = type { i32, i32 }

@prg_instance = global %prg_interface zeroinitializer

define void @prg(%prg_interface* %this) {
entry:
  ret void
}

FunctionBlocks

A FunctionBlock is an POU that is instantiated in a declaration. So in contrast to Programs, a FunctionBlock can have multiple instances. Nevertheless the code-generator uses a very similar strategy. A struct-type for the FunctionBlock's interface is created but no global instance-variable is allocated. Instead the function block can be used as a DataType to declare instances like in the following example:

FUNCTION_BLOCK foo
  VAR_INPUT
    x, y : INT;
  END_VAR
END_FUNCTION_BLOCK

PROGRAM prg
  VAR
    f : foo;
  END_VAR
END_PROGRAM

So for the given example, we see the code-generator creating a type for the FunctionBlock's state (foo_interface). The declared instance of foo, in prg's interface is seen in the program's generated interface struct-type (prg_interface).

; ModuleID = 'main'
source_filename = "main"

%prg_interface = type { %foo_interface }
%foo_interface = type { i16, i16 }

@prg_instance = global %prg_interface zeroinitializer

define void @foo(%foo_interface* %0) {
entry:
  ret void
}

define void @prg(%prg_interface* %0) {
entry:
  ret void
}

Functions

Functions generate very similar to programs and function_blocks. The main difference is, that no instance-global is allocated and the function's interface-type cannot be used as a datatype to declare your own instances. Instances of the program's interface-type are allocated whenever the function is called for the lifetime of a single call. Otherwise the code generated for functions is comparable to the code presented above for programs and function-blocks.

Generating Data Types

IEC61131-3 languages offer a wide range of data types. Next to the built-in intrinsic data types, we support following user defined data types:

Range Types

For range types we don't generate special code. Internally the new data type just becomes an alias for the derived type.

Pointer Types

For pointer types we don't generate special code. Internally the new data type just becomes an alias for the pointer-type.

Struct Types

Struct types translate direclty to llvm struct datatypes. We generate a new datatype with the user-type's name for the struct.

TYPE MyStruct:
  STRUCT
    a: DINT;
    b: INT;
  END_STRUCT
END_TYPE

This struct simply generates a llvm struct type:

%MyStruct = type { i32, i16 }

Enum Types

Enumerations are represented as DINT.

TYPE MyEnum: (red, yellow, green);
END_TYPE

For every enum's element we generate a global variable with the element's value.

@red = global i32 0
@yellow = global i32 1
@green = global i32 2

Array Types

Array types are generated as fixed sized llvm vector types - note that Array types must be fixed sized in ST :

TYPE MyArray: ARRAY[0..9] OF INT;
END_TYPE

VAR_GLOBAL
  x : MyArray;
  y : ARRAY[0..5] OF REAL;
END_VAR

Custom array data types are not reflected as dedicated types on the llvm-level.

@x = global [10 x i16] zeroinitializer
@y = global [6 x float] zeroinitializer

Multi dimensional arrays

Arrays can be declared as multi-dimensional:

VAR_GLOBAL
  x : ARRAY[0..5, 2..5, 0..1] OF INT;
END_VAR

The compiler will flatten these type of arrays to a single-dimension. To accomplish that, it calculates the total length by mulitplying the sizes of all dimensions:

    0..5 x 2..5 x 0..1
      6  x   4  x   2  = 64

So the array x : ARRAY[0..5, 2..5, 0..1] OF INT; will be generated as:

@x = global [64 x i16] zeroinitializer

This means that such a multidimensional array must be initialized like a single-dimensional array:

  • wrong
VAR_GLOBAL
  wrong_array : ARRAY[1..2, 0..3] OF INT := [ [10, 11, 12],
                                              [20, 21, 22],
                                              [30, 31, 32]];
END_VAR
  • correct
VAR_GLOBAL
  correct_array : ARRAY[1..2, 0..3] OF INT := [ 10, 11, 12,
                                                20, 21, 22,
                                                30, 31, 32];
END_VAR

Nested Arrays

Note that arrays declared as x : ARRAY[0..2] OF ARRAY[0..2] OF INT are different from mutli-dimensional arrays discussed in this section. Nested arrays are represented as multi-dimensional arrays on the LLVM-IR level and must also be initialized using nested array-literals!

String Types

String types are generated as fixed sized vector types.

VAR_GLOBAL
    str  : STRING[20];
    wstr : WSTRING[20];
END_VAR

Strings can be represented in two different encodings: UTF-8 (STRING) or UTF-16 (WSTRING).

@str = global [21 x i8] zeroinitializer
@wstr = global [21 x i16] zeroinitializer

CFC (Continous Function Chart)

RuSTy is compatible with CFC, as per the FBD part detailed in the IEC61131-3 XML-exchange format. The CFC implementation borrows extensively from the ST compiler-pipeline, with the exception that the lexical analysis and parsing phases are replaced by a model-to-model conversion process. This involves converting the XML into a structured model, which is then converted into ST AST statements.

The next chapter will walk you through the CFC implementation, giving you a better understanding of underlying code.

Model-to-Model Conversion

As previously mentioned, the lexical and parsing phases are replaced by a model-to-model conversion process which consists of two steps:

  1. Transform the input file (XML) into a data-model
  2. Transform the data-model into an AST

XML to Data-Model

Consider the heavily minified CFC file MyProgram.cfc, which translates to the CFC chart below.

                   x                      MyAdd
            ┌─────────────┐        ┌─────────────────┐
            │             │        │    exec_id:0    │
            │             ├───────►│ a               │                 z
            │ local_id: 0 │        │ ref_local_id: 0 │          ┌──────────────┐
            └─────────────┘        │                 │          │  exec_id: 1  │
                   y               │                 ├─────────►│              │
            ┌─────────────┐        │                 │          │ref_local_id:2│
            │             │        │                 │          └──────────────┘
            │             ├───────►│ b               │             local_id: 3
            │ local_id:1  │        │ ref_local_id: 1 │
            └─────────────┘        └─────────────────┘
                                       local_id: 2

The initial phase of the transformation process involves streaming the entire input file. During the streaming process, whenever important keywords such as block are encountered, they are directly mapped into a corresponding model structure. For example, when reaching the line <block localId="3" ...> within the XML file, we generate a model that can be represented as follows:

struct Block {
    localId: 2,
    type_name: "MyAdd",
    instance_name: None,
    execution_order_id: 0,
    variables: [
        InputVariable  { ... }, // x, with localId = 0
        InputVariable  { ... }, // y, with localId = 1
        OutputVariable { ... }, // MyAdd eventually becoming `z := MyAdd`, with z having a localId = 2
    ]
}

This process is repeated for every element in the input file which has a corresponding model implementation. For more information on implementation details, see the model folder.

Since the CFC programming language utilizes blocks and their interconnections to establish the program's logic flow, with the sequencing of block execution and inter-block links represented through corresponding localId, refLocalId and excutionOrderId, we have to order each element by their execution ID before proceeding to the next phase. Otherwise the generated AST statements would be out of order and hence semantically incorrect.

Data-Model to AST

The final part of the model-to-model transformation takes the input from the previous step and transforms it into an AST which the compiler pipeline understands and can generate code from. Consider the previous block example - the transformer first encounters the element with the executionOrderId of 0, which is a call to myAdd. We then check and transform each parameter, input a and b corresponding to the variables x and y respectively. The result of this transformation looks as follows:

CallStatement {
    operator: myAdd,
    parameters: [x, y]
}

Next, we process the element with an executionOrderId of 1, which corresponds to an assignment of the previous call's result to z. This update modifies the generated AST as follows:

AssignmentStatement {
    left: z,
    right: CallStatement {
        operator: myAdd,
        parameters: [x, y]
    }
}

While this explanation covers the handling of blocks and variables, there are other elements (e.g. control-flow), that are not discussed here. For more information on implementation details, see plc_xml/src/xml_parser.

Finally, after transforming all elements into their respective AST statements, the result is passed to the indexer and subsequently enters the next stages of the compiler pipeline, as described in the architecture documentation).

Appendix

MyAdd.st

FUNCTION MyAdd : DINT
    VAR_INPUT
        x, y : DINT;
    END_VAR

    MyAdd := x + y;
END_FUNCTION

MyProgram.cfc

<pou xmlns="http://www.plcopen.org/xml/tc6_0201" name="myProgram" pouType="program">
    <content>
        PROGRAM myProgram
            VAR
                x, y, z : DINT;
            END_VAR
    </content>
    <body>
        <FBD>
            <inVariable localId="1" height="20" width="80" negated="false">
                <expression>x</expression>
            </inVariable>
            <inVariable localId="2" height="20" width="80" negated="false">
                <expression>y</expression>
            </inVariable>
            <block localId="3" width="74" height="60" typeName="MyAdd" executionOrderId="0">
                <inputVariables>
                    <variable formalParameter="x" negated="false">
                        <connectionPointIn>
                            <connection refLocalId="1"/>
                        </connectionPointIn>
                    </variable>
                    <variable formalParameter="y" negated="false">
                        <connectionPointIn>
                            <connection refLocalId="2"/>
                        </connectionPointIn>
                    </variable>
                </inputVariables>
                <outputVariables>
                    </variable formalParameter="MyAdd" negated="false">
                </outputVariables>
            </block>
            <outVariable localId="4" height="20" width="80" executionOrderId="1" negated="false" storage="none">
                <position x="680" y="160"/>
                <connectionPointIn>
                    <connection refLocalId="3" formalParameter="MyAdd"/>
                </connectionPointIn>
                <expression>z</expression>
            </outVariable>
        </FBD>
    </body>
</pou>