When the accelerator is executed via the Accelerator Engine, it produces a zip file containing a set of files. The purpose of the engine section is to describe precisely how the contents of that zip file is to be created.

accelerator:
   ... 
engine:
  <transform-definition>  # <--- focus of this document

Why 'Transforms'?

The result of running an accelerator is produced somehow from the contents of the accelerator itself. It is made up out of subsets of the files taken from the accelerator <root> directory (and its subdirectories). The files can be copied verbatim or transformed in a number of ways before being added to the result.

As such the yaml notation in the engine section defines a transformation that takes as input a set of files (the stuff in the <root> directory of the accelerator) and produces as output another set of files (the files to be put into the zip).

Every transform has a type. Different types of transform have different behaviors and different yaml properties that control precisely what they do.

Let's look at a simple example to make this clearer. A transform of type Include is a 'filter. It takes as input a set of files and produces as output a subset of those files, retaining only those files whose path matches any one of a given list of patterns.

So if our accelerator has something like this:

engine:
  type: Include
  patterns: ['**/*.java']

Then this accelerator produces a zip file containing all the .java files from the accelerator <root> (or its subdirectories) but nothing else.

Transforms can also operate on the contents of a file (instead of just selecting it for inclusion).

For example:

type: ReplaceText
substitutions:
- text: hello-fun
  with: "#artifactId"

This transform looks for all occurrences of a string hello-fun in all its input files and replaces them with an artifactId (which is the result of evaluating a SpEL expression).

Combining Transforms

It should be obvious from the above examples that transforms like ReplaceText and Include are too 'primitive' to be useful just by themselves.

They are meant to be used as smaller building blocks to be composed into more complex accelerators.

To combine transforms we provide two operators called Chain and Merge. These operators are 'recursive' in the sense that they compose a number of 'child' transforms to create a more complex transform. This allows building up arbitrarily deep/complex trees of nested transform definitions.

Let's try to understand what each of these two operators does with an example; and then also try to understand the typical way that they would be used together.

Chain

Since transforms are functions whose input and output are of the same type (a set of files), you can take the ouptut of one function and feed it as input to another. This is what Chain does. In mathematical terms, Chain is function composition.

For an example of when/why we might want to do that, consider the ReplaceText transform again. Used by itself, it replaces text strings in all the accelerator input files. But what if we wanted to apply this replacement only to a subset of the files? Then we can use an Include filter to select only a subset of files of interest and chain that subset into ReplaceText. For example:

type: Chain
transformations: 
- type: Include
  patterns: ['**/pom.xml']
- type: ReplaceText
  substitutions:
  - text: hello-fun
    with: "#artifactId"

Merge

Chaining Include into ReplaceText limits the scope of ReplaceText to a subset of the input files. But unfortunately it also eliminates all the other files from the result. For example:

engine:
  type: Chain
  transformations: 
  - type: Include
    patterns: ['**/pom.xml']
  - type: ReplaceText
    substitutions:
    - text: hello-fun
      with: "#artifactId"

The above accelerator produces a zip file that only contains pom.xml files and nothing else.

What if we also wanted to have other files in that zip? Maybe we want to include some Java files as well (but we don't want to apply the same text replacement to them).

It may be tempting to write something like this:

engine:
  type: Chain
  transformations: 
  - type: Include  # <--- only `pom.xml` files can pass through this section of the chain
    patterns: ['**/pom.xml']
  - type: ReplaceText
    ...
  # <--- at this point in the chain the sourceset only has `pom.xml` files included.
  - type: Include  # <--- only `.java` files can pass through this section of the chain
    patterns: ['**/*.java'] 
  # <--- ... so at this point in the chain there are no more files

Unfortunately that doesn't work. The reason is that if we chain non-overlapping includes together like this then the result will be an empty result set. The reason is that the first include retains only pom.xml files. These files are fed to the next transform in the chain. The second include only retains .java files, but since there are only pom.xml files left in the input at this point... the result is an empty set.

This is where Merge comes in. A Merge takes the outputs of several transforms executed independently on the same input sourceset and combines or 'merges' them together into a single sourceset.

So for example:

engine:
  type: Merge
  sources: 
  - type: Chain
    - type: Include
      patterns: ['**/pom.xml']
    - type: ReplaceText
      ...
  - type: Include
    patterns: ['**/*.java']

The above accelerator produces a result which includes both:

  • the pom.xml files with some text replacements applied to them.
  • verbatim copies of all the .java files.

Shortened notation

It gets cumbersome and verbose to combine transforms like Include, Exclude and ReplaceText with explicit Chain and Merge operators. Also there is a 'natural' and very common composition pattern to using them (i.e. select an interesting subset using includes/excludes; apply a chain of additional transformations to the subset; then merge the result with other stuff produced by other transforms).

That is why we provide a 'swiss army knife' transform (aka the Combo transform) that combines 'Include', 'Exclude', 'Merge' and 'Chain' together in a natural way. Here's an example / template of what it looks like:

type: Combo  # <--- Combo is the default type, so this line can be omitted.
include: ['**/*.txt', '**/*.md']
exclude: ['**/secret/*']
merge: 
- <transform-definition>
- ...
chain:
- <transform-definition>
- ...

Each of the properties in this Combo transform is optional (as long as you specify at least one).

Notice how each of the properties include, exclude, merge and chain corresponds to the name of a type of transform (but spelled with lower case letters).

Intuitively if you specify only one of the properties the Combo transform behaves exactly the same as if you used that type of transformation by itself.

So, for example:

merge: ...

Behaves the same as:

type: Merge
sources: ...

When you do specify multiple properties at the same time then the 'combo' transform composes them together in a "logical way" using a combination of Merge and Chain under the hood.

So for example:

include: ['**/*.txt', '**.md']
chain: 
- type: ReplaceText
  ...

Is the same as:

type: Chain
transformations:
- type: Include
  patterns: ['**/*.txt', '**.md']
- type: Chain
  trasformations:
  - type: ReplaceText
    ...

When you use all of the properties of Combo at once:

include: I
exclude: E
merge: 
- S1
- S2
chain:
- T1
- T2

This is equivalent to:

type: Chain
transformations:
- type: Include
  patterns: I
- type: Exclude
  patterns: E
- type: Merge
  sources: 
  - S1
  - S2
- T1
- T2

TODO: Add a boxes and arrows 'picture' of the above combo transform?

A Combo of one?

Note that you can use the Combo as a convenient shorthand for a single type of annotation (i.e. while you can use it to combine multiple types, and while that is its main purpose; that doesn't mean you have to). For example:

include: ["**/*.java"]

This is a Combo transform (remember: type: Combo is optional). But rather than combining multiple types of transforms, it only defines the include property. This makes it behaves exactly the same an Include transform:

type: Include
patterns: ["**/*.java"]

Therefore usually it is more convenient to use a Combo transform to denote a single Include, Exclude, Chain or Merge transform as it is slightly shorter to write it as a Combo than writing it with an explicit type: property.

A Common Pattern with Merge Transforms

It is a common and useful pattern to use merges with overlapping contents to apply a transformation to a subset of files and then replace these changed files within a bigger context.

For example:

engine:
  merge: # < --- a merge (using Combo transform shortcut notation)
  - include: ["**/*"] # Transform A selects all files (this includes 'pom.xml')
  - include: ["**/pom.xml"] # Transform B (selects only poms and replaces some text in them)
    chain: 
    - type: ReplaceText
        subsitutions: ...

The above accelerator will copy all files from acceleator <root> whilst applying some text replacements only to pom.xml files (other files are copied verbatim). Let's understand exactly how this works by picking it appart.

Transform A is applied to the files from accelerator <root>. It selects all files. Note that this also includes pom.xml files.

Transform B is also applied to the files from accelerator <root> (remember that Merge passes the same input independently to each of its child transforms). Transform B selects pom.xml files and replaces some text in them.

So both Transform A and Transform B output pom.xml files. The fact that both result sets contain the same file (and with different contents in them in this case) is a kind of conflict that has to be resolved. By default, Combo follows a very simple rule to resolve such conlicts: just take the contents from the last child. So essentially it behaves as if you overlaid both result sets one after another into the same location, and the contents of the latter are overwriting any previous files that were already placed there by the earlier.

In this example that means that, while both Transform A and Transform B produce contents for pom.xml, the contents from Transform B 'wins' so you get the version of the pom.xml that has text replacements applied to it (rather than the verbatim copy from Transform A).

Conditional Transforms

Every <transform-definition> can have a condition attribute.

  - condition: "#k8sConfig == 'k8s-resource-simple'"
    include: [ "kubernetes/app/*.yaml" ]
    chain:
      - type: ReplaceText
        substitutions:
         - text: hello-fun
           with: "#artifactId"

When a transform's condition is false then that particular transform is 'disabled'. What this means is that it gets replaced with a transform that 'does nothing'. Now what that exactly means is a little subtle because, perhaps surprisingly, 'doing nothing' actually means something different depending on the context you are in.

  • When in the context of a 'Merge' a disabled transform behaves like something that returns an empty set. Intuition: a Merge adds stuff together using a kind of union; adding an empty set to union essentially does nothing.

  • When in the context of a 'Chain' however, a disabled transform behaves like the 'identity' function instead (i.e. lambda (x) => x). Intuition: when you chain functions together a value is passed through all functions in succession. Thus, each function in the chain gets the chance to 'do something' by returning a different/modified value. So, if you are a function in a chain, then to 'do nothing', means 'return the input you received unchanged as your output'.

If this all sounds a bit confusing, fortunately there is an easy 'rule of thumb' you can use to understand and predict the effect a disabled transform will have in the context of your accelerator definition.

The rule is this: if a transform's condition evaluates to false, then just pretend it isn't there. In other words, your accelerator will behave the same as if you just deleted (or commented out) that transform's yaml text entirely from the accelerator definition file.

Let's look at two different examples to illustrate both cases.

Conditional 'Merge' transform

Our first example has a conditional transform in a Merge context:

merge: # <--- a merge (in Combo shortcut notation)
  # transform A <--- The transform of interest
  - condition: "#k8sConfig == 'k8s-resource-simple'"
    include: [ "kubernetes/app/*.yaml" ]
    chain:
      ...
  # transform B
  - include: [ "pom.xml" ]
    chain:
      ...

If the condition of Transform A above is false it gets replaced with an 'empty set' because it is being used in a Merge context. This has the same effect as if the whole of Transform A was deleted or commented out:

merge:
  # transform A evaluates to false so it behaves like it was commented out
  # - condition: "#k8sConfig == 'k8s-resource-simple'"
  #  include: [ "kubernetes/app/*.yaml" ]
  #  chain:
  #    ...
  # transform B
  - include: [ "pom.xml" ]
    chain: 
      ...

The result in this example is that if the condition is false, only pom.xml file will end up in the result.

Conditional 'Chain' transform

In our next example some conditional transforms are used in a Chain context:

merge:
- include: [ '**/*.json' ]
  chain: # <--- a Chain context
  - type: ReplaceText # Transform A
    condition: '#customizeJson'  
    substitutions: ...
  - type: JsonPrettyPrint # Transform B
    condition: '#prettyJson'
    indent: '#jsonIndent'

Note: the JsonPrettyPrint transform type is purely 'hypothetical'. We could have such a transform but we don't provide it at the moment.

In the above example both Transform A and Transform B are conditional and used in a Chain context. Transform A is chained after the include transform. Whereas Transform B is chained after Transform A. When either of these conditions is false, the corresponding transform will behave like the identity function (whatever set of files it gets as input is exactly what it returns as output).

You can see this behaves in accordance with our 'rule of thumb'. For example if Transform A's condtion is false. Then it behaves just as if Transform A wasn't there: Transform A is chained after include so it receives the include's result, returns it unchanged, and this is passed to Transform B. So in orther words... the result of the include is passed as is to Transform B. This is exactly what would also happen if Transform A wasn't there.

A small Gotcha with using Conditionals in Merge Transforms

As discussed above, it is a useful pattern to use merges with overlapping contents. But you have to be careful using this in combination with conditional transforms.

Reconsider our earlier example:

engine:
  merge: # < --- a merge context (in a Combo transform)
  - include: ["**/*"] # Transform A selects all files (this includes 'pom.xml')
  - include: ["**/pom.xml"] # Transform B (selects only poms and replaces some text in them)
    chain: 
    - type: ReplaceText
      subsitutions: ...

Now we add a little twist. Let's say we only wanted to include pom files if the user selects a useMaven option.

We might be tempted to simply add a 'condition' to Transform B so as to disable it when that option isn't selected:

engine:
  merge: # < --- a merge context (in a Combo transform)
  - include: "**/*" # Transform A selects all files (this includes 'pom.xml')
  - condition: '#useMaven'
    include: ["**/pom.xml"] # Transform B (selects only poms and replaces some text in them)
    chain: 
    - type: ReplaceText
      subsitutions: ...

Sadly, this doesn't do what you might expect. The final result will still contain pom.xml files. To understand why, remember the 'rule of thumb' for disabled transforms. The rule says that, if a transform is disabled, we pretend it simply isn't there. So when #useMaven is false the example reduces to:

engine:
  merge: # < --- a merge context (in a Combo transform)
  - include: ["**/*"] # Transform A selects all files (this includes 'pom.xml')

This accelerator simply copies all files from acceleator <root> including pom.xml.

There are several ways to avoid this pitfall. One is to make sure the pom.xml files are not included in Transform A by explicitly excluding them:

  ...
  - include: ["**/*"] 
    exclude: ["**/pom.xml"] # <--- Added this line
  ...

Another way is to apply the 'exclusion of pom.xml' conditionally in a Chain after the main transform:

engine:
  merge: # a merge (in `Combo` shortcut notation)
  - include: ["**/*"] # Transform A selects all files (this includes 'pom.xml')
  - include: ["**/pom.xml"] # Transform B (selects only poms and replaces some text in them)
    chain: 
    - type: ReplaceText
        subsitutions: ...
  chain: # <-- chained after 'merge' by 'Combo'
  - condition: '!#useMaven' # remove poms if user doesn't want them.
    exclude: ['**/pom.xml']

Merge conflict

A subtlety that needs explaining is that the representation of the 'Set of files' upon which transforms operate is 'richer' than what can be physically stored on a typical file system. A key difference is that our 'Set of Files' allows for multiple files with the same path to exist at the same time. Of course, when files are initially read from a physical file system, or a zip file, this situation does not arise. However, as transforms are applied to this input, it is possible to produce results that have more than one file with the same path (and different contents).

We have in fact already seen some typical examples where this happens through a merge operation. Recall this example:

merge: # < --- a merge context (in a Combo transform)
- include: ["**/*"] # Transform A selects all files (this includes 'pom.xml')
- include: ["**/pom.xml"] # Transform B (selects only poms and replaces some text in them)
  chain: 
  - type: ReplaceText
    subsitutions: ...

The result of the above merge will have two files with path pom.xml (assuming there was a pom.xml file in the input). Transform A produces a pom.xml that is a verbatim copy of the input file; Transform B produces a modified copy with some text replaced in it.

It is not possible to have two files on disk with the same path. Therefore this conflict has to be resolved before we can write the result to disk (or pack it into a zip file).

As the example shows, merges are likely to give rise to these conflicts. So it is somewhat intuitive to call this a 'Merge conflict'. It is however important to understand these kinds of conlficts can also arise from other operations such as, for example, RewritePath:

type: RewritePath
regex: '.*.md'
rewriteTo: "'docs/README.md'"

The above example will rename any .md file to docs/README.md. Assuming the input contains more than one .md file, then the output will contain multiple files with path docs/README.md. Again we have a conflict because we can only have one such file exist in a physical file system or zip file.

Resolving 'Merge' Conflicts

By default when a conflict arises the engine doesn't really do anything with it. Our internal representation for Set of Files allows for multiple files with the same path. Thus, the engine simply carries on manipulating the files as is. This isn't really a problem, until the files will need to be materialized to disk (or zip file). If a conflict is still present at that time then an error will be raised.

This means that if your accelerator produces these kinds of conflicts then they need to be resolved before files can be materialized to disk. To this end we provide the UniquePath transform. This transform allows specifying explicitly what should be done when more than one file has the same path. For example:

chain:
- type: RewritePath
  regex: '.*.md'
  rewriteTo: "'docs/README.md'"
- type: UniquePath
  strategy: Append

The result of the above transform is that all .md files are gathered up and concatenated into a single file at path docs/README.md. Other possible resolution strategies could be that you keep only the contents of one of the files (see Conflict Resolution).

Note that Combo transform also comes with some convenience support for conflict resolution built in (it automatically selects the UseLast strategy if none is explicitly supplied. This means that in practice you probably will rarely, if ever, need to explicitly specify a conflict resolution strategy.

Understanding file ordering

As mentioned above, our 'Set of Files' represenation is richer than the files on a typical file system. We already stated that it allows for multiple files with the same path. Another way in which it is 'richer' is that the files in the set are 'ordered' (i.e. a 'FileSet' is actually more like an ordered List than like an unordered Set).

In most situations, the order of files in a 'FileSet' doesn't really matter. However in conflict resolution it is actually significant. If we look at the RewritePath example again, you might ask about the order the various .md files will be appended to eachother. This ordering is directly determined by the order of the files in the input set.

That begs the question 'so what is that order?'. In general, when files are read from disk to create a FileSet, we can not assume a specific order. Yes, the files will be read and processed in some sequential order, but the actual order is not well-defined, it depends on implementation details of the underlying file system. The accelerator engine therefore does not guarantee a specific order in this case, it only guarantees that it preserves whatever ordering it gets from the file system, and processes files in accordance with that order.

As an accelerator author you should probably avoid relying on the file order produced from reading directly from a file system. Thus the RewritePath example above is something you probably shouldn't do... unless you do not particularly care about the ordering of all the different sections of the produced README.md file.

If, however, you do care and want to control the order explicitly, then you can make use of the fact that Merge will process its children in order and reflect this order in the resulting output Set of Files. For example:

chain:
  - merge:
      - include: ['README.md']
      - include: ['DEPLOYMENT.md']
        chain:
          - type: RewritePath
            rewriteTo: "'README.md'"
  - type: UniquePath
    strategy: Append

In this example we know without a doubt that README.md (from the first child of merge) comes before DEPLOYMENT.md (from the second child of merge). So in this example we can control the merge order directly (by changing the order of the merge children).

Conclusion

This concludes our 'Gentle' introduction. This introduction was focussed on an intuitive understanding of the <transform-definition> notation, which is used to describe precisely how the accelerator engine should generate new project content from the files in the accelerator root.

From here on you may want to move on to reading one of the following more detailed documents:

  • An exhaustive Reference of all built-in transform types,
  • A sample, commented accelerator.yaml to learn from a concrete example.
check-circle-line exclamation-circle-line close-line
Scroll to top icon