Workflow language#

fasthep-flow workflows describe analysis intent declaratively.

Rather than explicitly writing:

  • event loops

  • scheduling logic

  • task orchestration

  • execution ordering

users describe:

  • data sources

  • transformations

  • histogramming

  • rendering

  • outputs

  • dependencies

The workflow engine then determines how workflows should be validated, normalised, planned, and executed.

Important

FAST-HEP workflows describe intent rather than implementation details.


A minimal workflow#

A typical workflow begins as a human-authored YAML file:

author.yaml

For example:

version: 1.0

use:
  profiles:
    - registry
    - fasthep_carpenter:registry
    - fasthep_render:registry
    - fasthep_workshop:registry

data:
  datasets:
    - name: toy
      eventtype: mc
      files:
        - toy://dy

sources:
  events:
    kind: workshop.toy_source
    stream_type: event_stream

analysis:
  stages:
    - id: BasicVars
      op: hep.define
      params:
        variables:
          - name: Muon_Pt
            expr: "sqrt(Muon_Px ** 2 + Muon_Py ** 2)"

This workflow:

  1. loads workflow profiles

  2. defines a dataset

  3. creates an event stream

  4. derives a new variable

The workflow itself does not describe:

  • execution order

  • scheduling

  • backend configuration

  • parallel execution details

These are inferred automatically during workflow compilation.


Workflow structure#

FAST-HEP workflows are organised into sections.

Common top-level sections include:

Section

Purpose

use

profiles, registries, presets

data

datasets and their defaults

sources

input stream definitions

styles

reusable rendering definitions

analysis

transforms and workflow stages

observers

diagnostics and inspection

strategies

backend/runtime tuning

Not every workflow requires every section.

For example:

  • simple workflows may omit styles

  • local workflows may omit strategies

  • workflows without rendering may omit render definitions entirely


Profiles and registries#

Profiles load workflow capabilities into the current workflow.

For example:

use:
  profiles:
    - registry
    - fasthep_carpenter:registry
    - fasthep_render:registry

These profiles register:

  • operations

  • sources

  • sinks

  • hooks

  • rendering implementations

Profiles allow workflows to remain modular and composable.

Note

The first profile, registry, is the built-in fasthep-flow registry.

Future presets may hide explicit registry configuration from user-facing workflows while preserving fully resolved execution plans internally.

For more information, see Profiles and Registries.


Datasets#

Datasets describe logical analysis inputs.

For example:

data:
  datasets:
    - name: dy
      eventtype: mc
      files:
        - data/DY.root

Datasets may represent:

  • local ROOT files

  • parquet datasets

  • distributed storage

  • virtual datasets

  • generated tutorial data

The meaning of dataset files depends on the source implementation.

For example:

files:
  - toy://dy

is interpreted internally by the workshop toy source rather than referencing a physical file.

Note

Sources may interpret datasets in different ways.

A source may:

  • read local files

  • prepend caches or redirectors

  • stream remote datasets

  • query databases

  • generate synthetic data on the fly

  • construct derived streams from other inputs

The only required contract is that sources produce workflow streams compatible with downstream operations.

In practice this typically means stream records behave like structured event or tabular data objects, for example dictionaries, awkward arrays, or backend-specific stream representations understood by the active runtime.


Sources#

Sources introduce streams into workflows.

For example:

sources:
  events:
    kind: workshop.toy_source

This source creates an event stream named:

events

which downstream workflow stages consume.

Sources may:

  • read ROOT trees

  • stream parquet datasets

  • generate synthetic events

  • query databases

  • construct derived streams


Declarative execution#

FAST-HEP workflows describe what should be computed rather than how computations should be scheduled.

For example:

- id: BasicVars
  op: hep.define
  params:
    variables:
      - name: Muon_Pt
        expr: "sqrt(Muon_Px ** 2 + Muon_Py ** 2)"

This workflow stage defines:

Muon_Pt

from:

Muon_Px
Muon_Py

The workflow engine automatically infers:

  • required inputs

  • data dependencies

  • execution ordering

  • downstream consumers

without requiring users to manually wire execution graphs together:

(Muon_Px, Muon_Py) Muon_Pt

This separation between workflow intent and execution strategy allows workflows to remain portable across runtime backends.

Note

The expression syntax used here is currently provided by fasthep-carpenter through the hep.define operation.

The syntax intentionally remains close to numexpr while adding a number of domain-specific functions and symbols commonly used in analysis workflows.

For example:

expr: "sqrt(Muon_Px ** 2 + Muon_Py ** 2)" is interpreted by the operation implementation rather than by fasthep-flow itself.

Different operations may choose to support different expression syntaxes or evaluation models.

Users are also free to implement custom operations with entirely different expression systems.


Streams and artifacts#

FAST-HEP workflows distinguish between streams and artifacts.

  • stream: flowing event/tabular data

  • artifact: produced object or output

Examples of streams include:

  • event records

  • awkward arrays

  • tabular data

  • partitioned datasets

Examples of artifacts include:

  • histograms (pkl)

  • plots (PNG)

  • tables

  • reports

  • other output files

This distinction is important because different operation types consume and produce different workflow objects.


Operations#

Workflow stages are implemented as operations.

Common operation categories include:

Category

Purpose

sources

introduce streams

transforms

derive or modify data

sinks

persist or aggregate outputs

renderers

generate plots and reports

hooks

react to runtime lifecycle events

observers

inspect workflow state

For example:

- id: MuonPt
  op: hep.hist

fills a histogram artifact from an event stream.

Detailed operation specifications are documented in Operations and Specs.


Workflow compilation#

author.yaml workflows are not executed directly.

Instead, workflows are compiled through several stages:

        flowchart TD

    subgraph Compile["Compilation and planning"]
        Author["author.yaml"]
        Profiles["profiles and registries"]
        Normalised["normalised workflow"]
        Dependency["dependency inference"]
        Plan["execution plan"]

        Author --> Normalised
        Profiles --> Normalised
        Normalised --> Dependency
        Dependency --> Plan
    end

    subgraph Execute["Runtime execution"]
        Runtime["runtime execution"]
        Outputs["artifacts and outputs"]

        Runtime --> Outputs
    end

    Plan --> Runtime
    

This compilation process allows workflows to be:

  • validated before execution

  • serialised and inspected

  • transformed into backend-specific plans

  • optimised independently of workflow logic

  • executed reproducibly across environments

For more information, see:


Language philosophy#

The FAST-HEP workflow language is designed around several core principles.

  1. Portable
    Workflow definitions remain independent of execution infrastructure and runtime backends.

  2. Reproducible
    Execution plans and workflow state can be serialised and preserved.

  3. Inspectable
    Workflows can be validated, normalised, and planned before runtime execution begins.

  4. Declarative
    Users describe analysis intent rather than implementation details.

  5. Extensible
    Capabilities are composed dynamically through profiles, registries, and operation specifications.


YAML reference#

This page introduces the concepts and structure of the workflow language.

For the complete YAML reference, see: