Execution Infrastructure#

Execution infrastructure describes where and how a workflow runs.

FAST-HEP separates analysis logic from execution details. The same workflow can run:

locally on a laptop
on a university HTCondor cluster
on a Slurm supercomputer
on heterogeneous CPU/GPU resources
or on future distributed platforms

The analysis itself does not need to change.

Overview#

Execution infrastructure consists of four layers:

Workflow
    ↓
Resources
    ↓
Pools
    ↓
Backend + Strategy

Workflow#

The workflow describes the scientific analysis:

analysis:
  stages:
    - id: SelectMuons
      op: hep.select
      ...

This layer should contain as little infrastructure-specific logic as possible.

Resources#

Resources describe the capabilities required by a stage.

Examples:

execution:
  require: gpu

execution:
  require: high_memory

Resources are labels rather than concrete batch-system requests.

This allows workflows to remain portable across sites.

Pools#

Pools describe groups of workers with specific capabilities.

Example:

execution:
  pools:
    default:
      workers: 20
      resources:
        cpus: 1
        memory: 4GB

    high_memory:
      workers: 2
      resources:
        cpus: 4
        memory: 64GB

    gpu:
      workers: 1
      resources:
        gpus: 1
        memory: 16GB

A stage requesting:

execution:
  require: high_memory

will be routed to the high_memory pool.

A stage requesting:

execution:
  require: gpu

will be routed to the gpu pool.

Backends and Strategies#

FAST-HEP separates execution into two concepts:

Backend#

The backend is responsible for executing tasks.

Examples:

execution:
  backend: local

execution:
  backend: dask

Strategy#

The strategy determines how workers are created.

Examples:

execution:
  strategy: htcondor

execution:
  strategy: slurm

A backend may support multiple strategies.

For example:

execution:
  backend: dask
  strategy: htcondor

and

execution:
  backend: dask
  strategy: slurm

both use Dask, but deploy workers differently.

Heterogeneous Worker Pools#

Traditional analysis workflows often split processing into separate jobs:

preprocess data on high-memory nodes
run analysis on standard nodes
run inference on GPU nodes

FAST-HEP allows these stages to exist within a single workflow.

Example:

analysis:
  stages:

    - id: BuildIndex
      op: custom.build_index
      execution:
        require: high_memory

    - id: TrainModel
      op: custom.train
      execution:
        require: gpu

    - id: ProducePlots
      op: hep.hist

The execution system routes each stage to appropriate workers automatically.

Worker Environments#

Workers need access to:

Python packages,
FAST-HEP components,
experiment software,
analysis code.

FAST-HEP supports packaging worker environments independently from workflow definitions.

Examples include:

shared filesystems
packed Pixi environments

The workflow remains unchanged.

Credentials#

Some workflows require access to protected data and authenticate via X509 proxies or Scitokens. Credentials are treated as execution infrastructure rather than analysis logic. This allows workflows to remain portable while execution systems handle credential transfer and setup.

Execution Modifiers#

Execution modifiers adapt how a stage executes.

Examples:

execution:
  modifiers:
    - gpu.preload

execution:
  modifiers:
    - cuda.jit

Modifiers are applied at runtime and may:

preload data onto GPUs
wrap operations
collect diagnostics
perform profiling

Modifiers are attached to stages and travel through the execution plan as metadata.