Execution Infrastructure#
Execution infrastructure describes where and how a workflow runs.
FAST-HEP separates analysis logic from execution details. The same workflow can run:
locally on a laptop
on a university HTCondor cluster
on a Slurm supercomputer
on heterogeneous CPU/GPU resources
or on future distributed platforms
The analysis itself does not need to change.
Overview#
Execution infrastructure consists of four layers:
Workflow
↓
Resources
↓
Pools
↓
Backend + Strategy
Workflow#
The workflow describes the scientific analysis:
analysis:
stages:
- id: SelectMuons
op: hep.select
...
This layer should contain as little infrastructure-specific logic as possible.
Resources#
Resources describe the capabilities required by a stage.
Examples:
execution:
require: gpu
execution:
require: high_memory
Resources are labels rather than concrete batch-system requests.
This allows workflows to remain portable across sites.
Pools#
Pools describe groups of workers with specific capabilities.
Example:
execution:
pools:
default:
workers: 20
resources:
cpus: 1
memory: 4GB
high_memory:
workers: 2
resources:
cpus: 4
memory: 64GB
gpu:
workers: 1
resources:
gpus: 1
memory: 16GB
A stage requesting:
execution:
require: high_memory
will be routed to the high_memory pool.
A stage requesting:
execution:
require: gpu
will be routed to the gpu pool.
Backends and Strategies#
FAST-HEP separates execution into two concepts:
Backend#
The backend is responsible for executing tasks.
Examples:
execution:
backend: local
execution:
backend: dask
Strategy#
The strategy determines how workers are created.
Examples:
execution:
strategy: htcondor
execution:
strategy: slurm
A backend may support multiple strategies.
For example:
execution:
backend: dask
strategy: htcondor
and
execution:
backend: dask
strategy: slurm
both use Dask, but deploy workers differently.
Heterogeneous Worker Pools#
Traditional analysis workflows often split processing into separate jobs:
preprocess data on high-memory nodes
run analysis on standard nodes
run inference on GPU nodes
FAST-HEP allows these stages to exist within a single workflow.
Example:
analysis:
stages:
- id: BuildIndex
op: custom.build_index
execution:
require: high_memory
- id: TrainModel
op: custom.train
execution:
require: gpu
- id: ProducePlots
op: hep.hist
The execution system routes each stage to appropriate workers automatically.
Worker Environments#
Workers need access to:
Python packages,
FAST-HEP components,
experiment software,
analysis code.
FAST-HEP supports packaging worker environments independently from workflow definitions.
Examples include:
shared filesystems
packed Pixi environments
The workflow remains unchanged.
Credentials#
Some workflows require access to protected data and authenticate via X509 proxies or Scitokens. Credentials are treated as execution infrastructure rather than analysis logic. This allows workflows to remain portable while execution systems handle credential transfer and setup.
Execution Modifiers#
Execution modifiers adapt how a stage executes.
Examples:
execution:
modifiers:
- gpu.preload
execution:
modifiers:
- cuda.jit
Modifiers are applied at runtime and may:
preload data onto GPUs
wrap operations
collect diagnostics
perform profiling
Modifiers are attached to stages and travel through the execution plan as metadata.