# Execution Infrastructure Execution infrastructure describes **where** and **how** a workflow runs. FAST-HEP separates analysis logic from execution details. The same workflow can run: * locally on a laptop * on a university HTCondor cluster * on a Slurm supercomputer * on heterogeneous CPU/GPU resources * or on future distributed platforms The analysis itself does not need to change. ## Overview Execution infrastructure consists of four layers: ```text Workflow ↓ Resources ↓ Pools ↓ Backend + Strategy ``` ### Workflow The workflow describes the scientific analysis: ```yaml analysis: stages: - id: SelectMuons op: hep.select ... ``` This layer should contain as little infrastructure-specific logic as possible. ### Resources Resources describe the capabilities required by a stage. Examples: ```yaml execution: require: gpu ``` ```yaml execution: require: high_memory ``` Resources are labels rather than concrete batch-system requests. This allows workflows to remain portable across sites. ### Pools Pools describe groups of workers with specific capabilities. Example: ```yaml execution: pools: default: workers: 20 resources: cpus: 1 memory: 4GB high_memory: workers: 2 resources: cpus: 4 memory: 64GB gpu: workers: 1 resources: gpus: 1 memory: 16GB ``` A stage requesting: ```yaml execution: require: high_memory ``` will be routed to the `high_memory` pool. A stage requesting: ```yaml execution: require: gpu ``` will be routed to the `gpu` pool. ## Backends and Strategies FAST-HEP separates execution into two concepts: ### Backend The backend is responsible for executing tasks. Examples: ```yaml execution: backend: local ``` ```yaml execution: backend: dask ``` ### Strategy The strategy determines how workers are created. Examples: ```yaml execution: strategy: htcondor ``` ```yaml execution: strategy: slurm ``` A backend may support multiple strategies. For example: ```yaml execution: backend: dask strategy: htcondor ``` and ```yaml execution: backend: dask strategy: slurm ``` both use Dask, but deploy workers differently. ## Heterogeneous Worker Pools Traditional analysis workflows often split processing into separate jobs: 1. preprocess data on high-memory nodes 2. run analysis on standard nodes 3. run inference on GPU nodes FAST-HEP allows these stages to exist within a single workflow. Example: ```yaml analysis: stages: - id: BuildIndex op: custom.build_index execution: require: high_memory - id: TrainModel op: custom.train execution: require: gpu - id: ProducePlots op: hep.hist ``` The execution system routes each stage to appropriate workers automatically. ## Worker Environments Workers need access to: * Python packages, * FAST-HEP components, * experiment software, * analysis code. FAST-HEP supports packaging worker environments independently from workflow definitions. Examples include: * shared filesystems * packed Pixi environments The workflow remains unchanged. ## Credentials Some workflows require access to protected data and authenticate via X509 proxies or Scitokens. Credentials are treated as execution infrastructure rather than analysis logic. This allows workflows to remain portable while execution systems handle credential transfer and setup. ## Execution Modifiers Execution modifiers adapt how a stage executes. Examples: ```yaml execution: modifiers: - gpu.preload ``` ```yaml execution: modifiers: - cuda.jit ``` Modifiers are applied at runtime and may: * preload data onto GPUs * wrap operations * collect diagnostics * perform profiling Modifiers are attached to stages and travel through the execution plan as metadata.