Resource Pools#

Resource pools are the mechanism FAST-HEP uses to route work to appropriate workers.

They allow a single workflow to use multiple types of compute resources without splitting the analysis into separate jobs.

Motivation#

Scientific workflows often contain stages with very different requirements.

Examples include:

lightweight filtering,
memory-intensive preprocessing,
machine learning inference,
GPU-accelerated calculations.

Traditionally these stages are executed as separate workflows or submitted to different queues manually.

FAST-HEP allows them to coexist within a single workflow.

Resources vs Pools#

A useful way to think about the execution system is:

Resources describe what a stage needs.

Pools describe what workers exist.

Resources#

Stages request resources:

analysis:
  stages:
    - id: BuildIndex
      op: custom.build_index
      execution:
        require: high_memory

    - id: Inference
      op: custom.inference
      execution:
        require: gpu

The stage does not know:

which batch system is used,
how many workers exist,
which machine will execute it.

It only describes what it needs.

Pools#

Pools describe available worker groups:

execution:
  pools:

    default:
      workers: 20
      resources:
        cpus: 1
        memory: 4GB

    high_memory:
      workers: 2
      resources:
        cpus: 8
        memory: 128GB

    gpu:
      workers: 1
      resources:
        gpus: 1
        memory: 16GB

Each pool creates workers with specific capabilities.

Routing#

When a stage requests:

execution:
  require: high_memory

FAST-HEP schedules that stage onto workers from the corresponding pool.

Likewise:

execution:
  require: gpu

will only execute on workers belonging to the GPU pool.

Conceptually:

Stage
    ↓
Required resource
    ↓
Matching pool
    ↓
Worker

A Typical Example#

Consider a workflow that:

builds an event index,
performs GPU inference,
creates histograms.

analysis:
  stages:

    - id: BuildIndex
      op: custom.index
      execution:
        require: high_memory

    - id: RunInference
      op: custom.inference
      execution:
        require: gpu

    - id: MuonPt
      op: hep.hist

With pools:

execution:
  pools:

    default:
      workers: 10

    high_memory:
      workers: 2
      resources:
        memory: 128GB

    gpu:
      workers: 1
      resources:
        gpus: 1

Execution proceeds automatically:

BuildIndex
    → high_memory workers

RunInference
    → gpu workers

MuonPt
    → default workers

No manual job splitting is required.

Pool Profiles#

Many sites use the same resource configurations repeatedly.

Profiles can be used to avoid repetition.

For example:

execution:
  pools:

    default:
      use: standard_worker

    gpu:
      use: gpu_worker

where profiles define the actual worker configuration.

This allows sites to standardise resource definitions while analyses remain portable.

Pool-Specific Configuration#

Pools may also carry backend-specific configuration.

For example:

execution:
  pools:

    high_memory:
      workers: 2

      resources:
        memory: 128GB

      config:
        walltime: 04:00:00

The exact configuration options depend on the selected execution strategy.

For example:

HTCondor

may expose different scheduling options.

Heterogeneous Execution#

The most important feature of resource pools is heterogeneous execution.

A single workflow can use:

CPU workers
High-memory workers
GPU workers

simultaneously.

This allows workflows to express the natural structure of an analysis rather than forcing users to divide it into separate batch submissions.

Resource Pools and Dask#

When using the Dask backend, pools are translated into worker groups.

Each pool advertises resources to Dask:

resource.default
resource.high_memory
resource.gpu

Stages are automatically annotated with the required resources.

Dask then routes tasks to matching workers.

The analysis author does not need to manage this routing manually.