FAST-HEP-flow#

Introduction#

fasthep-flow is a package for describing data analysis workflows in YAML and converting them into a workflow DAG that can be run by software like Dask. It is designed to be used with the fast-hep package ecosystem, but can be used independently.

The goal of this package is to define a workflow, e.g. a HEP analysis, in a YAML file, and then convert that YAML file into a workflow DAG. This DAG can then be run on a local machine, or on a cluster using CERN’s HTCondor (via Dask) or Google Cloud Composer.

In fasthep-flow’s YAML files draw inspiration from Continuous Integration (CI) pipelines and Ansible Playbooks to define the workflow, where each independent task that can be run in parallel. fasthep-flow will check the parameters of each task, and then generate the DAG. The DAG will have a task for instruction, and the dependencies between the tasks will be defined by the needs key in the YAML file. More on this under Configuration.

Tip

Documentation#