FAST-HEP-flow#
Introduction#
fasthep-flow
is a package for describing data analysis workflows in YAML and
converting them into a workflow DAG that can be run by software like Dask. It is
designed to be used with the fast-hep package
ecosystem, but can be used independently.
The goal of this package is to define a workflow, e.g. a HEP analysis, in a YAML file, and then convert that YAML file into a workflow DAG. This DAG can then be run on a local machine, or on a cluster using CERN’s HTCondor (via Dask) or Google Cloud Composer.
In fasthep-flow
’s YAML files draw inspiration from Continuous Integration (CI)
pipelines and Ansible Playbooks to define the workflow, where each independent
task that can be run in parallel. fasthep-flow
will check the parameters of
each task, and then generate the DAG. The DAG will have a task for instruction,
and the dependencies between the tasks will be defined by the needs
key in the
YAML file. More on this under Configuration.
Tip
fasthep-flow
is still in early development, and the API is not yet stable. Please report any issues you find on the GitHub issue tracker.Curious how this looks in action? Have a quick look at the CMS Public Tutorial example.