Variations of existing stages

Often it will be necessary to clone a stage or re-run the whole workflow with a to one variable. These might be due to new calibration constants, new systematics, or updated or new algorithms. In these cases it might be useful to have a way to define a new stage that is almost identical to an existing stage, but with a few changes.

fasthep-flow provides a way to do this by using the variations key in the YAML file. This key is a dictionary, where the keys are the names of the variations, and the values are the changes to the stage. The changes are defined in the same way as the stage itself, but only the changes are needed. The changes are applied to the stage, and the new stage is then added to the workflow.

Here’s an example of a variation:

stages:
  - name: my_stage
    type: fasthep_flow.operators.BashOperator
    kwargs:
      bash_command: echo "Hello World!"
variations:
    - name: my_stage [variation]
        changes:
          kwargs:
            bash_command: echo "Hello Universe!"

In this example, the my_stage [variation] stage is a variation of the my_stage stage. The only change is that the bash_command argument is changed from echo "Hello World!" to echo "Hello Universe!". Since the workflow has only one stage, the new stage will be added to the workflow in parallel to the original stage.

Changing data

Let’s say you are measuring the invariant mass of two particles, and you have a stage that calculates the mass. Since the invariant mass depends on the momenta and energies of the two particles, they are likely to have their own calibrations. Luckily, these up- and down-systematics are already included in the data, but under different names.

let’s take the example from the CMS Public Tutorial, where we have a stage that calculates the invariant mass of two muons. The stage looks like this:

- name: Muon Invariant Mass
  type: fasthep_carpenter.operators.DiObjectMass
  kwargs:
    four_momenta: ["Muon_Px", "Muon_Py", "Muon_Pz", "Muon_E"]
    output: "DiMuonMass"
    when:
      all:
        - "NIsoMuon >= 2"
        - "Muon_Charge[0] == -Muon_Charge[1]"

The systematics are included in the data as Muon_Px_up and Muon_Px_down, Muon_Py_up and Muon_Py_down, and so on. We can use the variations key to define a new stage that uses the up-systematics:

variations:
  - name: Muon Invariant Mass [up]
    changes:
      kwargs:
        four_momenta: ["Muon_Px_up", "Muon_Py_up", "Muon_Pz_up", "Muon_E_up"]
  - name: Muon Invariant Mass [down]
    changes:
      kwargs:
        four_momenta:
          ["Muon_Px_down", "Muon_Py_down", "Muon_Pz_down", "Muon_E_down"]

Since this is a full analysis example, this stage is not in isolation. Before this stage we have the Input data and Create variables stages, and after it we have the Creating histograms stages, and after it we have the Creating histograms, Select events, Creating histograms after selection, and Output data stages. As previously, a new stage will be added to the workflow in parallel to the original stage, but fasthep-flow will also create new stages for all subsequent stages, and add them to the workflow in parallel to the original stages.

Alternatively, if you have lots of variations, you might want to use the source key to define the location of the variations:

variations:
  source: /path/to/variation_*.yaml