gwin_make_workflow: A parameter estimation workflow generator

Introduction

The executable gwin_make_workflow is a workflow generator to setup a parameter estimation analysis.

Workflow configuration file

A simple workflow configuration file

[workflow]
; basic information used by the workflow generator
file-retention-level = all_triggers
h1-channel-name = H1:DCS-CALIB_STRAIN_C02
l1-channel-name = L1:DCS-CALIB_STRAIN_C02

[workflow-ifos]
; the IFOs to analyze
h1 =
l1 =

[workflow-datafind]
; how the workflow generator should get frame data
datafind-h1-frame-type = H1_HOFT_C02
datafind-l1-frame-type = L1_HOFT_C02
datafind-method = AT_RUNTIME_SINGLE_FRAMES
datafind-check-segment-gaps = raise_error
datafind-check-frames-exist = raise_error
datafind-check-segment-summary = no_test

[workflow-inference]
; how the workflow generator should setup inference nodes
num-events = 1
plot-1d-mass = mass1 mass2 mchirp q
plot-1d-orientation = ra dec tc polarization inclination coa_phase
plot-1d-distance = distance redshift

[executables]
; paths to executables to use in workflow
inference = ${which:run_gwin}
inference_posterior = ${which:gwin_plot_posterior}
inference_prior = ${which:gwin_plot_prior}
inference_rate = ${which:gwin_plot_acceptance_rate}
inference_samples = ${which:gwin_plot_samples}
inference_table = ${which:gwin_table_summary}
plot_spectrum = ${which:pycbc_plot_psd_file}
results_page = ${which:pycbc_make_html_page}

[datafind]
; datafind options
urltype = file

[inference]
; command line options use --help for more information
sample-rate = 2048
low-frequency-cutoff = 20
strain-high-pass = 15
pad-data = 8
psd-estimation = median
psd-segment-length = 16
psd-segment-stride = 8
psd-inverse-length = 4
processing-scheme = cpu
checkpoint-interval = 1000
sampler = kombine
nwalkers = 5000
update-interval = 500
n-independent-samples = 5000
burn-in-function = max_posterior
save-psd =
save-strain =
save-stilde =
nprocesses = 12
resume-from-checkpoint =

[pegasus_profile-inference]
; pegasus profile for inference nodes
condor|request_memory = 20G
condor|request_cpus = 12

[inference_posterior]
; command line options use --help for more information
plot-density =
plot-contours =
plot-marginal =

[inference_prior]
; command line options use --help for more information

[inference_rate]
; command line options use --help for more information

[inference_samples]
; command line options use --help for more information

[inference_table]
; command line options use --help for more information

[plot_spectrum]
; command line options use --help for more information

[results_page]
; command line options use --help for more information
analysis-title = "GWin Test"

Inference configuration file

You will also need a configuration file with sections that tells gwin how to construct the priors. A simple inference configuration file is

[model]
name = gaussian_noise

[variable_params]
; parameters to vary in inference sampler
tc =
mass1 =
mass2 =
distance =
coa_phase =
inclination =
ra =
dec =
polarization =

[static_params]
; parameters that do not vary in inference sampler
approximant = SEOBNRv2_ROM_DoubleSpin
f_lower = 28.0

[prior-tc]
; how to construct prior distribution
name = uniform
min-tc = 1126259462.2
max-tc = 1126259462.6

[prior-mass1]
; how to construct prior distribution
name = uniform
min-mass1 = 10.
max-mass1 = 80.

[prior-mass2]
; how to construct prior distribution
name = uniform
min-mass2 = 10.
max-mass2 = 80.

[prior-distance]
; how to construct prior distribution
name = uniform
min-distance = 10
max-distance = 500

[prior-coa_phase]
; how to construct prior distribution
name = uniform_angle
; uniform_angle defaults to [0,2pi), so we
; don't need to specify anything here

[prior-inclination]
; how to construct prior distribution
name = sin_angle

[prior-ra+dec]
; how to construct prior distribution
name = uniform_sky

[prior-polarization]
; how to construct prior distribution
name = uniform_angle

A simple configuration file for parameter estimation on the ringdown is:

[model]
name = gaussian_noise

[variable_params]
; parameters to vary in inference sampler
tc =
f_0 =
tau =
amp =
phi =

[static_params]
; parameters that do not vary in inference sampler
approximant = FdQNM
ra = 2.21535724066
dec = -1.23649695537
polarization = 0.
f_lower = 28.0
f_final = 512

[prior-tc]
; how to construct prior distribution
name = uniform
min-tc = 1126259462.4
max-tc = 1126259462.5

[prior-f_0]
; how to construct prior distribution
name = uniform
min-f_0 = 200.
max-f_0 = 300.

[prior-tau]
; how to construct prior distribution
name = uniform
min-tau = 0.0008
max-tau = 0.020

[prior-amp]
; how to construct prior distribution
name = uniform
min-amp = 0
max-amp = 1e-20

[prior-phi]
; how to construct prior distribution
name = uniform_angle

If you want to use another variable parameter in the inference sampler then add its name to [variable_params] and add a prior section like shown above.

The number of likelihood calculations grows exponentially with the number of variable parameters, and so in order to keep computational costs to a minimum, we can choose to marginalize over time and/or distance and/or phase. This removes the need to sample over those dimensions without affecting the other posterior PDFs.

We can turn on likehood marginalization by specifying the marginalized_gaussian_noise class in the [model] section. For example, we can modify the simple configuration file above to include distance marginalization,

[model]
name = marginalized_gaussian_noise
distance_marginalization =

[marginalized_prior-distance]
; how to construct prior distribution for distance marginalization
name = uniform
min-distance = 50.
max-distance = 5000.

[variable_params]
; parameters to vary in inference sampler
tc =
mass1 =
mass2 =
distance =
coa_phase =
inclination =
ra =
dec =
polarization =

[static_params]
; parameters that do not vary in inference sampler
approximant = SEOBNRv2_ROM_DoubleSpin
f_lower = 28.0

[prior-tc]
; how to construct prior distribution
name = uniform
min-tc = 1126259462.2
max-tc = 1126259462.6

[prior-mass1]
; how to construct prior distribution
name = uniform
min-mass1 = 10.
max-mass1 = 80.

[prior-mass2]
; how to construct prior distribution
name = uniform
min-mass2 = 10.
max-mass2 = 80.

[prior-distance]
; how to construct prior distribution
name = uniform
min-distance = 10
max-distance = 500

[prior-coa_phase]
; how to construct prior distribution
name = uniform_angle
; uniform_angle defaults to [0,2pi), so we
; don't need to specify anything here

[prior-inclination]
; how to construct prior distribution
name = sin_angle

[prior-ra+dec]
; how to construct prior distribution
name = uniform_sky

[prior-polarization]
; how to construct prior distribution
name = uniform_angle

If you want to also marginalize over time and/or phase, then add its name to the [model] section and add a corresponding prior under the section [marginalized_prior-${VARIABLE}] like shown above. If this was the case, then for this trivial example, you would no longer need to sample over the phase and coalescence time.

When working on real data, it is necessary to marginalise over calibration uncertainty. The model and parameters describing the calibration uncertainty can be passed in another ini file, e.g.:

# Details of set up as given in the O1 Binary Black Hole Paper https://arxiv.org/abs/1606.04856
[calibration]
h1_model = cubic_spline
h1_minimum_frequency = 10
h1_maximum_frequency = 1024
h1_n_points = 5
l1_model = cubic_spline
l1_minimum_frequency = 10
l1_maximum_frequency = 1024
l1_n_points = 5

[variable_params]
recalib_amplitude_h1_0 =
recalib_amplitude_h1_1 =
recalib_amplitude_h1_2 =
recalib_amplitude_h1_3 =
recalib_amplitude_h1_4 =
recalib_phase_h1_0 =
recalib_phase_h1_1 =
recalib_phase_h1_2 =
recalib_phase_h1_3 =
recalib_phase_h1_4 =
recalib_amplitude_l1_0 =
recalib_amplitude_l1_1 =
recalib_amplitude_l1_2 =
recalib_amplitude_l1_3 =
recalib_amplitude_l1_4 =
recalib_phase_l1_0 =
recalib_phase_l1_1 =
recalib_phase_l1_2 =
recalib_phase_l1_3 =
recalib_phase_l1_4 =

[prior-recalib_amplitude_h1_0]
name = gaussian
recalib_amplitude_h1_0_mean = 0
recalib_amplitude_h1_0_var = 0.0023

[prior-recalib_amplitude_h1_1]
name = gaussian
recalib_amplitude_h1_1_mean = 0
recalib_amplitude_h1_1_var = 0.0023

[prior-recalib_amplitude_h1_2]
name = gaussian
recalib_amplitude_h1_2_mean = 0
recalib_amplitude_h1_2_var = 0.0023

[prior-recalib_amplitude_h1_3]
name = gaussian
recalib_amplitude_h1_3_mean = 0
recalib_amplitude_h1_3_var = 0.0023

[prior-recalib_amplitude_h1_4]
name = gaussian
recalib_amplitude_h1_4_mean = 0
recalib_amplitude_h1_4_var = 0.0023

[prior-recalib_amplitude_l1_0]
name = gaussian
recalib_amplitude_l1_0_mean = 0
recalib_amplitude_l1_0_var = 0.0068

[prior-recalib_amplitude_l1_1]
name = gaussian
recalib_amplitude_l1_1_mean = 0
recalib_amplitude_l1_1_var = 0.0068

[prior-recalib_amplitude_l1_2]
name = gaussian
recalib_amplitude_l1_2_mean = 0
recalib_amplitude_l1_2_var = 0.0068

[prior-recalib_amplitude_l1_3]
name = gaussian
recalib_amplitude_l1_3_mean = 0
recalib_amplitude_l1_3_var = 0.0068

[prior-recalib_amplitude_l1_4]
name = gaussian
recalib_amplitude_l1_4_mean = 0
recalib_amplitude_l1_4_var = 0.0068

[prior-recalib_phase_h1_0]
name = gaussian
recalib_phase_h1_0_mean = 0
recalib_phase_h1_0_var = 0.0030

[prior-recalib_phase_h1_1]
name = gaussian
recalib_phase_h1_1_mean = 0
recalib_phase_h1_1_var = 0.0030

[prior-recalib_phase_h1_2]
name = gaussian
recalib_phase_h1_2_mean = 0
recalib_phase_h1_2_var = 0.0030

[prior-recalib_phase_h1_3]
name = gaussian
recalib_phase_h1_3_mean = 0
recalib_phase_h1_3_var = 0.0030

[prior-recalib_phase_h1_4]
name = gaussian
recalib_phase_h1_4_mean = 0
recalib_phase_h1_4_var = 0.0030

[prior-recalib_phase_l1_0]
name = gaussian
recalib_phase_l1_0_mean = 0
recalib_phase_l1_0_var = 0.0054

[prior-recalib_phase_l1_1]
name = gaussian
recalib_phase_l1_1_mean = 0
recalib_phase_l1_1_var = 0.0054

[prior-recalib_phase_l1_2]
name = gaussian
recalib_phase_l1_2_mean = 0
recalib_phase_l1_2_var = 0.0054

[prior-recalib_phase_l1_3]
name = gaussian
recalib_phase_l1_3_mean = 0
recalib_phase_l1_3_var = 0.0054

[prior-recalib_phase_l1_4]
name = gaussian
recalib_phase_l1_4_mean = 0
recalib_phase_l1_4_var = 0.0054

Generate the workflow

To generate a workflow you will need your configuration files. The workflow creates a single config file inference.ini from all the files specified in INFERENCE_CONFIG_PATH. We set the following enviroment variables for this example:

# name of the workflow
WORKFLOW_NAME="r1"

# path to output dir
OUTPUT_DIR=output

# input configuration files
CONFIG_PATH=workflow.ini
INFERENCE_CONFIG_PATH="gwin.ini calibration.ini"

Specify a directory to save the HTML pages:

# directory that will be populated with HTML pages
HTML_DIR=${HOME}/public_html/inference_test

If you want to run on the loudest triggers from a PyCBC coincident search workflow then run:

# run workflow generator on triggers from workflow
gwin_make_workflow --workflow-name ${WORKFLOW_NAME} \
    --config-files ${CONFIG_PATH} \
    --inference-config-file ${INFERENCE_CONFIG_PATH} \
    --output-dir ${OUTPUT_DIR} \
    --output-file ${WORKFLOW_NAME}.dax \
    --output-map ${OUTPUT_MAP_PATH} \
    --bank-file ${BANK_PATH} \
    --statmap-file ${STATMAP_PATH} \
    --single-detector-triggers ${SNGL_H1_PATHS} ${SNGL_L1_PATHS}
    --config-overrides workflow:start-time:${WORKFLOW_START_TIME} \
                       workflow:end-time:${WORKFLOW_END_TIME} \
                       workflow-inference:data-seconds-before-trigger:8 \
                       workflow-inference:data-seconds-after-trigger:8 \
                       results_page:output-path:${HTML_DIR} \
                       results_page:analysis-subtitle:${WORKFLOW_NAME}

Where ${BANK_FILE} is the path to the template bank HDF file, ${STATMAP_FILE} is the path to the combined statmap HDF file, ${SNGL_H1_PATHS} and ${SNGL_L1_PATHS} are the paths to the merged single-detector HDF files, and ${WORKFLOW_START_TIME} and ${WORKFLOW_END_TIME} are the start and end time of the coincidence workflow.

Else you can run from a specific GPS end time with the --gps-end-time option like:

# run workflow generator on specific GPS end time
gwin_make_workflow --workflow-name ${WORKFLOW_NAME} \
    --config-files ${CONFIG_PATH} \
    --inference-config-file ${INFERENCE_CONFIG_PATH} \
    --output-dir ${OUTPUT_DIR} \
    --output-file ${WORKFLOW_NAME}.dax \
    --output-map ${OUTPUT_MAP_PATH} \
    --gps-end-time ${GPS_END_TIME} \
    --config-overrides workflow:start-time:$((${GPS_END_TIME}-16)) \
                       workflow:end-time:$((${GPS_END_TIME}+16)) \
                       workflow-inference:data-seconds-before-trigger:2 \
                       workflow-inference:data-seconds-after-trigger:2 \
                       inference:psd-start-time:$((${GPS_END_TIME}-300)) \
                       inference:psd-end-time:$((${GPS_END_TIME}+748)) \
                       results_page:output-path:${HTML_DIR} \
                       results_page:analysis-subtitle:${WORKFLOW_NAME}

Where ${GPS_END_TIME} is the GPS end time of the trigger.

For the CBC example above define the environment variables GPS_END_TIME=1126259462 and OUTPUT_MAP_PATH=output.map.

Plan and execute the workflow

Finally plan and submit the workflow with:

# submit workflow
pycbc_submit_dax --dax ${WORKFLOW_NAME}.dax \
    --accounting-group ligo.dev.o3.cbc.explore.test \
    --enable-shared-filesystem