General aspects of using McMule

In this section, we will collect a few general points of interest regarding McMule. In particular, we will discuss heuristics on how much statistics is necessary for different contributions in Section sec_stat. This is followed by a more in-depth discussion of the analysis strategy in Section Analysis.

Flavour

McMule implements generic processes that are connected to physical processes by setting the masses of the particles. For example, m2enng0 can refer to \(\mu\to e\nu\bar\nu\gamma\), \(\tau\to e\nu\bar\nu\gamma\), or \(\tau\to\mu\nu\bar\nu\gamma\), depending on the values of Mm and Me. These parameters are controlled by flavour as set during initflavour(). Further, initflavour() also sets the centre-of-mass energy scms for \(2\to n\) processes. We provide flavours for all relevant combinations as well as a few presets for scms as shown in Table 3. For processes that support it, appending a + or - can switch the charge of one of the invloved flavours.

Table 3 The different flavour in McMule

Flavour

Mapping

Default centre-of-mass energy

t

m

e

mu-e

μ

e

no default

tau-mu

τ

μ

no default

tau-e

τ

e

no default

tau-mue

τ

μ

e

no default

tau-emu

τ

e

μ

no default

mu-0

μ

0

no default

tau-0

τ

0

no default

e-p

p

e

no default

mu-p

p

μ

no default

muone

μ

e

\(m_e^2+m_\mu^2+2m_e\times150\,{\rm GeV}\)

mesa

p

e

\(m_e^2+m_p^2+2m_p\times155\,{\rm MeV}\)

belleTau

τ

e

\(M_{\Upsilon(4S)}^2\)

Statistics

McMule is a Monte Carlo program. This means it samples the integrand at \(N\) (pseudo-)random points to get an estimate for the integral. However, because it uses the adaptive Monte Carlo integration routine vegas [15], we split \(N=i\times n\) into \(i\) iterations (itmx), each with \(n\) points (ncall). After each iteration, vegas changes the way it will sample the next iteration based on the results of the previous one. Hence, the performance of the integration is a subtle interplay between \(i\) and \(n\) – it is not sufficient any more to consider their product \(N\).

Further, we always perform the integration in two steps: a pre-conditioning with \(i_{\text{ad}}\times n_{\text{ad}}\) (ncall_ad and itmx_ad, respectively), that is used to optimise the integration strategy and after which the result is discarded, and a main integration that benefits from the integrator’s understanding of the integrand.

Of course there are no one-size-fits-all rules of how to choose the \(i\) and \(n\) for pre-conditioning and main run. However, the following heuristics have proven helpful:

  • \(n\) is always much larger than \(i\). For very simple integrands, \(n=\mathcal{O}(10\cdot 10^3)\) and \(i=\mathcal{O}(10)\).

  • Increasing \(n\) reduces errors that can be thought of as systematic because it allows the integrator to ‘discover’ new features of the integrand. Increasing \(i\) on the other hand will rarely have that effect and only improves the statistical error. This is especially true for distributions

  • There is no real limit on \(n\), except that it has to fit into the datatype used – integrations with \(n=\mathcal{O}(2^{31}-1)\) are not too uncommon – while \(i\) is rarely (much) larger than 100.

  • For very stringent cuts it can happen that that typical values of \(n_{\text{ad}}\) are too small for any point to pass the cuts. In this case vegas will return NaN, indicating that no events were found. Barring mistakes in the definition of the cuts, a pre-pre-conditioning with extremely large \(n\) but \(i=1\!-\!2\) can be helpful.

  • \(n\) also needs to be large enough for vegas to reliably find all features of the integrand. It is rarely obvious that it did, though sometimes it becomes clear when increasing \(n\) or looking at intermediary results as a function of the already-completed iterations.

  • The main run should always have larger \(i\) and \(n\) than the pre-conditioning. Judging how much more is a delicate game though \(i/i_{\text{ad}} = \mathcal{O}(5)\) and \(n/n_{\text{ad}} = \mathcal{O}(10\!-\!50)\) have been proven helpful.

  • If, once the integration is completed, the result is unsatisfactory, take into account the following strategies

    • A large \(\chi^2/\rm{d.o.f.}\) indicates a too small \(n\). Try to increase \(n_{\text{ad}}\) and, to a perhaps lesser extent, \(n\).

    • Increase \(i\). Often it is a good idea to consciously set \(i\) to a value so large that the integrator will never reach it and to keep looking at ‘intermediary’ results.

    • If the error is small enough for the application but the result seems incorrect (for example because the \(\xi_{c}\) dependence does not vanish), massively increase \(n\).

  • Real corrections need much more statistics in both \(i\) and \(n\) (\(\mathcal{O}(10)\) times more for \(n\), \(\mathcal{O}(2)\) for \(i\)) than the corresponding LO calculations because of the higher-dimensional phase-space.

  • Virtual corrections have the same number of dimensions as the LO calculation and can go by with only a modest increase to account for the added functional complexity.

  • vegas tends to underestimate the numerical error.

These guidelines are often helpful but should not be considered infallible as they are just that – guidelines.

McMule is not parallelised; however, because Monte Carlo integrations require a random seed anyway, it is possible to calculate multiple estimates of the same integral using different random seeds \(z_1\) and combining the results obtained this way. This also allows to for a better, more reliable understanding of the error estimate.

Analysis

Once the Monte Carlo has run, an offline analysis of the results is required. This entails loading, averaging, and combining the data. This is automatised in pymule but the basic steps are

  1. Load the data into a suitable analysis framework such as python.

  2. Combine the different random seeds into one result per contribution and \(\xi_{c}\). The \(\chi^2/{\rm d.o.f.}\) of this merging must be small. Otherwise, try to increase the statistics or choose of different phase-space parametrisation.

  3. Add all contributions that combine into one of the physical contributions (18). This includes any partitioning done in Section Phase-space generation.

  4. (optional) At N\(^\ell\)LO, perform a fit[1]

    (2)\[\sigma_{n+j}^{(\ell)} = c_0^{(j)} + c_1^{(j)} \log\xi_{c} + c_2^{(j)} \log^2\xi_{c} + \cdots + c_\ell^{(j)} \log^\ell = \sum_{i=0}^\ell c_i^{(j)}\log^i\xi_{c}\]

    This has the advantage that it very clearly quantifies any residual \(\xi_{c}\) dependence. We will come back to this issue in Section Study of \xi_{c} dependence.

  5. Combine all physical contributions of (17) into \(\sigma^{(\ell)}(\xi_{c})\) which has to be \(\xi_{c}\) independent.

  6. Perform detailed checks on \(\xi_{c}\) independence. This is especially important on the first time a particular configuration is run. Beyond NLO, it is also extremely helpful to check whether the sum of the fits (2) is compatible with a constant, i.e. whether for all \(1\le i\le\ell\)

    (3)\[\Bigg| \frac{\sum_{j=0}^\ell c_i^{(j)} } {\sum_{j=0}^\ell \delta c_i^{(j)} } \Bigg| < 1\]

    where \(\delta c_i^{(j)}\) is the error estimate on the coefficient \(c_i^{(j)}\).[2] pymule’s mergefkswithplot() can be helpful here.

    If (3) is not satisfied or only very poorly, try to run the Monte Carlo again with an increased \(n\).

  7. Merge the different estimates of (17) from the different \(\xi_{c}\) into one final number \(\sigma^{(\ell)}\). The \(\chi^2/{\rm d.o.f.}\) of this merging must be small.

  8. Repeat the above for any distributions produced, though often bin-wise fitting as in Point 3 is rarely necessary or helpful.

    If a total cross section is \(\xi_{c}\) independent but the distributions (or a cross section obtained after applying cuts) are not, this is a hint that the distribution (or the applied cuts) is not IR safe.

These steps have been almost completely automatised in pymule and Mathematica. Though all steps of this pipeline could be easily implemented in any other language by following the specification of the file format below (Section Differential distributions and intermediary state files).

Manual compilation

You might need to compile McMule manually if you are not using a sufficiently recent Linux distribution or want to work it on yourself. In this case, you first need to obtain a copy of the McMule source code. We recommend the following approach

$ git clone --recursive https://gitlab.com/mule-tools/mcmule

To build McMule, you will need

  • Python 3.8 or newer

  • Meson 0.64.0 or newer

  • ninja 1.8.2 or newer

  • GFortran 4.8 or newer

Now you need to configure and build McMule using meson and ninja

$ meson setup build
$ ninja -C build

Note that this will distribute the build on as many CPUs as your machine has which can cause memory issues. If you do not want to do that, add -j <number of jobs> flag to the ninja command. Despite the parallelisation, a full build of McMule is can take up to 1h, depending on your machine. If you only need to compile some parts of McMule (such as Bhabha scattering), you can control which process groups are build

$ meson setup build -Dgroups=mue,ee

If you need debug symbols, you can disable optimisation

$ meson setup build --buildtype=debug

Alternatively, we provide a Docker container [17] for easy deployment and legacy results (cf. Section Basics of containerisation). In multi-user environments, udocker [12] can be used instead. In either case, a pre-compiled copy of the code can be obtained by calling

$ docker pull registry.gitlab.com/mule-tools/mcmule  # requires Docker to be installed
$ udocker pull registry.gitlab.com/mule-tools/mcmule # requires uDocker to be installed

Running in a container

Linux containers are an emergent new technology in the software engineering world. The main idea behind such containerisation is to bundle all dependencies with a software when shipping. This allows the software to be executed regardless of the Linux distribution running on the host system without having to install any software beyond the containerising tool. This is possible without any measurable loss in performance. For these reasons, containerising McMule allows the code to be easily deployed on any modern computer, including systems running macOS or Windows (albeit with a loss of performance), and all results to be perfectly reproducible.

A popular containerisation tool is Docker [17]. Unfortunately, Docker requires some processes to be executed in privileged mode which is rarely available in the multi-user environments usually found on computing infrastructures. This led to the creation of udocker [12] which circumvents these problems.

udocker can be installed by calling

Warning

It might be advisable to point the variable UDOCKER_DIR to a folder on a drive without quota first as udocker requires sizeable disk space

$ curl https://raw.githubusercontent.com/indigo-dc/udocker/master/udocker.py > udocker
$ chmod u+rx ./udocker
$ ./udocker install

Once Docker or udocker has been installed, McMule can be downloaded by simply calling

$ docker pull yulrich/mcmule  # requires Docker to be installed
$ udocker pull yulrich/mcmule # requires udocker to be installed

This automatically fetches the latest public release of McMule deemed stable by the McMule collaboration. We will discuss some technical details behind containerisation in Section Basics of containerisation.

McMule can be run containerised on a specified user.f95 which is compiled automatically into mcmule. This is possible both directly or using menu files as discussed above. To run McMule directly on a specified user.f95, simply call

$ ./tools/run-docker.sh -i yulrich/mcmule:latest -u path/to/user.f95  -r

This requests the same input already discussed in Table 1. To run a containerised menu file, add an image command before the first conf command in the menu file

image yulrich/mcmule:latest path/to/user.f95
conf m2enng.toml

run 70998 0.500000 m2enngR tau-e 0
...

Note that only one image command per menu file is allowed. After this, the menu file can be executed normally though the drive where Docker or udocker is installed needs to be shared between all nodes working on the job. It is recommended that all legacy results use be produced with udocker or Docker.