11. Specification for Parallel Execution¶
Parallel computation includes distributed-memory parallelization and shared-memory parallelization, both of which are supported in PHITS. A hybrid mode combining these two approaches is also available.
For distributed-memory parallelization, MPI must be installed on your computer. For shared-memory parallelization, no additional protocols or software need to be installed. However, when using the same number of CPU cores, distributed-memory parallelization often results in shorter computation times.
Switching between single and parallel execution is controlled by compiler options, and separate executable files must be created for each mode. For details, see Section 10.
In distributed-memory parallelization, jobs are assigned to each CPU core in units of batches, and the main core collects the results after all cores have completed their batch calculations. Since all cores independently read geometry and tally data, the required memory is approximately proportional to the number of cores compared to single execution. Therefore, this approach is not suitable for calculations requiring large amounts of memory, such as voxel phantom simulations. In addition, since result collection waits for all cores to finish, computation time may become unnecessarily long if there is a large imbalance in processing time among cores, for example when the number of histories per batch (maxcas) is small.
In shared-memory parallelization, jobs are assigned to each core in units of histories, and all cores share geometry and tally data during the computation. Therefore, memory usage is comparable to that of single execution. However, since write access to memory may cause contention, computation time may become unnecessarily long for calculations that frequently write results, such as those using [t-sed].
11.1. Distributed-Memory Parallelization¶
11.1.1. Setup¶
To perform distributed-memory (MPI) parallel computation, an MPI implementation must be installed.
On Windows, when running parallel computations on a single PC, use the MPI included in the Intel oneAPI package.
For installation, refer to phits/document/Install-IntelFortran-OneAPI-en.pdf and Section 10.1.
When running parallel computations across multiple Windows PCs, refer to Windows-MPI-setup-jp.docx in the phits/document/mpi folder.
For macOS and Linux, install an MPI implementation by following appropriate instructions available on the internet.
11.1.2. Execution Using Batch Files or Shell Scripts¶
To run the MPI parallel version of PHITS on a single PC, add $MPI=M (M is the number of parallel processes) before the first section of the PHITS input file.
For example, to efficiently use a PC with four CPU cores:
$MPI = 4
Note that the actual number of processing elements (PEs) used in parallel execution is M+1, since one PE is used for control.
After saving the input file in this form, PHITS can be executed in MPI parallel mode using the standard execution procedure.
On Windows, when running for the first time, you will be prompted to enter a user name and password.
Currently, hybrid parallelization with OpenMP is not supported. If both $OMP (for OpenMP) and $MPI (for MPI) are specified in the input file, the one written later takes precedence.
11.1.3. Execution from the Command Line¶
When running from the command-line interface provided on each OS, the execution command is, for example:
mpirun -np 5 phits_LinIfort_MPI
Here, mpirun is the executable of the installed MPI implementation, the number after -np specifies the number of processing elements (PEs), and phits_LinIfort_MPI is the PHITS executable.
Submit this command using a system-specific method such as qsub.
In distributed-memory parallel mode, PHITS automatically reads the input file name from the file phits.in.
The file name phits.in is fixed.
Write the input file name in the first line of phits.in as follows:
file = input_file_name
Therefore, input redirection from the shell is not supported. This restriction applies only to distributed-memory parallel mode.
Alternatively, if you write file=phits.in in the first line, you can place the entire input data below it and execute PHITS.
11.1.4. Specification of maxcas and maxbch¶
In distributed-memory parallel computation, PHITS parallelizes the calculation in units of batches.
Therefore, the number of batches (maxbch) should be specified as a multiple of the number of processing elements used for computation (total PEs minus one, since one PE is used for control).
If this condition is not satisfied, the program automatically adjusts the values so that maxbch becomes a multiple and the total number of histories remains approximately unchanged. In such cases, a comment is printed at the end of the input echo in the output.
Batch-wise information is output every (number of batches × (PE − 1)) in distributed-memory parallel mode. Intermediate termination can also be performed at this unit.
For restart calculations (istdev < 0), maxcas is automatically adjusted to match previous results, so the total number of histories is not modified. Only maxbch is adjusted to be a multiple of (PE − 1).
11.1.5. Handling of Abnormal Termination¶
If the program terminates abnormally, the corresponding PE is removed, and the calculation continues with the remaining PEs.
The final result is obtained by summing the results from the remaining PEs. The status of each PE is reported in the batch information and in the calculation summary.
11.1.6. Output File Names for Dump, dumpall, and [t-userdefined]¶
When using distributed-memory parallelization, output files for Dump, dumpall, and [t-userdefined] are split, and a number corresponding to the PE (e.g., .005) is appended to each file name.
If the number of parallel processes is large (four or five digits), the number of digits in the suffix increases accordingly.
11.1.7. Specification of Input Files in PHITS¶
A typical example of files read by PHITS is a source file generated by Decay-Turtle.
The former is about 2.6 MB in size, and simultaneous access by all PEs is unlikely to impose a significant load on the network. However, larger files (on the order of 100 MB) may cause issues when accessed repeatedly by multiple PEs.
In such cases, it is recommended to copy the data file to each PE’s working directory (e.g., /wk/j9999/turtle/sours.dat) in advance, and specify it in the PHITS input as:
file = /wk/j9999/turtle/sours.dat