Better logging¶
Be default, the normal and erroneous outputs of the LAM model are stored in a single $RUN_DIR/**/out_execution file.
This is not ideal because logs from different MPI processes and components (LMDZ, DYNAMICO, etc.) are all mixed together.
Fortunately, we can improve the situation with the following tips.
Use srun process labels¶
The srun command of SLURM allows distinguishing log lines coming from different (MPI) processes.
In the Job_<JobName> script (e.g. Job_CREATE-amip-ERA5-LAM.01), generated by libIGCM, add -l or --label argument to srun
which will prepend the process ID to each log line:
- Prepend task (process) number to each line of the standard output and error streams (see
srun --labeldocumentation).
Should generate $RUN_DIR/**/out_execution log similar to this:
22: USING DEFAULTS : area_radius1 = 3360.00000000000
26: USING DEFAULTS : area_radius1 = 3360.00000000000
0: USING DEFAULTS : area_radius1 = 3360.00000000000
0: GETIN area_radius1 = 3360.00000000000
12: USING DEFAULTS : area_rotation_pre = 0.000000000000000E+000
12: USING DEFAULTS : area_rotation = 0.000000000000000E+000
0 up to the MPI_COMM_RANK - 1.
Separating the labelled logs¶
The log file can be then unmixed with ipsl_slurm_logs script from the ipsl-common Python package:
pip install git+https://gitlab.in2p3.fr/patryk.kiepas/ipsl-common.git
mkdir separated_logs/
ipsl_slurm_logs <YOUR_LOG_FILE> --output-dir separated_logs/
The separated log should appear inseparated_logs/output_<ID>.log files.
Consult the ipsl_slurm_logs --help section to find out more.
The potential of separated logs
Separated logs facilitate debugging of model errors, like paprs bad order,
which could occur only at specific grid points (often on the domain boundary).
Use sbatch output¶
We can also modify how sbatch creates the log files and output them to per-process files.