# Scalability

To calculate how well a GEOS-Chem simulation scaled, we compute the ratio of CPU time / wall time. The CPU time and wall time metrics can be obtained by checking the job information from your scheduler. We describe the steps to take to obtain this information and calculate this ratio in the sections below.

## Computing scalability from SLURM scheduler output

After your job has finished, type:

```  sacct -l -j JOBID
```

The -l option returns the “long” output information for your job. You may also specify which information you would like to obtain for your job. For example:

```  sacct -j JOBID --format=JobID,JobName,User,Partition,NNodes,NCPUS,MaxRSS,TotalCPU,Elapsed
```

From the returned output, note the values for TotalCPU and Elapsed. For example:

```        JobID    JobName      User  Partition   NNodes      NCPUS     MaxRSS   TotalCPU    Elapsed
------------ ---------- --------- ---------- -------- ---------- ---------- ---------- ----------
53901011     HEMCO+Hen+ ryantosca      jacob        1          8        16? 1-03:35:33   04:15:53
53901011.ba+      batch                             1          8   6329196K 1-03:35:33   04:15:53
```

Note that there are 2 entries. The first line represents queue to which you submitted the job (i.e. jacob, and the second line represents the internal queue name in which the job actually ran (i.e. batch).

A good measure of how well your job scales across multiple CPUs is the ratio of CPU time to wall-clock time. You can compute this by taking the ratio of the SLURM quantities

```  TotalCPU [s] / Elapsed [s]
```

as reported by the sacct command.

From the above example:

```  CPU time / wall time = 1d 03h 35m 33s / 4h 15m 53s
= 99333 s        / 15353 s    = 6.4699
```

A theoretically ideal job running on 8 CPUs would have a CPU time / wall time ratio of exactly 8. This in practice is never attained due to file I/O as well as system overhead. By dividing the ratio of CPU time / wall time computed above by the number of CPUs that were used (in this example, 8), you can get an estimate of how efficient your job was, compared to ideal performance:

```  % of ideal performance = ( 6.4699 / 8 ) * 100 = 80.87%
```

### Scripts for parsing SLURM scheduler output

To simplify reporting the scalability for GEOS-Chem jobs from SLURM scheduler output, the GEOS-Chem Support Team have written two Perl scripts, jobstats and jobinfo (upon which jobstats relies). All you need to supply is the SLURM job ID #, as shown below:

```  > jobstats 53901011

SLURM JobID #         : 53901011
Job Name              : HEMCO+Henry.run
Submit time           : 2015-12-17 11:16:16
Start  time           : 2015-12-17 11:16:16
End    time           : 2015-12-17 15:32:09
Partition             : jacob
Node                  : regal12
CPUs                  : 8
Memory                : 6.3292 GB
CPU  Time             : 1-03:35:33  (      99333 s)
Wall Time             : 04:15:53    (      15353 s)
CPU  Time / Wall Time : 6.4699      ( 80.87% ideal)
```

--Bob Yantosca (talk) 17:33, 21 December 2015 (UTC)

## Computing scalability from Grid Engine scheduler output

After your job has finished, type:

```  qacct -j JOBID
```

From the returned output, note the values for cpu and ru_wallclock. For example:

```  ==============================================================
qname        bench
hostname     titan-10.as.harvard.edu
group        mpayer
owner        mpayer
project      NONE
department   defaultdepartment
jobname      v10-01-public-release-Run1.run
jobnumber    81969
account      sge
priority     0
qsub_time    Thu Jun 18 17:14:15 2015
start_time   Thu Jun 18 17:14:55 2015
end_time     Fri Jun 19 01:01:48 2015
granted_pe   bench
slots        8
failed       0
exit_status  0
ru_wallclock 28013
ru_utime     189568.938
ru_stime     1718.925
ru_minflt    5437936
ru_majflt    23
ru_nswap     0
ru_inblock   36810536
ru_oublock   834224
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     390660
ru_nivcsw    19093052
cpu          191287.863
mem          1266832.593
io           30.376
iow          0.000
maxvmem      6.817G
arid         undefined
```

Calculate the CPU time / wall time ratio using:

```  cpu / ru_wallclock
```

From the above example:

```  CPU time / wall time = 191287.863 s / 28013 s = 6.8285
```

A theoretically ideal job running on 8 CPUs would have a CPU time / wall time ratio of exactly 8. This in practice is never attained due to file I/O as well as system overhead. By dividing the ratio of CPU time / wall time computed above by the number of CPUs (aka "slots") that were used (in this example, 8), you can get an estimate of how efficient your job was, compared to ideal performance:

```  % of ideal performance = ( 6.8285 / 8 ) * 100 = 85.35%
```

--Bob Yantosca (talk) 17:34, 21 December 2015 (UTC)

### Scripts for parsing Grid Engine output

To simplify reporting the scalability for GEOS-Chem jobs from Grid Engine scheduler output, the GEOS-Chem Support Team have written a Perl script ( scale). All you need to supply is the Grid Engine job ID #, as shown below:

```  > scale 81969

SGE JobID #       : 81969
Submitted at      : Thu Jun 18 17:14:15 2015
Run began at      : Thu Jun 18 17:14:55 2015
Run ended at      : Fri Jun 19 01:01:48 2015
Ran on host       : titan-10.as.harvard.edu
Ran in queue      : bench
CPU  Time [s]     : 191287.863 s
CPU  Time [h:m:s] : 53:08:07
Wall Time [s]     : 28013 s
Wall Time [h:m:s] : 07:46:53
Scalability       : 6.8285
```

--Bob Yantosca (talk) 17:20, 21 December 2015 (UTC)