Scalability

From Geos-chem
Revision as of 22:04, 11 September 2015 by Melissa Payer (Talk | contribs) (Created page with "On this page we describe the scalability calculation used in the 1-month benchmark simulations. == Overview == To calculate how well a run scaled, we use CPU time / wall tim...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

On this page we describe the scalability calculation used in the 1-month benchmark simulations.

Overview

To calculate how well a run scaled, we use CPU time / wall time. The CPU time and wall time metrics can be obtained by checking the job information from your scheduler. We describe the steps to take to obtain this information and calculate scalability below.

SLURM scheduler

After your job has finished, type:

  sacct -l -j JOBID

The -l option returns the “long” output information for your job. You may also specify which information you would like to obtain for your job. For example:

  sacct -j JOBID --format=JobID,JobName,User,Partition,NNodes,NCPUS,MaxRSS,AveCPU,Elapsed

From the returned output, note the values for AveCPU and Elapsed. For example:

         JobID    JobName      User  Partition   NNodes      NCPUS     MaxRSS     AveCPU    Elapsed 
  ------------ ---------- --------- ---------- -------- ---------- ---------- ---------- ---------- 
  44603804     v10-01.run msulpriz+      jacob        1          8                         07:28:55 
  44603804.ba+      batch                             1          8      5.59G 2-04:46:51   07:28:55

Calculate the scalability using:

  AveCPU in seconds / Elapsed in seconds

From the above example:

  Scalability = 2-04:46:51 / 07:28:55 = 190011 s / 26935 s = 7.0544

SGE scheduler

After your job has finished, type:

  qacct -j JOBID

From the returned output, note the values for cpu and ru_wallclock. For example:

  ==============================================================
  qname        bench               
  hostname     titan-10.as.harvard.edu
  group        mpayer              
  owner        mpayer              
  project      NONE                
  department   defaultdepartment   
  jobname      v10-01-public-release-Run1.run
  jobnumber    81969               
  taskid       undefined
  account      sge                 
  priority     0                   
  qsub_time    Thu Jun 18 17:14:15 2015
  start_time   Thu Jun 18 17:14:55 2015
  end_time     Fri Jun 19 01:01:48 2015
  granted_pe   bench               
  slots        8                   
  failed       0    
  exit_status  0                   
  ru_wallclock 28013
  ru_utime     189568.938   
  ru_stime     1718.925     
  ru_maxrss    5941376             
  ru_ixrss     0
  ru_ismrss    0                   
  ru_idrss     0                   
  ru_isrss     0                   
  ru_minflt    5437936             
  ru_majflt    23                  
  ru_nswap     0                   
  ru_inblock   36810536            
  ru_oublock   834224              
  ru_msgsnd    0                   
  ru_msgrcv    0                   
  ru_nsignals  0                   
  ru_nvcsw     390660              
  ru_nivcsw    19093052            
  cpu          191287.863
  mem          1266832.593       
  io           30.376            
  iow          0.000             
  maxvmem      6.817G
  arid         undefined

Calculate the scalability using:

  cputime / ru_wallclock

From the above example:

  Scalability = 191287.863 / 28013 = 6.8285

--Melissa Sulprizio (talk) 22:04, 11 September 2015 (UTC)