Technical How-To’s & Notes
Technical How-To’s & Notes
Monitoring CPU usage of your jobs on the HBSGrid
Monitoring CPU usage of your jobs on the HBSGrid
There are two ways to monitor your running job:
- Run unix command top on the compute node where your job is executing, or
- Query LSF via bjobs for current running jobs
Both have their advantages and disadvantages, which we'll discuss below. In either case, either method will give you real-time feedback on how your code is behaving.
Running top
on the compute node:
- Use
bjobs
to figure out what execution host (EXEC_HOST) your job is running on - For each host, get an interactive session on that machine (replace EXEC_HOST with
the actual machine name, e.g
rhrcsnod05
)bsub -q interactive -Is -W 24:00 -R "rusage[mem=1000]" -m EXEC_HOST /bin/bash
(This command gets a bash shell on the named machine with 1 core for 24 hrs with 1000 MB of RAM) - Run
top
, and watch for your processes. - (Optional) Press
u
and enter your username to view only your processes. - (Optional) Press
1
(the number one) to see the utilization of each cpu core or hyperthread. - Press or to stop monitoring.
Monitoring execution via
outputThis is less precise as LSF (the scheduler) must collect runtime data on a periodic basis from the execution hosts, so there will be a lag in information. Also, you may not have information for the first 1 to 5 minutes of your job.
- Use
bjobs
to figure out the job IDs for your jobs - For each job, use
bjobs -l jobID
to get job details - Look at the
IDLE_FACTOR
statistic. For a 1 core job at 100% efficiency, this will be 1. For a 2 core job, this will be 2, etc. An example of threading out would be asking for 2 cores and seeing a value of > 2.5 (e.g. 4, 6, etc).