Skip to Main Content
HBS Home
  • About
  • Academic Programs
  • Alumni
  • Faculty & Research
  • Baker Library
  • Giving
  • Harvard Business Review
  • Initiatives
  • News
  • Recruit
  • Map / Directions
Research Computing Services
  • Online Requests
  • FAQ
  • Blog
  • Contact Us
  • About Us
  • Faculty Projects
  • Training
  • Compute Cluster & Data Storage
  • Data Practices
  • Help
  • …→
  • Harvard Business School→
  • Research Computing Services→
  • Compute Cluster and Data Storage
    • Compute Cluster and Data Storage
    • Compute Cluster
    • Data Storage
    • Database Server
    • Other Research Computing Environments
    →
  • Compute Cluster
    • Compute Cluster
    • Technical Benefits and Features
    • Quick Start
    • Requesting an Account
    • Logging In
    • Copying & Extracting Files
    • Running Jobs
    • Software Tools
    →
  • Running Jobs
    • Running Jobs
    • Guidelines for Choosing Resources
    • Running a Program/Submitting a Job
    • Monitoring Jobs & Troubleshooting
    • Scaling Work
    →
  • Scaling Work→

Compute Cluster

Compute Cluster

  • Technical Benefits and Features
  • Quick Start
  • Requesting an Account
  • Logging In
  • Copying & Extracting Files
  • Running Jobs
  • Software Tools

Scaling Work

Scaling Work

  • Compute Cluster
    • Technical Benefits and Features
    • Quick Start
    • Requesting an Account
    • Logging In
    • Copying & Extracting Files
    • Running Jobs
      • Guidelines for Choosing Resources
      • Running a Program/Submitting a Job
      • Monitoring Jobs & Troubleshooting
      • Scaling Work
    • Software Tools
  • Data Storage
  • Database Server
  • Other Research Computing Environments
3ms
There are numerous ways to scale up your work on the HBSGrid, including parallel processing and GPUs.

Parallel Processing

Also commonly called parallel computing or multicore processing, using multiple cores (CPUs) to analyze data is an efficient way to get more work done in less time. But only under certain circumstances! There two basic ways to use multiple cores: implicit parallelism built in to your application or library, and explicit parallelism that you program and manage yourself. Explicit parallelism can be achieved using application/language native tools or using LSF job arrays.

Requesting multiple CPUs on the HBSGrid

When using parallel processing on shared compute systems, you need to indicate to the scheduler, the system software that manages workloads, that you wish to use multiple cores. On your personal desktop or laptop, this isn't necessary, as you control all the resources on that machine. However, on a compute cluster, you only control the resources that the scheduler has given you, and it has given you only the resources that you've requested, whether this is done explicitly via a custom job submission script, or implicitly using a default values or default submission scripts available on the HBS compute grid. This is due to the fact that jobs (work sessions) from multiple (and possibly) different people, are often running side-by-side on a given compute node on the compute cluster.

To use multiple CPUs on the HBSGrid you can start your application using a wrapper script {link to Working with wrapper (default submission) scripts subsection} or a custom submission script {link to Working with custom submission scripts subsection} and specify the number of CPUs you will use. For example, starting R with the command Rstudio -n 5 in the terminal when on a login node will call the wrapper script (Rstudio-5g; the RAM footprint -5g (5 GB RAM) is optional for the smallest one) and start R with 5 CPUs reserved. Note that one cannot use the NoMachine drop-down menus to start parallel jobs except for using Stata: The menus indicate if 1-core (Stata-SE), 4-core (Stata-MP4), or 8-core (Stata-MP8). If you wish to use multiple cores with other tools, you must issue the appropriate commands yourself in the terminal to request and reserve these resources from the scheduler.

Implicit Parallelism

Implicit parallelism is easiest to use but limited to the features offered by your application or programming language. Most of the applications commonly used for data analysis on the HBSGrid provide some degree of implicit parallelization. The system-wide installation of Rstudio / Microsoft R Open uses the Intel Math Kernel Lbrary (MKL) for fast multi-threaded computations.  The system-wide installation of Spyder / Python also use MKL to speed up some computations. Similarly, many Stata commands have been parallelized, as have some Matlab algorithms. Note that for all these applications only some computations use implicit parallelization and many computations will only use a single CPU. To speed up other computations you may be able to use explicit parallelization.

Most system-wide applications started on the HBSGrid via wrapper scripts will use the number of CPUs you specify when starting your application. For example, starting R via Rstudio -n 5 from the command line will start R with MKL correctly configured to use five cores.

Explicit Parallelism

Explicit parallelism can be achieved using application or library features to leverage multiple CPUs on a single compute node, or using LSF job arrays to leverage multiple CPUs across multiple compute nodes. For this to work, your script or code must be parallelizable -- it can be broken into parts that can execute independently. This is often the case with for loops or functions that can perform work independently. A good example is the apply functions in R (apply(), lapply(), etc.).

As when using implicit parallelism, you must request the number of CPUs you will use when submitting a job. We also recommend that you do not statically indicate ('hard code') the number of cores that you'll be using in your code. Instead, set this value dynamically based on job/runtime environment variables that are set as the job executes. You'll see examples of this in the following sections.

Finally, one must also factor in memory requirements for explicit parallelization. If the parallelization all happens within one job (e.g. Python's muliprocessing, certain approaches with Rparallel or Rfuture, or others), one must also determine how the memory will be consumed for each fork/thread/process/branch of the code. If each has it's own copy of the data and data structures, then memory requirements will increase significantly based on the number of parallel executions. Conversely, if each shares the data in memory with the parent program, then significantly less memory will be needed. Each application/programming framework works differently; consult the documentation and adjust the memory requirement appropriately when submitting the job.

Explicit parallelism uses application-specific libraries and features, and is described below for each of the most commonly used programs on the HBSGrid cluster.

MATLAB

Introduction

The following has been adapted from FAS RC’s Parallel MATLAB page (https://docs.rc.fas.harvard.edu/kb/parallel-matlab-pct-dcs/). As the Odyssey cluster uses a different workload manager, the code has been adapted to the workload manager on the HBS compute grid.

This page is intended to help you with running parallel MATLAB codes on the HBS compute grid. Parallel processing with MATLAB is performed with the help of two products, Parallel Computing Toolbox (PCT) and Distributed Computing Server (DCS). HBS is licensed only for use of the PCT.

Supported Versions: On the HBS compute grid, the following versions of MATLAB with the Parallel Computing Toolbox (PCT) are:

MATLAB Version Executable name
MATALB 2018a 64-bit matlab

Maximum Workers: PCT uses workers, MATLAB computational engines, to execute parallelized applications and their parts on CPU cores. Each compute node on the Grid has 32 physical cores; therefore (in theory) users should request no more than 32 cores when using MATLAB with PCT. However, due to current user resource limits, you should request no more than 12 (interactive) or 16 (batch) cores. If you request more than this, your job will not run as it will sit in a PEND state.

Code Example

The following simple code illustrates the use of PCT to calculate pi via a parallel Monte-Carlo method. This example also illustrates the use of parfor (parallel for) loops. In this scheme, suitable for loops could be simply replaced by parallel parfor loops without other changes to the code:

hLog = fopen( [mfilename, '.log'] , 'w' ); % Create log file

% Launch parallel pool with as many workers as requested
hPar = parpool( 'local' , str2num( getenv('LSB_MAX_NUM_PROCESSORS') ) );

% Report number of workers
fprintf( hLog , 'Number of workers = %d\n' , hPar.NumWorkers )

% Main code
R = 1; darts = 1e7; count = 0; % Prepare settings
tic; % Start timer

parfor i = 1:darts
x = R * rand(1);
y = R * rand(1);
if x^2 + y^2 <= R^2
count = count + 1
end
explanation_parallel_code

myPI = 4 * count / darts;
T = toc; % Stop timer

% Log results
fprintf( hLog , 'The computed value of pi is %2.7f\n' , myPI );
fprintf( hLog , 'Executed in %8.2f seconds\n' , T );

% shutdown pool, close log file, and exit
delete(gcp);
fclose(hLog);
exit;

Code with Job Submission Script

To run the above code (named code.m) using 5 CPU cores with the Grid's default wrapper scripts, in the terminal use the following command:

matlab -n5 code.m

Using custom job submission commands, the following will similarly submit the job with 5 cores for 10 mins with 100 MB of memory:

bsub -q short -N -n 5 -W 10 -R "rusage[mem=100]" -M 100 -hl matlab -r "run('code.m');"

This will cause a log file to be created called code.log owing to the first line in our MATLAB code, hLog=fopen( [mfilename, '.log'] , 'w' );

If you do not use MATLAB's mfilename function, then you may also enter the following command to have output sent to an unnamed output file:

bsub -q short -N -n 5 -W 10 -R ”rusage[mem=100]” -M -hl 100 matlab \< code.m

The < is escaped here so that it becomes part of the MATLAB command, not the bsub command.

If you wish to use a submission script to run this code and include LSF job option parameters, create a text file named code.sh containing the following:

____________________________________________________________

#!/bin/bash
#
#BSUB -q short
#BSUB -N
#BSUB -W 10
#BSUB -R" rusage[mem=100]"
#BSUB -W 100
matlab -r "run('code.m');"
____________________________________________________________

Once your script is ready, you may run it with 5 cores by entering:

bsub -n 5 < ./code.sh

The < character is used here so that the #BSUB directives in the script file are parsed by LSF.

Explanation of Parallel Code

Starting and stopping the parallel pool

The parpool function is used to initiate the parallel pool. To dynamically set the number of workers to the CPU cores you requested, we ask MATLAB to query the LSF environment variable LSB_MAX_NUM_PROCESSORS:

hPar = parpool( 'local', str2num( getenv( 'LSB_MAX_NUM_PROCESSORS' ) ) );

Once the parallelized portion of your code has been run, you should explicitly close the parallel pool and release the workers as follows:

delete(gcp); % Shutdown parallel pool

Parallelized portion of the code

The actual portion of the code that takes advantage of multiple CPUs is the parfor loop (http://www.mathworks.com/help/distcomp/parfor.html). A parfor loop behaves similarly to a for loop, though various iterations of the loop are passed to different workers. It is therefore important that iterations due not rely on the output of any other iteration in the same loop.

____________________________________________________________

parfor i = 1:darts
x = R * rand(1);
y = R * rand(1);
if x^2 + y^2 <= R^2
count = count + 1
end
end
____________________________________________________________
Python

Introduction

This section is intended to help you with running parallel python codes on the HBSGrid cluster or on your local multicore machine. The version of python on the cluster uses MKL to automatically parallelize some computations. Python started via a wrapper script will use the number of CPUs you specify when starting your application. For example, starting Python via python3 -n 5 from the command line will start Python with MKL correctly configured to use 5 cores. You can configure the number of cores MKL uses with the mkl-service package.

In addition to the implicit parallelization provided by MKL, you can explicitly parallelize your own analysis code using the 'multiprocessing' package. Instructions and examples are provided below. Note that this guide does NOT cover distributed computing, which distributes the workload over multiple machines.

Maximum Workers: Most compute nodes on the cluster have at least 32 physical cores; therefore (in theory) users should request no more than 32 cores. For short queue jobs, you may request the use of up to 16 cores, while the limit remains at 12 cores for long queue jobs. Nota Bene! The number of workers are dynamically determined by asking LSF (the scheduler) how many cores you have reserved via the LSB_MAX_NUM_PROCESSORS environment variable. DO NOT use multiprocessing.cpu_count() or similar; instead retrieve the values of this environment variable, e.g., os.getenv(LSB_MAX_NUM_PROCESSORS).

Example: Parallel Processing Basics

This sample code will provide a basic introduction to parallel processing. You will be shown how to set up your parallel pool with the appropriate number of workers, how to define which function is to be run in parallel, and how to gather the results.

For this example, we will calculate the square of a list of numbers in parallel.

____________________________________________________________
# file: fork_process.py

# This code both defines the function (f) to run
# and also (in __main__) forks a new process for each worker

import sys
import os
import multiprocessing
import time

def f(x):
  pid=os.getpid()
  print("{}:{}".format(pid,x*x))
  return x*x

if __name__ == "__main__":
  numList=range(1,100)
  procs = [multiprocessing.Process(target=f, args=(x,)) for x in numList]

  for p in procs:
    p.start()
    p.join()

____________________________________________________________


# And we get these outputs:

3728:1
10236:4
11508:9
8348:16
3012:25
13244:36
2528:49
8440:64
5184:81
11168:100
6848:121
13292:144
........

Example: Parallel Processing with Pools

____________________________________________________________
# File pool_process.py

# This code assumes the same function (f) as above
# but instead (in __main__) uses the requested cores to create
# persistent workers (process pool) to handle the spread of work

import sys
import os
import multiprocessing
import time

def f(x):
  pid=os.getpid()
  print("{}:{}".format(pid,x*x))
  return x*x

if __name__ == '__main__':

  numList=range(1,100)
  num_workers = os.getenv("LSB_MAX_NUM_PROCESSORS")

  p = multiprocessing.Pool(num_workers)
  result = p.map(f,numList)
  p.close()
  p.join()

____________________________________________________________


# And we get these outputs:

14452:1
14452:4
14452:9
14452:16
14452:25
14452:36
......
14452:2304
14452:2401
2940:2809
2940:2916
14452:2500
2940:3025
2940:3136
14452:2601
6452:3249
2940:3721
2940:3844

Code with Job Submission Script

To run the above code (named test.py) using 5 CPU cores via the cluster's default wrapper scripts, in the terminal use the following command:

python -n 5 pool_process.py

Submit the two examples code files above as follows:

bsub -q micro -n 5 -W 10 -R ”rusage[mem=1024]” -M 1024 python fork_process.py bsub -q micro -n 5 -W 10 -R ”rusage[mem=1024]” -M 1024 python pool_process.py

If you wish to use a submission script to run this code and include LSF job option parameters, create a text file named code.sh containing the following:

____________________________________________________________

#!/bin/bash
#
#BSUB -q micro
#BSUB -W 10
#BSUB -R" rusage[mem=1024]"
#BSUB -W 1024

pool_process.py
____________________________________________________________

Note that one can use one of micro mini short long as queue names with increasing run time limits.

Once your script is ready, you may run it with 5 cores by entering:

bsub -n 5 ./code.sh

Note, the < character is no longer needed when submitting jobs for LSF to parse #BSUB directives; this is done by default.

R

Implicit Parallelization

The system-wide installation of Rstudio / Microsoft R Open on the Grid uses the Intel Math Kernel Lbrary (MKL) for fast multithreaded computations. R started on the Grid via a wrapper script will use the number of CPUs you specify when starting your application. For example, starting R via Rstudio -n 5 from the command line will start R with MKL correctly configured to use 5 cores. You can set the number of cores used by MKL using the setMKLthreads function in the RevoUtilsMath package; more information about MKL in R is available. Some popular R packages, including data.table, also provide some degree of implicit parallelization. The number of threads used by data.table can be set using the setDTthreads function.

Explicit Parallelization

It is also possible to explicilty parallelize your own analysis code. There are a large number of R packages available for parallel computing.

The future package is simple, easy to use, and can make use of several backends to enable parallelization across CPUs, add-hoc clusters, HPC clusters (including LSF on the grid) via future.batchtools and others. A number of front-ends are available, including future.apply and furrr.

The foreach package is another popular option with a number of available backends, including doFuture that allows you to use foreach as a future frontend.

For a more comprehensive survey of parallel computing in R refer to the High Performance Computing Task View.

Code Examples

The following code examples were adapted from the Texas Advanced Computing Center (TACC) seminar "R for High Performance Computing" given through XSEDE (now ACCESS).

Below are a number of very simple examples to highlight how the frameworks can be included in your code. Nota Bene! The number of workers are dynamically determined by asking LSF (the scheduler) how many cores you have reserved via the LSB_MAX_NUM_PROCESSORS environment variable. DO NOT use the mc.detectcores() routine or anything similar, as this will clobber your code as well as any other code running on the same compute node.

All the following examples will use the following example function :

myProc <- function(size=10000000) {
#Load a large vector
vec <- rnorm(size)

#Now sum the vec values
return(sum(vec))
}

It is important not to use more cores than we've reserved:

## detect the number of CPUs we are allowed to use
n_cores <- as.integer(Sys.getenv('LSB_MAX_NUM_PROCESSORS'))
## use multiprocess backend
plan(multiprocess, workers = n_cores)

The future.apply package provides *apply functions that use future backends.

## replicate in parallel
library(future.apply)
future_replicate(10, myProc())

The doFuture package makes it easy to write parallel loops:

>library(doFuture)
registerDoFuture()
foreach (i = 1:10, .combine = c) %dopar% { myProc() }

Scheduler Submission (Job) Script

If submitted via the terminal, the following batch submission script will submit your R code to the compute grid and will allocate 4 CPU cores for the work (as well as 5 GB of RAM for a run time limit of 12 hrs). If your code is written as above, using LSB_MAX_NUM_PROCESSORS, then your code will detect that 4 cores have been allocated. (Please note the second, long command may wrap on the page, but should be submitted as one line)

#!/bin/bash
bsub -n 4 -q long -W 12:00 -R "rusage[mem=5000]" -M 5000 R CMD BATCH my_parallel_code.R
Stata

Introduction

StataMP provides implicilty parallel implementations of many functions, which are documented its 330 page Stata/MP Performance Report, describing which functions are parallelized and each's efficiency (how perfectly parallelized a given function is):

Stata/MP is the version of Stata that is programmed to take full advantage of multicore and multiprocessor computers. It is exactly like Stata/SE in all ways except that it distributes many of Stata’s most computationally demanding tasks across all the cores in your computer and thereby runs faster—much faster.

They could be impressive. But a caveat:

With multiple cores, one might expect to achieve the theoretical upper bound of doubling the speed by doubling the number of cores—2 cores run twice as fast as 1, 4 run twice as fast as 2, and so on. However, there are three reasons why such perfect scalability cannot be expected: 1) some calculations have parts that cannot be partitioned into parallel processes; 2) even when there are parts that can be partitioned, determining how to partition them takes computer time; and 3) multicore/multiprocessor systems only duplicate processors and cores, not all the other system resources.

In general:

Stata/MP achieved 75% parallelization efficiency overall and 85% efficiency among estimation commands... Speed is more important for problems that are quantified as large in terms of the size of the dataset or some other aspect of the problem, such as the number of covariates. On large problems, Stata/MP with 2 cores runs half of Stata’s commands at least 1.7 times faster than on a single core. With 4 cores, the same commands run at least 2.4 times faster than on a single core. NOTE: This is already a drop to 60% efficiency on 4 cores.

How to Utilize This?

This parallelization benefit is mostly realized in running code in batch mode. If using Stata interactively, Stata is predominantly waiting for user input, and so the parallelization gains diminish rapidly. If one intends to do intense, focused work for short periods of time (up to a few days) and subsequently exit the software, choosing multiple cores is fine. But if you plan to run an interactive session over the course of the day or two, please select Stata-SE, as the multiple cores that you have requested are reserved only for you and will sit idle during this time, decreasing the resources available to other people.

No additional work is needed for you to utilize the multiple CPU cores in your code. Stata will handle this transparently for you. But you do need to ensure that you ask the compute grid to reserve the cores for your use:

Using NoMachine (interactive only):

From the Applications menu, select the Stata-SE menus for single-core or Stata-MP4 menus for 4-core Stata. Under each, select the appropriate memory footprint for your work (see Choosing Resources). An example screenshot can be see here. The wrapper scripts that drive these menu items include all the necessary commands to start Stata with the designated number of CPU cores within your session.

Using PAC (batch only):

Note that PAC is not yet available for Grid 2.5. Please contact RCS if you have any questions.

Using the command-line (interactive or batch):

Both interactive and batch jobs can started from the command line, and both default, wrapper scripts and custom submission scripts can be employed. (Again, wrapper scripts are great for out-of-the-box configs; custom scripts are needed for atypical RAM or CPU requirements.) For example:

# interactive (GUI) Stata-MP4 with 5 GB footprint via default wrapper
xstata-mp4-5g

# interactive (GUI) Stata-MP4, 35 GB, for 12 hours via custom submission script
bsub -q long_int -n 4 -Ip -W 12:00 -R "rusage[mem=35000]" -M 35000 -hl xstata-mp4

# batch Stata-MP4 with 5 GB footprint via default wrapper
stata-mp4-5g -b do myfile.do

# batch Stata-MP4, 35 GB, for 12 hours via custom submission script
bsub -q long -n 4 -W 12:00 -R "rusage[mem=35000]" -M 35000 -hl stata-mp4 -b do myfile.do

Explicit Parallelization

Explicit parallelization in Stata can be achieved using the parallel module.

For other environments, or if you have any questions, please contact RCS.

GPU Computing

About GPU Computing

A GPU (graphics processing unit) is a processor that is great at handling specialized computations. We can contrast this to the Central Processing Unit (CPU), which is great at handling general computations. CPUs power most of the computations performed on the devices we use daily.

GPU can be faster at completing tasks than CPU. However, it is not true for every case. The performance hugely depends on the type of computation being performed. GPUs are great at tasks that can be run in parallel ....and are often used for Machine Learning types of'embarrassingly parallel' tasks (== a huge task can be broken down into many smaller ones that are completely independent of one another).

-- Taken and adapted from Why Deep Learning Uses GPUs

The HBSGrid cluster has five NVIDIA Tesla V100 graphics processing units (GPUs) attached to one compute node. Computational workflows that make use of GPUs can see significant speedups in execution time, though one's code must be written using frameworks that will leverage these special resources (e.g. Tensorflow, PyTorch, etc). The GPU node is available for both interactive and batch sessions.

At this point in time, one must submit jobs via custom LSF commands. That is, wrapper scripts cannot be used to start interactive or batch jobs on the GPU node. Due to the unusual nature of the compute node's architecture, there are some decisions one must make that can affect how soon a job is dispatched and how efficient the processing can be, both of which are explained below.

Submitting Jobs

To request any GPU resources as a part of your job submission, you must include the -gpu flag and options and we recommend that you submit to the gpu queue (-q gpu). Your job can be either for interactive or batch sessions as your work requires.

The easiest route is to use the default GPU configuration, -gpu -. For example:

module load AI && bsub -q gpu -gpu - -Is -M 5000 -hl spypder

This command will load the AI/Machine Learning Python environment and submit a job to the gpu queue with default GPU options that launches the Spyder IDE. The default GPU options are "num=1:aff=no:mode=shared:mps=no:j_exclusive=no" (with quotes). If you wish to do something other than the default, simply indicate the option name and its preferred value, or supply the whole option string. For example,

# one parameter
module load AI
bsub -q gpu -gpu "aff=yes" -Is -M 5000 -hl spyder

OR

# full parameter list
module load AI
bsub -q gpu -gpu "num=1:aff=yes:mode=shared:mps=no:j_exclusive=no" -Is -M 5000 -hl spyder

Again, as with all job submissions, one can specify the parameters on the command line or include them in a job submission script.

Note: In December 2020, we will begin phasing out separate AI/ML environments and instead include the most common frameworks (Tensorflow, Keras, OpenCV, etc) as a part of our standard python environments. Please see our Machine Learning page for more information.

Common GPU Options and Their Definitions

The full range of options for use of the GPU resources are documented at LSF's Submitting Jobs that Require GPU Resources page. These five options should handle most use cases (defaults are in boldface type; text below is copied or paraphrased from the LSF page):

num= (default =1): The number of GPUs to request. Note that after your job is dispatched, no matter which GPU one is allocated, the GPUs will be indexed starting from 0. And for security purposes, we are enforcing GPU sandboxing via Linux CGROUPS so one cannot use an incorrect index.

aff=no | yes: CPU-GPU affinity. This indicates whether or not the job should enforce strict GPU-CPU affinity binding. That is, the GPU allocated is on the same socket (group of CPU cores) as the GPU. This GPU-CPU affinity translates to higher communication rate, and thus better performance. If set to no, LSF does not bind the job core on the CPU socket to the GPU, but does ensure that the job is pinned to one or more cores (it does not bounce around == less performance) and that CGROUPs are active (job is sandboxed). NOTE: due to the unusual nature of the compute node, if you request aff=yes and the node has filled the lower 48 cores, your job will not dispatch until some of the lower cores are released. This is due to the fact that the upper 16 cores do not share the same CPU socket with any GPU. If you wish use aff=yes and are submitting an interactive job (concerned about immediate job dispatching), we advise that you use bhosts | grep -i nod12 to see how busy the GPU node is.

mode=shared | exclusive_process: The GPU mode when the job is running, either shared or exclusive_process. The shared mode corresponds to the NVIDIA DEFAULT compute mode -- multiple processes can use the GPU simultaneously. Individual threads of each process may submit work to the GPU simultaneously. The exclusive_process mode corresponds to the NVIDIA EXCLUSIVE_PROCESS compute mode -- the GPU is assigned to only one process at a time, and individual process threads may submit work to the GPU concurrently.

mps=no | yes: Enables or disables the NVIDIA Multi-Process Service (MPS) for the GPUs that are allocated to the job. We are not using this service at this time. If you have a need for this or feel that it should be in play, please contact RCS to consult with us on this.

j_exclusive=no| yes: Specifies whether the allocated GPUs can be used by other jobs. When the mode is set to exclusive_process, the j_exclusive=yes option is set automatically.

Further Resources

For more information, please see:

  • NVidia Telsa V100 GPUs
  • LSF documentation on GPU resources and jobs
  • LSF example job submission commands
ǁ
Campus Map
Research Computing Services (RCS) 
Harvard Business School
Baker Library, B90, 25 Harvard Way
Boston, MA 02163
Phone: 617.495.6100
Email: research@hbs.edu
→Map & Directions
→More Contact Information
→Terms Of Service
  • Make a Gift
  • Site Map
  • Jobs
  • Harvard University
  • Trademarks
  • Policies
  • Accessibility
  • Digital Accessibility
Copyright © President & Fellows of Harvard College