9. Practical considerations¶
The best way to configure and run gemBS depends on the computing systems that are available. We will consider three situations
Computer cluster with a shared file system
Distributed system without a shared file system.
The different characteristics of these systems change the optimal method to arrange the analyses. There are common consideration that apply to all three systems. The individual computational units (the workstation, a cluster node or a compute instance) must have sufficient memory and access to enough disk space in order to perform the calculations. Memory requirements for gemBS are quite high; this is a design choice as the speed characteristics derive to a large extent from its large memory footprint. To run gemBS comfortably on a human sized genome it is recommended to have a minimum of 48Gb RAM and at least 0.5 - 1 Tb of disk space available. These numbers could be reduced somewhat, but at the risk of some analyses stopping due to lack of memory or disk space.
9.1. Running gemBS on a single workstation¶
This is the simplest case: in general the user can do the entire analysis with gemBS without having to perform additional scripting or needing external workflow management tools. After the configuration step has been completed, it should be sufficient to simply go through the following commands:
gemBS map gemBS call gemBS extract gemBS report
or, more simply
The main decision for the user is the number of parallel jobs to run at each stage (apart from the mapping stage). This will depend on the amount of memory and computing cores available. As a quick rule of thumb, allocating 2-3 cores per jobs for the calling and extraction phases, and 6-8Gb RAM should be sufficient (although this will depend on characteristics of the experiment such as the coverage. Note that there is no point in running multiple jobs for the mapping process; GEM3 can efficiently use all of the cores available, and running a single process allows GEM3 to share the index across threads so additional memory requirements per thread are minimized.