LSF, the Load Sharing Facility, is a subsystem for submitting, scheduling, executing, monitoring, and controlling a workload of batch jobs across compute servers in a cluster. It is a system to manage (large) programs that cannot be run interactively on a machine as they require too much CPU-time, memory or other system resources. For that reason, those large programs have to be run in batch (batch jobs).LSF takes care of that batch management; based on the job specifications LSF will start execution of jobs when there are enough system resources available for the job to complete. Until that time, a job request will be queued.An executable file is submitted to run on the computational nodes using the LSF command bsub.The bsub command submits a job for batch execution and assigns it a unique numerical job ID. It runs the job on a compute node or nodes that satisfie all requirements of the job, when all conditions on the job, host, queue, and cluster are satisfied. If LSF cannot run all jobs immediately, LSF scheduling policies determine the order of dispatch. Jobs are started and suspended according to the current system load.
These are the most important LSF user commands :
bsub-Submit a batch job to the LSF system
bkill-Kill a running job
bjobs-See the status of jobs in the LSF queue
bpeek-Access the output and error files of a job
bhist-History of one or more LSF jobs
bqueues-Information about LSF batch queues