CBI User FAQ

Welcome to the CBI User FAQ

Here our staff has collected information and how-to guides on common tasks and software available. Just about everything a user needs to know to use our systems.

We strive to provide our users with the best and latest information, but if you find anything out of date or in need of clarification, or just any questions or comments, please email us at cbiadmin@utsa.edu.

Basics of Linux

It should be stated that much of the scientific software available does not have a nice graphical interface or even run on Windows. Much of the software runs on the command line and even accessing our servers is typically via a remote terminal or shell. Thus it is important that all users become familiar with Linux and the command line. We have available several Linux workstations and can even assist in installing Linux on your own systems. Also to help new users become more comfortable with Linux, Linux.org has a great course for beginners. I recommend users read through their Beginners Level Course, with a focus on the following sections:

  • What is Linux?
  • Fundamental Linux Knowledge
  • Great Linux Commands
  • Editing Files

They also have an Intermediate and Advanced levels beyond that and of course Google is a great resource for finding more information. And don't forget the CBI Staff is available for answering your questions. And here is a great printable cheat sheet with common commands, Thanks to FossWire.

CBI Presentations and Workshops

Presentations that provide an overview of our services. Please visit the FAQ for more details information.

As well as past CBI workshops providing tutorials on a variety of topics.

Using the CBI Cluster

Advanced Cluster Usage

CBI Support Policy

The primary management objective of the Computational Biology Core Facility is to maximally advance research and education related to core expertise for our affiliated members. The organization of the facility is predicated on the assumption that computational techniques will be increasingly important for state-of-the-art biological research. We also recognize that students, faculty, and staff within the institutions that the core serves have a wide range of existing expertise, needs and resources. For this reason, the core use policy is designed to be as organizationally flexible and adaptive as possible.

It is important to minimize barriers to use of core facilities, especially for new projects and laboratory groups just learning to incorporate computational techniques into their research. For this reason, the Directors of the facility will continue to seek funding to provide, as far as possible, open and free access. At the same time, we recognize both the need and the value inherent in receiving direct support from research groups making heavy or central use of the core facility.

The transition from free use to pay-as-you-go use is probably best understood in the context of the three tiered level of support around which the facility is organized and managed:

Tier 1: Tier one support refers to use basic use of software and hardware with minimal expenditure of staff resources. Software and hardware are made freely available to anyone with a need. The facility maintains a large array of software of general interest to the community. Some of this software is free, while some requires yearly site licenses. With fee for use software, we evaluate the likely general usefulness of the software and invest accordingly. We are particularly mindful of opportunities to maximize the value of the investment across institutions. Core purchases have already saved individual laboratories multiple thousands of dollars. Occasionally users request additional software to support specific lab groups. Freely available software is obtained and installed at no charge. If there is a charge for the software, we pay in full, partially pay, or request payment from the requesting lab based on our assessment of the likely general use of the software as well as the ability of the requesting laboratory to pay.

Tier 2: Tier 2 involves training beyond basic installation and creation of user accounts. Core facility staff organizes educational workshops focused on specific software packages and/or computational techniques. Workshop topics are established based on the perceived needs of the community by the staff and occasionally on requests from users. In many cases experts are brought in from outside San Antonio to lead these workshops. To date, all workshops have been free and the expectation is that most or all will continue to be free.

Tier 3: Tier 3 involves direct investment of staff or affiliated faculty expertise to support research. At this level of support, a staff member participates directly in computational efforts in individual laboratories as full collaborators. The initial establishment of Tier 3 efforts is not explicitly dependent on the prior existence of funds to pay for the service. However, as these collaborations frequently result in the generation of new grant proposals, we then work with PIs to include support for the facility in those grants. An important measure of the success of the facility is how well it incubates new research directions and contributes to successful new grant applications. PIs whose projects are heavily dependent on the CBI services are expected to include support for facility staff, software and hardware in their grants. Well funded PIs who make heavy use of core services are also expected to support the facility. The ability to provide intensive Tier 3 support at little or no cost depends on core funding.

Although at present the facility is not used by businesses or outside organizations, we anticipate the possibility of such use at any of the three levels identified. Decisions about fees for services for outside groups will be assessed on a case by case basis. At no time, however, will the ability of an outside organization to pay for services exclude use by affiliated organizations (at present UTSA and the UTHSCSA). The principle focus and the top priority for use will always be given to members of affiliated organizations.

In summary, the Computational Biology Core Facility is not designed or intended to be a profit center. The facility is constructed as a collaborative, intellectual / scientific enterprise that focuses on supporting research and incubating new research directions. While we expect and encourage our users, especially those with the means to do so to contribute to core operation, we also anticipate that core services will always be subsidized by sources of support obtained by the directors themselves, as well as by the institutions served.

Installing R Libraries

Due to the nature of how R libraries are distributed, from an administration stand point, this makes it very difficult to install centrally. To get around this, this howto will show how to install R libraries within one's home directory.

First create a directory within your home directory to store the libraries, such as /home/username/Rlibs. Then create a file named .Renviron within your home directory(ie /home/username/.Renviron) and add the following to that file:

R_LIBS="/home/username/Rlibs"

Now any libraries installed will be placed there.

R has two ways to install packages, either downloaded directly via R from their repository or the user downloads the source file and R builds it. For most packages, the online method should work. For example, start R:

> install.packages('packageName')

Now some packages may not work with this method and report that its unavailable. In this case, go to CRAN and download the source directly and have R build it. For example:

$ R CMD INSTALL package.tar.gz

If a package has dependencies, you'll want to install those packages first. If you run into any issues with dependencies outside of R that are required for building a particular package, contact the staff for assistance.

Installing virtualenv

virtualenv is a python tool that allows you to build a local python environment where you can test code and even install python packages without installing them into the system python packages. To setup this tool in the context of the cluster takes a few steps. Note, always use qlogin to first get a terminal on a node in the cluster.

  1. Setup directories

    Create a 'bin' directory in your home directory via 'mkdir ~/bin'

    Edit ~/.bash_profile and add $HOME/bin to that PATH variable, for example:

    PATH=$HOME/bin:$PATH

    Logout and back in for the change to take effect
  2. Download virtualenv from pypi

    Extract the downloaded tarball into bin

    Create a symlink to the virtualenv script like so:

    ln -s virtualenv-x.x.x/virtualenv.py virtualenv
  3. (Optional) If using Biopython, first load it via 'module load biopython'
  4. Create an environment via 'virtualenv --system-site-packages /path/to/ENV'

    Or just use 'ENV' and it will create an environment right within the current working directory
  5. Load the new environment via 'source /path/to/ENV/bin/activate'

Now you should see your environment name on the prompt in the terminal and when you run python or install python packages, it will be within the context of your new python environment.

Note: This creates an environment that also makes available any packages already installed on the system. If you want to create a blank environment, omit '--system-site-packages' 

Note: If using Biopython, also make sure you run 'module load biopython' first. If you want this done automaticly, you can add the command to ~/.bash_profile

MATLAB Compiling Guide

AttachmentSize
matlabcompilingandgridusageatcbi.pdf442.16 KB

Matlab provides the ability to compile Matlab functions into native executables. This is ideal for Matlab codes that are suitable for grid type deployments, where there are many independent units of work that can be submitted to a cluster such as the CBI Linux cluster via the Sun Grid Engine scheduler. The key benefit is that the compiled executable does not require Matlab licenses to run, enabling large scale distributed jobs to run without interfering with other users' ability to use Matlab. In addition, this makes it possible to distribute Matlab algorithms to others that may not have a Matlab license.

The compiled executable dynamically links to the runtime library provided by Mathworks called the MCR standing for the Matlab Compiler Runtime. This library contains the functionality of Matlab accessible from the compiled executable. Many Matlab codes can be converted with a minimal set of changes to source code. An introduction to the process of creating a standalone executable version of a Matlab code is provided in this tutorial.

Mathematica Fonts

AttachmentSize
mathfonts.zip5.85 MB

Mathematica Fonts are available for download, though only available for UTSA users of the Mathematica software.

Remote Access

For those who need to access our services remotely, and this will be the case for using the cluster or compute server like Bishop, one needs an SSH client for a remote terminal. Two popular ssh clients for windows are Putty and Tunnelier. Both also come with sftp functionality to upload/download files from the server.

To use the programs, you'll need the following, hostname/server name and username and password. The hostname would be something like cheetah.cbi.utsa.edu for the cluster and the username/password would of course be your CBI username and password. Once connected you'll be presented with a command line interface, terminal, or shell depending on how you want to call it. For those unfamiliar with Linux, we have a beginner's tutorial available.

Using Screen

GNU Screen is a terminal multiplexer. You can use it to run multiple instances of a interactive shell in the same "window". It is the same concept as running multiple tabs on a web browser. Instead of clicking on a "+" symbol to create another tab, you use keyboard shortcuts. You can also detach and reattach screens as you like. Another feature of screen is that it will keep programs running if you accidentally close the terminal. CBI has screen function installed on the bishop server so if you would like to use this feature please ssh into bishop to use it. Using Screen: First you start screen simply by typing "screen" into the command. Then you'll notice that on top of your terminal you'll see [screen 0: shell] username@bishop on the title of the terminal. This means you have successfully started the screen. Now you are able to create new screen windows and use other features of screen. You create new windows by pressing ctrl+a and then type in "c" and hit enter. You'll notice that the terminal flashes and [screen 1:] will now be the title of your terminal. This means you have created a new screen. You can shift through your screens with several different shortcut commands.

  1. ctrl+a n or ctrl+a p, This basically switch to the next or previous screen
  2. ctrl+a 0-9, the numbers are displayed on the title of the screen

Detaching and Reattaching: You can use screen to run programs and close them so it runs in the background. In order to do this you would simply run the program in one of your screen window and then detach the screen. You detach the screen by pressing ctrl+a and then type "d" and press enter. Then you'll notice that you're back into the original terminal where you started screen in. Then in order to bring back the same screen you detached you just type "screen -r" in the terminal. Keyboard Shortcuts: The following are some of the common shortcuts you can use while using screen. All shortcuts in screen is preceded by ctrl+a and they are case-sensitive.

  • c - Creates new window.
  • 0 thru 9 - Switches between windows
  • n - Switch to next window
  • p - Switch to previous window
  • A - Change the name of the window
  • d - Detach the current screen
  • K - Kills the current window
  • list - Lists all the screen

For more and detailed information about shortcuts run the command "man screen" in your terminal. Closing Screen: In order to fully close screen you would have to close every screen that has been opened. Otherwise it will keep running in the background. You can check to see all the screens open by entering command "screen -list".

X Forwarding on Mac

While Mac is *nix based, you'll need to install X Windows software on a Mac before you can do X forwarding. First download and install XQuartz, the Mac version of X Windows.

XQuartz

Once its installed, find and run XQuartz via Spotlight

When you launch XQuartz, by default, it will start XTerm, a X Windows enabled terminal.

Now that you have XTerm open, use ssh with the -Y option to use X Forwarding when connecting to the cluster.

When you then use qlogin, it will automaticly forward, so when you launch matlab, you'll be greeted by its interface.

X Fowarding on Windows

With Xming and PuTTy, you will be able to access gui based programs running on the cluster from your Windows machine.

Software Tools:

  1. Xming
  2. Putty

Installation Steps:

Download and install Xming from the link above and follow the installation instructions. Once installed, you'll find Xming in the Start Menu.

Running Xlaunch will let you configure certain options, such as whether to run multiple windows or full screen,

and set the display number. Best to stick with the defaults, but remember the display number, this is used by putty later on.

Finally then select "Start no client" and continue. Xming will now be running in the background and you'll find its icon in the notification area on the task bar.

Download and install putty from the link above and follow the installation instructions. Once installed, start putty.

Note that Xming must be running before connecting to the server. From the putty configuration, enter in the hostname, and port is the default 22. You can also save this session configuration for easy future use.

Next go to the following category: Connection -> SSH -> X11. Enable X11 forwarding and enter in X display location localhost:X, where 'X' is the display number set when starting Xming. The rest leave as is.

Then go back to Session and click Open. You'll be asked for your username and password. After you enter it, you are now connected and have a running shell on the server. Now you can start any graphical program and have its interface run remotely. Note: 3D functionally will not work.

NOTICE: X Forwarding is an older protocol that was built when networks were physically small and located in the same building. We do not guarantee that X Forwarding will work for any user off campus. A general rule, the farther away you are, the worse it will perform.

Accessing Windows Server via Remote Desktop

Remote Desktop is a feature of Windows and can be found in the Start Menu in Programs > Accessories > Remote Desktop Connection. Launch the program and click on the Options button to expand the available options. Enter in the full domain name of the server you need to access in the Computer field followed by your username. Your username must be in the form of: CBI.UTSA.EDU\username

Now you have the ability to have the program save this information by selecting, "Allow me to save credentials." As well as saving the connections settings as a RDP file that works like a shortcut to the program with the relevant bits already saved. The rest of the options are fine as is.

When working within a remote desktop session, you will be presented with a full windows desktop on the server. Please note that browsing the web is forbidden while on the server. Also do not save files on the desktop or the My Documents folder within the remote desktop session. The D: drive is available for those who need it as a temporary work location. Be sure to copy your work back to your home directory before logging out.

A nice feature of remote desktop is that it will keep your session active if you close out the remote desktop program without logging out or if you lose your network connection. This means if you log back in, your session(ie running programs) will still be there. Though please note it is good to completely close all programs and log out completely every now and then instead of leaving an always running session.

NOTICE: Remote Desktop ability is only available on campus only. Also note that remote desktop ability is only for authorized users that have made arrangements for this special case.

Account Benefits

Why join the CBI?

By joining the CBI, you get access to our lab facilities, located at BSE 3.114, where we have two labs. The high priority research lab with Windows workstations for reservation, and our open classroom lab with Linux workstations. All workstations feature multi-core cpus and high memory capacity as well as large flat panel displays.

The lab is available 24/7 via card access. Staff assistance is available from 9am - 5pm, Monday through Friday.

Besides our physical workspace, and our high end workstations, our compute facilities are available as well, in particular our high performance cluster. Check our list of available hardware for more information.

Coupled with great hardware is our software selection. Any open source, freely available software can be requested to be installed. Commercial software can be installed if a license can be acquired or if you or your lab already have a license, it can be requested to be installed on our systems. Check out list of available software for more information.

Besides our hardware and software, we provide several services to users and their labs. Check our list of available services for more information.

Core Facility Services

The Core Facility is operational and has begun to offer support for research investigators at both campuses. Services include help with experimental design, selection of appropriate computational methodologies, and most importantly, assistance with the analysis of data. We are providing support for experimental designs that involve DNA expression microarrays. Appropriate planning at the design stage results in the most efficient use of resources such as picking the minimum number of array chips that need to be use to address a particular set of hypotheses. Several laboratory scientists have consulted with the Core Facility staff in the design of experiments focused on stem cell biology, differential DNA expression across species, 2D gel analysis of protein expression, and neuroscience.

We are providing assistance in developing statistical analytic plans for experiments. Many bioinformatics investigations were formerly implemented in a less than efficient manner, primarily because of the lack of statistical expertise in this area. Bioinformatics experiments were often planned with no consideration of quality control variation, fundamental experimental design, and with no statistical analysis plan. The Program is now providing statistical assistance to researchers in the design of genomic and proteomic experiments. Our input includes helping investigators choose the most appropriate experimental design to satisfy the needs of their project, ensuring that the study aims and endpoints are clearly defined, that sample size calculations are used to guide the utilization of precious laboratory resources, and appropriate analytic plans are included. Laboratories that have begun collaborations with the Core Facility include

Accessing your Home directory

On Campus:

Here are instructions on how to access your home directory while on the UTSA campus.

On Windows:

1. Open My Computer, then click on "Tools" on the toolbar at the top of the window. Then click "Map Network Drive."

Map network drive

2. At the prompt enter in the Folder as

\\cajal.cbi.utsa.edu\homes Note: Please notice the use of the backslash

If you are accessing a group share, then you will enter in the groupname in place of homes.

 

3. A prompt will appear asking for your username and password. (Note: There is a rare case where one might have to enter the username of the form CBI.UTSA.EDU\username)

Note: Even if you are mounting a groupshare, you will enter in your own username and password.

 

4. After hitting ok there should now be a drive icon in your My Computer window indicating the Drive you've mounted.

 

On Mac OSX:

 

1. Click on finder and then select "Go" from the menu bar at the top of the screen, from there select "Connect to Server."

 

2. When you select this option a prompt will come up. At the prompt type in the file server for your home directory:

smb://cbi.utsa.edu;username@cajal.cbi.utsa.edu/sharename

where username is your own unique username an the sharename is the directory you want to access on cajal. If you are accessing your personal share then it is the same as your username. If you are accessing a groupshare then sharename will be the name of the group and username stays the same.

3. When prompted for your password enter it into the field and hit enter. If you are mounting a groupshare, then you will still enter in your own password.

4. An icon for your mounted drive should be present on the finder browser, you can also access the share on your Terminal under /Volumes/username where username is your unique unsername.

Off Campus:

If one has access to the UTSA VPN, then simply use the VPN and follow the instructions for Accessing your Home directory on-campus.

Off-campus access is only allowable via SSH(Secured Shell) connection. Any SSH client capable of SFTP(Secured File Transfer) will do, but for this example Tunnelier will be described. It is free for personal use, via Bitvise.

Launching Tunnelier, one can see the Host, Username, and Password boxes in the control panel. Using host cajal.cbi.utsa.edu, enter your CBI username and password in their coresponding fields. Click Login. (Note: SSH off-campus can be slow depending on your network speed and amount of traffic)

By default Tunnelier will launch both a terminal window and the SFTP window. This setting can be changed in the Options tab, On Login section. If the SFTP window is not visible, select Open New SFTP Window from the left.

Local files shows the files located locally on one's machine, the default is the desktop location. Remote files shows the files located on the sever, the default is your home directory(ie /home/username). Simply select the file you wish to upload, and click Upload from the bottom toolbar. You can select more then one file at a time and these are uploaded one at a time via the Queue. This can be viewed via the top tool bar.

Sun Grid Engine Tutorial

AttachmentSize
matlab-test.tar10 KB
openmpi-test.tar10 KB

The Sun Grid Engine is a queue and scheduler that accepts jobs and runs them on the cluster for the user. There are three types of jobs available, interactive, batch, parallel.

It is assumed you are logged into the cluster and know how to create and edit files, etc. It should also be noted that one should never assume a program to be on the path, and as such one should always call programs by their full path name(ie /usr/local/bin/xxxx). This also helps when a script doesn't work and needs to be debugged.

Running Batch(Serial) Jobs with SGE:

A batch(or serial) job is one that is run on a single node. This is in contrast to the case where a single job is run on many nodes in an interconnected fashion, generally using MPI to communicate in between individual processes. If you are running the same program on the cluster as you would on your desktop, chances are you will want to use a serial job.

Some things to keep in mind when creating jobs is your directory structure. Its a good idea to organize files needed for a job into a single folder. If there are read-only files needed by mutiple jobs, using symlinks is a good idea so there are no duplicate files taking up extra space. An example of a good directory structure could be:

Project1/
Project1/jobA
Project1/jobB

In this example, we will run a matlab script:

  1. Create a directory to hold your job file and any associated data(matlab scripts, etc).
  2. Open a new file, in this case we will call it matlab-test.job
  3. #!/bin/bash
    # The name of the job, can be anything, simply used when displaying the list of running jobs
    #$ -N matlab-test
    # Giving the name of the output log file
    #$ -o matlabTest.log
    # Combining output/error messages into one file
    #$ -j y
    # One needs to tell the queue system to use the current directory as the working directory
    # Or else the script may fail as it will execute in your top level home directory /home/username
    #$ -cwd
    # Now comes the commands to be executed
    /share/apps/matlab/bin/matlab -nodisplay -nodesktop -nojvm -r matlab-test
    # Note after -r is not the name of the m-file but the name of the routine
    exit 0
  4. Save this job script and submit to the queue with qsub matlab-test.job
  5. Now you can check the status of your script with “qstat” which will return a list of your running/queued jobs

When the job is completed you can check the output of the job in the filename given above, matlabTest.log NOTE:You may see the following in the output

“Warning: no access to tty (Bad File descriptor).
Thus no job control in this shell.”

This is normal and can be ignored. And in the case of matlab, you may see a message about shopt, again for matlab this is normal and can be ignored. Attached is the sample job and matlab script.

Running Interactive Jobs with SGE:

An interactive job is when you are running a program interactively on a node. This is good in the case of building/testing scripts, etc. This is not the place to run long running, very computationally intensive, or other jobs better suited to run in a batch job. An example would be the development of a matlab script. You can launch an interactive job, develop the script and write the job file. But when it comes to running the job itself, it needs to be submitted as a batch job. To run an interactive job, simply type qlogin

Running Parallel Jobs with SGE: A parallel job is where a single job is run on many nodes in an interconnected fashion, generally using MPI to communicate in between individual processes. If you are running the same program on the cluster as you would on your desktop, chances are you will want to use a serial job, not a parallel job. Parallel jobs generally are only for specially designed programs which will only work on machines with cluster management software installed.

Also not just any program can run in parallel, it must be programmed as such and compiled against a particular mpi library. In this case we build a simply program that passes a message between processes and compile it against the OpenMPI, the main mpi library of the cluster.

Also note that the scheduler will only accept parallel jobs between 4 to 8 slots. It is currently setup to start parallel processes on a single node to limit the overhead of inter-process communication over the network, which adds considerable run time to the job. For most jobs, more slots is not always best

  1. Like the batch job, create a directory to hold this job and related files
  2. Open a new file and create the job script:
    #!/bin/bash#$ -N openmpi-test
    # Here we tell the queue that we want the orte parallel enivironment and request 4 slots
    # This option take the following form: -pe nameOfEnv min-Max
    # Where you request a min and max number of slots
    #$ -pe orte 4-8
    # For parallel jobs, its a good idea to use even numbers.
    #$ -cwd
    #$ -j y
    mpirun -n $NSLOTS mpi-ring
    exit 0
    
  3. And like above you can use qsub to check on your job

NOTES:There are a few queue commands to know

  • List all jobs running “qsub -u \*”
  • List all jobs running per node “qsub -u \* -f”
  • To delete a job “qdel jobID”
  • To list any queue messages “qstat -j”
  • Should a job be marked for deletion but stay in the queue for a while, contact CBI.
  • There is a known bug in the scheduler that sometimes causes it to not responds resulting in the following message:
    commlib error: got select error (Connection refused) unable to send message to qmaster using port 536 on host "cheetah.cbi.utsa.edu": got send error

    This can be safely ignored. Simply wait a minitue and retry your command again.

SGE Environment Options And Environment Variables:

When a Sun Grid Engine job is run, a number of variables are preset into the job’s script environment, as listed below.

  • ARC - The Sun Grid Engine architecture name of the node on which the job is running; the name is compiled-in into the sge_execd binary
  • COMMD_PORT
  • SGE_ROOT - The Sun Grid Engine root directory as set for sge_execd before start-up or the default /usr/SGE
  • SGE_CELL - The Sun Grid Engine cell in which the job executes
  • SGE_JOB_SPOOL_DIR - The directory used by sge_shepherd(8) to store jobrelated data during job execution
  • SGE_O_HOME - The home directory path of the job owner on the host from which the job was submitted
  • SGE_O_HOST - The host from which the job was submitted
  • SGE_O_LOGNAME - The login name of the job owner on the host from which the job was submitted
  • SGE_O_MAIL - The content of the MAIL environment variable in the context of the job submission command
  • SGE_O_PATH - The content of the PATH environment variable in the context of the job submission command
  • SGE_O_SHELL - The content of the SHELL environment variable in the context of the job submission command
  • SGE_O_TZ - The content of the TZ environment variable in the context of the job submission command
  • SGE_O_WORKDIR - The working directory of the job submission command
  • SGE_CKPT_ENV - Specifies the checkpointing environment (as selected with the qsub -ckpt option) under which a checkpointing job executes
  • SGE_CKPT_DIR - Only set for checkpointing jobs; contains path ckpt_dir (see the checkpoint manual page) of the checkpoint interface
  • SGE_STDERR_PATH - The path name of the file to which the standard error stream of the job is diverted; commonly used for enhancing the output with error messages from prolog, epilog, parallel environment start/stop or checkpointing scripts
  • SGE_STDOUT_PATH - The path name of the file to which the standard output stream of the job is diverted; commonly used for enhancing the output with messages from prolog, epilog, parallel environment start/stop or checkpointing scripts
  • SGE_TASK_ID - The task identifier in the array job represented by this task
  • ENVIRONMENT - Always set to BATCH; this variable indicates that the script is run in batch mode
  • HOME - The user’s home directory path from the passwd file
  • HOSTNAME - The host name of the node on which the job is running
  • JOB_ID - A unique identifier assigned by the sge_qmaster when the job was submitted; the job ID is a decimal integer in the range to 99999
  • JOB_NAME - The job name, built from the qsub script filename, a period, and the digits of the job ID; this default may be overwritten by qsub -N
  • LOGNAME - The user’s login name from the passwd file
  • NHOSTS - The number of hosts in use by a parallel job
  • NQUEUES - The number of queues allocated for the job (always 1 for serial jobs)
  • NSLOTS - The number of queue slots in use by a parallel job

Advanced Jobs

Using other MPI Environments: Besides the default mpi environment for openmpi, mpich2 is installed on the system at /opt/mpich2/gnu. To setup your environment to use mpich2 instead of openmpi, you'll have to alter your shell environment. To do so, use your text editor to edit /home/username/.bash_profile and add the following:

export PATH=/opt/mpich2/gnu/bin:$PATH
export LD_LIBRARY_PATH=/opt/mpich2/gnu/lib:$LD_LIBRARY_PATH
export LD_RUN_PATH=/opt/mpich2/gnu/lib:$LD_RUN_PATH

This adds mpich2 to the path and to the library path. When compiling programs, be sure to tell the configure script where mpicc/mpif90/etc are located by using the full path. Launching an mpich2 job: The job script is similar, but includes a few extra directives needed for mpich2

#!/bin/bash
#$ -N jobName
#$ -cmd
#$ -S /bin/bash
#$ -pe mpich2 min-Max
export MPICH2_ROOT=/opt/mpich2/gnu
export PATH=$MPICH2_ROOT/bin:$PATH
export MPD_CON_EXT="sge_$JOB_ID.$SGE_TASK_ID"
/opt/mpich2/gnu/bin/mpiexec -machinefile $TMPDIR/machines -n $NSLOTS /path/to/program
exit 0 

 

 

Job Submission Strategies

The most simplest use case of the cluster is submitting a single job and using the scheduler/queue to launch the job without any user intervention. Yet this is not much different that simply using your workstation to run a program, thus the following job submission strategies can be used. Not all of them are supported by each program available and in the case of multi-threaded or parallel jobs, the software must be developed specifically for that situation.

Job Dependencies: This feature allows you to specify that a job must wait for another job to complete first before running. This is useful when you need to run multiple jobs in sucession and each job depends on the output of a previous job. Use the following example when submitting a job:

# qsub -hold_jid <<jobID/jobName>> jobScript

You can use either a job id number or a job name when specifing the job dependency. In the case of using a job name, you can have several jobs with the same name and then the job submitted with hold_jid jobName will wait for all jobs with that name to complete first.

Job Array: The job array is a feature of the grid engine that allows one to programmatically submit many identical jobs from a single job script. The primary example of this would be a user who has a simulation they want to run, but run many of them with variations on the starting parameters. Without the job array, one would have to create an individual job script for each run.

This is a most basic example. More information can be found here.

Take this simple job script:

#!/bin/bash
/path/to/program -i /path/to/input -o /path/to/output

If one had 1000 input files, then they must be numbered. For example: input.0001 to input.1000 Then the job script with the job array feature would look like this:

#!/bin/bash
#$ -t 1-1000
/path/to/program -i /path/to/input.$SGE_TASK_ID -o /path/to/output.$SGE_TASK_ID

Here the grid engine creates an environment variable $SGE_TASK_ID for each task and the script will simply replace that variable with its value before executing. This would then schedule 1000 tasks under one particular job id. Using qdel on that job id would kill all tasks, though using qdel jobid.taskid would only kill that particular task.

Multi-Threaded Programs: Some programs come with the feature to specify the number of threads to use. To use the cluster fairly and not overload a node, here is an example job script:

!/bin/bash
# This selects the mt parallel environment and
# requests two slots(ie 1 slot = 1 core)
#$ -pe mt 8
/path/to/program -p $NSLOTS -i /path/to/input

NOTE: No more then 8 slots may be requested, but DO NOT submit jobs with a smaller slot count then what the program will use.

Job Limitations: Currently there are two limitations placed on the grid engine. No user can have more then 50 running jobs and the max number of slots per job is 8. Though there is a way around the last limitation. The number of slots per job is limited to protect parallel jobs from incurring the large overhead that network communication has when using MPI. Though to be able to accommodate all users, we provide a way around this limitation with the warning that you are on your own.

Example job script:

#!/bin/bash 
#$ -pe orte2 20 
mpirun -n $NSLOTS /path/to/program -i /path/to/input 

Running ABINIT

Various versions of ABINIT can be found in /share/apps/abinit/ though please note that it has been recompiled with a compiler different from the system version and thus requires certain libraries loaded first. Use the following as an example:

#!/bin/bash
#$ -N tpaw1_!
#$ -o $JOB_NAME-$JOB_ID.log
#$ -cwd
#$ -j y
#$ -pe orte 4
#$ -V
#$ -S /bin/bash
source /share/apps/gcc/4.6.1/gcc-4.6.1-source.sh
mpirun -n $NSLOTS /share/apps/abinit/6.8.1/bin/abinit < tpaw1_1.files

Running Columbus

Under Construction

Running GAMESS

GAMESS contains parallel code that utilizes its own built in messaging system(ie doesn't use mpi libraries). A separate parallel environment has been setup to restrict the requested number of slots to be on a single node. This maintains a high speed of inter-process communication that otherwise would slow down if communication had to pass over the network.

The typical way to run a cluster job is to create a folder where everything pertaining to that job will be located. This helps to organize job data. Within this folder place your input file. It must be of the format: inputFile.inp otherwise the program will not run. Then create your job script, use the following as an example:

#!/bin/bash
# Job Name
#$ -N gamessTest
# Run in the current working directory
#$ -cwd
# Requests 2 slots
#$ -pe gamess 2
/share/apps/gamess/rungms inputFile.inp
exit 0

Running MEME

While MEME is available via a web interface, it does not take advantage of the parallel processing capabilities when integrated with MPI.

To run MEME on the cluster, use the following as an example job script:

#!/bin/bash
# Using the mpich2 MPI library
# Requesting N number of slots
#$ -pe mpich2 N
#$ -N jobName
#$ -cwd
#$ -S /bin/bash
# Setup the mpich2 environment
export MPICH2_ROOT="/opt/mpich2/gnu"
export PATH="$MPICH2_ROOT/bin:$PATH"
export MPD_CON_EXT="sge_$JOB_ID.$SGE_TASK_ID"
# The following is an example of MEME
# It looks for dna motifs in the input file using the OOPS model with the palindrome switch on
/share/apps/meme/bin/meme -p $NSLOTS crp0.s -dna -mod oops -pal
exit 0

In the case you need to pass options to mpirun, use quots after the '-p' option to include those options:

/share/apps/meme/bin/meme -p "$NSLOTS mpiOptions" ...

Running NAMD

A base install of NAMD has been installed from the prebuilt binaries configured for multicore(ie single node) use. The following is an example job script for running NAMD on the general all.q:

#$ -N namdJob
#$ -cwd
#$ -pe mt 8
namd2 +p$NSLOTS inputFile.namd
exit 0

For those who wish to use the multicore-CUDA version, here is an example job script:

#$ -N namdCudaJob
#$ -cwd
#$ -q gpu.q
#$ -l gpu
module load namd-cuda
namd2 +idlepoll +p12 inputFile.namd
exit 0

For those who wish to use the Infiniband-SMP version, here is an example job script(Note that it uses a different queue and parallel environment):

#$ -N namdIBjob
#$ -cwd
#$ -q ib.q
#$ -l ib_only
# When using the namd parallel environment
# Only select number of slots that are divisible by 4
#$ -pe namd 16
module load namd-ib
# IB nodes only have 4 cores
# Calculate number of nodes and thus number of PEs per node
NODES=$(($NSLOTS/4))
PPN=$(($NSLOTS/$NODES))

charmrun ++remote-shell ssh ++scalable-start ++p $NSLOTS ++PPN $PPN ++nodelist $TMPDIR/namd-machines $(which namd2) +setspuaffinity $(pwd)/inputFile.namd

 

Running NWCHEM

First Run:

For first time users, you need to setup your NWChem environment. Make a copy of the file

/usr/local/share/NWChem-6.3/data/default.nwchemrc

to your home directory. For example:

cp /usr/local/share/NWChem-6.3/data/default.nwchemrc ~/.nwchemrc

Important Note for SCF, DFT, MP2, CCSD Modules:

These modules by default write a lot of their calculations to disk. In the past this was because systems were tight on memory or compute cabability and writing their calculations to disk would save on time when it needed to use these calulations later on. Today this is no longer the case and in face can slow down the entire cluster due to the increased IO traffic over the network. Please use these settings to disable the caching of this data to disk.

scf
semidirect memsize 200000000 filesize 0 # this uses 1.6 GB of memory and no disk for caching integrals
end

dft
direct # all integrals are computed on the fly
end

mp2
scratchdisk 1024 # use 1 GB of disk per process for intermediates
end

ccsd
nodisk # most integrals are computed on-the-fly
end
 

Example Job Script:

# Give your job a name, this gets saved to $JOB_NAME
#$ -N nwchem-job
# Tell the grid engine to use 8 slots for this job
#$ -pe orte 8
# Tell grid engine to work within the current working directory
# This is saved to $SGE_O_WORKDIR
#$ -cwd
# Join output/error output into single log file
#$ -j y
# Using grid engine variables to name log file
#$ -o $JOB_NAME-$JOB_ID.log
# This log file is where output from nwchem will be captured
mpirun -n $NSLOTS nwchem inputFile.nw
# If you wish to capture the nwchem output manually
# use this as an example
#mpirun -n $NSLOTS nwchem inputFile.nw > $SGE_O_WORKDIR/outputFile.log

Using local disk on node:

It may be desirable to have your job run on the local disk on a node instead of writing to the network disk. The grid engine automaticly creates a directory for each job and this location can be used for this purpose. First stage your files to $TMPDIR, then change directory using cd, run NWChem within that directory. $TMPDIR will be destroyed after the job completes. In this example all output from NWChem is captured to the job output log file

#$ -N nwchem-localdisk
#$ -cwd
#$ -pe orte 8
#$ -j y
#$ -o $JOB_NAME.$JOB_ID.log
cp inputFile.nw $TMPDIR
cd $TMPDIR
mpirun -n $NSLOTS nwchem inputFile.nw

Running Siesta

There are two versions of siesta installed. A parallel version and a serial version with netcdf support.

Use the following as an example of a parallel siesta job:

#!/bin/bash
#$ -N MoS2
#$ -o MoS2.log
#$ -cwd
#$ -j y
#$ -pe orte 4
mpirun -n $NSLOTS siesta < MoS2.fdf

Use the following as an example of a serial siesta job with netcdf support:

 #!/bin/bash
#$ -N serial-siesta-netcdf
#$ -o $JOB_NAME-$JOB_ID.log 
#$ -j y
#$ -cwd
siesta-netcdf < MoS2.fdf

 

Special Queue: Big Memory Nodes

The big memory nodes are not part of the general queue and require special steps to access. These nodes are only for large memory workloads and as such if any other jobs are found running on these nodes, will be terminated.

Since the primary resource of interest is the memory as well as the system's memory bandwidth, these nodes are setup with only one slot. There is no parallel usage of the bigmem nodes.

Interactive Access
Its possible to access a bigmem node interactively via the following command:

$ qlogin -q bigmem.q -l bigmem

Job Script
Use the following to enable a job script to run on the gpu queue:

#!/bin/bash
....
#$ -q bigmem.q
#$ -l bigmem
....

Special Queue: GPU Computing

The GPU equipped nodes are not part of the general queue and require special steps to access. These nodes are only for gpu enabled workloads and as such if any other jobs are found running on these nodes, will be terminated.

Since the primary resource of interest is the gpu, these nodes are setup with only one slot. There is no parallel usage of the gpu nodes.

Interactive Access:

Its possible to access a gpu node interactively via the following command:

$ qlogin -q gpu.q -l gpu

Job Script:

Use the following code snip as an example of a job script to run on the gpu queue:

#$ -N gpuJob
...
#$ -q gpu.q
#$ -l gpu
...

Special Queue: Infiniband Nodes

The Infiniband nodes are not part of the general queue and require special steps to access.  These nodes are only for MPI workloads and as such, if any other jobs are found running on these nodes, they will be terminated. 

Infiniband is a special type of networking fabric that has very low latency compared to standard Ethernet based networks. Its enables the use of larger scale MPI jobs that can spread over several nodes. The max number of slots a user can request is 60 and there are three parallel environments to choose from.

Parallel Environments:

  • orte: This environment is the standard parallel environment. It restricts jobs to only be as large as the size of node in that queue. In the ib.q queue, that would be 4 slots. OpenMPI would then use shared memory for interprocess communication, not infiniband.
  • orte2: This environment will allow you to specify up to the max slots/jobs per user(60). It distributes the slots for the job by first filling up a node and then moving to the next available node. OpenMPI would then use both Infiniband and shared memory for interprocess communication.
  • orte3: This environment can also allow up to 60 slots. It distributes the slots in a round robin fashion, placing a single process per node until it has gone all the way around and starts again. OpenMPI would then use Infiniband, and if more then one process is started on a node, shared memory.

Use the following code snip as an example of a job script to run on the Infiniband queue:

#$ -N ibJob
...
#$ -pe orte3 4-16
#$ -q ib.q
#$ -l ib_only
...
mpirun -np $NSLOTS /path/to/executable inputFile

NOTE:

It should be noted that bigger is not always better. Requesting more slots for your job run will not always equate to faster runtimes. It depends on the algorithm in use and how the program is structured. Its best to check the documentation and test to get an idea of how well you workload will perform at higher number of slots.

Also note that some software packages have a separate version built for Infiniband. Please check the coresponding software page for details

Special Queue: Private Nodes

Some of the nodes in our cluster are owned exclusivly by a faculty member and only available to their students and collaborators. Access to these nodes must be requested beforehand by contacting us. Once you are on the access list, a user can submit jobs to the private queue by using the following:

qlogin -q private.q

or by adding the following to their job script:

#$ -q private.q

Note that there are no parallel environments available through the grid engine on these nodes. To run MPI jobs, simply hardcode into the script the number of processes to start.

Advanced: Available scratch space options

There are two options when it comes to using a scratch space instead of your home directory when running a job. This is useful when you have many jobs or when you are close to your quota on your home directory. The two options are the local disk on the node and the Lustre parallel file system, each has its pros and cons.

Local Disk: The local disk is great when you have many small files and temporary files and your jobs are running on a single node. The grid engine creates a directory for each job in $TMPDIR and it is destroyed when the job completes. The main issue is this space can be limited as the local disk in each node is only a few tens of gigabytes. 

Lustre File System: This parallel network file system is a good choice when you have a distributed job running on many nodes. Also this file system has many terabytes available and no quotas on usage. Though please be aware that this space is not backed up and any valuable data should be moved out as needed. Also should total use on this file system get too high, we will begin deleted files to make space. Another issue to be aware of is Lustre is not great at small files as this can greatly increase the overhead network activity and slow the system down. It is best suited for large files.

As to how these spaces are used in a job script depend on how you build them. File Staging can be done in different ways, either manually before a job is submitted or programmatically within a job script. Its best to first submit simple short jobs to test for bugs in any job scripts that programmatically stage files. The Lustre file system is available on all nodes and files can easily be staged manually when logged onto a node via qlogin. The local disk space will only be visible from a running job script. Be sure to include commands to clean up after your program completes. In the case of the local disk space, be sure to copy out any generated results files as $TMPDIR will be deleted upon job completion.