Contact Information
- Name: CAEN Staff
- Email: caen@umich.edu
- Phone: (734) 764-CAEN
Releases Quick Access
Grid options explored
Most people at the College use CAEN computing resources interactively to display and utilize engineering applications and productivity software. However, many researchers and students require additional resources in order to run computationally intensive background programs for purposes such as validating circuits, solving computational fluid dynamics problems, or even doing Monte Carlo simulations of cancer treatments.
CAEN offers three options for running computationally intensive background and parallel jobs. Each of these options has its own set of benefits and drawbacks. The newest option is the Portable Batch System (PBS). PBS is now available on all CAEN lab PC desktops that are booted into Linux. This closely matches the batch queuing environment on the Linux-based compute clusters in the College's Center for Advanced Computing (CAC).
| Job Requirements | Jobs Running > 6 hours | Jobs Running < 6 hours |
| Need access to full suite of CAEN licensed software | SGE | PBS on CAEN Lab hosts |
| Need a subset of CAEN licensed software or using your own software | PBS on the compute clusters |
PBS on the CAEN Lab Hosts (recommended) or PBS on the compute clusters |
| Need to use high-speed networking for parallel programs | PBS on the compute clusters |
PBS on the compute clusters |
The above table summarizes the three options available for long-running or otherwise non-interactive compute jobs at the College.
PBS on the CAEN lab machines allows access to the full suite of CAEN-licensed software. However, the Linux desktops are typically booted into Windows for a large part of the day during which PBS cannot run. The unoccupied desktops in the CAEN labs are automatically booted into Linux in the middle of the night for about six hours. If your job will complete in six hours or less, PBS is a good option for you. There are also more Linux desktops than Solaris desktops in the CAEN labs, so jobs in the PBS queue have a good chance of starting faster than those in the Solaris-based SGE queue (described later in this article). Note: PBS is currently in the process of being deployed in the CAEN labs and is expected to be ready for use by the end of January.
Another option for running background intensive programs is the batch system on the compute clusters. The cluster computing environment at the College is maintained by the CAC in conjunction with CAEN, and has been available for many years in one form or another. Cluster jobs run on either Linux or Apple OS X systems. These computers are not available for interactive use and only one job is allocated to a computer (or node) at a time.
The CAC clusters use the same PBS batch system that is used on the CAEN Linux computers; the PBS command file written for one system will work on the other with only slight modifications. The CAC clusters can support very long running jobs (up to two weeks). Since there are no console users, they are able to run for long periods of time without being rebooted. In addition, the CAC cluster nodes have between 2GB and 6GB of RAM.
Furthermore, if you need dedicated access to compute resources and you don't want to wait in the queue, there are nodes available for rent on the CAC clusters. For additional information, refer to the CAC web site at cac.engin.umich.edu/access/subscription.html. Like the other batch queuing systems, there are also some limitations in using the clusters. The first of these is that there is only limited access to CAEN-licensed software. Many people write their own programs to run on the cluster, including parallel programs using MPI. Another drawback is that if your job is too long to fit into a queue with quick turn-around, your job may wait for several days before starting because these systems are used heavily.
The oldest and least-recommended option for running batch jobs is the Sun Grid Engine (SGE). The SGE job scheduler has been available at CAEN for two years now, and is documented on the CAEN web site at www.engin.umich.edu/caen/grid. As presently configured, it only schedules jobs on the Sun Solaris computers in CAEN computer labs. When a job is submitted to the SGE queue, SGE finds an open Sun workstation in a lab and starts that job there. If someone logs into that computer at the console, the SGE-owned job is suspended until the person using the console is done, at which time the SGE-owned job is resumed. When the job is complete, the submitter receives an email notification.
SGE has several disadvantages over the other two batch job scheduling systems available. It is limited by the small (and decreasing) number of Suns in the CAEN labs. It is also different from the more widely deployed PBS system, so the portability of batch script files is more difficult. SGE does, however, have some advantages. SGE jobs have access to all CAEN-licensed software; and the Suns in the labs are available 24 hours a day and are rarely rebooted. Remember, CAEN will be retiring most of the Suns in the labs and hence SGE over the next 12 to 18 months, so any investment you make in using SGE will be lost in the near future.
CAEN is always working improve support for all aspects of computing in the College of Engineering, so you can expect improvements in this area as well. For more news, watch the CAC and CAEN web sites and publications.

