Home > Help Files > Info > Computing and Processing Practices

Computing and Processing on DECF Clusters

(Revised 3/3/08)


Introduction The DECF clusters are available for use by anyone with a DECF account for running processes. They include (click on link for machine names): The 1111 Cluster is reserved for in lab use and running short jobs ONLY. Longer processes that are resource intensive should be run on the 1171*, Reindeer, or Archipelagos clusters because they are better equipped to handle them.
 *Except on boogie, bump, chacha, charleston, fandango, fever, and flamenco. These are slower computers w/ only 1 CPU.
Interpreting Loads Check Ganglia for computers with lower loads to run your jobs on. The red computers have HIGH loads and the blue computers have LOW or NO load. You can check the load on a computer by typing "top" at the command prompt. "top" displays the active processes, how much CPU and memory they are using, and the load.
  • How do you interpret load?
    Load can be thought of as a ratio of roughly the number of active processes to CPU's available. A computer with 1 CPU and a load of 1.00 means there is one process using all the CPU's resources. A computer with 2 CPUs and a load of 2.00 means there are 2 processes using all the CPU's resources. 1111 machines have 1 CPU; Reindeer and archipelagos clusters have 2 CPUs, most of the 1171 machines have 4 CPUs. Click on the machine name on Ganglia for the specifications of each machine.
  • What is a "high" load?
    A "high" load is when the load exceeds the number of CPU's available. (e.g. a load of 4.00 on a computer with 1 CPU is considered a high load because there are roughly 4 processes all trying to utilize the resources of 1 CPU.) CPUs running at a high load can become overloaded as the CPU tries to accomodate all the processes and cause the computer to crash and reboot, or crash and never recover, rendering the computer useless.
Running Jobs DO NOT RUN JOBS ON KEPLER.
Kepler is a login server and extremely slow. Any jobs running on kepler will be killed without prior notice.

Run jobs one at a time.
DO NOT run multiple jobs simultaneously; they do not run faster and they overload the computers and the file server.

Following how "load" works, when you are running two jobs simultaneously, each job ends up getting half the CPU's processing resources that one job by itself would, while increasing the load of the computer. It would take just as much or even less time to run the jobs one after the other (instead of simultaneously) and the load on the computers would be lower.

The file server houses EVERYBODY's files so it ABSOLUTELY CANNOT CRASH. The load on the file server increases when you run two or more jobs at the same time because it has to handle twice as many or more data transfers, which can overload the file server.

Users found running multiple jobs at the same time will have their jobs killed and their accounts locked. Users running large jobs on 1111 machines will also have their jobs killed. This is a necessary step to keep from overloading our computers and servers.

 


Comments to consult@newton.berkeley.edu
© 1998-2009 UC Regents