Argon queue system under extreme load

The Argon queuing system is currently under an extremely heavy load. This is causing a slow response time, and time outs, when running SGE commands. We are investigating the cause.

Update, 3:43 PM: The problem has been identified. There are a large number of high throughput jobs being submitted by many people, spread across both the all.q queue and various investor queues. In addition to the normal job launches and completions, this caused a high number of job evictions and migrations from the all.q queue. The scheduling priorities of each job were being communicated to the primary queue process. This is a very expensive operation, and the volume of the priority updates was causing SGE responsiveness to suffer. To alleviate the pressure, the reporting of the scheduler job priorities to the primary queue process has been turned off. This should restore responsiveness of SGE commands at the expense of the job priorities of pending jobs not being available to qstat. Note that the scheduler will still schedule jobs based on relative priorities, those values are just not visible in the qstat output.