temporary limit on number of jobs per user on Neon

Update August 6, 2014
The reporting of pending job priorities to the qmaster has been restored so priorities should now be shown in the qstat output again.

Update July 31, 2014
The user limit has been removed. Note that in order to reduce the load on the scheduler, the reporting of pending job tickets to the qmaster has been turned off. This will affect the output of qstat as the priorities of pending jobs will be unknown to the qmaster and show as 0. Thus, output of pending jobs will be sorted by submission time rather than priority.

Note that only the reporting of pending jobs as reported by qstat will change. The actual priorities of pending jobs are still determined every scheduler cycle so they are not 0 to the scheduler. What is different is that the values of those priorities are not being sent to the qmaster for reference/display/sorting by qstat. That does make the qstat output of pending jobs a bit less useful but should not affect the scheduling of the job. If you look at the output of running jobs you will see what the priorities actually were at the time of job launch. The priorities of running jobs are static but the priorites of pending jobs need to be calculated at each scheduling cycle. Coordinating that with the qmaster for a large number of pending jobs at every scheduling cycle is very expensive. This change is being done to reduce the load on the qmaster and keep the scheduler running while we determine what the problem is. Again, it will only affect the output of qstat and not the scheduling of the jobs during troubleshooting.

Update July 30, 2014
The problem is apparently not completely solved. A limit has been put back but this time at 20000 jobs per user.

Update: July 22, 2014 6:37 PM
The issue that was causing the SGE qmaster to fail appears to have been resolved. As such, the limit on the number of active jobs per user has been lifted.

A temporary limit has been put in place on Neon to limit the number of active jobs per user to 5000. This has been put in place as a backstop while a scalability issue involving HTP jobs with a large number of job dependencies is investigated.