Panic Mode On (102) Server Problems?

Author	Message
William Volunteer tester Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0	Message 1780600 - Posted: 20 Apr 2016, 7:38:27 UTC - in response to Message 1780562. I haven't received any GPU tasks at all today since the outage. Nothing but GBT vlars being downloaded. Setting the log options for scheduling shows an acknowledgement of GPU deficit but BOINC refuses to download any GPU work. Something has changed in the scheduler I think. Maybe Eric put something in place that changes the rules for Nvidia. These entries in the log are suspicious and something I've never seen before. Keith-Windows7 36177 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] reserving 0.330000 of coproc NVIDIA 36178 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] add to run list: 13au10aa.23534.4598.16.43.43_0 (NVIDIA GPU, FIFO) (prio -0.983826) 36179 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] reserving 0.330000 of coproc NVIDIA 36180 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] add to run list: 13au10aa.23534.4598.16.43.95_0 (NVIDIA GPU, FIFO) (prio -0.992389) 36181 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] reserving 0.330000 of coproc NVIDIA 36182 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] add to run list: 14se10ad.23501.4975.12.39.107_1 (NVIDIA GPU, FIFO) (prio -1.000953) 36183 Milkyway@Home 4/19/2016 11:04:24 PM [cpu_sched_debug] reserving 0.500000 of coproc NVIDIA 36184 Milkyway@Home 4/19/2016 11:04:24 PM [cpu_sched_debug] add to run list: de_modfit_fast_15_2s_136_ModfitConstraints1_2_1453826702_38380416_0 (NVIDIA GPU, FIFO) (prio -1.007310) 36185 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] reserving 0.330000 of coproc NVIDIA 36186 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] add to run list: 13au10aa.23534.4598.16.43.167_1 (NVIDIA GPU, FIFO) (prio -1.009516) 36187 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.6234.15.42.25_1 36188 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.6234.15.42.77_1 36189 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.23534.4598.16.43.97_0 36190 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.6234.15.42.27_1 36191 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.23534.4598.16.43.41_1 36192 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 14se10ad.23501.4975.12.39.95_0 36193 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.23534.4598.16.43.115_1 36194 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.23534.4598.16.43.169_1 36195 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.23534.4598.16.43.113_1 36196 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.6234.15.42.95_0 36197 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 14se10ad.23501.8247.12.39.88_0 36198 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.11142.15.42.10_0 36199 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.11142.15.42.163_1 36200 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.11142.15.42.7_1 36201 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.15641.15.42.175_0 36202 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 14se10ad.23501.15609.12.39.174_1 I'm sitting at about 100 GPU tasks on each machine and I should be at 200 tasks per machine since they both have two GTX970's. I'm fully loaded at 100 CPU tasks per machine. I don't think that the MB splitters are only pushing out VLAR GPU tasks right now which is the only other reason that Nvidia cards aren't getting GPU work. Anybody else confirm the scheduler requests? [Edit] So maybe the spigot has re-opened. I'm getting GPU work again on Pipsqueek, up to about 170 right now. Hope I see the same event happen on the main cruncher. But also seeing the 'insufficient Nvidia' on Pipsqueek also. Very weird recovery from the project outage today. Still think something's changed in the scheduler. I think boinc is just telling you that it cannot run more tasks on the GPUs you have with the 0.33 and 0.5 setting you made - basically it ran out of GPUs to run more tasks but fore some reason still checked a dozen more tasks as to whether it can make them fit in. A person who won't read has no advantage over one who can't read. (Mark Twain) ID: 1780600 ·

Old man Volunteer tester Send message Joined: 19 Sep 07 Posts: 29 Credit: 3,025,264 RAC: 0	Message 1780666 - Posted: 20 Apr 2016, 12:28:25 UTC Here also 0 gpu tasks to my gtx 460 gpu. But maybe i get them some day. ID: 1780666 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1780683 - Posted: 20 Apr 2016, 13:25:37 UTC - in response to Message 1780666. Here also 0 gpu tasks to my gtx 460 gpu. But maybe i get them some day. There are tasks around: 20/04/2016 14:11:13 \| SETI@home \| Sending scheduler request: To fetch work. 20/04/2016 14:11:13 \| SETI@home \| Reporting 1 completed tasks 20/04/2016 14:11:13 \| SETI@home \| Requesting new tasks for NVIDIA GPU 20/04/2016 14:11:13 \| SETI@home \| [sched_op] CPU work request: 0.00 seconds; 0.00 devices 20/04/2016 14:11:13 \| SETI@home \| [sched_op] NVIDIA GPU work request: 17602.47 seconds; 0.00 devices 20/04/2016 14:11:13 \| SETI@home \| [sched_op] Intel GPU work request: 0.00 seconds; 0.00 devices 20/04/2016 14:11:16 \| SETI@home \| Scheduler request completed: got 12 new tasks 20/04/2016 14:11:16 \| SETI@home \| [sched_op] estimated total NVIDIA GPU task duration: 17819 seconds but they come in clumps, and you often don't get any if you don't ask at exactly the right moment. Once they start flowing, BOINC usually manages to keep the cache topped up, but if you let a host run dry, it can stay dry for a long time before catching the moment - especially if you have a backup project in place. The way to have the best chance of keeping the cache topped up, with BOINC v7, is to set a very low or zero values for "Store up to an additional ... days of work". ID: 1780683 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1780730 - Posted: 20 Apr 2016, 15:39:30 UTC - in response to Message 1780683. Correct, I have had the setting for additional days of work set to zero for over a year now with just a four day cache set. I ran all the way down to about 25 tasks on the main cruncher last night before I turned in and hoped for the best overnight. As I posted earlier, Pipsqueek had managed to almost snag a full cache. Both machines are full this morning. So I assume there was just a very low amount of viable work for Nvidia cards after the outage out the ~ 500,0000 or so tasks available and everyone was fighting over getting them. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1780730 ·

WezH Volunteer tester Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95	Message 1780734 - Posted: 20 Apr 2016, 15:46:05 UTC There is only 4 pfb splitters running, that may cause shortage of GPU tasks... It seems that GBT files are vlars only(?) ID: 1780734 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22205 Credit: 416,307,556 RAC: 380	Message 1780735 - Posted: 20 Apr 2016, 15:46:53 UTC ...the critical number isn't the half million sitting in the queue but the 100 in the ready to send. The current Arecibo work is dominated by VLARs, so Nvidia GPUs are tending to run low. Now once the Beta testing of VLAR and guppi on Nvidia GPUs has completed the situation should (hopefully) improve dramatically..... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1780735 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1780740 - Posted: 20 Apr 2016, 15:56:27 UTC - in response to Message 1780735. Thanks for explaining the situation. Based on my readings of the Beta forum threads ... progress is made VERY slowly over at Beta. I suspect we will have to live with this current situation for quite a long while before the new applications have been approved for Main and we get a new Lunatics installer that incorporates them. It also seems that you will have to make some major adjustments into how you run CPU and GPU tasks. From what I have read so far, all of the new applications that handle VLAR tasks require a full CPU to feed each work unit on the GPU. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1780740 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1780743 - Posted: 20 Apr 2016, 16:11:26 UTC - in response to Message 1780734. There is only 4 pfb splitters running, that may cause shortage of GPU tasks... It seems that GBT files are vlars only(?) Yes, most of Â´em. With each crime and every kindness we birth our future. ID: 1780743 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1780754 - Posted: 20 Apr 2016, 16:50:08 UTC - in response to Message 1780740. Last modified: 20 Apr 2016, 16:50:26 UTC Thanks for explaining the situation. Based on my readings of the Beta forum threads ... progress is made VERY slowly over at Beta. I suspect we will have to live with this current situation for quite a long while before the new applications have been approved for Main and we get a new Lunatics installer that incorporates them. It also seems that you will have to make some major adjustments into how you run CPU and GPU tasks. From what I have read so far, all of the new applications that handle VLAR tasks require a full CPU to feed each work unit on the GPU. Not necessarily... Using the commandlines, we have been able to get the cpu usage down to about 2-3% of a core for each work unit with extension of the work by maybe 3-5 mins depending on the machine. It's a big improvement from where it was. It's up to the Raistmer and those people to decide how they would implement the new app and what changes would need to be made ID: 1780754 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1780765 - Posted: 20 Apr 2016, 17:12:54 UTC - in response to Message 1780754. Thanks for the update. But I see another problem with scheduling.... how long do you think it will take for the project scientists to come up with a better scheduling mechanism and finer grained plan classes to handle all the various generations of hardware. From what I have gathered over at Beta, there is no mechanism in place or even proposed to decide what your computer is sent with regard to the SoG and OpenCL applications or which is more appropriate. Each has its own benefit and drawback. Depending on hardware and even AR, one or the other is better tasked to crunching any particular work unit. That is why I foresee a LONG wait for the new applications being brought to Main and being implemented. Hence, my pessimistic comment that we might be in this situation for quite a while. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1780765 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1780778 - Posted: 20 Apr 2016, 18:06:42 UTC - in response to Message 1780765. Thanks for the update. But I see another problem with scheduling.... how long do you think it will take for the project scientists to come up with a better scheduling mechanism and finer grained plan classes to handle all the various generations of hardware. From what I have gathered over at Beta, there is no mechanism in place or even proposed to decide what your computer is sent with regard to the SoG and OpenCL applications or which is more appropriate. Each has its own benefit and drawback. Depending on hardware and even AR, one or the other is better tasked to crunching any particular work unit. That is why I foresee a LONG wait for the new applications being brought to Main and being implemented. Hence, my pessimistic comment that we might be in this situation for quite a while. Some projects allow users to select apps by plan class in their project preferences. However SETI@home tends to run a less modified version of BOINC. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1780778 ·

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349	Message 1780821 - Posted: 20 Apr 2016, 21:45:00 UTC Cache seems to be staying full with GPU MBs on my 4 crunchers. So far, so good ... ID: 1780821 ·

Bruce Volunteer tester Send message Joined: 15 Mar 02 Posts: 123 Credit: 124,955,234 RAC: 11	Message 1781403 - Posted: 22 Apr 2016, 23:09:07 UTC Has anybody noticed that GBT data is now going out to nVidia cards. I started getting them about one hour ago. We'll see what happens. *Bruce* ID: 1781403 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1781405 - Posted: 22 Apr 2016, 23:10:04 UTC - in response to Message 1781403. Last modified: 22 Apr 2016, 23:14:28 UTC Yup you are correct, let's see how they do with cuda 50 ID: 1781405 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1781409 - Posted: 22 Apr 2016, 23:16:49 UTC - in response to Message 1781405. I don't see the VLAR tag attached to the end of the work unit. ID: 1781409 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1781415 - Posted: 22 Apr 2016, 23:30:10 UTC I have only run though 2 GBT guppie NVidia GPU tasks though, but it seems about the same as CPU, 20% faster. ID: 1781415 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1781418 - Posted: 22 Apr 2016, 23:45:16 UTC These were not VLAR, as Tut has pointed out At least some of MESSIER031 doesn't appear to be VLAR GBTs WU true angle range is : 0.307247 So there be some work units for the NV GPUs from these at least. ID: 1781418 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1781423 - Posted: 22 Apr 2016, 23:56:55 UTC Seems to be a large number of Overflows, at least one has validated; http://setiathome.berkeley.edu/result.php?resultid=4877733743 The normal ones look like this; http://setiathome.berkeley.edu/result.php?resultid=4877628357 ID: 1781423 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1781434 - Posted: 23 Apr 2016, 0:02:35 UTC - in response to Message 1781423. 2 of 17 had overflows on mine, the rest went to completion. ID: 1781434 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22205 Credit: 416,307,556 RAC: 380	Message 1781588 - Posted: 23 Apr 2016, 8:41:04 UTC It looks as if there is a problem with the stats pages... When I look at mine I see that there are a number CPU tasks visible in the "all tasks" page, but when I look at the detail pages (running, waiting validation etc) all tasks are shown as having been run on one of the GPUs. While more of an annoyance than a problem if it is restricted to the stats pages if this issue has crept into other parts of the system, such as granting credit, then it could well cause problems. (This includes tasks that were correctly allocated when I did a check yesterday evening about 12 hours ago...) Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1781588 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.