Panic Mode On (102) Server Problems?

Message boards : Number crunching : Panic Mode On (102) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 25 · Next

AuthorMessage
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1780600 - Posted: 20 Apr 2016, 7:38:27 UTC - in response to Message 1780562.  

I haven't received any GPU tasks at all today since the outage. Nothing but GBT vlars being downloaded. Setting the log options for scheduling shows an acknowledgement of GPU deficit but BOINC refuses to download any GPU work. Something has changed in the scheduler I think. Maybe Eric put something in place that changes the rules for Nvidia. These entries in the log are suspicious and something I've never seen before.

Keith-Windows7

36177 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] reserving 0.330000 of coproc NVIDIA
36178 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] add to run list: 13au10aa.23534.4598.16.43.43_0 (NVIDIA GPU, FIFO) (prio -0.983826)
36179 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] reserving 0.330000 of coproc NVIDIA
36180 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] add to run list: 13au10aa.23534.4598.16.43.95_0 (NVIDIA GPU, FIFO) (prio -0.992389)
36181 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] reserving 0.330000 of coproc NVIDIA
36182 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] add to run list: 14se10ad.23501.4975.12.39.107_1 (NVIDIA GPU, FIFO) (prio -1.000953)
36183 Milkyway@Home 4/19/2016 11:04:24 PM [cpu_sched_debug] reserving 0.500000 of coproc NVIDIA
36184 Milkyway@Home 4/19/2016 11:04:24 PM [cpu_sched_debug] add to run list: de_modfit_fast_15_2s_136_ModfitConstraints1_2_1453826702_38380416_0 (NVIDIA GPU, FIFO) (prio -1.007310)
36185 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] reserving 0.330000 of coproc NVIDIA
36186 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] add to run list: 13au10aa.23534.4598.16.43.167_1 (NVIDIA GPU, FIFO) (prio -1.009516)
36187 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.6234.15.42.25_1
36188 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.6234.15.42.77_1
36189 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.23534.4598.16.43.97_0
36190 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.6234.15.42.27_1
36191 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.23534.4598.16.43.41_1
36192 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 14se10ad.23501.4975.12.39.95_0
36193 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.23534.4598.16.43.115_1
36194 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.23534.4598.16.43.169_1
36195 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.23534.4598.16.43.113_1
36196 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.6234.15.42.95_0
36197 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 14se10ad.23501.8247.12.39.88_0
36198 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.11142.15.42.10_0
36199 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.11142.15.42.163_1
36200 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.11142.15.42.7_1
36201 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 13au10aa.21706.15641.15.42.175_0
36202 SETI@home 4/19/2016 11:04:24 PM [cpu_sched_debug] insufficient NVIDIA for 14se10ad.23501.15609.12.39.174_1


I'm sitting at about 100 GPU tasks on each machine and I should be at 200 tasks per machine since they both have two GTX970's. I'm fully loaded at 100 CPU tasks per machine. I don't think that the MB splitters are only pushing out VLAR GPU tasks right now which is the only other reason that Nvidia cards aren't getting GPU work. Anybody else confirm the scheduler requests?

[Edit] So maybe the spigot has re-opened. I'm getting GPU work again on Pipsqueek, up to about 170 right now. Hope I see the same event happen on the main cruncher. But also seeing the 'insufficient Nvidia' on Pipsqueek also. Very weird recovery from the project outage today. Still think something's changed in the scheduler.


I think boinc is just telling you that it cannot run more tasks on the GPUs you have with the 0.33 and 0.5 setting you made - basically it ran out of GPUs to run more tasks but fore some reason still checked a dozen more tasks as to whether it can make them fit in.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1780600 · Report as offensive
Old man
Volunteer tester

Send message
Joined: 19 Sep 07
Posts: 29
Credit: 3,025,264
RAC: 0
Mongolia
Message 1780666 - Posted: 20 Apr 2016, 12:28:25 UTC

Here also 0 gpu tasks to my gtx 460 gpu. But maybe i get them some day.
ID: 1780666 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1780683 - Posted: 20 Apr 2016, 13:25:37 UTC - in response to Message 1780666.  

Here also 0 gpu tasks to my gtx 460 gpu. But maybe i get them some day.

There are tasks around:

20/04/2016 14:11:13 | SETI@home | Sending scheduler request: To fetch work.
20/04/2016 14:11:13 | SETI@home | Reporting 1 completed tasks
20/04/2016 14:11:13 | SETI@home | Requesting new tasks for NVIDIA GPU
20/04/2016 14:11:13 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
20/04/2016 14:11:13 | SETI@home | [sched_op] NVIDIA GPU work request: 17602.47 seconds; 0.00 devices
20/04/2016 14:11:13 | SETI@home | [sched_op] Intel GPU work request: 0.00 seconds; 0.00 devices
20/04/2016 14:11:16 | SETI@home | Scheduler request completed: got 12 new tasks
20/04/2016 14:11:16 | SETI@home | [sched_op] estimated total NVIDIA GPU task duration: 17819 seconds

but they come in clumps, and you often don't get any if you don't ask at exactly the right moment. Once they start flowing, BOINC usually manages to keep the cache topped up, but if you let a host run dry, it can stay dry for a long time before catching the moment - especially if you have a backup project in place.

The way to have the best chance of keeping the cache topped up, with BOINC v7, is to set a very low or zero values for "Store up to an additional ... days of work".
ID: 1780683 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1780730 - Posted: 20 Apr 2016, 15:39:30 UTC - in response to Message 1780683.  

Correct, I have had the setting for additional days of work set to zero for over a year now with just a four day cache set. I ran all the way down to about 25 tasks on the main cruncher last night before I turned in and hoped for the best overnight. As I posted earlier, Pipsqueek had managed to almost snag a full cache. Both machines are full this morning. So I assume there was just a very low amount of viable work for Nvidia cards after the outage out the ~ 500,0000 or so tasks available and everyone was fighting over getting them.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1780730 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1780734 - Posted: 20 Apr 2016, 15:46:05 UTC

There is only 4 pfb splitters running, that may cause shortage of GPU tasks... It seems that GBT files are vlars only(?)
ID: 1780734 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22205
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1780735 - Posted: 20 Apr 2016, 15:46:53 UTC

...the critical number isn't the half million sitting in the queue but the 100 in the ready to send. The current Arecibo work is dominated by VLARs, so Nvidia GPUs are tending to run low. Now once the Beta testing of VLAR and guppi on Nvidia GPUs has completed the situation should (hopefully) improve dramatically.....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1780735 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1780740 - Posted: 20 Apr 2016, 15:56:27 UTC - in response to Message 1780735.  

Thanks for explaining the situation. Based on my readings of the Beta forum threads ... progress is made VERY slowly over at Beta. I suspect we will have to live with this current situation for quite a long while before the new applications have been approved for Main and we get a new Lunatics installer that incorporates them. It also seems that you will have to make some major adjustments into how you run CPU and GPU tasks. From what I have read so far, all of the new applications that handle VLAR tasks require a full CPU to feed each work unit on the GPU.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1780740 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1780743 - Posted: 20 Apr 2016, 16:11:26 UTC - in response to Message 1780734.  

There is only 4 pfb splitters running, that may cause shortage of GPU tasks... It seems that GBT files are vlars only(?)


Yes, most of ´em.


With each crime and every kindness we birth our future.
ID: 1780743 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1780754 - Posted: 20 Apr 2016, 16:50:08 UTC - in response to Message 1780740.  
Last modified: 20 Apr 2016, 16:50:26 UTC

Thanks for explaining the situation. Based on my readings of the Beta forum threads ... progress is made VERY slowly over at Beta. I suspect we will have to live with this current situation for quite a long while before the new applications have been approved for Main and we get a new Lunatics installer that incorporates them. It also seems that you will have to make some major adjustments into how you run CPU and GPU tasks. From what I have read so far, all of the new applications that handle VLAR tasks require a full CPU to feed each work unit on the GPU.


Not necessarily...

Using the commandlines, we have been able to get the cpu usage down to about 2-3% of a core for each work unit with extension of the work by maybe 3-5 mins depending on the machine.

It's a big improvement from where it was.

It's up to the Raistmer and those people to decide how they would implement the new app and what changes would need to be made
ID: 1780754 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1780765 - Posted: 20 Apr 2016, 17:12:54 UTC - in response to Message 1780754.  

Thanks for the update. But I see another problem with scheduling.... how long do you think it will take for the project scientists to come up with a better scheduling mechanism and finer grained plan classes to handle all the various generations of hardware. From what I have gathered over at Beta, there is no mechanism in place or even proposed to decide what your computer is sent with regard to the SoG and OpenCL applications or which is more appropriate. Each has its own benefit and drawback. Depending on hardware and even AR, one or the other is better tasked to crunching any particular work unit. That is why I foresee a LONG wait for the new applications being brought to Main and being implemented. Hence, my pessimistic comment that we might be in this situation for quite a while.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1780765 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1780778 - Posted: 20 Apr 2016, 18:06:42 UTC - in response to Message 1780765.  

Thanks for the update. But I see another problem with scheduling.... how long do you think it will take for the project scientists to come up with a better scheduling mechanism and finer grained plan classes to handle all the various generations of hardware. From what I have gathered over at Beta, there is no mechanism in place or even proposed to decide what your computer is sent with regard to the SoG and OpenCL applications or which is more appropriate. Each has its own benefit and drawback. Depending on hardware and even AR, one or the other is better tasked to crunching any particular work unit. That is why I foresee a LONG wait for the new applications being brought to Main and being implemented. Hence, my pessimistic comment that we might be in this situation for quite a while.

Some projects allow users to select apps by plan class in their project preferences. However SETI@home tends to run a less modified version of BOINC.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1780778 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1780821 - Posted: 20 Apr 2016, 21:45:00 UTC

Cache seems to be staying full with GPU MBs on my 4 crunchers. So far, so good ...
ID: 1780821 · Report as offensive
Bruce
Volunteer tester

Send message
Joined: 15 Mar 02
Posts: 123
Credit: 124,955,234
RAC: 11
United States
Message 1781403 - Posted: 22 Apr 2016, 23:09:07 UTC

Has anybody noticed that GBT data is now going out to nVidia cards. I started getting them about one hour ago.
We'll see what happens.
Bruce
ID: 1781403 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1781405 - Posted: 22 Apr 2016, 23:10:04 UTC - in response to Message 1781403.  
Last modified: 22 Apr 2016, 23:14:28 UTC

Yup you are correct, let's see how they do with cuda 50
ID: 1781405 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1781409 - Posted: 22 Apr 2016, 23:16:49 UTC - in response to Message 1781405.  

I don't see the VLAR tag attached to the end of the work unit.
ID: 1781409 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1781415 - Posted: 22 Apr 2016, 23:30:10 UTC

I have only run though 2 GBT guppie NVidia GPU tasks though, but it seems about the same as CPU, 20% faster.
ID: 1781415 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1781418 - Posted: 22 Apr 2016, 23:45:16 UTC

These were not VLAR, as Tut has pointed out

At least some of *MESSIER031* doesn't appear to be VLAR GBTs


WU true angle range is : 0.307247


So there be some work units for the NV GPUs from these at least.
ID: 1781418 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1781423 - Posted: 22 Apr 2016, 23:56:55 UTC

Seems to be a large number of Overflows, at least one has validated; http://setiathome.berkeley.edu/result.php?resultid=4877733743
The normal ones look like this; http://setiathome.berkeley.edu/result.php?resultid=4877628357
ID: 1781423 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1781434 - Posted: 23 Apr 2016, 0:02:35 UTC - in response to Message 1781423.  

2 of 17 had overflows on mine, the rest went to completion.
ID: 1781434 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22205
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1781588 - Posted: 23 Apr 2016, 8:41:04 UTC

It looks as if there is a problem with the stats pages...
When I look at mine I see that there are a number CPU tasks visible in the "all tasks" page, but when I look at the detail pages (running, waiting validation etc) all tasks are shown as having been run on one of the GPUs.
While more of an annoyance than a problem if it is restricted to the stats pages if this issue has crept into other parts of the system, such as granting credit, then it could well cause problems.
(This includes tasks that were correctly allocated when I did a check yesterday evening about 12 hours ago...)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1781588 · Report as offensive
Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 25 · Next

Message boards : Number crunching : Panic Mode On (102) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.