Panic Mode On (104) Server Problems?

Message boards : Number crunching : Panic Mode On (104) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 42 · Next

AuthorMessage
Profile Dr Grey

Send message
Joined: 27 May 99
Posts: 154
Credit: 104,147,344
RAC: 21
United Kingdom
Message 1841608 - Posted: 12 Jan 2017, 9:26:32 UTC - in response to Message 1841607.  

So generating a surplus of 4/sec which is enough to fill a 100 wu cache every 25 seconds. That's fast. But with 162,000 active hosts, it will take a while to get ahead of the pack.

As long as it continues to produce work at that rate. Sometimes it's faster, but at other times (like the most recent update) it's slower.
Only 26/sec. Nowhere near faster enough.


It's interesting though, that the average turnaround time is as high as 33 hours. With a lot of people struggling with low queues you'd expect it to be much lower. That suggests that the bulk of machines don't run anywhere near dry during the outage so the deficit is probably not all that bad.
ID: 1841608 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1841611 - Posted: 12 Jan 2017, 10:06:31 UTC - in response to Message 1841608.  
Last modified: 12 Jan 2017, 10:09:42 UTC

It's interesting though, that the average turnaround time is as high as 33 hours. With a lot of people struggling with low queues you'd expect it to be much lower. That suggests that the bulk of machines don't run anywhere near dry during the outage so the deficit is probably not all that bad.

The older and slower machines such as my Core 2 Duo can take over a week to crunch 100WUs.
However the top machines will run out of work in a matter of hours with the 100WU limit. And given the backoffs of the current BOINC Managers, if you're not there to manually hammer the retry button once you run out of work with the current shortage you'll be lucky to pick up any WUs.

Basically the more efficient the machine, the greater the difficulty in getting work once you've run out- and as it is the cache on my main machine is slowly shrinking; there just isn't enough work available to maintain it. At the present rate (without a sudden top-up) i'll be out of GPU work in about 8-10 hours.
As high as the current work return rate is (115,000/hr), if all the machines that are currently out of work had some, the return rate would be considerably higher than it presently is.
Grant
Darwin NT
ID: 1841611 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1841616 - Posted: 12 Jan 2017, 11:28:53 UTC

One problem is how the scheduler gives out work when it's running a low cache - It's all or nothing.

Instead of giving 10 to 20 people, it gives 200 to the first who asks for it.

I was sitting dry here for an hour, then got 94, next request 140 more, the poof I'm suddenly bursting. It would make more sense to give out a little bit to all, but as been said before, the scheduler code is a mess already. Ask Eric with his v8.23 rollout ....
ID: 1841616 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1841629 - Posted: 12 Jan 2017, 14:24:12 UTC - in response to Message 1841571.  

Well, this has been interesting. My Windows 10 machine has a full cache of work. My Windows 7 machines have zero GPU work. Looks like the outrage is going to cause at least a 3-4 day lack of GPU work for my fast machines.


. . I have been having a lot of trouble getting work on all three of my rigs. Most of the time "No tasks available", then I'll get a partial cache refill. But tonight the servers seem to be dispatching work more consistently. I am hoping that problem has been resolved.

Stephen

:)
ID: 1841629 · Report as offensive
Profile Dr Grey

Send message
Joined: 27 May 99
Posts: 154
Credit: 104,147,344
RAC: 21
United Kingdom
Message 1841642 - Posted: 12 Jan 2017, 17:20:50 UTC

Now it's 31.5/sec returned in the last hour with a creation rate of 36.7/sec. They are fighting a valiant battle.
Noticing also that the average turnaround time has dropped to 30.5 hours from 33 earlier so the shorter queues are showing.
ID: 1841642 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22199
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1841644 - Posted: 12 Jan 2017, 17:28:03 UTC

Don't forget that the distribution scheduler runs on a 3 second cycle, and only "plays" with 100 WU at a time.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1841644 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1841647 - Posted: 12 Jan 2017, 17:57:42 UTC

No complaints from the kitty crew.
The splitters and servers have so far been doing a yeoman's job of keeping up with the crunching hordes doing Aerecibo shorties.
Not had a completely full cache, but it has been bouncing up and down between my upper limit of 2600 and about 2450. Mostly in the upper range.
And my rigs are just crunching the bejeezus out of them shorties.

Meow!
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1841647 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1841648 - Posted: 12 Jan 2017, 17:59:38 UTC - in response to Message 1841629.  

Well, this has been interesting. My Windows 10 machine has a full cache of work. My Windows 7 machines have zero GPU work. Looks like the outrage is going to cause at least a 3-4 day lack of GPU work for my fast machines.


. . I have been having a lot of trouble getting work on all three of my rigs. Most of the time "No tasks available", then I'll get a partial cache refill. But tonight the servers seem to be dispatching work more consistently. I am hoping that problem has been resolved.

Stephen

:)

I'm making some progress on one of my Win 7 machines. Currently at 46 out of 200 GPU tasks on board. However, my daily driver has been only able to get a maximum download of 4 tasks every 15 minutes for the last two days. I finish those up every 11 minutes so go dry to zero constantly. Not making any headway on this machine.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1841648 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1841711 - Posted: 12 Jan 2017, 22:57:24 UTC
Last modified: 12 Jan 2017, 23:18:08 UTC

There is still a large number of Ghosts assigned to a number of the machines with AMD/ATI GPUs. If they could be freed it would inject a couple hundred thousand WUs all at once. This is a sample machine, https://setiathome.berkeley.edu/results.php?hostid=8001648. The 8.23 Plan Class was removed from the Application list, but all the Ghosts remain, https://setiathome.berkeley.edu/apps.php. I'm still trying to determine why an App that isn't on the List is still having tasks sent,
12 Jan 2017, 22:40:49 UTC 6 Mar 2017, 7:36:27 UTC In progress --- --- --- SETI@home v8 v8.23 (opencl_ati5_SoG_nocal) windows_intelx86

I suppose it's being worked on. Since beginning this post another Plan Class has been added. Now at 8.24; 8.24 (opencl_ati5_SoG_nocal)
ID: 1841711 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1841729 - Posted: 12 Jan 2017, 23:50:09 UTC

As of 23:40:04 UTC no work is being produced, resends only, so is it an indication that the GBT server maybe about to come online or are there probs in paradise?

Cheers.
ID: 1841729 · Report as offensive
Profile Qui-Gon
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 2940
Credit: 19,199,902
RAC: 11
United States
Message 1841731 - Posted: 12 Jan 2017, 23:59:22 UTC

I've turned off two machines and reduced a third one to 50%. Anyway, I haven't been able to get new work for a few days.
ID: 1841731 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1841733 - Posted: 13 Jan 2017, 0:14:06 UTC - in response to Message 1841729.  
Last modified: 13 Jan 2017, 0:14:39 UTC

As of 23:40:04 UTC no work is being produced, resends only, so is it an indication that the GBT server maybe about to come online or are there probs in paradise?

Cheers.


I have 2 fresh spits from 23:17 UTC. Looks like they started it, then shut it down.
ID: 1841733 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1841744 - Posted: 13 Jan 2017, 1:26:53 UTC

I received a handful more fresh BLCs, Looks like there back.

P.S. I think I just heard a kitty sigh (loudly)
ID: 1841744 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1841748 - Posted: 13 Jan 2017, 1:51:53 UTC - in response to Message 1841744.  

I received a handful more fresh BLCs, Looks like there back.

P.S. I think I just heard a kitty sigh (loudly)

LOL.....
It was bound to get back to 'normal' sooner or later.

Meowsigh.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1841748 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1841762 - Posted: 13 Jan 2017, 2:48:22 UTC - in response to Message 1841744.  

I received a handful more fresh BLCs, Looks like there back.

P.S. I think I just heard a kitty sigh (loudly)

Several ... :)
ID: 1841762 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1841767 - Posted: 13 Jan 2017, 3:20:44 UTC - in response to Message 1841762.  

I received a handful more fresh BLCs, Looks like there back.

P.S. I think I just heard a kitty sigh (loudly)

Several ... :)

Yah, well.....................
It's been a bit of rather fine fun, from a crunching standpoint.
But, all of you really in tune with the project realize that Guppies carry some very good science with them.

The fallout shall be seen in my descending RAC in the coming days when the kitties have had the crunchers do all the simple Aerecibo work in their caches.
No worries.............the kitties won't cry.
But they are not used to being attacked by Guppies.
The kitties usually pursue the fish, not the other way around.........LOL.
They'll get over it.
Glad that Centurion is back on the mend.

Meow.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1841767 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1841792 - Posted: 13 Jan 2017, 4:35:33 UTC - in response to Message 1841767.  

Glad to get the Guppies back. My i7 ran out of work a couple of times during the day, there just wasn't enough Arecibo work available to keep it going.
Grant
Darwin NT
ID: 1841792 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1841805 - Posted: 13 Jan 2017, 5:26:42 UTC - in response to Message 1841731.  

I've turned off two machines and reduced a third one to 50%. Anyway, I haven't been able to get new work for a few days.


. . You should be OK now, the Guppies are flowing again.

Stephen

:)
ID: 1841805 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1841807 - Posted: 13 Jan 2017, 5:28:06 UTC - in response to Message 1841729.  

As of 23:40:04 UTC no work is being produced, resends only, so is it an indication that the GBT server maybe about to come online or are there probs in paradise?

Cheers.



. . Guppies for everyone :) It seems your first diagnosis was spot on Doc :)

Stephen

:)
ID: 1841807 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1841808 - Posted: 13 Jan 2017, 5:30:02 UTC - in response to Message 1841748.  

I received a handful more fresh BLCs, Looks like there back.

P.S. I think I just heard a kitty sigh (loudly)

LOL.....
It was bound to get back to 'normal' sooner or later.

Meowsigh.


. . I too have mixed feelings. It's good that more work is flowing but it is hard to raise a hearty cheer for Blc5s.

Stephen

:)
ID: 1841808 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 42 · Next

Message boards : Number crunching : Panic Mode On (104) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.