Panic Mode On (77) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (77) Server Problems?

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 23 · Next
Author Message
rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8535
Credit: 59,443,323
RAC: 85,867
United Kingdom
Message 1291603 - Posted: 5 Oct 2012, 13:29:46 UTC

A quick look at the server status page - tapes available, but splitters not splitting them, and no tasks available.

As its Friday afternoon here I'll pull up a chair, pop the top on a beer and sip it quietly...
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 828
Credit: 1,571,842
RAC: 247
Germany
Message 1291607 - Posted: 5 Oct 2012, 13:37:18 UTC - in response to Message 1291603.

The splitters are spliting them, current result creation rate for MB is 32.5953/sec. Apparently not enough to build up any ready to send buffer, but enough to max out the bandwidth.
____________
.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5868
Credit: 60,598,482
RAC: 47,541
Australia
Message 1291763 - Posted: 5 Oct 2012, 20:54:15 UTC - in response to Message 1291501.

however in the last hour i've received 60 tasks for the cpu .. the likes i haven't seen for well over a week

I'm still getting work, but not that much. Probably every 4th or 5th request results in work. So my caches continue to shrink, but much more slowly than they have been.


Although the splitters are still limited in what they can produce, luckily most of the current WUs aren't shorties, so my cache has actually grown overnight. Not enough to get a cache of CPU work (i'm always only hours from running out), but at least my cache of GPU work has stopped shrinking.
Now if they could just crank the splitters up a couple of notches...
____________
Grant
Darwin NT.

Speedy
Volunteer tester
Avatar
Send message
Joined: 26 Jun 04
Posts: 679
Credit: 5,929,851
RAC: 3,905
New Zealand
Message 1291776 - Posted: 5 Oct 2012, 21:14:25 UTC - in response to Message 1291763.


Now if they could just crank the splitters up a couple of notches...

From Tech news 2nd Oct.
However one sudden crisis at the end of the day today: the air conditioning in the building seems to have gone kaput. Our server closet is just fine (phew!) but we do have several servers not in the closet and they are burning up. We are shutting a few of the less necessary ones off for the evening. Hopefully the a/c will be fixed before too long.

I'd say this is the reason for Splitters running slow. As I type Current result creation rate is 31.2241/sec As of* 6m & to send buffer is 0
____________

Live in NZ y not join Smile City?

JohnDKProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 00
Posts: 862
Credit: 46,803,644
RAC: 75,492
Denmark
Message 1291793 - Posted: 5 Oct 2012, 21:41:44 UTC - in response to Message 1291763.

Now if they could just crank the splitters up a couple of notches...

Since the cricket is maxed as it is now, I think they should leave things as they are now.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7358
Credit: 96,904,003
RAC: 66,647
Australia
Message 1291797 - Posted: 5 Oct 2012, 21:57:07 UTC - in response to Message 1291793.

Now if they could just crank the splitters up a couple of notches...

Since the cricket is maxed as it is now, I think they should leave things as they are now.

The splitters could probably produce work except that the file "24my12ad" has been stuck at 14 for nearly 24hrs now.

Cheers.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5868
Credit: 60,598,482
RAC: 47,541
Australia
Message 1291807 - Posted: 5 Oct 2012, 22:33:06 UTC - in response to Message 1291793.
Last modified: 5 Oct 2012, 22:38:11 UTC

Now if they could just crank the splitters up a couple of notches...

Since the cricket is maxed as it is now, I think they should leave things as they are now.

The problem is that people's caches aren't being refilled. More work split will mean they will be refilled, even with the heavier network traffic (and congestion).
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5868
Credit: 60,598,482
RAC: 47,541
Australia
Message 1291810 - Posted: 5 Oct 2012, 22:37:42 UTC - in response to Message 1291776.
Last modified: 5 Oct 2012, 22:38:36 UTC

From Tech news 2nd Oct.
However one sudden crisis at the end of the day today: the air conditioning in the building seems to have gone kaput. Our server closet is just fine (phew!) but we do have several servers not in the closet and they are burning up. We are shutting a few of the less necessary ones off for the evening. Hopefully the a/c will be fixed before too long.

I'd say this is the reason for Splitters running slow. As I type Current result creation rate is 31.2241/sec As of* 6m & to send buffer is 0

I noticed that, and while those machines were shut down down none of them were used for splitting work, feeding or Scheduling. This is some other problem.
And even with the reduced availability of work, i'm still getting Scheduler timeouts.
____________
Grant
Darwin NT.

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 828
Credit: 1,571,842
RAC: 247
Germany
Message 1291943 - Posted: 6 Oct 2012, 7:53:10 UTC - in response to Message 1291807.

Now if they could just crank the splitters up a couple of notches...

Since the cricket is maxed as it is now, I think they should leave things as they are now.

The problem is that people's caches aren't being refilled. More work split will mean they will be refilled, even with the heavier network traffic (and congestion).

And how does it help them to get tasks assigned, which they can't download? I have to agree here with JohnDK, right now they are sending out exactly as much as they can, more will only make the things worse. You won't get more thru a network connection when you push it harder, you'll get less.
____________
.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5868
Credit: 60,598,482
RAC: 47,541
Australia
Message 1291945 - Posted: 6 Oct 2012, 8:03:33 UTC - in response to Message 1291943.

And how does it help them to get tasks assigned, which they can't download? I have to agree here with JohnDK, right now they are sending out exactly as much as they can, more will only make the things worse. You won't get more thru a network connection when you push it harder, you'll get less.

That's generally the case.
The interesting thing is that when there is more work available, i'm able to get it & eventually download it.
That helps reduce the load on the Scheduler because it doesn't have to deaal with nearly as many requests.

At the moment, my caches are re-filling, but at the present rate it will take a couple of weeks- if there are no outages or hiccups between now & then.
Normally, inspite of the load, even after an extended outage it usually only take 8-12 hours for my caches to be filled.
I haven't had a full cache for over 3 weeks now.
____________
Grant
Darwin NT.

Morten Ross
Volunteer tester
Avatar
Send message
Joined: 30 Apr 01
Posts: 183
Credit: 378,289,433
RAC: 0
Norway
Message 1291957 - Posted: 6 Oct 2012, 9:20:18 UTC

Something is definitely changing in regards to work distribution as I broke the 100 mark today for number of tasks assigned after a request:

06/10/2012 11:10:37 | SETI@home | Sending scheduler request: To fetch work.
06/10/2012 11:10:37 | SETI@home | Reporting 31 completed tasks, requesting new tasks for CPU and NVIDIA
06/10/2012 11:10:40 | SETI@home | Computation for task 12mr10ab.30517.20926.140733193388047.10.170_0 finished
06/10/2012 11:10:40 | SETI@home | Starting task 12mr10ab.30517.20926.140733193388047.10.158_0 using setiathome_enhanced version 610 (cuda_fermi) in slot 31
06/10/2012 11:10:43 | SETI@home | Finished upload of 12mr10ab.30517.20926.140733193388047.10.152_0_0
06/10/2012 11:10:43 | SETI@home | Started upload of 12mr10ab.30517.20926.140733193388047.10.170_0_0
06/10/2012 11:10:51 | SETI@home | Finished upload of 12mr10ab.30517.20926.140733193388047.10.170_0_0
06/10/2012 11:11:27 | SETI@home | Scheduler request completed: got 113 new tasks
____________
Morten Ross

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 828
Credit: 1,571,842
RAC: 247
Germany
Message 1291958 - Posted: 6 Oct 2012, 9:24:49 UTC - in response to Message 1291945.

And how does it help them to get tasks assigned, which they can't download? I have to agree here with JohnDK, right now they are sending out exactly as much as they can, more will only make the things worse. You won't get more thru a network connection when you push it harder, you'll get less.

That's generally the case.
The interesting thing is that when there is more work available, i'm able to get it & eventually download it.
That helps reduce the load on the Scheduler because it doesn't have to deaal with nearly as many requests.

Not if the scheduler replies get lost due to overloaded network connection like it happened recently quite often. Than the resend lost tasks thingy has to work a lot, let many VLAR tasks time out, new replacement results have to be created and send to someone else (so the scheduler has to send out the same tasks more than once). I might be wrong, but that does not sound like less load to me, it's more and not only for the scheduler.
____________
.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5868
Credit: 60,598,482
RAC: 47,541
Australia
Message 1291962 - Posted: 6 Oct 2012, 9:43:04 UTC - in response to Message 1291958.

The interesting thing is that when there is more work available, i'm able to get it & eventually download it.
That helps reduce the load on the Scheduler because it doesn't have to deaal with nearly as many requests.

Not if the scheduler replies get lost due to overloaded network connection like it happened recently quite often. Than the resend lost tasks thingy has to work a lot, let many VLAR tasks time out, new replacement results have to be created and send to someone else (so the scheduler has to send out the same tasks more than once). I might be wrong, but that does not sound like less load to me, it's more and not only for the scheduler.


The Scheduler timeouts are a major problem, and i too expect they are the major cause of Ghost WUs & resends.
However they don't appear to be related to network traffic load. I've been getting a lot of Scheduler "timeout reached" messages for the last 3 weeks, when we were having upload issues, as well as the present lack of work produced by the splitters.
Yet in the past, after multi-day outages when the ready to send buffer was 200,000+ WUs & the download speeds were lucky to be 2kB/s, Sheduler "timeout reached messages" were few & far between. Usually it was "couldn't contact Scheduler" (or similar).


Whatever the present issue is with the Scheduler timeouts, it's not due to the download traffic, and it's not due to the change in download server software as the problems were occuring before that was implemented.
____________
Grant
Darwin NT.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7358
Credit: 96,904,003
RAC: 66,647
Australia
Message 1291965 - Posted: 6 Oct 2012, 9:52:59 UTC - in response to Message 1291962.

Well I can't complain now as all 3 of my rigs have their caches back up to scratch.

Cheers.
____________

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24551
Credit: 33,885,214
RAC: 24,327
Germany
Message 1291966 - Posted: 6 Oct 2012, 9:53:08 UTC - in response to Message 1291958.

And how does it help them to get tasks assigned, which they can't download? I have to agree here with JohnDK, right now they are sending out exactly as much as they can, more will only make the things worse. You won't get more thru a network connection when you push it harder, you'll get less.

That's generally the case.
The interesting thing is that when there is more work available, i'm able to get it & eventually download it.
That helps reduce the load on the Scheduler because it doesn't have to deaal with nearly as many requests.

Not if the scheduler replies get lost due to overloaded network connection like it happened recently quite often. Than the resend lost tasks thingy has to work a lot, let many VLAR tasks time out, new replacement results have to be created and send to someone else (so the scheduler has to send out the same tasks more than once). I might be wrong, but that does not sound like less load to me, it's more and not only for the scheduler.


I totally agree on that.

____________

.clair.
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,063,663
RAC: 474
United Kingdom
Message 1291972 - Posted: 6 Oct 2012, 10:29:41 UTC - in response to Message 1291957.

Something is definitely changing in regards to work distribution as I broke the 100 mark today for number of tasks assigned after a request:

06/10/2012 11:10:37 | SETI@home | Sending scheduler request: To fetch work.
06/10/2012 11:10:37 | SETI@home | Reporting 31 completed tasks, requesting new tasks for CPU and NVIDIA
06/10/2012 11:10:40 | SETI@home | Computation for task 12mr10ab.30517.20926.140733193388047.10.170_0 finished
06/10/2012 11:10:40 | SETI@home | Starting task 12mr10ab.30517.20926.140733193388047.10.158_0 using setiathome_enhanced version 610 (cuda_fermi) in slot 31
06/10/2012 11:10:43 | SETI@home | Finished upload of 12mr10ab.30517.20926.140733193388047.10.152_0_0
06/10/2012 11:10:43 | SETI@home | Started upload of 12mr10ab.30517.20926.140733193388047.10.170_0_0
06/10/2012 11:10:51 | SETI@home | Finished upload of 12mr10ab.30517.20926.140733193388047.10.170_0_0
06/10/2012 11:11:27 | SETI@home | Scheduler request completed: got 113 new tasks

I saw a got 95 yesterday, that is the most i have ever seen in one go
i did not think it was posible to get more than 100 at a time, unless the que has been enlarged !!

Morten Ross
Volunteer tester
Avatar
Send message
Joined: 30 Apr 01
Posts: 183
Credit: 378,289,433
RAC: 0
Norway
Message 1291992 - Posted: 6 Oct 2012, 11:10:52 UTC - in response to Message 1291972.

Something is definitely changing in regards to work distribution as I broke the 100 mark today for number of tasks assigned after a request:

06/10/2012 11:10:37 | SETI@home | Sending scheduler request: To fetch work.
06/10/2012 11:10:37 | SETI@home | Reporting 31 completed tasks, requesting new tasks for CPU and NVIDIA
06/10/2012 11:10:40 | SETI@home | Computation for task 12mr10ab.30517.20926.140733193388047.10.170_0 finished
06/10/2012 11:10:40 | SETI@home | Starting task 12mr10ab.30517.20926.140733193388047.10.158_0 using setiathome_enhanced version 610 (cuda_fermi) in slot 31
06/10/2012 11:10:43 | SETI@home | Finished upload of 12mr10ab.30517.20926.140733193388047.10.152_0_0
06/10/2012 11:10:43 | SETI@home | Started upload of 12mr10ab.30517.20926.140733193388047.10.170_0_0
06/10/2012 11:10:51 | SETI@home | Finished upload of 12mr10ab.30517.20926.140733193388047.10.170_0_0
06/10/2012 11:11:27 | SETI@home | Scheduler request completed: got 113 new tasks

I saw a got 95 yesterday, that is the most i have ever seen in one go
i did not think it was posible to get more than 100 at a time, unless the que has been enlarged !!

I've just maxed out at 131:
06/10/2012 12:58:34 | SETI@home | Sending scheduler request: To fetch work.
06/10/2012 12:58:34 | SETI@home | Requesting new tasks for CPU
06/10/2012 13:00:02 | SETI@home | Scheduler request completed: got 131 new tasks

____________
Morten Ross

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 828
Credit: 1,571,842
RAC: 247
Germany
Message 1291995 - Posted: 6 Oct 2012, 11:19:18 UTC - in response to Message 1291962.

The Scheduler timeouts are a major problem, and i too expect they are the major cause of Ghost WUs & resends.
However they don't appear to be related to network traffic load. I've been getting a lot of Scheduler "timeout reached" messages for the last 3 weeks, when we were having upload issues, as well as the present lack of work produced by the splitters.
Yet in the past, after multi-day outages when the ready to send buffer was 200,000+ WUs & the download speeds were lucky to be 2kB/s, Sheduler "timeout reached messages" were few & far between. Usually it was "couldn't contact Scheduler" (or similar).


Whatever the present issue is with the Scheduler timeouts, it's not due to the download traffic, and it's not due to the change in download server software as the problems were occuring before that was implemented.

Well, from here we can only guess, what the current reason is, but the general rule of thumb is not to push the entire system harder than it's slowest part (here the 100Mbit connection to the outside world) can take, maybe even a bit less than that. There it usually will work best.
____________
.

fscheel
Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1291998 - Posted: 6 Oct 2012, 11:30:38 UTC

Is there a utility or something that will easily show me how many and what type tasks are presently on my machine?

Profile Fred E.Project donor
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,139,004
RAC: 1
United States
Message 1292000 - Posted: 6 Oct 2012, 11:38:02 UTC
Last modified: 6 Oct 2012, 11:48:25 UTC

Is there a utility or something that will easily show me how many and what type tasks are presently on my machine?

BOINC Tasks will do that and more - you can monitor all your computers from one. It doesn't replace BOINC Manager, just provides a better user interface with task counts, sum of estimated completion times, etc.
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 23 · Next

Message boards : Number crunching : Panic Mode On (77) Server Problems?

Copyright © 2014 University of California