Panic Mode On (77) Server Problems?

Message boards : Number crunching : Panic Mode On (77) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 22 · Next

AuthorMessage
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1639
Credit: 12,921,799
RAC: 89
New Zealand
Message 1291776 - Posted: 5 Oct 2012, 21:14:25 UTC - in response to Message 1291763.  


Now if they could just crank the splitters up a couple of notches...

From Tech news 2nd Oct.
However one sudden crisis at the end of the day today: the air conditioning in the building seems to have gone kaput. Our server closet is just fine (phew!) but we do have several servers not in the closet and they are burning up. We are shutting a few of the less necessary ones off for the evening. Hopefully the a/c will be fixed before too long.

I'd say this is the reason for Splitters running slow. As I type Current result creation rate is 31.2241/sec As of* 6m & to send buffer is 0
ID: 1291776 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1291793 - Posted: 5 Oct 2012, 21:41:44 UTC - in response to Message 1291763.  

Now if they could just crank the splitters up a couple of notches...

Since the cricket is maxed as it is now, I think they should leave things as they are now.
ID: 1291793 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1291797 - Posted: 5 Oct 2012, 21:57:07 UTC - in response to Message 1291793.  

Now if they could just crank the splitters up a couple of notches...

Since the cricket is maxed as it is now, I think they should leave things as they are now.

The splitters could probably produce work except that the file "24my12ad" has been stuck at 14 for nearly 24hrs now.

Cheers.
ID: 1291797 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1291807 - Posted: 5 Oct 2012, 22:33:06 UTC - in response to Message 1291793.  
Last modified: 5 Oct 2012, 22:38:11 UTC

Now if they could just crank the splitters up a couple of notches...

Since the cricket is maxed as it is now, I think they should leave things as they are now.

The problem is that people's caches aren't being refilled. More work split will mean they will be refilled, even with the heavier network traffic (and congestion).
Grant
Darwin NT
ID: 1291807 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1291810 - Posted: 5 Oct 2012, 22:37:42 UTC - in response to Message 1291776.  
Last modified: 5 Oct 2012, 22:38:36 UTC

From Tech news 2nd Oct.
However one sudden crisis at the end of the day today: the air conditioning in the building seems to have gone kaput. Our server closet is just fine (phew!) but we do have several servers not in the closet and they are burning up. We are shutting a few of the less necessary ones off for the evening. Hopefully the a/c will be fixed before too long.

I'd say this is the reason for Splitters running slow. As I type Current result creation rate is 31.2241/sec As of* 6m & to send buffer is 0

I noticed that, and while those machines were shut down down none of them were used for splitting work, feeding or Scheduling. This is some other problem.
And even with the reduced availability of work, i'm still getting Scheduler timeouts.
Grant
Darwin NT
ID: 1291810 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1291943 - Posted: 6 Oct 2012, 7:53:10 UTC - in response to Message 1291807.  

Now if they could just crank the splitters up a couple of notches...

Since the cricket is maxed as it is now, I think they should leave things as they are now.

The problem is that people's caches aren't being refilled. More work split will mean they will be refilled, even with the heavier network traffic (and congestion).

And how does it help them to get tasks assigned, which they can't download? I have to agree here with JohnDK, right now they are sending out exactly as much as they can, more will only make the things worse. You won't get more thru a network connection when you push it harder, you'll get less.
ID: 1291943 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1291945 - Posted: 6 Oct 2012, 8:03:33 UTC - in response to Message 1291943.  

And how does it help them to get tasks assigned, which they can't download? I have to agree here with JohnDK, right now they are sending out exactly as much as they can, more will only make the things worse. You won't get more thru a network connection when you push it harder, you'll get less.

That's generally the case.
The interesting thing is that when there is more work available, i'm able to get it & eventually download it.
That helps reduce the load on the Scheduler because it doesn't have to deaal with nearly as many requests.

At the moment, my caches are re-filling, but at the present rate it will take a couple of weeks- if there are no outages or hiccups between now & then.
Normally, inspite of the load, even after an extended outage it usually only take 8-12 hours for my caches to be filled.
I haven't had a full cache for over 3 weeks now.
Grant
Darwin NT
ID: 1291945 · Report as offensive
Morten Ross
Volunteer tester
Avatar

Send message
Joined: 30 Apr 01
Posts: 183
Credit: 385,664,915
RAC: 0
Norway
Message 1291957 - Posted: 6 Oct 2012, 9:20:18 UTC

Something is definitely changing in regards to work distribution as I broke the 100 mark today for number of tasks assigned after a request:

06/10/2012 11:10:37 | SETI@home | Sending scheduler request: To fetch work.
06/10/2012 11:10:37 | SETI@home | Reporting 31 completed tasks, requesting new tasks for CPU and NVIDIA
06/10/2012 11:10:40 | SETI@home | Computation for task 12mr10ab.30517.20926.140733193388047.10.170_0 finished
06/10/2012 11:10:40 | SETI@home | Starting task 12mr10ab.30517.20926.140733193388047.10.158_0 using setiathome_enhanced version 610 (cuda_fermi) in slot 31
06/10/2012 11:10:43 | SETI@home | Finished upload of 12mr10ab.30517.20926.140733193388047.10.152_0_0
06/10/2012 11:10:43 | SETI@home | Started upload of 12mr10ab.30517.20926.140733193388047.10.170_0_0
06/10/2012 11:10:51 | SETI@home | Finished upload of 12mr10ab.30517.20926.140733193388047.10.170_0_0
06/10/2012 11:11:27 | SETI@home | Scheduler request completed: got 113 new tasks
Morten Ross
ID: 1291957 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1291958 - Posted: 6 Oct 2012, 9:24:49 UTC - in response to Message 1291945.  

And how does it help them to get tasks assigned, which they can't download? I have to agree here with JohnDK, right now they are sending out exactly as much as they can, more will only make the things worse. You won't get more thru a network connection when you push it harder, you'll get less.

That's generally the case.
The interesting thing is that when there is more work available, i'm able to get it & eventually download it.
That helps reduce the load on the Scheduler because it doesn't have to deaal with nearly as many requests.

Not if the scheduler replies get lost due to overloaded network connection like it happened recently quite often. Than the resend lost tasks thingy has to work a lot, let many VLAR tasks time out, new replacement results have to be created and send to someone else (so the scheduler has to send out the same tasks more than once). I might be wrong, but that does not sound like less load to me, it's more and not only for the scheduler.
ID: 1291958 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1291962 - Posted: 6 Oct 2012, 9:43:04 UTC - in response to Message 1291958.  

The interesting thing is that when there is more work available, i'm able to get it & eventually download it.
That helps reduce the load on the Scheduler because it doesn't have to deaal with nearly as many requests.

Not if the scheduler replies get lost due to overloaded network connection like it happened recently quite often. Than the resend lost tasks thingy has to work a lot, let many VLAR tasks time out, new replacement results have to be created and send to someone else (so the scheduler has to send out the same tasks more than once). I might be wrong, but that does not sound like less load to me, it's more and not only for the scheduler.


The Scheduler timeouts are a major problem, and i too expect they are the major cause of Ghost WUs & resends.
However they don't appear to be related to network traffic load. I've been getting a lot of Scheduler "timeout reached" messages for the last 3 weeks, when we were having upload issues, as well as the present lack of work produced by the splitters.
Yet in the past, after multi-day outages when the ready to send buffer was 200,000+ WUs & the download speeds were lucky to be 2kB/s, Sheduler "timeout reached messages" were few & far between. Usually it was "couldn't contact Scheduler" (or similar).


Whatever the present issue is with the Scheduler timeouts, it's not due to the download traffic, and it's not due to the change in download server software as the problems were occuring before that was implemented.
Grant
Darwin NT
ID: 1291962 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1291965 - Posted: 6 Oct 2012, 9:52:59 UTC - in response to Message 1291962.  

Well I can't complain now as all 3 of my rigs have their caches back up to scratch.

Cheers.
ID: 1291965 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1291966 - Posted: 6 Oct 2012, 9:53:08 UTC - in response to Message 1291958.  

And how does it help them to get tasks assigned, which they can't download? I have to agree here with JohnDK, right now they are sending out exactly as much as they can, more will only make the things worse. You won't get more thru a network connection when you push it harder, you'll get less.

That's generally the case.
The interesting thing is that when there is more work available, i'm able to get it & eventually download it.
That helps reduce the load on the Scheduler because it doesn't have to deaal with nearly as many requests.

Not if the scheduler replies get lost due to overloaded network connection like it happened recently quite often. Than the resend lost tasks thingy has to work a lot, let many VLAR tasks time out, new replacement results have to be created and send to someone else (so the scheduler has to send out the same tasks more than once). I might be wrong, but that does not sound like less load to me, it's more and not only for the scheduler.


I totally agree on that.



With each crime and every kindness we birth our future.
ID: 1291966 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1291972 - Posted: 6 Oct 2012, 10:29:41 UTC - in response to Message 1291957.  

Something is definitely changing in regards to work distribution as I broke the 100 mark today for number of tasks assigned after a request:

06/10/2012 11:10:37 | SETI@home | Sending scheduler request: To fetch work.
06/10/2012 11:10:37 | SETI@home | Reporting 31 completed tasks, requesting new tasks for CPU and NVIDIA
06/10/2012 11:10:40 | SETI@home | Computation for task 12mr10ab.30517.20926.140733193388047.10.170_0 finished
06/10/2012 11:10:40 | SETI@home | Starting task 12mr10ab.30517.20926.140733193388047.10.158_0 using setiathome_enhanced version 610 (cuda_fermi) in slot 31
06/10/2012 11:10:43 | SETI@home | Finished upload of 12mr10ab.30517.20926.140733193388047.10.152_0_0
06/10/2012 11:10:43 | SETI@home | Started upload of 12mr10ab.30517.20926.140733193388047.10.170_0_0
06/10/2012 11:10:51 | SETI@home | Finished upload of 12mr10ab.30517.20926.140733193388047.10.170_0_0
06/10/2012 11:11:27 | SETI@home | Scheduler request completed: got 113 new tasks

I saw a got 95 yesterday, that is the most i have ever seen in one go
i did not think it was posible to get more than 100 at a time, unless the que has been enlarged !!
ID: 1291972 · Report as offensive
Morten Ross
Volunteer tester
Avatar

Send message
Joined: 30 Apr 01
Posts: 183
Credit: 385,664,915
RAC: 0
Norway
Message 1291992 - Posted: 6 Oct 2012, 11:10:52 UTC - in response to Message 1291972.  

Something is definitely changing in regards to work distribution as I broke the 100 mark today for number of tasks assigned after a request:

06/10/2012 11:10:37 | SETI@home | Sending scheduler request: To fetch work.
06/10/2012 11:10:37 | SETI@home | Reporting 31 completed tasks, requesting new tasks for CPU and NVIDIA
06/10/2012 11:10:40 | SETI@home | Computation for task 12mr10ab.30517.20926.140733193388047.10.170_0 finished
06/10/2012 11:10:40 | SETI@home | Starting task 12mr10ab.30517.20926.140733193388047.10.158_0 using setiathome_enhanced version 610 (cuda_fermi) in slot 31
06/10/2012 11:10:43 | SETI@home | Finished upload of 12mr10ab.30517.20926.140733193388047.10.152_0_0
06/10/2012 11:10:43 | SETI@home | Started upload of 12mr10ab.30517.20926.140733193388047.10.170_0_0
06/10/2012 11:10:51 | SETI@home | Finished upload of 12mr10ab.30517.20926.140733193388047.10.170_0_0
06/10/2012 11:11:27 | SETI@home | Scheduler request completed: got 113 new tasks

I saw a got 95 yesterday, that is the most i have ever seen in one go
i did not think it was posible to get more than 100 at a time, unless the que has been enlarged !!

I've just maxed out at 131:
06/10/2012 12:58:34 | SETI@home | Sending scheduler request: To fetch work.
06/10/2012 12:58:34 | SETI@home | Requesting new tasks for CPU
06/10/2012 13:00:02 | SETI@home | Scheduler request completed: got 131 new tasks

Morten Ross
ID: 1291992 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1291995 - Posted: 6 Oct 2012, 11:19:18 UTC - in response to Message 1291962.  

The Scheduler timeouts are a major problem, and i too expect they are the major cause of Ghost WUs & resends.
However they don't appear to be related to network traffic load. I've been getting a lot of Scheduler "timeout reached" messages for the last 3 weeks, when we were having upload issues, as well as the present lack of work produced by the splitters.
Yet in the past, after multi-day outages when the ready to send buffer was 200,000+ WUs & the download speeds were lucky to be 2kB/s, Sheduler "timeout reached messages" were few & far between. Usually it was "couldn't contact Scheduler" (or similar).


Whatever the present issue is with the Scheduler timeouts, it's not due to the download traffic, and it's not due to the change in download server software as the problems were occuring before that was implemented.

Well, from here we can only guess, what the current reason is, but the general rule of thumb is not to push the entire system harder than it's slowest part (here the 100Mbit connection to the outside world) can take, maybe even a bit less than that. There it usually will work best.
ID: 1291995 · Report as offensive
fscheel

Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1291998 - Posted: 6 Oct 2012, 11:30:38 UTC

Is there a utility or something that will easily show me how many and what type tasks are presently on my machine?

ID: 1291998 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1292000 - Posted: 6 Oct 2012, 11:38:02 UTC
Last modified: 6 Oct 2012, 11:48:25 UTC

Is there a utility or something that will easily show me how many and what type tasks are presently on my machine?

BOINC Tasks will do that and more - you can monitor all your computers from one. It doesn't replace BOINC Manager, just provides a better user interface with task counts, sum of estimated completion times, etc.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1292000 · Report as offensive
fscheel

Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1292005 - Posted: 6 Oct 2012, 11:55:44 UTC - in response to Message 1292000.  

Is there a utility or something that will easily show me how many and what type tasks are presently on my machine?

BOINC Tasks will do that and more - you can monitor all your computers from one. It doesn't replace BOINC Manager, just provides a better user interface with task counts, sum of estimated completion times, etc.


Thanks.. Will give that a try.
ID: 1292005 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1292010 - Posted: 6 Oct 2012, 12:06:08 UTC

Is there a utility or something that will easily show me how many and what type tasks are presently on my machine?

BOINC Tasks will do that and more - you can monitor all your computers from one. It doesn't replace BOINC Manager, just provides a better user interface with task counts, sum of estimated completion times, etc.

Thanks.. Will give that a try.

This page shows how to set it up for multiple computers and has some more screenshots.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1292010 · Report as offensive
Profile Tim
Volunteer tester
Avatar

Send message
Joined: 19 May 99
Posts: 211
Credit: 278,575,259
RAC: 0
Greece
Message 1292068 - Posted: 6 Oct 2012, 15:38:27 UTC

I finally did it. I manage to finish a daily quota... :-)


6/10/2012 6:32:54 μμ SETI@home Scheduler request completed: got 0 new tasks
6/10/2012 6:32:54 μμ SETI@home Message from server: No tasks sent
6/10/2012 6:32:54 μμ SETI@home Message from server: No tasks are available for SETI@home Enhanced
6/10/2012 6:32:54 μμ SETI@home Message from server: This computer has finished a daily quota of 1 tasks
ID: 1292068 · Report as offensive
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (77) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.