Panic Mode On (38) Server problems

Message boards : Number crunching : Panic Mode On (38) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

AuthorMessage
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1034388 - Posted: 19 Sep 2010, 8:21:44 UTC - in response to Message 1034379.  

In an earlier discussion of the causes of ghosts, either Joe or Claggy were talking about corrupted sched_request_ack_xml(?) files. And then today we had the situation where a sched_request_xml file was so big, with 2700 Tasks reporting, that it would not go through to the server, and had to be emailed to Dr. A for manual upload.

It just seems to me that the larger these files are, the more likely they are to get "stuck" in the pipe during max traffic, and that reducing their size might be a partial solution.

Donald
Infernal Optimist / Submariner, retired
ID: 1034388 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1034394 - Posted: 19 Sep 2010, 8:36:38 UTC - in response to Message 1034388.  

In an earlier discussion of the causes of ghosts, either Joe or Claggy were talking about corrupted sched_request_ack_xml(?) files. And then today we had the situation where a sched_request_xml file was so big, with 2700 Tasks reporting, that it would not go through to the server, and had to be emailed to Dr. A for manual upload.

It just seems to me that the larger these files are, the more likely they are to get "stuck" in the pipe during max traffic, and that reducing their size might be a partial solution.

You could be right.....and so far as I know, Vyper has still not been able to report his results successfully.......pending whatever solution DA may or may not be able to come up with.
Still awaiting news on that saga.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1034394 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1034399 - Posted: 19 Sep 2010, 8:49:09 UTC - in response to Message 1034394.  

In an earlier discussion of the causes of ghosts, either Joe or Claggy were talking about corrupted sched_request_ack_xml(?) files. And then today we had the situation where a sched_request_xml file was so big, with 2700 Tasks reporting, that it would not go through to the server, and had to be emailed to Dr. A for manual upload.

It just seems to me that the larger these files are, the more likely they are to get "stuck" in the pipe during max traffic, and that reducing their size might be a partial solution.

You could be right.....and so far as I know, Vyper has still not been able to report his results successfully.......pending whatever solution DA may or may not be able to come up with.
Still awaiting news on that saga.


Since edits don't carry over into reply quotes, let the record show that it was 8448 Tasks reporting, not 2700, and that the file size was 31 MB. Unusually large, even after an extended outage, but still.. too large a file going either direction could get caught or be corrupted.

Donald
Infernal Optimist / Submariner, retired
ID: 1034399 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1034440 - Posted: 19 Sep 2010, 11:39:46 UTC - in response to Message 1034388.  
Last modified: 19 Sep 2010, 12:29:00 UTC

In an earlier discussion of the causes of ghosts, either Joe or Claggy were talking about corrupted sched_request_ack_xml(?) files. And then today we had the situation where a sched_request_xml file was so big, with 2700 Tasks reporting, that it would not go through to the server, and had to be emailed to Dr. A for manual upload.

It just seems to me that the larger these files are, the more likely they are to get "stuck" in the pipe during max traffic, and that reducing their size might be a partial solution.


The Ghosts i received since the servers came up this week, were from replies with 55, 29, 48 and 27 tasks respectively,
i also think limiting the tasks in the sched_reply's might help, not sure it would be a full fix though, as the Server would now have to process more requests,

and my last set from last week were 14 Astropulse tasks, the problem here is that Astropulse tasks are getting sent out in cycles, as in, you can't get any for hours,
then you get 20 or more in one request, just after everyone else has got theirs too, at which point download speeds have already dropped,

Claggy
ID: 1034440 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1034445 - Posted: 19 Sep 2010, 11:56:25 UTC - in response to Message 1034388.  

..and had to be emailed to Dr. A for manual upload.

When David is asking for such a file to be sent to him, he won't manually upload the contents of said file into the database or anything. He'll be using it to test with what happens to it - checking the backend, see what it does - when he tries to send such a file.

At the end of the day, the user that has the problem will still need to report all those tasks himself; or at least, his BOINC needs to do so.
ID: 1034445 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1034482 - Posted: 19 Sep 2010, 14:31:42 UTC - in response to Message 1034352.  

Yes, limits of 40/320 are in place, but im wondering more about this:

SETI@home 19.09.2010 08:38:50 Scheduler request completed: got 58 new tasks

Thought, there was also a limit of about 20 per request? Seems, this doesn't apply any more.

Yes, the project's config.xml used to have max_wus_to_send set to 20, and that used to be applied directly. But changeset [trac]changeset:18255[/trac] revised it so the number is now scaled by number of CPUs and GPUs (*gpu_multiplier). The 20 is probably still there, but is being treated as 20 per CPU and 160 per GPU.

There are only 100 feeder slots so that's the absolute maximum which could be sent for one request, and for new work there are pairs of tasks for each WU. Getting more than 50 means some are unpaired reissues (_2 and up).
                                                                Joe
ID: 1034482 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1034549 - Posted: 19 Sep 2010, 18:47:10 UTC


Probably time to up the server side limits.
Network traffic has finally eased off (apart from the AP bursts), but the inbound traffic has been steadily climbing- most likely from hosts that have reached the Server Side limit & still trying for work.
Grant
Darwin NT
ID: 1034549 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1034550 - Posted: 19 Sep 2010, 18:48:59 UTC - in response to Message 1034549.  


Probably time to up the server side limits.
Network traffic has finally eased off (apart from the AP bursts), but the inbound traffic has been steadily climbing- most likely from hosts that have reached the Server Side limit & still trying for work.

I said that hours ago before I passed out, my friend.

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1034550 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1034644 - Posted: 20 Sep 2010, 0:20:44 UTC

I remember the first time I noticed I could get more than 20 tasks in one request. It was nice when I needed to get 150-225 tasks to fill the cache back when I was doing MB work. Used to take a ton of requests, but then it would only take 4-6 requests instead of 10+.

Presently, my record for most APs in one request is 23. Just about all of the requests are 1-3 though, but every now and then, I can get 5-15 at a time.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1034644 · Report as offensive
Blake Bonkofsky
Volunteer tester
Avatar

Send message
Joined: 29 Dec 99
Posts: 617
Credit: 46,383,149
RAC: 0
United States
Message 1034647 - Posted: 20 Sep 2010, 0:31:13 UTC

Looks like the limits have been raised up, as I have picked up 250ish WU's within the last few hours. I had been banging off of the limit all weekend.
ID: 1034647 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1034713 - Posted: 20 Sep 2010, 3:09:30 UTC
Last modified: 20 Sep 2010, 3:09:55 UTC

New message from the server !!

20/09/2010 12:21:27 SETI@home Message from server: Resent lost task 24ap10ae.5264.1058.11.10.50_2

Looks like some thing has finally been done about the ghost problem - Yeehaa

T.A.
ID: 1034713 · Report as offensive
Profile Zeus Fab3r
Avatar

Send message
Joined: 17 Jan 01
Posts: 649
Credit: 275,335,635
RAC: 597
Serbia
Message 1034717 - Posted: 20 Sep 2010, 3:14:53 UTC - in response to Message 1034713.  
Last modified: 20 Sep 2010, 4:01:41 UTC

YES indeed !!!!!!!

20-Sep-10 05:12:27	SETI@home	Scheduler request completed: got 20 new tasks
20-Sep-10 05:12:27	SETI@home	[sched_op_debug] Server version 611
20-Sep-10 05:12:27	SETI@home	Message from server: Resent lost task D%my10adep23.24198.12.10.222_2
20-Sep-10 05:12:27	SETI@home	Message from server: Resent lost task 31dc09af.17708.8411.12.10.90_3
20-Sep-10 05:12:27	SETI@home	Message from server: Resent lost task 31dc09af.17708.8411.12.10.102_2


Edit: But something is very wrong... I was tracking 26 ghosts after scheduler request timeout when I saw that message, but instead of those 26 I just got 20 tasks that are not on my rig's list at all ?! So, will I crunch someone else's wu's in the near future?

Edit 2: Things are going from bad to worse. I just got a bunch of fresh .vlars on my GPU ;-< Task 1712880409

Who the hell is General Failure and why is he reading my harddisk?¿
ID: 1034717 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1034829 - Posted: 20 Sep 2010, 13:45:44 UTC - in response to Message 1034717.  
Last modified: 20 Sep 2010, 13:47:08 UTC

YES indeed !!!!!!!

20-Sep-10 05:12:27	SETI@home	Scheduler request completed: got 20 new tasks
20-Sep-10 05:12:27	SETI@home	[sched_op_debug] Server version 611
20-Sep-10 05:12:27	SETI@home	Message from server: Resent lost task D%my10adep23.24198.12.10.222_2
20-Sep-10 05:12:27	SETI@home	Message from server: Resent lost task 31dc09af.17708.8411.12.10.90_3
20-Sep-10 05:12:27	SETI@home	Message from server: Resent lost task 31dc09af.17708.8411.12.10.102_2


Edit: But something is very wrong... I was tracking 26 ghosts after scheduler request timeout when I saw that message, but instead of those 26 I just got 20 tasks that are not on my rig's list at all ?! So, will I crunch someone else's wu's in the near future?

Edit 2: Things are going from bad to worse. I just got a bunch of fresh .vlars on my GPU ;-< Task 1712880409


Ohh.. well.. - a new feature enabled and (unintentional) in the same time an other old feature disabled.. :-(
ID: 1034829 · Report as offensive
Profile Helli_retiered
Volunteer tester
Avatar

Send message
Joined: 15 Dec 99
Posts: 707
Credit: 108,785,585
RAC: 0
Germany
Message 1034831 - Posted: 20 Sep 2010, 13:56:16 UTC - in response to Message 1034647.  

Looks like the limits have been raised up, as I have picked up 250ish WU's within the last few hours. I had been banging off of the limit all weekend.


Yes, fine. But as always for me - as a Power Cruncher - very late. I'm not
sure that i can download >4,000 workunits in the next 24 Hours. :-|

Helli
A loooong time ago: First Credits after SETI@home Restart
ID: 1034831 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6652
Credit: 121,090,076
RAC: 0
United States
Message 1034848 - Posted: 20 Sep 2010, 14:48:46 UTC - in response to Message 1034831.  

Looks like the limits have been raised up, as I have picked up 250ish WU's within the last few hours. I had been banging off of the limit all weekend.


Yes, fine. But as always for me - as a Power Cruncher - very late. I'm not
sure that i can download >4,000 workunits in the next 24 Hours. :-|

Helli


I have that problem too. It is very difficult to download a lot of work at what turns out to be the last minute. You really have to stay with your rig and baby sit it. I hope the new server will allow things to run with higher limits earlier on.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1034848 · Report as offensive
Profile Helli_retiered
Volunteer tester
Avatar

Send message
Joined: 15 Dec 99
Posts: 707
Credit: 108,785,585
RAC: 0
Germany
Message 1034850 - Posted: 20 Sep 2010, 14:54:11 UTC - in response to Message 1034848.  
Last modified: 20 Sep 2010, 14:54:33 UTC

You said it right, Steve. I sit here (maybe the next 4 hours) and push the Retry-Button because
i can't wait 11:32:45 Hours for the next automatic connect. ;-)

Helli
A loooong time ago: First Credits after SETI@home Restart
ID: 1034850 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1034865 - Posted: 20 Sep 2010, 15:41:37 UTC - in response to Message 1034850.  

My machines will DL to tomorrow to the outage.. and with bad luck during the outage they will have a lot of backlogged DLs..

I have only DSL light (384/64 kbit/s).. :-(

For example, my GPU machine need at least 5 hours after every 3 day outage to UL all results of 3 days.. and during this time no work request..

After the 940 BE made the UL, then I enable the network for the E7600 machine..

Everyone have his own specially problems.. ;-)

ID: 1034865 · Report as offensive
Profile Helli_retiered
Volunteer tester
Avatar

Send message
Joined: 15 Dec 99
Posts: 707
Credit: 108,785,585
RAC: 0
Germany
Message 1034872 - Posted: 20 Sep 2010, 16:09:15 UTC - in response to Message 1034865.  
Last modified: 20 Sep 2010, 16:10:03 UTC

...Everyone have his own specially problems.. ;-)


Yupp.

And i don't think that Oscar makes any Difference on this Situation.
IMHO the 100MBit Line in Combination with this 3-Day-in-7-Days Outage
is the Bottleneck. Maybe a 5-Day-in-14-Days Outage would be a better Choice for us.

Helli
A loooong time ago: First Credits after SETI@home Restart
ID: 1034872 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6652
Credit: 121,090,076
RAC: 0
United States
Message 1034887 - Posted: 20 Sep 2010, 16:42:49 UTC - in response to Message 1034872.  

...Everyone have his own specially problems.. ;-)


Yupp.

And i don't think that Oscar makes any Difference on this Situation.
IMHO the 100MBit Line in Combination with this 3-Day-in-7-Days Outage
is the Bottleneck. Maybe a 5-Day-in-14-Days Outage would be a better Choice for us.

Helli


I was thinking that after they shuffle the servers around, and add Oscar, the need for an extended down time might be some what aleviated. I am thinking that 96 Gig of RAM can process a whole lot of science. I am most likely wrong, but if the outage is so NTPC can work while not being bombarded with new files, then the ability to handle the files and do science may reduce the need for down time. Actually, I don't really know.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1034887 · Report as offensive
Profile X-Files 27
Avatar

Send message
Joined: 17 May 99
Posts: 104
Credit: 111,191,433
RAC: 0
Canada
Message 1034888 - Posted: 20 Sep 2010, 16:43:21 UTC

2 parts is broken:

1st) .VLAR are being sent to GPU

2nd) Prefs of "Use CPU" = No, are getting cpu tasks.
ID: 1034888 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (38) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.