Message boards :
Number crunching :
Panic Mode On (38) Server problems
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next
Author | Message |
---|---|
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
In an earlier discussion of the causes of ghosts, either Joe or Claggy were talking about corrupted sched_request_ack_xml(?) files. And then today we had the situation where a sched_request_xml file was so big, with 2700 Tasks reporting, that it would not go through to the server, and had to be emailed to Dr. A for manual upload. It just seems to me that the larger these files are, the more likely they are to get "stuck" in the pipe during max traffic, and that reducing their size might be a partial solution. Donald Infernal Optimist / Submariner, retired |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
In an earlier discussion of the causes of ghosts, either Joe or Claggy were talking about corrupted sched_request_ack_xml(?) files. And then today we had the situation where a sched_request_xml file was so big, with 2700 Tasks reporting, that it would not go through to the server, and had to be emailed to Dr. A for manual upload. You could be right.....and so far as I know, Vyper has still not been able to report his results successfully.......pending whatever solution DA may or may not be able to come up with. Still awaiting news on that saga. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
In an earlier discussion of the causes of ghosts, either Joe or Claggy were talking about corrupted sched_request_ack_xml(?) files. And then today we had the situation where a sched_request_xml file was so big, with 2700 Tasks reporting, that it would not go through to the server, and had to be emailed to Dr. A for manual upload. Since edits don't carry over into reply quotes, let the record show that it was 8448 Tasks reporting, not 2700, and that the file size was 31 MB. Unusually large, even after an extended outage, but still.. too large a file going either direction could get caught or be corrupted. Donald Infernal Optimist / Submariner, retired |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
In an earlier discussion of the causes of ghosts, either Joe or Claggy were talking about corrupted sched_request_ack_xml(?) files. And then today we had the situation where a sched_request_xml file was so big, with 2700 Tasks reporting, that it would not go through to the server, and had to be emailed to Dr. A for manual upload. The Ghosts i received since the servers came up this week, were from replies with 55, 29, 48 and 27 tasks respectively, i also think limiting the tasks in the sched_reply's might help, not sure it would be a full fix though, as the Server would now have to process more requests, and my last set from last week were 14 Astropulse tasks, the problem here is that Astropulse tasks are getting sent out in cycles, as in, you can't get any for hours, then you get 20 or more in one request, just after everyone else has got theirs too, at which point download speeds have already dropped, Claggy |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
..and had to be emailed to Dr. A for manual upload. When David is asking for such a file to be sent to him, he won't manually upload the contents of said file into the database or anything. He'll be using it to test with what happens to it - checking the backend, see what it does - when he tries to send such a file. At the end of the day, the user that has the problem will still need to report all those tasks himself; or at least, his BOINC needs to do so. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Yes, limits of 40/320 are in place, but im wondering more about this: Yes, the project's config.xml used to have max_wus_to_send set to 20, and that used to be applied directly. But changeset [trac]changeset:18255[/trac] revised it so the number is now scaled by number of CPUs and GPUs (*gpu_multiplier). The 20 is probably still there, but is being treated as 20 per CPU and 160 per GPU. There are only 100 feeder slots so that's the absolute maximum which could be sent for one request, and for new work there are pairs of tasks for each WU. Getting more than 50 means some are unpaired reissues (_2 and up). Joe |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Probably time to up the server side limits. Network traffic has finally eased off (apart from the AP bursts), but the inbound traffic has been steadily climbing- most likely from hosts that have reached the Server Side limit & still trying for work. Grant Darwin NT |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
I said that hours ago before I passed out, my friend. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I remember the first time I noticed I could get more than 20 tasks in one request. It was nice when I needed to get 150-225 tasks to fill the cache back when I was doing MB work. Used to take a ton of requests, but then it would only take 4-6 requests instead of 10+. Presently, my record for most APs in one request is 23. Just about all of the requests are 1-3 though, but every now and then, I can get 5-15 at a time. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Blake Bonkofsky Send message Joined: 29 Dec 99 Posts: 617 Credit: 46,383,149 RAC: 0 |
Looks like the limits have been raised up, as I have picked up 250ish WU's within the last few hours. I had been banging off of the limit all weekend. |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
New message from the server !! 20/09/2010 12:21:27 SETI@home Message from server: Resent lost task 24ap10ae.5264.1058.11.10.50_2 Looks like some thing has finally been done about the ghost problem - Yeehaa T.A. |
Zeus Fab3r Send message Joined: 17 Jan 01 Posts: 649 Credit: 275,335,635 RAC: 597 |
YES indeed !!!!!!! 20-Sep-10 05:12:27 SETI@home Scheduler request completed: got 20 new tasks 20-Sep-10 05:12:27 SETI@home [sched_op_debug] Server version 611 20-Sep-10 05:12:27 SETI@home Message from server: Resent lost task D%my10adep23.24198.12.10.222_2 20-Sep-10 05:12:27 SETI@home Message from server: Resent lost task 31dc09af.17708.8411.12.10.90_3 20-Sep-10 05:12:27 SETI@home Message from server: Resent lost task 31dc09af.17708.8411.12.10.102_2 Edit: But something is very wrong... I was tracking 26 ghosts after scheduler request timeout when I saw that message, but instead of those 26 I just got 20 tasks that are not on my rig's list at all ?! So, will I crunch someone else's wu's in the near future? Edit 2: Things are going from bad to worse. I just got a bunch of fresh .vlars on my GPU ;-< Task 1712880409 Who the hell is General Failure and why is he reading my harddisk?¿ |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
YES indeed !!!!!!! Ohh.. well.. - a new feature enabled and (unintentional) in the same time an other old feature disabled.. :-( |
Helli_retiered Send message Joined: 15 Dec 99 Posts: 707 Credit: 108,785,585 RAC: 0 |
Looks like the limits have been raised up, as I have picked up 250ish WU's within the last few hours. I had been banging off of the limit all weekend. Yes, fine. But as always for me - as a Power Cruncher - very late. I'm not sure that i can download >4,000 workunits in the next 24 Hours. :-| Helli A loooong time ago: First Credits after SETI@home Restart |
SciManStev Send message Joined: 20 Jun 99 Posts: 6652 Credit: 121,090,076 RAC: 0 |
Looks like the limits have been raised up, as I have picked up 250ish WU's within the last few hours. I had been banging off of the limit all weekend. I have that problem too. It is very difficult to download a lot of work at what turns out to be the last minute. You really have to stay with your rig and baby sit it. I hope the new server will allow things to run with higher limits earlier on. Steve Warning, addicted to SETI crunching! Crunching as a member of GPU Users Group. GPUUG Website |
Helli_retiered Send message Joined: 15 Dec 99 Posts: 707 Credit: 108,785,585 RAC: 0 |
You said it right, Steve. I sit here (maybe the next 4 hours) and push the Retry-Button because i can't wait 11:32:45 Hours for the next automatic connect. ;-) Helli A loooong time ago: First Credits after SETI@home Restart |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
My machines will DL to tomorrow to the outage.. and with bad luck during the outage they will have a lot of backlogged DLs.. I have only DSL light (384/64 kbit/s).. :-( For example, my GPU machine need at least 5 hours after every 3 day outage to UL all results of 3 days.. and during this time no work request.. After the 940 BE made the UL, then I enable the network for the E7600 machine.. Everyone have his own specially problems.. ;-) |
Helli_retiered Send message Joined: 15 Dec 99 Posts: 707 Credit: 108,785,585 RAC: 0 |
...Everyone have his own specially problems.. ;-) Yupp. And i don't think that Oscar makes any Difference on this Situation. IMHO the 100MBit Line in Combination with this 3-Day-in-7-Days Outage is the Bottleneck. Maybe a 5-Day-in-14-Days Outage would be a better Choice for us. Helli A loooong time ago: First Credits after SETI@home Restart |
SciManStev Send message Joined: 20 Jun 99 Posts: 6652 Credit: 121,090,076 RAC: 0 |
...Everyone have his own specially problems.. ;-) I was thinking that after they shuffle the servers around, and add Oscar, the need for an extended down time might be some what aleviated. I am thinking that 96 Gig of RAM can process a whole lot of science. I am most likely wrong, but if the outage is so NTPC can work while not being bombarded with new files, then the ability to handle the files and do science may reduce the need for down time. Actually, I don't really know. Steve Warning, addicted to SETI crunching! Crunching as a member of GPU Users Group. GPUUG Website |
X-Files 27 Send message Joined: 17 May 99 Posts: 104 Credit: 111,191,433 RAC: 0 |
2 parts is broken: 1st) .VLAR are being sent to GPU 2nd) Prefs of "Use CPU" = No, are getting cpu tasks. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.