Message boards :
Number crunching :
Panic Mode On (20) Server problems
Message board moderation
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 · Next
Author | Message |
---|---|
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Since today morning the GPU cruncher is only idle. (since maybe 12 hours) And had a ~ 4 day WU cache. I have now ~ 1,600 results ready for UL. But they can't go home. Maybe every few minutes one result. Very well.. if this will continue like this.. in few days (one week?) the PC will request new work. Ohh well.. This well UL traffic will continue? How long? OTOH. The UL traffic will be better in future.. or not.. the GPU cruncher want to send ~ 800 results to Berkeley / day. [normal ARs] If this will not be possible, I can switch off the GPU cruncher.. |
jay_e Send message Joined: 6 Apr 03 Posts: 62 Credit: 1,072,112 RAC: 0 |
So far, I've waited since Sunday. Hi Grant, Thanks for the info!! One WU made it through overnight. Now I know that I should try to force the uploads.... I got the rest to go by using the BOINC Manager advanced view: Advanced-> "Do Network Communication" Over and over and over. For every three or four cyles, maybe one WU made it. I run Seti@gome on another two laptops in two cities - one: cable-modem - the other: DSL. Both had the problem of WU not uploading. Same solution worked: "Do Network Communication" over and over. Jay |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304 |
Things appear to be working ok at the moment, but the network traffic for the past 18 hours or so looks rather odd to say the least. Outbound traffic- a couple of short, sharp drops, then nice & steady until it took a huge dive for a few hours. Came back up again, but still a few sharp drops here & there. As for inbound traffic- started off ok & gradually increased, then it became a sine wave of gradually increaasing amplitude. Leveled off for a short while but then started to become a bit jagged. Grant Darwin NT |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Grrrr................... Uploads failing again. Bandwidth is maxxed out again but AP work is NOT being generated or sent out. So what gives???? [edit]In fact never did get all my uploads in. They seem to error out instantly and go back into the delayed time backoff mode. Boinc....Boinc....Boinc....Boinc.... |
W-K 666 Send message Joined: 18 May 99 Posts: 19093 Credit: 40,757,560 RAC: 67 |
Grrrr................... See Eric K's post 918637. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304 |
Inbound traffic just hit 32.48Mb/s. I think we have a new record. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304 |
Boy that upload server is copping a hammering. Usual rate is around 45-50,000 results per hour. It's been averaging 100,000 for 14 hours now (hitting a peak of 250,590). Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Boy that upload server is copping a hammering. It always happens. I think they keep a special stash of "shorty only" tapes on the shelf, so they can slip a few on after an outage and really gum the works up. Edit - did you see that blip earlier when the main database was doing over 1,000 queries a second? That was the validators playing catchup. |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
I now have a nice backlog of work, enough to last me a few days at least, and uploads seem to be going in on a regular basis... just wish I could get some AP units so my CPUs get busy too. |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
Grrrr................... . . . WK - THAT Link seems to go to 'POST' [must be because of ALL the 'Moving' around of Posts / Messages DOH!] BOINC Wiki . . . Science Status Page . . . |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
Grrrr................... Then it was perhaps this one (918297). (They only differ in two digits ;-) Gruß, Gundolf [edit]Ohhh, and Eric K embedded the live diagram, very naughty :-) [/edit] Computer sind nicht alles im Leben. (Kleiner Scherz) SETI@home classic workunits 3,758 SETI@home classic CPU time 66,520 hours |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
[edit]Ohhh, and Eric K embedded the live diagram, very naughty :-) [/edit] I think we can trust Eric to know how much of the Cricket Admin's (and the campus's) bandwidth SETI can afford to use - you can't embed that graph by mistake, it takes a fair amount of effort. It's only regenerated once every four minutes, so it isn't exactly live streaming. Even if it was, it would have precisely zero effect on the Hurricane Electric link carrying our data. |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
I will attempt to explain what is happening here again. Only about 25% of my uploads are going through. The rest end up with................ Upload pending, retry in xx:xx:xx I know that this is normal during a recovery phase. Now 3 days since the normal Tuesday outage are we still in a recovery phase? I check my computers with Boinc View. It shows 50 or so in this backoff state. I highlight all of them and tell boinc view to retry the uploads. 1 in 4 is uploaded, the rest INSTANTLY go into a deeper backoff. Normally at this point in time all my work is uploaded and I am not experiencing any back off's. So I continue to massage the retry option in boinc view. Within a few minutes they are all uploaded again. My point is that if uploads are so easily done now manually, why were these work units in a back off state to begin with? They should have uploaded themselves on the first try. Even now the transfers tab is loading up with retry's again. Yet if I retry manually 25% will upload and again in a few minutes all transfers are done. So my basic contention is that 75% of uploads are now failing that normally get through fine at this point in the recovery period. Eric's change has dramatically improved the uploads but now I see a different problem. Boinc....Boinc....Boinc....Boinc.... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
We're still showing historically high data rates (86 Mbs out, 26 Mbs in), so some retries are to be expected: Eric's change means that they now happen instantly instead of after 21 seconds. This isn't recovery, this is an old-fashioned shorty storm, aided and abetted by CUDA. |
Vistro Send message Joined: 6 Aug 08 Posts: 233 Credit: 316,549 RAC: 0 |
My parents just (reluctantly) let me install SETI in the editing suite, but the server claims it has no jobs. and it won't look at cuda. But I think that might be a reb00t thing. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Boy that upload server is copping a hammering. I think they have a new assistant helping select the next tapes: his name is Murphy. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Even now the transfers tab is loading up with retry's again. Yet if I retry manually 25% will upload and again in a few minutes all transfers are done. The change improves uploads by getting rid of the ones that can't be serviced promptly instead of having the overhead of trying to hang on and hope the server can get to them in time. This works because every time an upload succeeds, there is a little less traffic. Eventually, enough will get through that the traffic will drop to the point that every upload will succeed, and then life will be good. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Boy that upload server is copping a hammering. Either that, or a grad student with a typical student sense of humour. |
Heflin Send message Joined: 22 Sep 99 Posts: 81 Credit: 640,242 RAC: 0 |
It shows 50 or so in this backoff state. I highlight all of them and tell boinc view to retry the uploads. 1 in 4 is uploaded, the rest INSTANTLY go into a deeper backoff. So You are the reason that others upload request are failing. The fail and backoff process is there for a reason. Manually forcing retries REPEATEDLY just hammers the servers making it worse for everyone. As long as you have more work to process, chill out and let the backoff process work SETI@home since 1999 "Set it, and Forget it!" |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
It shows 50 or so in this backoff state. I highlight all of them and tell boinc view to retry the uploads. 1 in 4 is uploaded, the rest INSTANTLY go into a deeper backoff. There are two "saving graces" for a case like this: 1) There aren't enough of us worrying over this to hit the retry button over and over. 2) Eventually, all of his work will upload, and he'll be out of the way again. Edit: unfortunately, we have to accept human behavior for what it is. It isn't a violation of the rules, but it isn't exactly good form either. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.