Panic Mode On (17) Server problems |
![]() |
Forum : Number crunching : Panic Mode On (17) Server problems
| Auteur | Bericht |
|---|---|
|
New thread. | |
| ID: 908012 | | |
"Results ready to send" is 47 minutes old! Results ready to send 43,228 0 49m Current result creation rate 14.2274/sec 0.0315/sec 6m Results out in the field 3,992,415 114,054 49m Still getting no WUs, run out of work on my laptop. Also, "Results out in the field 3,992,415", not many days ago it was over 4,5mill. | |
| ID: 908026 | | |
|
| |
| ID: 908027 | | |
|
Got 11 multibeam work units about an hour ago. | |
| ID: 908052 | | |
|
back to no new works since I have started this morning, I know I will get some so I suppose I will just have to sit and wait. | |
| ID: 908119 | | |
back to no new works since I have started this morning, I know I will get some so I suppose I will just have to sit and wait. . . . always the BEST Solution ---> waIT found it true in every case ____________ BOINC Wiki . . .Science Status Page . . . | |
| ID: 908120 | | |
|
i have two cuda left to run when i before i heading to work. looks like the GPU will be quit today. | |
| ID: 908156 | | |
|
Yes, I see one-third fewer tasks listed on each of my two computers. I figure it'll work itself out soon. If not, there's Einstein. | |
| ID: 908177 | | |
back to no new works since I have started this morning, I know I will get some so I suppose I will just have to sit and wait. Still no new work well that is 10 hrs now, still have five to do, cricket graph last time I look looked like it was up and down spiking here and there. Server stats says over 90,000 still cannot get any shall I change my account so that is just does MBs ____________ ![]() ![]() | |
| ID: 908185 | | |
|
Waah Hoo! Were back... | |
| ID: 908210 | | |
Waah Hoo! Were back... My machines are all set for network access only between 6:00pm and 11:00pm (I'm in the same time zone as Berkeley) and they do manage to top-off enough to keep crunching. ____________ | |
| ID: 908253 | | |
|
Apparently the problem hasn't been mitigated. Still no work. | |
| ID: 908255 | | |
|
| |
| ID: 908261 | | |
|
I got 12 MBs for my CPU but now I can't get any for my GPU. :( I know it was just bad timing but why did it have to ask for more 6.03s when I am completely out of 6.08s? I still had 6.03s and an AP V5 on my machine. | |
| ID: 908267 | | |
You can switch it on right now, if you attach to another project as it has been suggested you do since the day BOINC went live. ____________ ![]() You will be assimilated...bunghole! ![]() | |
| ID: 908282 | | |
You can switch it on right now, if you attach to another project as it has been suggested you do since the day BOINC went live. No thanks. If you looked in my account data/profile.. I'm 100 % SETIan.. no other BOINC project. ____________ U aren't, or U aren't happy in Ur team? seti.international ! :-) | |
| ID: 908285 | | |
You can switch it on right now, if you attach to another project as it has been suggested you do since the day BOINC went live. And I submit that your insistence to participate in only one project is the reason why you are out of work, not that Seti@home is having server issues, which it often does. ____________ ![]() You will be assimilated...bunghole! ![]() | |
| ID: 908286 | | |
And I submit that your insistence to participate in only one project is the reason why you are out of work, not that Seti@home is having server issues, which it often does. It depend which kind of person you are. I want only to pay my electricity bill for SETI@home. If you would like to pay your electricity bill only for to heat your room you can support this or this BOINC project. ;-) ____________ U aren't, or U aren't happy in Ur team? seti.international ! :-) | |
| ID: 908288 | | |
|
Sutaru And I submit that your insistence to participate in only one project is the reason why you are out of work, not that Seti@home is having server issues, which it often does. Regards ____________ Please consider a Donation to the Seti Project. | |
| ID: 908294 | | |
And I submit that your insistence to participate in only one project is the reason why you are out of work, not that Seti@home is having server issues, which it often does. If you were to sign up with another project, say, World Community Grid...you could set it's resource share at 1 and set Seti at 1000. Thus, for every 1000 hours of work you do on Seti, you'd do 1 hour of work on WCG. That will hardly make an impact on your overall Seti stats. Furthermore, it will keep your processors hot if Seti goes down for an extended period. If you are concerned about your GPU, then do the same thing with GPU Grid. It also wouldn't hurt to increase your cache from 5 days to 10 days. I ordinarily wouldn't care, except I feel that signing up for a backup project or two is more productive than making multiple posts in multiple threads on multiple forums about the exact same subject. ;-) ____________ ![]() You will be assimilated...bunghole! ![]() | |
| ID: 908295 | | |
... I would like to do it.. but it wasn't possible. I had 10 day cache on the HDD for two GPUs. Then I insert the last two GPUs.. (4 GPUs) -> 5 day cache. [for ~ one month] The preferences not changed.. -> 10 day cache. But.. the max. was 5 days down to 2 or 1 day.. filled up again to max. 5 days.. and again down to 2 or 1 day.. then again up to 5 days.. but now down to 0 days. All the time I had 5 days cache.. Berkeley had server probs and the cache gone down.. I guess the PC have too much performance for SETI@home.. ;-) ------------------------------------------------------------------ @ Pappa Yes of course I will be patiently.. I'm a SETI@home member and learned this characteristic.. ;-) ____________ U aren't, or U aren't happy in Ur team? seti.international ! :-) | |
| ID: 908299 | | |
|
The good news is that the bandwidth hasn't maxed out at all after this outage . . . :) | |
| ID: 908341 | | |
You can switch it on right now, if you attach to another project as it has been suggested you do since the day BOINC went live. So? Consider crunching for SETI BETA. They have a slightly different mix of servers, and they aren't necessarily down when SETI "main" is down. ____________ | |
| ID: 908342 | | |
The good news is that the bandwidth hasn't maxed out at all after this outage . . . :) That's a good one! LOL | |
| ID: 908345 | | |
|
The bad news is . . . | |
| ID: 908346 | | |
That's OK, you haven't seen me when I get home at night and in the privacy of my house slip into a more comfortable shape! [Evil chuckle] and I thought it was just my cataracts playing up and fatigue when I've seen your outline wavering at 3am :-) | |
| ID: 908350 | | |
|
So global over farming of WU’s for RAC has caused a global famine. (plus OC and op apps). | |
| ID: 908356 | | |
So global over farming of WU’s for RAC has caused a global famine. (plus OC and op apps). Seti is looking for more computing power, not less, my friend. They just have to get their server situation sorted...and they are working on that. ____________ 4 kitties on a Seti mission...Meeeeeeooowwwrrrrr!!! The Genuine Kittyman..........accept no substitutes. ![]() | |
| ID: 908360 | | |
|
WTH lol, that was a strange nothing for several hours Daemon History . | |
| ID: 908372 | | |
|
well I got server down message well this; | |
| ID: 908374 | | |
|
Panic over here, just got 20 MB tasks. | |
| ID: 908377 | | |
Panic over here, just got 20 MB tasks. Got some myself now, so it seems to be coming back. ____________ ![]() | |
| ID: 908384 | | |
|
Why is this project so often down? They have sun servers and ten years experience - but this doesn't seem to be enough yet. | |
| ID: 908385 | | |
Why is this project so often down? They have sun servers and ten years experience - but this doesn't seem to be enough yet. Because in most environments, they'd have BRAND NEW Sun servers, all of them with all kinds of excess capacity. The Sun servers they had were mostly hand-me-downs from other organizations. They've largely been replaced by various Intel-based servers, some white-box, some prototypes that are "retired" by the manufacturers, some sent in as gifts by various people. They aren't the hardware that SETI@Home would have specified or bought, they're what showed up. A well funded project would have more people. They'd have people whose only job is operating the servers. Given short staff and very little money, they're doing exceptionally well. I'll also point out (again) that we need to consider BOINC as a system, and that the BOINC client does not require 100% uptime at the servers. It will (in general) cache work and keep crunching. 90% uptime should be more than enough. ____________ | |
| ID: 908388 | | |
|
Servers back down.....I'm sure they are sorting things again. | |
| ID: 908394 | | |
|
I'm not quite near panic yet. Main rig still has 200+ MBs and 2 APs on it. That ends up being about 3 days of work. Second rig has about 20 MBs and 1 AP on it. That's about 2 days. One Linux box only has 1 AP in its queue, so that's 2 days left remaining for that task. Second Linux box has quite a few MBs and at least one AP, so at least 2 days on that. | |
| ID: 908395 | | |
|
i was able to get a few MB last night but has anyone seen an new CUDA unit sent out? | |
| ID: 908397 | | |
|
Anyone noticed the new website code at play?... | |
| ID: 908398 | | |
Anyone noticed the new website code at play?...Other than our spiffy little flags? ____________ 4 kitties on a Seti mission...Meeeeeeooowwwrrrrr!!! The Genuine Kittyman..........accept no substitutes. ![]() | |
| ID: 908402 | | |
|
I am now on the 24 hour wait luckily I have WCG to do and still have one to upload sometime tomorrow. | |
| ID: 908405 | | |
Anyone noticed the new website code at play?... Some weird effect of CSS on the main page, all black with blue letters before it loads correctly, you mean? ____________ Jord -BOINC FAQ Service -Nvidia CUDA & ATI CAL FAQ Courtesy starts with your first post of the thread. | |
| ID: 908412 | | |
Yeahhh... that will go over well with enthusiasts like me who have been working hard to throw everything they've got at it. Somehow I don't think people would like SETI@home saying to them: "We want your help, but only so much. We're cutting you off at 10 CPUs." In fact, if SETI@home were to adopt such a policy, I would abandon the project. There are plenty of other worthy DC projects which would never do such a thing. ____________ ![]() | |
| ID: 908415 | | |
|
My rig has been trying to upload 100 tasks most of the day. | |
| ID: 908417 | | |
|
I see only about three hours of tasks left. It'll be Einstein or else I'll have to turn off my machines tonight. | |
| ID: 908424 | | |
|
I have 4 gpu's to feed so it looks like Aqua for me just to keep my room warm lol | |
| ID: 908426 | | |
|
Fixed! | |
| ID: 908436 | | |
|
Don't know what is fixed but server status is now showing 7 hrs. | |
| ID: 908437 | | |
|
Yep sorry, was seeing things. | |
| ID: 908444 | | |
I still think the math behind the quota needs to be revised. Sure, one bad task can still be -1, but a good task needs to be something like +1 or +2 instead of x2. Takes 100+ bad tasks to get down to a quota of 1, but it only takes 8 good ones to get back to 100. Sounds like it defeats the purpose of the quota altogether. $0.02 [edit: and my panic is gone now. The one Linux box that only had an AP to crunch requested 86400 seconds of work (nice number there..) and got 20 MBs. Good to go now.] ____________ ![]() Linux laptop uptime (844 days as of 2009-11-24) I was being vague to avoid being misleading | |
| ID: 908445 | | |
Don't know what is fixed but server status is now showing 7 hrs. Before they took the website down, and took the server's offline for maintenance, the Status time was stuck at: [As of 17 Jun 2009 12:20:10 UTC], now at least it's updating it's status every 10 mins, even through the stats haven't been updated (yet). Work is getting through, I've got about 60 WU's downloaded, and another 20 downloading, You might have to get some uploaded before you'll request more. Claggy | |
| ID: 908446 | | |
|
No point in having quotas if the schedular gets things totally screwed. | |
| ID: 908447 | | |
I still think the math behind the quota needs to be revised. Sure, one bad task can still be -1, but a good task needs to be something like +1 or +2 instead of x2. Takes 100+ bad tasks to get down to a quota of 1, but it only takes 8 good ones to get back to 100. Sounds like it defeats the purpose of the quota altogether. It really depends on what you're trying to achieve. If the purpose is to keep a machine that returns consistently bad results from chewing through work, it's good enough. It'll take a while to throttle it down, so an occasional validator error or math bug won't hurt machines that are not broken. It also means a broken machine (or maybe a bug-fix to the application) can recover quite quickly once it's fixed. If you did away with the -1/x2 mechanism entirely, things would probably still be okay, there would just be a lot more reissues. ____________ | |
| ID: 908487 | | |
|
I was also thinking of proposing that a new user's quota be at 10, and go from there. That way they don't go and fill up their cache and then abandon ship, leaving all of those tasks to time out and be reissued. | |
| ID: 908510 | | |
I was also thinking of proposing that a new user's quota be at 10, and go from there. That way they don't go and fill up their cache and then abandon ship, leaving all of those tasks to time out and be reissued. I'd think even lower would be fine, maybe as low as two. Each validated WU would double-up, and they'd get to full quota fairly quickly. ____________ | |
| ID: 908520 | | |
|
:) PAUSE ! 18/06/2009 00:00:36 Resuming network activity 18/06/2009 00:00:36 SETI@home Started upload of 01mr09ad.27455.4980.11.8.163_1_0 18/06/2009 00:00:36 SETI@home Started upload of 24fe09ac.31767.5389.3.8.130_0_0 18/06/2009 00:00:36 SETI@home Sending scheduler request: To fetch work. 18/06/2009 00:00:36 SETI@home Requesting new tasks 18/06/2009 00:00:41 SETI@home Scheduler request completed: got 0 new tasks 18/06/2009 00:00:41 SETI@home Message from server: (Project has no jobs available) 18/06/2009 00:00:44 SETI@home [error] Error reported by file upload server: can't open file 18/06/2009 00:00:44 SETI@home Temporarily failed upload of 24fe09ac.31767.5389.3.8.130_0_0: transient upload error 18/06/2009 00:00:44 SETI@home Backing off 1 min 0 sec on upload of 24fe09ac.31767.5389.3.8.130_0_0 18/06/2009 00:00:47 SETI@home [error] Error reported by file upload server: can't open file 18/06/2009 00:00:47 SETI@home Temporarily failed upload of 01mr09ad.27455.4980.11.8.163_1_0: transient upload error 18/06/2009 00:00:47 SETI@home Backing off 1 min 0 sec on upload of 01mr09ad.27455.4980.11.8.163_1_0 18/06/2009 00:01:44 SETI@home Started upload of 24fe09ac.31767.5389.3.8.130_0_0 18/06/2009 00:01:47 SETI@home Started upload of 01mr09ad.27455.4980.11.8.163_1_0 18/06/2009 00:01:56 SETI@home Sending scheduler request: To fetch work. 18/06/2009 00:01:56 SETI@home Requesting new tasks 18/06/2009 00:02:01 SETI@home Scheduler request completed: got 1 new tasks 18/06/2009 00:02:03 SETI@home Started download of 12mr09ac.29680.16841.4.8.223 18/06/2009 00:02:09 Project communication failed: attempting access to reference site 18/06/2009 00:02:09 SETI@home Temporarily failed upload of 01mr09ad.27455.4980.11.8.163_1_0: connect() failed 18/06/2009 00:02:09 SETI@home Backing off 1 min 12 sec on upload of 01mr09ad.27455.4980.11.8.163_1_0 18/06/2009 00:02:09 SETI@home Finished download of 12mr09ac.29680.16841.4.8.223 18/06/2009 00:02:11 BOINC can't access Internet - check network connection or proxy configuration. 18/06/2009 00:02:38 Project communication failed: attempting access to reference site 18/06/2009 00:02:38 SETI@home Temporarily failed upload of 24fe09ac.31767.5389.3.8.130_0_0: connect() failed 18/06/2009 00:02:38 SETI@home Backing off 1 min 0 sec on upload of 24fe09ac.31767.5389.3.8.130_0_0 18/06/2009 00:02:40 Internet access OK - project servers may be temporarily down. 18/06/2009 00:02:56 Suspending network activity - user request ____________ SETI@Home Informational message -9 result_overflow with a general handicap of 80% and it makes much d' efforts for the community and s' expimer, thank you d' to be understanding. | |
| ID: 908521 | | |
I was also thinking of proposing that a new user's quota be at 10, and go from there. That way they don't go and fill up their cache and then abandon ship, leaving all of those tasks to time out and be reissued. I've also thought that the initial DCF should be pretty high. That would keep inefficient machines from over-requesting work initially, and allow more work once things settle in. The obvious problem is the high projected time, but it should be possible to explain that (or show a "displayed duration" based on a "display-DCF"). ____________ | |
| ID: 908524 | | |
:) PAUSE ! Looking at what is happening with uploads at the moment, suspending network for a few hours seems like the best option. Those of my uploads that are managing to make contact are getting to 100% but then failing to get the final "ack" from the database so it looks like a database access issue again. F. ____________ ![]() | |
| ID: 908527 | | |
:) PAUSE ! Yup, I've got the same problems with the "transient upload error". I'm sure it will sort itself out in due time. If not this week or year, but surely the next :-) | |
| ID: 908529 | | |
|
I was just looking at the cricket graph and there is still about 30mbit leftover (if you add inbound and outbound together, it shouldn't be more than 100mbit..in a theoretical sense), but none of my 40 pending uploads will go through. I don't think it's a lack of bandwidth, I think there's just too many requests happening for apache to handle all of them. | |
| ID: 908570 | | |
I was just looking at the cricket graph and there is still about 30mbit leftover (if you add inbound and outbound together, it shouldn't be more than 100mbit..in a theoretical sense), but none of my 40 pending uploads will go through. I don't think it's a lack of bandwidth, I think there's just too many requests happening for apache to handle all of them. I finally removed/aborted all (12) of my stuck uploads, yes I lost some 600 credits, but all my cores where out of jobs. Immediately thereafter Boinc downloaded 24 new WU's. So, I don't think it's a matter of lack of bandwidth, but lack of something else, database wise. Sten-Arne | |
| ID: 908575 | | |
I still think the math behind the quota needs to be revised. Sure, one bad task can still be -1, but a good task needs to be something like +1 or +2 instead of x2. Takes 100+ bad tasks to get down to a quota of 1, but it only takes 8 good ones to get back to 100. Sounds like it defeats the purpose of the quota altogether. Wouldn't it cause less? | |
| ID: 908591 | | |
I was just looking at the cricket graph and there is still about 30mbit leftover (if you add inbound and outbound together, it shouldn't be more than 100mbit..in a theoretical sense), but none of my 40 pending uploads will go through. I don't think it's a lack of bandwidth, I think there's just too many requests happening for apache to handle all of them. That will help them out- download,wait 1hr then abort. Repeat untill the problem is fixed. LOL | |
| ID: 908595 | | |
|
Does this mean all of our results are invalid? OH NOES! | |
| ID: 908600 | | |
|
I did notice the number of tapes left online is a reasonable number now. It is shrinking fast and we still haven't heard anything about new data being recorded/shipped, so unless there's a few more tapes at off-site storage, we could actually run out.. ::gasp:: | |
| ID: 908608 | | |
|
Cosmic_Ocean wrote: I did notice the number of tapes left online is a reasonable number now. It is shrinking fast and we still haven't heard anything about new data being recorded/shipped, so unless there's a few more tapes at off-site storage, we could actually run out.. ::gasp:: Well, if the current situation persists (WU upload problems and Validate errors for those unlucky WUs that got reported in), there would be many many reissues. ____________ | |
| ID: 908631 | | |
|
| |
| ID: 908636 | | |
I still think the math behind the quota needs to be revised. Sure, one bad task can still be -1, but a good task needs to be something like +1 or +2 instead of x2. Takes 100+ bad tasks to get down to a quota of 1, but it only takes 8 good ones to get back to 100. Sounds like it defeats the purpose of the quota altogether. Letting every broken host have 100 work units per CPU every day would mean more "broken" results returned every day, and more reissues. ____________ | |
| ID: 908643 | | |
|
Hi. | |
| ID: 908648 | | |
|
I now have about 300 6.08 Cuda trying to upload and about 30 mins left to crunch then this i7 is out of Seti work. | |
| ID: 908653 | | |
Can anybody tell me what is going on with SETI? There's still some sort of problem with the servers. I suspect it's a download problem- download traffic has dropped to lower than normal levels, even though there is plenty of work available to download. But there is a huge amount of inbound traffic, yet it's impossible to actually return any results because of that traffic. ____________ Grant Darwin NT. | |
| ID: 908676 | | |
|
luckily I have only one to upload, unluckily it has been trying since yesterday now http errors which is usual after an outage. Have to wait another few hours before I can see if I get any work, yesterday my conmunication was deferred for 24 hours which is ok after it has checked for awhile will let you know if I get any around 18:30 BST | |
| ID: 908698 | | |
|
Well I just shut the old P4 down due to i cant upload, But thats ok. Now is the time to blow the dust bunnies out of it. | |
| ID: 908701 | | |
|
| |
| ID: 908702 | | |
|
I tried a few times to get work for the P4 but no go so i just shut er down. | |
| ID: 908707 | | |
|
I know there are problems which are leading to an inability to upload or download work - thats fine, these things happen. My query revolves around the status page, the ones I check are | |
| ID: 908717 | | |
. . . *My* 0.02 cents . . . Wow, that's little! Where do they have hundredth of cents? ;-) | |
| ID: 908740 | | |
|
http://bluenorthernsoftware.com/scarecrow/sahstats/graphs.php?t=48 | |
| ID: 908741 | | |
|
. . . *My* 0.02 cents . . . eh Gundolf ;)) maybe THIS: $00.02 [cents] ____________ BOINC Wiki . . .Science Status Page . . . | |
| ID: 908743 | | |
You can switch it on right now, if you attach to another project as it has been suggested you do since the day BOINC went live. So? Hmm.. if the servers would be separated.. maybe it would be a well idea.. But now (maybe always) SETI@home Beta Test have the same probs like SETI@home. ____________ U aren't, or U aren't happy in Ur team? seti.international ! :-) | |
| ID: 908754 | | |
|
I just added a new cruncher to my account. It sent out work requests twice for the 2 day cache that I have set and successfully downloaded 17 WUs without any trouble. So downloads dont seem to be an issue. However, my other crunchers have 100s of completed WUs waiting for upload, and unless these get reported, no new work will be downloaded. Suspended network activity on all for the time being. | |
| ID: 908755 | | |
|
FYI | |
| ID: 908774 | | |
|
The folks at Climate make this suggestion periodically as well -- makes sense for those doing single project work (which doesn't make all that much sense as a primary focus of the BOINC project was multi-project support so that project specific outages would be less bothersome).
____________ ![]() | |
| ID: 908786 | | |
Hmm.. if the servers would be separated.. maybe it would be a well idea.. The problem is that a lot of the storage is through the NAS box, or through mounts on various drives -- and those work best on a LAN. The interdependencies mean that you have more than one critical server for some functions. The best idea would be for SETI to find a great big pile of money (or a great big pile of brand new servers, with the right OS preloaded and driver issues all resolved). Then they could reduce the interdependencies. ____________ | |
| ID: 908787 | | |
The folks at Climate make this suggestion periodically as well -- makes sense for those doing single project work (which doesn't make all that much sense as a primary focus of the BOINC project was multi-project support so that project specific outages would be less bothersome). Please add your support to trac ticket #139. [Yes, two years old - but stranger things have happened recently] | |
| ID: 908794 | | |
|
I've been able to do some uploads now. Reporting goes well. I noticed on the server status page that they have put in a mb_splitter_try server on bambi. | |
| ID: 908798 | | |
FYI Hey Mike.. ..your nice rig is a topic also in an other thread: Technical News : Comedy (Jun 17 2009) - Message 908762 :-) BTW. My GPU cruncher is also offline.. upload not possible -> no work request -> idle ____________ U aren't, or U aren't happy in Ur team? seti.international ! :-) | |
| ID: 908813 | | |
|
Yes, I had posted it over there. Someone moved it here. | |
| ID: 908815 | | |
I've been able to do some uploads now. Reporting goes well. I noticed on the server status page that they have put in a mb_splitter_try server on bambi. Yeah, and the Splitter Status gives 18(!) channels in progress ;-) Can't believe that's right though. ____________ ![]() | |
| ID: 908816 | | |
Yes, I had posted it over there. Someone moved it here. If a Mod moved a post, you get an EMail why and which Mod done it. I guess this must be enabled (SETI@home preferences) : Is it OK for SETI@home and your team (if any) to email you? yes But don't post Mod activities in the forum. EMail the Mods if you feel that it wasn't a well action. ____________ U aren't, or U aren't happy in Ur team? seti.international ! :-) | |
| ID: 908819 | | |
Yes, I had posted it over there. Someone moved it here. If you have there no.. Post rules #8: Do not posts comments related to specific moderator actions. The moderators may be contacted at seti_moderators (the small sign) ssl . berkeley . edu :-) ____________ U aren't, or U aren't happy in Ur team? seti.international ! :-) | |
| ID: 908822 | | |
|
Have been getting the message may be down for the last two days. My work units is just setting there saying uploading. any help would get my thanks. | |
| ID: 908826 | | |
|
Luckily I had some downloads and the master file download because of this little problem | |
| ID: 908830 | | |
|
| |
| ID: 908833 | | |
... or a "test" version that doesn't do a "count" against the database. ____________ | |
| ID: 908839 | | |
The folks at Climate make this suggestion periodically as well -- makes sense for those doing single project work (which doesn't make all that much sense as a primary focus of the BOINC project was multi-project support so that project specific outages would be less bothersome). > Thanks Richard - will get to that after i speak with Eric [Korpela] ____________ BOINC Wiki . . .Science Status Page . . . | |
| ID: 908846 | | |
|
LOTS of MB's coming in now. Even Cuda, which I haven't had for a couple of days. | |
| ID: 908855 | | |
|
I have somewhere around 100-150 WUs trying to upload and I keep getting the error message that the servers may be temporarily down. Yet, when I check the server status page, things look fine. What gives? | |
| ID: 908858 | | |
I have somewhere around 100-150 WUs trying to upload and I keep getting the error message that the servers may be temporarily down. Yet, when I check the server status page, things look fine. What gives? Check the sticky SERVER OUTAGE: 6/18/09 | |
| ID: 908860 | | |
Have been getting the message may be down for the last two days. My work units is just setting there saying uploading. any help would get my thanks. Please refer to the Server Outage Announcement ____________ ![]() Play Fallen Sword! | |
| ID: 908862 | | |
|
Yup - looks like someone took the cork out the bottle - they are pouring out, my backlog disappeared in a blink and got refilled. I suspect the only issue left now is the sheer number of updates and refills to be done to catch up may cause some delay, but the cavalry has definitely turned up :) | |
| ID: 908863 | | |
|
I had about 10 WUs upload, while 20 more finished and are trying to upload. I think one of the mechanisms of BOINC is that it will not request more work until all the uploads are done, correct? | |
| ID: 908866 | | |
Yup - looks like someone took the cork out the bottle - they are pouring out, my backlog disappeared in a blink and got refilled. I suspect the only issue left now is the sheer number of updates and refills to be done to catch up may cause some delay, but the cavalry has definitely turned up :) Yes indeed they seem to be pouring in. However, now we will soon face another problem: No more WU's to send out. Looking at the server status page, there are not that many files left to split, and bearing in mind that there is a problem down in Arecibo, with getting new files up to Berkeley, I predict that by Friday or maybe Saturday all files will be split and all WU's sent out. Up your caches folks take WU's while there are any :-) Sten-Arne | |
| ID: 908867 | | |
|
They're shorties but work is work... | |
| ID: 908869 | | |
|
| |
| ID: 908873 | | |
|
Now, if I could only get these 328 uploads to go through, then I could take advantage of the pipes being open again!! Only 15 MB - just about 3 hours - and no CUDA left in my 3-day cache :-( | |
| ID: 908877 | | |
I had about 10 WUs upload, while 20 more finished and are trying to upload. I think one of the mechanisms of BOINC is that it will not request more work until all the uploads are done, correct? Close. It doesn't quite wait until they've all gone: just until the number remaining is down to twice the number of CPU cores (yes, cpu - CUDA doesn't count here). That means I can see my four rigs requesting upwards of 2M seconds of work combined once all the uploads are done. That will be a lot. Some would regard it as polite if you left a few for the rest of us - and didn't clog up Matt's server with tasks you won't get round to for a couple of weeks! | |
| ID: 908884 | | |
Yes indeed they seem to be pouring in. I (unfortunately) have to agree with you. On my non-CUDA C2D machines I normally have the setting at 1 / 1.5. However, given this last issue at SETI, I've increased to 1 / 5.0 for now (couldn't bear to change it to 10) until things stabilized again, then will change back. At least I got my machine at work going again, but I powered off my machine at home when this issue occurred (both units ran dry). Too bad I hit my limit (200/day for my C2D), but only 4 more hours until 00:00 UTC.... ____________ ![]() ![]() | |
| ID: 908899 | | |
|
I have my work cache set at 1 / 10, and I can't remember the last time I ran out of CUDA work. Some might say that hoarding 10 days of work is selfish, but my computers never have any problems crunching them. I would rather have a bit of a wait to get credits than run out of work. I'm just kicking back and waiting for things to straighten themselves out. | |
| ID: 908909 | | |
I have my work cache set at 1 / 10, and I can't remember the last time I ran out of CUDA work. I've got mine set to 0.1 / 4, and i've only run out of work a couple of times since Seti moved to the BOINC platfrom. ____________ Grant Darwin NT. | |
| ID: 909059 | | |
I had about 10 WUs upload, while 20 more finished and are trying to upload. I think one of the mechanisms of BOINC is that it will not request more work until all the uploads are done, correct? Well something definitely got kicked in the server closet. Just uploaded and reported 90 tasks. I tried three uploads one at a time and they went right through, so I selected all and hit "retry now" and they all cleared out in under a minute, then reported them all, requesting 1,109,803 seconds of work (just one rig requested half of the combined total I mentioned earlier..). Got no tasks. That's fine though, I still have a day or two of work left on this rig. And for the record, all of my rigs have always been set for 0.1/4. ____________ ![]() Linux laptop uptime (844 days as of 2009-11-24) I was being vague to avoid being misleading | |
| ID: 909062 | | |
|
Just got home from a 10 hour shift....... | |
| ID: 909064 | | |
|
| |
| ID: 909066 | | |
|
Between 0700 and 1400utc, my main cruncher has managed to get its work requests down to ~400,000 seconds. Only took 134 new MBs to go from 1.1M to 400k, but it is going down. Just wait until the two APs are gone. Those will account for a decent amount of work request by themselves. | |
| ID: 909160 | | |
|
I've cleared my upload, but can't report the 45 tasks on my 2X4. Tried looking at the Cricket graphs (don't trust the server status page) - they're not loading at all?! | |
| ID: 909167 | | |
|
Must have been a hiccup. Just cleared them. Still no Cricket graphs, though. | |
| ID: 909170 | | |
Must have been a hiccup. Just cleared them. Still no Cricket graphs, though. LOL...first Seti, now the Crickets. Although work requests are being met with sporadic success, the kitties have been hunting down enough to keep the crunchers going, so far. ____________ 4 kitties on a Seti mission...Meeeeeeooowwwrrrrr!!! The Genuine Kittyman..........accept no substitutes. ![]() | |
| ID: 909189 | | |
Must have been a hiccup. Just cleared them. Still no Cricket graphs, though. Remember that cricket is a tool for the Berkeley CNS department, and isn't exactly meant for us. They could have moved it, or stopped it, or whatever they please. ____________ | |
| ID: 909198 | | |
|
I have Boinc running for several days but after it finished a chunk of data . . . nothing is coming down the line. My iMac is idle most of the day so it is free to work in seti... why is seti not sending data to me immediately ? | |
| ID: 909205 | | |
Must have been a hiccup. Just cleared them. Still no Cricket graphs, though. I would hope that they do not mind us using it.... It is a very helpful tool for us too, and we have referenced it for years now. Maybe Eric or Matt could inquire. ____________ 4 kitties on a Seti mission...Meeeeeeooowwwrrrrr!!! The Genuine Kittyman..........accept no substitutes. ![]() | |
| ID: 909207 | | |
|
Any idea when/if we are ever going to get any sort of reliability again? Please no stuff about "no momey" or "second hand servers" ect, please (I'm fed up of hearing it). | |
| ID: 909220 | | |
Any idea when/if we are ever going to get any sort of reliability again? Please no stuff about "no momey" or "second hand servers" ect, please (I'm fed up of hearing it). There is no firm answer there, my friend. The project recently received a donation of some Intel servers, which they are working on brining online, and that may help. Once the current excess of MB work is crunched up, there should be some more AP work available again, and that may help somewhat. Just rest assured that they are working on it every day to try to find solutions....just read Matt's technical notes for some background on what they are faced with from time to time and some of the things they are trying. ____________ 4 kitties on a Seti mission...Meeeeeeooowwwrrrrr!!! The Genuine Kittyman..........accept no substitutes. ![]() | |
| ID: 909224 | | |
Must have been a hiccup. Just cleared them. Still no Cricket graphs, though. I was just pointing out that they probably don't think of this as a "service" to any outside group, but as an internal measurement tool. They could make changes at any time, and probably not even realize that some of us watch it. ____________ | |
| ID: 909227 | | |
Any idea when/if we are ever going to get any sort of reliability again? Please no stuff about "no momey" or "second hand servers" ect, please (I'm fed up of hearing it). Something less than 90% reliability is probably fine for BOINC. If the BOINC servers are down, the BOINC client will just retry. It's only an issue when there are extended problems. ... and even then, my two active crunchers have not run out of work during these bumpy times. P.S. I'm kind of tired of the pessimistic "are we ever going to have any reliability" or "why doesn't BOINC work" when work is flowing and credit is being granted, and improvements are (slowly) coming. ____________ | |
| ID: 909230 | | |
|
Nah, if you check the others available, you'll see they all stopped at the same time. Unless they changed all of them to a new position... http://fragment1.berkeley.edu/newcricket/ | |
| ID: 909231 | | |
Nah, if you check the others available, you'll see they all stopped at the same time. Unless they changed all of them to a new position... http://fragment1.berkeley.edu/newcricket/ When I see "newcricket" in the URL, it implies that there is an old cricket someplace, and maybe they moved "newcricket" to the old URL (and this is abandoned in place). It's probably easier to find the new graphs than it is to ask CNS what they did. ____________ | |
| ID: 909232 | | |
|
LOL had to laugh at this | |
| ID: 909234 | | |
LOL had to laugh at this LMAO ! ____________ SETI@Home Informational message -9 result_overflow with a general handicap of 80% and it makes much d' efforts for the community and s' expimer, thank you d' to be understanding. | |
| ID: 909258 | | |
|
| |
| ID: 909293 | | |
|
do we all have a graphic like this one? and how can we see or go to it? | |
| ID: 909297 | | |
do we all have a graphic like this one? and how can we see or go to it? It's not my own graph.. ;-) It's the network traffic of Berkeley. The Router which manage in/out. BTW. Small click on the pic (graph) - and more infos.. :-) ____________ U aren't, or U aren't happy in Ur team? seti.international ! :-) | |
| ID: 909298 | | |
|
Thanks for the info | |
| ID: 909300 | | |
do we all have a graphic like this one? and how can we see or go to it? Graphing bits per second is slightly more interesting because we don't know how many bits are in a packet (we can guess), but we know it is a 100 megabit wire. ____________ | |
| ID: 909312 | | |
|
| |
| ID: 909318 | | |
Between 0700 and 1400utc, my main cruncher has managed to get its work requests down to ~400,000 seconds. Only took 134 new MBs to go from 1.1M to 400k, but it is going down. Just wait until the two APs are gone. Those will account for a decent amount of work request by themselves. And then between 1400utc and 2245utc, another 130 MBs, and have satisfied the work requests. It stopped asking for more work since it caught up. ____________ ![]() Linux laptop uptime (844 days as of 2009-11-24) I was being vague to avoid being misleading | |
| ID: 909345 | | |
|
Is the validator mostly working? I want to start crunching without having tons of points discarded. | |
| ID: 909356 | | |
Is the validator mostly working? It was sorted out yesterday. ____________ Grant Darwin NT. | |
| ID: 909362 | | |
|
mb_splitter 12, 13 & 14 is disabled any idea why? To much load on the system maybe? | |
| ID: 909471 | | |
mb_splitter 12, 13 & 14 is disabled any idea why? To much load on the system maybe? Only six splitters are shown as running, but there are 19 channels in progress. I guess they've spread the load onto other servers, but haven't got round to displaying them on the status page yet. More important fish to fry first. | |
| ID: 909474 | | |
mb_splitter 12, 13 & 14 is disabled any idea why? To much load on the system maybe? From all appearances, having all of the splitters running is unusual. They seem to adjust the number of splitters based on demand -- so that work is split at or a little above the rate it is being assigned. ____________ | |
| ID: 909567 | | |
|
2 weeks ago there was 130+ tapes awaiting to be split. I have checked today three times and it seems we are eating through 1 tape every 4-6 hours? The list is now down to 30 tapes and all of the data being split is from 2009 recordings which leaves older 2008 recordings to do. Seems to me that we may be out of data sooner than later. | |
| ID: 909719 | | |
2 weeks ago there was 130+ tapes awaiting to be split. I have checked today three times and it seems we are eating through 1 tape every 4-6 hours? The list is now down to 30 tapes and all of the data being split is from 2009 recordings which leaves older 2008 recordings to do. Seems to me that we may be out of data sooner than later. Last I heard there was a "field trip" down to Arecibo to install/upgrade one of the recorders, and they were supposed to be taking a look at the raid controller in the box that does the recording to disk. Other than that, I haven't heard anything more. There may still be some tapes in off-site storage that can be pulled down, but if I remember one of Matt's posts correctly, he said there's a ton of data at that location, but it doesn't have any of the blanking information built-in, so he has to get the software radar blanker working better to be able to split those tapes. Of course that only affects AP, but we don't want to load up 100+ tapes and turn the AP splitters off and end up with the inverse situation we're trying to come out of (MB fell behind AP, but it could easily go the other way unless both splitters are running simultaneously). ____________ ![]() Linux laptop uptime (844 days as of 2009-11-24) I was being vague to avoid being misleading | |
| ID: 909733 | | |
2 weeks ago there was 130+ tapes awaiting to be split. I have checked today three times and it seems we are eating through 1 tape every 4-6 hours? The list is now down to 30 tapes and all of the data being split is from 2009 recordings which leaves older 2008 recordings to do. Seems to me that we may be out of data sooner than later. The latest word was that the problem has shifted to getting the CDT data acquisition card working with the newer version of the OS they had to install to get the raid working... The problem with radar noise was noted before AP, it tended to make a huge number of triplets and/or spikes in S@H Enhanced work. AP may actually be less impacted than Enhanced because it has inbuilt code to replace noisy sections with shaped random data. As far what data remains to be split, on May 14 Matt said: 1225 of the 1787 archived data files are from 2008 or later, and of these 249 have yet to be split. So we got plenty of numbers to crunch until we get the data recorder working again. My estimate is they probably still have over 200 not yet pulled back from NERSC, and since they will provide both MB and AP data each should supply crunching for about half a day. So perhaps 100 days to the real panic? Joe | |
| ID: 909864 | | |
Of course that only affects AP, but we don't want to load up 100+ tapes and turn the AP splitters off and end up with the inverse situation we're trying to come out of (MB fell behind AP, but it could easily go the other way unless both splitters are running simultaneously). The "situation we're trying to come out of" was caused by the scheduler assigning work out of proportion to what is split. Seems like that has been solved, and we're playing "catch up" with MB. They could add tapes with mixed work, but it won't really change the mix. If there are tapes in the archive that are suitable for MB but can't be split for AP, that's a different situation, and we'll just have to grin and bear it, because any time you try to "force" one or the other, you have to make up for it later. ____________ | |
| ID: 909875 | | |
|
Seems to be a download problem. Wrong size isn't one I've seen before. | |
| ID: 910623 | | |
Seems to be a download problem. Wrong size isn't one I've seen before. The system doesn't seem to have come up fully yet, after the weekly outage. There seems to be certain things not working yet. Take a cup of coffee, and/or a walk ( a long long walk) :-) Sten-Arne | |
| ID: 910626 | | |
Seems to be a download problem. Wrong size isn't one I've seen before. I'm getting them too. ____________ ![]() Calm Chaos Forum...Join Calm Chaos Now | |
| ID: 910627 | | |
|
Mine have uploaded ok. | |
| ID: 910631 | | |
Mine have uploaded ok. Yeah, but these were downloads | |
| ID: 910636 | | |
Mine have uploaded ok. True and they aren't ghosts either. Same number of units on the web as in the manager for the machine. I've plenty of work so I'm not in a panic. Just concerned if it got wide spread. A zero length might be a splitter problem. Oh uploads are working from the same machine so it isn't a communication issue. Anyone know how long BOINC will re-try before it finally gives up trying to fetch them? ____________ ![]() ![]() | |
| ID: 910652 | | |
|
Had a slew of downloads pending..They just went through one after the other.
Picked up some work from other projects; will try again this evening. Haven't looked the crickets, probably normal recovery logjam. Keep Crunchin'! :) ____________ "The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov ![]() | |
| ID: 910655 | | |
Had a slew of downloads pending..They just went through one after the other. Cricket graphs report:
Average bits in (for the day): Average bits out (for the day):
Cur: 3.35 Mbits/sec Cur: 9.31 Mbits/sec | |
| ID: 910664 | | |
|
. . . Server status: scheduler process - anakin - Disabled | |
| ID: 910666 | | |
|
I got this; | |
| ID: 910676 | | |
I got this; That would have a been a problem at your end (your computer, modem or ISP). Everything else though is a problem at Seti. The system started to come up back up after the outage, then it fell over. Heavily. ____________ Grant Darwin NT. | |
| ID: 910677 | | |
|
Then why as I did nothing did I get connect failed messages from then onwards and an upload weird isn't it. | |
| ID: 910679 | | |
Then why as I did nothing did I get connect failed messages from then onwards and an upload weird isn't it. Not really. Whatever the glitch was with your computer/modem/ISP/whatever resolved iitself before the next attempt to communicate with the project. Uploas are still working, but contacting the Scheduler isn't possible & any downloads presently allocated won't download. ____________ Grant Darwin NT. | |
| ID: 910682 | | |
|
What on Earth is wrong with Seti now? One of my machines is completely out of work - no tasks at all - and the other is not getting any new ones. This project seems to have caught a LOT of Gremlins lately. | |
| ID: 910683 | | |
Can't anybody fix this? PLEASE! It's normal for the servers to be down for a few hours of a Tuesday for maintenance. And it's not unusual for other problems to occur from time to time. That's why it's possible to maintain a cache. I've got a 4 day cache & have only run out of work a couple of times since Seti moved to BOINC (4, 5 years ago?). And even then the longest period of time without work was a few hours. So the servers not being up 24/7 isn't a problem that needs fixing. ____________ Grant Darwin NT. | |
| ID: 910684 | | |
|
No contact with the servers, No units available for download, Matt's post in the Technical News "Long Outage June23 2009," has disappeared | |
| ID: 910696 | | |
|
I posted something here last night, and it has disappeared...things are really screwed up!!!! | |
| ID: 910699 | | |
|
I too posted something today well two messages and they have disappeared having trouble downloading again. Mainly connect failed or expecting and got zero | |
| ID: 910701 | | |
|
how can there be no work available? It just took me ages to get my work oploaded and now nothing coming in. Its the first time im realy out of work | |
| ID: 910702 | | |
I posted something here last night, and it has disappeared...things are really screwed up!!!! Yes, some things got messed up! I found at least two threads where it has been impossible to open/read, but the failure has been reported already. Hope they can fix it soon. Cheers! ____________ ![]() International Year of Astronomy 2009 Año Internacional de la AstronomÃa 2009 | |
| ID: 910704 | | |
I too posted something today well two messages and they have disappeared having trouble downloading again. Mainly connect failed or expecting and got zero I had to shut down my 2nd (8-core) cruncher because no work is being sent. Ya know, I came back to SETI because I got an email, and built 2 octocore machines to process WUs...seems like a waste right now. Why can't these probs be fixed? Is the h/w too old and flaky? Is the s/w inadequate for the load? In particular, is mysql strong enough? Any thoughts or ideas? Thanks! ____________ | |
| ID: 910707 | | |
|
Sever status shows the replica is shut down. I just returned 16 WU and the sever says it has a lot of WU now . when at 7:00 am EDT there was none. | |
| ID: 910715 | | |
|
We are getting download errors like those of last week. | |
| ID: 910720 | | |
|
Has anyone else noticed missing posts on the forums. I saw a few threads where posts appear to have disappeared | |
| ID: 910728 | | |
|
Has anyone else noticed missing posts on the forums. I saw a few threads where posts appear to have disappeared Yes Sir - All of Yesterday's Posts and PMs have Virtually gone away as well . . . ____________ BOINC Wiki . . .Science Status Page . . . | |
| ID: 910729 | | |
|
I have a different problem. I got assigned work, but they just refused to download properly. Anyone else seeing this wrong size error? Is the server sending erroneous signal to Boinc that the download is completed while in reality nothing was transferred? 6/24/2009 10:34:03 PM SETI@home [error] File 06ap09ab.20174.20931.14.8.253 has wrong size: expected 375338, got 0 ____________ | |
| ID: 910738 | | |
|
i have 6 tring to down load sence last night, and i'm out of cuda, i have my cashe set for 6 days, maybe i need to push to max | |
| ID: 910742 | | |
I have a different problem. I got assigned work, but they just refused to download properly. Anyone else seeing this wrong size error? Is the server sending erroneous signal to Boinc that the download is completed while in reality nothing was transferred? 06ap09ab.20174.20931.14.8.253 has wrong size: expected 375338, got 0 This means the server tried to send the file, but you did not receive any of it. Boinc will keep trying until you do receive it. EDIT: Fixed tags. ____________ Flying high with Team Sicituradastra. | |
| ID: 910743 | | |
|
yes, same problem here | |
| ID: 910744 | | |
I too posted something today well two messages and they have disappeared having trouble downloading again. Mainly connect failed or expecting and got zero There are two main issues with the back-end reliability: 1) Yes, a lot of the hardware is either hand-me-downs or prototypes being tested by the manufacturer, either case means special and careful attention when issues do arise, usually hardware issues 2) Most of the software is made in-house for a specific purpose, and MySQL is probably being used more intensely than it was ever intended to be used. For either one of those two reasons, when problems arise, it takes a while to figure out what failed, and then to develop a solution to fix it by means of a band-aid, or in the best-case scenario, keep it from being a problem again. The only problem with fixing problems is that by fixing one problem, you often create 5 new ones. ____________ ![]() Linux laptop uptime (844 days as of 2009-11-24) I was being vague to avoid being misleading | |
| ID: 910763 | | |
|
Got 20 tasks and 4 of them downloaded. Hopefully it's a sign things are starting to clear up. | |
| ID: 910766 | | |
Has anyone else noticed missing posts on the forums. I saw a few threads where posts appear to have disappeared I've seen a thread that has gone away. It is obvious that something crashed hard right after if came back up from the outage and they have had to pull the backup. Now it looks like it crashed again as there is no I/O on cricket. I know it will get fixed. Wishing them godspeed. ____________ ![]() ![]() | |
| ID: 910768 | | |
|
I cannot connect to the server. Receiving error messages project communication failed. What is happening? | |
| ID: 910772 | | |
I thought MySQL might be the problem - is there any chance someone could/would donate a more Enterprise-Oriented DB? SQLServer? Oracle? Other? Any corporate DB users/admins out there who have a feel for what might work better? ____________ | |
| ID: 910780 | | |
|
There seemed to be a few server problems earlier, none of my hosts where able to report tasks all morning. | |
| ID: 910791 | | |
|
BOINC Database Engine State # As of
Master database queries/second 696* 0m
Replica seconds behind master 62,807* 0m
[* seconds] Hmm.. maybe it have something to do with the DB that some posts/thread are not available? Also: wrong size error @ download no jobs available @ work request Also the Berkeley crew modded the forum? Now at the post view more functions [BBCode userfriendly] available.. ____________ U aren't, or U aren't happy in Ur team? seti.international ! :-) | |
| ID: 910807 | | |
|
Much more user friendly | |
| ID: 910812 | | |
|
The two stuck from last night before crash are still stuck with a wrong size error | |
| ID: 910818 | | |
|
does anyone know when the current issues are likely to be fixed, i am out of work unts for the cpu, it is trying to download about 11 units but they fail to download, even if they did wouldonly take a few hours to process, as the CPU does 8 at once. | |
| ID: 910831 | | |
|
does anyone know when the current issues are likely to be fixed, i am out of work unts for the cpu, it is trying to download about 11 units but they fail to download, even if they did wouldonly take a few hours to process, as the CPU does 8 at once. | |
| ID: 910832 | | |
Much more user friendly How is something that doesn't seem to work a large percentage of the time (recently, anyway) user friendly in any way, shape or form???? ____________ | |
| ID: 910834 | | |
|
FYI: I just started up my second 8-core (which I had shut down before because of no work) and I "got" about 50 WUs. Unfortunately, they are not downloading to my machine; they show up in both the Transfers tab and the Tasks tab. | |
| ID: 910835 | | |
|
I'm getting this on my servers, single, dual and quad's. | |
| ID: 910839 | | |
The key word here is "errors" -- each time you have an error, your quota is reduced by 1. If you've intentionally "errored" 90 VLARs, then your daily quota is 10. Return one valid work unit, and it'll double to 20, etc. ____________ | |
| ID: 910840 | | |
Much more user friendly Follow the thread to the previous message. He's referring to the buttons to automatically add BBCODE tags. If you aren't referring to the buttons that add BBCODE tags, then you're responding to the wrong topic. Edit: I've said this many times, and this is a good time to repeat it. If you are referring to the servers being out of work, that isn't meant to be "user friendly" -- those transactions are between the BOINC client and the BOINC servers. SETI strongly suggests that you crunch for more than one project, so that during lean times here you will not run out. Alternately, you can shut down for a day. The work assigned (but not yet downloaded) will be here tomorrow -- or sooner. ... but I'd suggest another project, or a bigger cache, or both. ____________ | |
| ID: 910841 | | |
|
Thank's Ned I was refering to the BUTTONS | |
| ID: 910845 | | |
Much more user friendly Actually, I was referring to the (apparent) fact that I am not getting tasks d/l that I have been (supposedly) handed by the servers, at least if the BOINC Tasks and Transfers tabs are correct. And I had already shut down this machine for most of a day, earlier. <snark/> And another project didn't invite me back, SETI did </snark> ____________ | |
| ID: 910855 | | |
Much more user friendly I know what you were referring to -- but if you review the thread you'll see that the post you quoted is about something different (and much more user friendly). As for work: I don't know what you're talking about, as I've been able to get work within the past hour. Could it be a setting? Could you have an inappropriate app_info.xml, maybe left over from before? Is it impatience? I don't know. What I do know is that a very small, dedicated staff is working very hard to keep things running smoothly, and being snarky doesn't make a difference either way. ____________ | |
| ID: 910858 | | |
Thank's Ned I was refering to the BUTTONS Yep. I followed the thread back and it was IOTTMCO. The buttons are much more user friendly. I could be snarky and complain that they don't work if you block scripting, but that's my choice and not something wrong with the project. ____________ | |
| ID: 910860 | | |
Much more user friendly You are right about the original quote - my bad. As far as BOINC 6.6.36 goes, I have changed no settings (except queue length, from 2 days -> 5 days) for several days; it was working fine for a while after the Tuesday PM, then stopped actual d/l and u/l of Tasks. As I said before, I have about 50 tasks trying to d/l... EDIT: I know the staff is working hard, but it is frustrating for me (as well as others) to be caught in the midst of whatever-this-is. ____________ | |
| ID: 910867 | | |
Why be frustrated when the project is doing exactly what they have promised? They have always promised that there will be outages, and there will be time when there is no work. They've suggested running BOINC on computers that will be on anyway. BOINC can carry a large cache (to bridge outages) and BOINC handles multiple projects so that projects don't need 99.999% reliability. If you want to run computers just to crunch, then you are certainly welcome, but you do so at your own risk. If you're frustrated, it is because your expectation is not what the project promises to deliver. I know I'm not a big cruncher by any means, but I've always got work, and I turn my computers off and on based solely on my needs, not SETI or BOINC. Sorry if that isn't what you wanted to hear. ____________ | |
| ID: 910871 | | |
|
Me too. | |
| ID: 910876 | | |
Me too. I'm saying very much the opposite. ____________ | |
| ID: 910877 | | |
If you are still having issues at this time, Upload download Problems. Stop and ReStart BOINC. I had two machine that were stuck in that loop... If it is a Service Mode/Protected install make sure you stop the Service. The other things taht could be affecting dowloads could be as Matt has added the other download server back into the mix. DNS could be/is somewhat whacked out. * Shutdown BOINC, open a command windows and type ipconfig /flushdns Wit about 30 seconds and restart BOINC. Then force a manual update. Regards ____________ Please consider a Donation to the Seti Project. | |
| ID: 910878 | | |
eh Pappa - shouldn't you also type: ipconfig /registerdns AFTER the flushdns and BEFORE restarting . . . ____________ BOINC Wiki . . .Science Status Page . . . | |
| ID: 910908 | | |
eh Pappa - shouldn't you also type: ipconfig /registerdns AFTER the flushdns and BEFORE restarting . . . Only if you're running Active Directory. It has nothing to do with DNS in the real world. ____________ | |
| ID: 910912 | | |
IPCONFIG /REGISTERDNS is only needed if you want to update your Host A record to your DNS server, which has nothing to do with flushing your local DNS resolver cache. So no, you do not have to type IPCONFIG /REGISTERDNS each time you flush it. ____________ BOINC FAQ Service BOINC & Optimized SETI download repository | |
| ID: 910913 | | |
eh Pappa - shouldn't you also type: ipconfig /registerdns AFTER the flushdns and BEFORE restarting . . . I think I would argue this one with you Ned. Its not for AD only, but for any server that supports DNS Dynamic Updates, which is mostly associated with local networks, and very often ignored by ISPs. The only reason why its ignored by your ISP is because your ISP actually communicates with your "modem"/router. This is why if you change "modems" or routers with your ISP, you usually have to force a reboot/reset to get it to work, because the ISP expects the same PHY address each time and the reset sends a sort of /REGISTERDNS to get things working again. All /REGISTERDNS does is forces an update of your DNS Server's Host A record pertaining to your computer's address. /REGISTERDNS is mostly only useful if you change your IP address manually and you would like to update the record immediately without a reboot. It does have uses with DNS in the real world because it ensures that Host A records, which will usually get updated automatically, remains accurate. For instance, changing the IP address of a member web server and wanting to update its Host A record on the primary DNS server on the network so that, even for up to 5 minutes it won't be inaccurate. ____________ BOINC FAQ Service BOINC & Optimized SETI download repository | |
| ID: 910915 | | |
The only reason why its ignored by your ISP is because your ISP actually communicates with your "modem"/router. When I pick up my home phone and dial the support number for "my" ISP, my office phone rings (home phone on line 2, office phone on line 1). If (wearing my ISP hat) it needs to be in DNS (like the A records for the upload/download servers) it needs a static IP. If it has a dynamic IP, it's a workstation on a LAN and probably shouldn't be visible beyond the LAN. I see lots of strange problems caused by "LAN" DNS and public DNS for a company trying to work in the same namespace. I have a number of customers who can't see their own web sites because their "LAN Consultant" put their domain name in Active Directory. BOINC won't care if your machine has an A record or not -- IPCONFIG /REGISTERDNS will not update the ssl.berkeley.EDU zone. ____________ | |
| ID: 910923 | | |
|
Pappa: | |
| ID: 910926 | | |
eh Pappa - shouldn't you also type: ipconfig /registerdns AFTER the flushdns and BEFORE restarting . . . I am sitting behind a Firewall/Router to Cable. I do not have an internal WINS or DNS servers active as I have not brought the Domain Controller back online. Generally "ipconfig /flushdns" is to blow out my "resolver cache" on "this machine." This fits with 99.x% of the users. If the above mentioned machines were active on my network then /registerdns would let WINS and the Others know something has happened and start a round of updates. The DNS Server would have pointers to Seti's DNS Servers (we won't go into that detail) ____________ Please consider a Donation to the Seti Project. | |
| ID: 910927 | | |
If it has a dynamic IP, it's a workstation on a LAN and probably shouldn't be visible beyond the LAN. I don't know about that. Strictly speaking maybe, but there are a handful of situations I can think of where 'other than static IP address servers' need to be accessed by other employees and therefore need to be visible in DNS. One scenario would be employee printer sharing. Another would be a local intranet web server where the DHCP or DNS Server fails, and is subsequently restored and you have employees calling on the internal help desk line complaining that they can't access resources. The easy and usual answer is to take a coffee break and it'll "magically" be there soon, but for more important employees, such as the guy who signs my paycheck, I tend to use a few extra tools such as /REGISTERDNS to get it up and running quickly for him. I see lots of strange problems caused by "LAN" DNS and public DNS for a company trying to work in the same namespace. I have a number of customers who can't see their own web sites because their "LAN Consultant" put their domain name in Active Directory. Sounds like their "LAN Consultant" didn't research everything thoroughly enough. BOINC won't care if your machine has an A record or not -- IPCONFIG /REGISTERDNS will not update the ssl.berkeley.EDU zone. Fully agreed. Using /REGISTERDNS does not somehow reload valid entires into the local resolver cache. They will get reloaded automatically through the normal lookup process (and hopefully not through a poisoned source!). ____________ BOINC FAQ Service BOINC & Optimized SETI download repository | |
| ID: 910931 | | |
Pappa: Try to reboot you machines. I had similar problems, but after a reboot they started to download the stuck WU's. Sten-Arne | |
| ID: 910937 | | |
If it has a dynamic IP, it's a workstation on a LAN and probably shouldn't be visible beyond the LAN. Shared printers should not be publicly available -- I should not be able to print to your printer (I definitely should not be able to try and print solid black on your printer, not that I'd even think of such a thing). If a workstation needs to access a shared resource, it should be on the same LAN, or it should be using some technology that extends the LAN to his location (i.e. a VPN). Anything else is too open to abuse. Your other examples are also LAN examples -- and while I guess I might allow that we're talking about the "real world" in those cases, it's a relatively small "real world" compared to the one that starts at the dot. ____________ | |
| ID: 910942 | | |
Pappa: The quick look shows that both machines are Vista Ultimiate. I sould ask how they are connected to the internet and Brand names of the router BOINC version etc. So depending on "timing" if boinc was stopped on both machines and you flushed DNS it should have dumped IP's that would have been held on the local machine. All I can say is that potentially your Router to the Internet has cached something that is incorrect. It this is the case, then stopping BOINC, ipconfig /flushdns then before you start anything... GO over to the Router and unplug power for 20 seconds. Plug power back in and allow it to reconnect to the Internet. Then on your machine open a web browser and type in setiathome.berkeley.edu This is overkill, but you could then in the same CMD window type ipconfig /displaydns and then scroll until you find the record for Seti which would look something like this. setiweb.ssl.berkeley.edu ---------------------------------------- Record Name . . . . . : setiweb.ssl.berkeley.edu Record Type . . . . . : 5 Time To Live . . . . : 277 Data Length . . . . . : 4 Section . . . . . . . : Answer CNAME Record . . . . : setiathome.ssl.berkeley.edu Then start BOINC and press the Update button. It knows the IP address for the Seti DNS Servers. setiboinc.ssl.berkeley.edu ---------------------------------------- Record Name . . . . . : setiboinc.ssl.berkeley.edu Record Type . . . . . : 1 Time To Live . . . . : 288 Data Length . . . . . : 4 Section . . . . . . . : Answer A (Host) Record . . . : 208.68.240.20 Last but not least, restart your computers and then unplug the Router for 20 seconds. Then start the transfer retries, if still get something like this... 6/24/2009 4:10:00 PM SETI@home Beta Test [error] File 03ja09ad.8874.7434.8.11.71 has wrong size: expected 375329, got 0 The system is still overloaded and having handoff errors from the scheduler to the download server(s). ____________ Please consider a Donation to the Seti Project. | |
| ID: 910945 | | |
|
| |
| ID: 910956 | | |
|
| |
| ID: 910960 | | |
This is where a 10 Day Cache is a Killer. What I suspect in your cache is that your ISP is very slow in updating DNS. I will have to think about how we got around this before. Matt had set a Very short TTL and there were many IPS's that did not honor it. When I do Lookups on both download servers I get the Berkeley IP (blocked)) and the External IP that the scheduler would handoff. So as I have sent a note Matt, it would be patient for a bit. ____________ Please consider a Donation to the Seti Project. | |
| ID: 910965 | | |
If it has a dynamic IP, it's a workstation on a LAN and probably shouldn't be visible beyond the LAN. The DNS system is used by many companies internally for many reasons. It isn't exclusive to the internet, even though it was originally designed for it. There are ways of segregating the internal DNS system from the external (or Internet) DNS system. Even though I have all my computers on my LAN registered in my local DNS, doesn't mean they are accessible through the internet. Some companies even run their own "root" servers if they do not want their employees using the internet, though that's quite rare these days but not unheard of. ____________ BOINC FAQ Service BOINC & Optimized SETI download repository | |
| ID: 910966 | | |
If it has a dynamic IP, it's a workstation on a LAN and probably shouldn't be visible beyond the LAN. setiboincdata.ssl.berkeley.EDU isn't on a corporate LAN, so the LAN discussion, while interesting, doesn't apply to "why can't I upload/download." ____________ | |
| ID: 910975 | | |
If it has a dynamic IP, it's a workstation on a LAN and probably shouldn't be visible beyond the LAN. Actually I Love this! For DNS to work properly in Windows AD, the person configuring it has to have an understanding of DNS and then configuration of DNS as it publishes to the world (infrastructure). As far as I know that part is well documented in DNS and Bind. While Windows AD takes care of a "lot" of things you stil have to know how DNS works. For the AD Domains for World Web Sites that gets a little trickier. The common problem as to why they are not visible is AD DNS is not allowed to talk to the Gateway (two way). Pick your misconfigured router/firewall/dns. If the Primary Name Server is unaccessable then it is DEAD! Yes there are other even more PhooPahsss or tricks... LOL ____________ Please consider a Donation to the Seti Project. | |
| ID: 910980 | | |
|
Well, I took the suggestion about shutting down my SETI machines, recycling my router, etc. I'm very pleased to report that it worked!!!! | |
| ID: 910983 | | |
This is where a 10 Day Cache is a Killer. What I suspect in your cache is that your ISP is very slow in updating DNS. I will have to think about how we got around this before. Matt had set a Very short TTL and there were many IPS's that did not honor it. Sorry.. I would if BOINC could.. BOINC V6.6.36 can't manage work cache > ~ 3,500 WUs. [If you go over this size.. like I posted.. BOINC go crazy.] http://setiathome.berkeley.edu/forum_thread.php?id=54289 For some months I posted here at the SETI@home forum also in the BOINC/dev forum that > BOINC V6.6.11 can't manage > ~ 2,500 WU cache. CUDA performance went very bad.. I guess because in higher BOINC versions the CUDA CPU support is very bad. [CPU support gone very much down compared with BOINC V6.6.11 .] Soo .. I'm down to 2 days.. set 3 days. [current ~ 1,600 WUs at HDD]. But BOINC can't fill up the cache because of download errors. Yes.. it's really disappointing that BOINC can't manage high WU cache. Yes.. sorry if it's sounding negative.. But.. I'm really disappointed that BOINC can't support high.. in my case.. ~ 8,000 WUs cache.. ~ 10 days. Before with my old QX6700.. 10 days.. ~ 700 WUs.. everything well. But now everyday I hope my PC will not run out of work. Ohh.. well.. Yes.. my GPU cruncher have too much performance for the SETI@home server. ____________ U aren't, or U aren't happy in Ur team? seti.international ! :-) | |
| ID: 910986 | | |
Actually it does to an extent. While I can not remember if there are 6 or 8 nameservers involved in this whole fiacsco... It means the Boinc was able to find the 208.xxx.xxx.xxx address not the 128.xxx.xxx.xxx which is the Berkeley Lan (web side). As I recall setiboincdata.ssl.berkeley.EDU is the "returned" scheduler DNS name. So unlike lookups for bruno or vader which are the two download servers and getting the proper 208.xxx.xxx.xxx addresses it would mean that your computer could not find the machine or the handoff from the scheduler did not give a resovable name or IP of the correct server to contact for that actual download. ____________ Please consider a Donation to the Seti Project. | |
| ID: 911004 | | |
Those having had the same problems, has solved the issue by a simple reboot of their machines. So, have you rebooted your troubled PC's? I'ts a DNS issue coming from the fact that the project shut down and then had to turn on again one of the download servers. REBOOT MAN, REBOOT, AND STOP COMPLAINING :-) | |
| ID: 911010 | | |
If it has a dynamic IP, it's a workstation on a LAN and probably shouldn't be visible beyond the LAN. I had already agreed with this basic premise that is true. I was just commenting on your statement that /REGISTERDNS is only for Windows AD and that it isn't used in the real world. This comment, of course, took the conversation off topic, but I felt it was worth pointing out that there are real world uses for /REGISTERDNS and the DNS system other than its applications with the Internet. ____________ BOINC FAQ Service BOINC & Optimized SETI download repository | |
| ID: 911012 | | |
Those having had the same problems, has solved the issue by a simple reboot of their machines. So, have you rebooted your troubled PC's? Message 910956 ..yes man.. made the reboot.. ;-) But maybe need to 'reboot' [ON/OFF] also the DSL-router? But.. this will not help if my ISP is too slowly.. to 'see' the new IP address of Berkeley.. or what ever.. ____________ U aren't, or U aren't happy in Ur team? seti.international ! :-) | |
| ID: 911017 | | |
|
First BOINC is a work in progress! Yes there can and will be problems... This is where a 10 Day Cache is a Killer. What I suspect in your cache is that your ISP is very slow in updating DNS. I will have to think about how we got around this before. Matt had set a Very short TTL and there were many IPS's that did not honor it. ____________ Please consider a Donation to the Seti Project. | |
| ID: 911021 | | |
Those having had the same problems, has solved the issue by a simple reboot of their machines. So, have you rebooted your troubled PC's? Reboot everything you've got to reboot :-) | |
| ID: 911022 | | |
Actually I Love this! Actually, the whole reason for having secondary name servers is so that when the primary is down, everything else can potentially still work, and in practice, that's true. Let's say you have a registered domain "pappa.org" -- the simple solution is that www.pappa.org is handled in the traditional way as documented in DNS and BIND (and others), and your Active Directory domain would be pappa.local. That way, you don't have any confusion between the ".local" namespace and the globally resolvable ".org" namespace. ... and the .org name servers are public. I should not be able to reach in from the globally routable IP space and query anything in "pappa.local" because that would let me build a complete list of resources on the domain. ____________ | |
| ID: 911032 | | |
Actually I Love this! In some cases a "limited DNS server or the appliance is placed on the edge for Global Names." It does not replicate anything *.local. It also presumes that beside the A Record for pappa.org (and the route) I have a CNAME Record for www.pappa.org that resolves. Anything inside my boundary will not resolve (except locally). I know this is boring the living daylights out of most users. As they do not know what is being talked about "other" than is currently affecting them. It is all majik (incorrectly mislabeled as the Internet to Seti)! So for Seti it is a mixed bag. As Long as the 208.xxx.xxx.xxx IP's are findable world wide from BOINC we should be okay (with time). When a change is made it takes time to get the rest of the world in sync. Sorry to cut and run have to go pick up the wife. ____________ Please consider a Donation to the Seti Project. | |
| ID: 911059 | | |
In some cases a "limited DNS server or the appliance is placed on the edge for Global Names." It does not replicate anything *.local. It also presumes that beside the A Record for pappa.org (and the route) I have a CNAME Record for www.pappa.org that resolves. Anything inside my boundary will not resolve (except locally). I know this is boring the living daylights out of most users. As they do not know what is being talked about "other" than is currently affecting them. It is all majik (incorrectly mislabeled as the Internet to Seti)! So for Seti it is a mixed bag. As Long as the 208.xxx.xxx.xxx IP's are findable world wide from BOINC we should be okay (with time). When a change is made it takes time to get the rest of the world in sync. Sorry to cut and run have to go pick up the wife. [/quote] For names like setiboincdata.ssl.berkeley.EDU, the TTL is five minutes. The "refresh" interval is 1 hour. No matter what combination of devices are used, when they make a change to an A record, old data should be completely gone in 65 minutes, tops. That assumes that a secondary updated just before the change (and can continue to answer for another 60 minutes) and that the "worst case" machine consistently hits the last secondary to update -- plus the five minutes for local caching. ____________ | |
| ID: 911083 | | |
|
Remember to Buckle your seatbelt......it makes it harder for the aliens to suck you out of the car!!! | |
| ID: 911095 | | |
|
I am still getting expected number and getting zero also connect failed, is it because the servers are getting maxed out and I have to wait until the can send me my downloads? Also got a couple of http errors which I assume is the same as connect failed, can upload but not download on my last few luckily I am doing other BOINC project so hopefully it will sort itself out again. | |
| ID: 911100 | | |
|
I've got about 70+ Work Units queued up to download. Only 2 of them have done so, all the rest give connect() failed messages. | |
| ID: 911102 | | |
|
Yeah, all of my rigs have downloads waiting to happen, and I looked at the cricket graph and we're not maxed out..but I do notice that the replica is offline, though that has nothing to do with WU downloads. | |
| ID: 911105 | | |
|
I had problems on a couple of my sys getting the downloads to start earlier, even after a reboot. So I did a ipconfig/flushdns from the command prompt and that cleared them. | |
| ID: 911108 | | |
I did a ipconfig/flushdns from the command prompt and that cleared them. Just gave that a go, no joy. I'll go back to the "See what happens overnight" method. ____________ Grant Darwin NT. | |
| ID: 911121 | | |
Forum : Number crunching : Panic Mode On (17) Server problems
| Copyright © 2009 University of California |