Message boards :
Technical News :
Lunchtime Review (Jun 02 2008)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Early Sunday morning I discovered the assimilators were all failing. Immediate analysis uncovered zero smoking guns. All the assimilators were choking on the same subset of results, and all while inserting pulses. Plus the actual processes were seg-faulting before they could produce any useful error codes. Checking the failing result files and database entries showed nothing obvious (all different sizes, submitted at different times, created by different clients, etc.). I did all I could do. I told the other guys (Bob, Jeff, Eric) - Bob's checking the database now for any subtle weird behaviour (once again I found no obvious problems yesterday) and Jeff's recompiling the assimilator code (perhaps a version that outputs useful error information). In the meantime, the assimilation cue grows, and our disk usage grows with it (as we haven't deleted anything in over a day) - sooner than later I'll have to stop the splitters to prevent storage disasters. I'll update this thread if we figure out what's up on that front. The only other real gripe right now is that our data recorder system at Arecibo is only seeing one of two data drives. Not a tragedy - we can still record data but this will put additional strain on the operators down there until we figure out why. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
Got a weird error, myself... got "6/2/2008 2:16:21 PM|SETI@home|Giving up on download of 07mr08ad.20495.8661.7.8.140: file not found" on 8 of 18 WU's ... no recent problems with the client, (5.10.30) no problems with my I-net connection. (the other 10 downloaded OK...) . Hello, from Albany, CA!... |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
Got a weird error, myself... got Ditto... About 50% of downloads are failing on my end... same message in log. Here's a result [Edit] Problem seems resolved. Downloads proceeded smoothly 'till daily quota was reached earlier than expected. I guess the failed DLs count against quota :-( [/edit] |
Swibby Bear Send message Joined: 1 Aug 01 Posts: 246 Credit: 7,945,093 RAC: 0 |
I encountered the same problem as KWSN - roughly 50% of my downloads failed in exactly the same way. The rest seem to have downloaded fine. |
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
Not fixed yet. I have one machine that is failing 100% of downloads and another that did about 50% until it hit the server limit. Both quads and both hardwired. [edit] Oops, the current failure is a different error message. "Temporarily Failed . . ." This will probably be OK once things are back up again. The previous failure was indeed the same as reported above. [/edit] Update later. You're probably aware of this already, but just in case; All downloads are now stopped. |
BMaytum Send message Joined: 3 Apr 99 Posts: 104 Credit: 4,382,041 RAC: 2 |
.... Update later. You're probably aware of this already, but just in case; All downloads are now stopped. I just downloaded 6 WUs, no problem. Sabertooth Z77, i7-3770K@4.2GHz, GTX680, W8.1Pro x64 P5N32-E SLI, C2D E8400@3Ghz, GTX580, Win7SP1Pro x64 & PCLinuxOS2015 x64 |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
In the meantime, the assimilation cue grows, and our disk usage grows with it (as we haven't deleted anything in over a day) - sooner than later I'll have to stop the splitters to prevent storage disasters. If it's possible to have a out line of how full the storage drive's are on the Server status page would people find this information useful? Thank you for you views on this. Matt & team good luck with sorting the storage issues. I'm sure you will get it sorted Cheers Speedy |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
... Plus the actual processes were seg-faulting before they could produce any useful error codes...It might be worth mentioning, just in case it's relevant, the difficulty Crunch3r was having building Linux Science apps recently (on Fedora IIRC). My understanding, perhaps incorrect, is that symptoms included unexplained segmentation faults, and lots of dependency weirdness. I believe he solved that by switching distro .... Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
ML1 Send message Joined: 25 Nov 01 Posts: 20359 Credit: 7,508,002 RAC: 20 |
... the difficulty Crunch3r was having building Linux Science apps recently (on Fedora IIRC). My understanding, perhaps incorrect, is that symptoms included unexplained segmentation faults, and lots of dependency weirdness. I believe he solved that by switching distro .... Is not the code from Berkeley now based on Ubuntu? Or is that only for the Boinc client itself? Good luck, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
... Plus the actual processes were seg-faulting before they could produce any useful error codes...It might be worth mentioning, just in case it's relevant, the difficulty Crunch3r was having building Linux Science apps recently (on Fedora IIRC). My understanding, perhaps incorrect, is that symptoms included unexplained segmentation faults, and lots of dependency weirdness. I believe he solved that by switching distro .... Sorry, Jason, this time you might have understand something wrong. Crunch3r fixed the issues he had with the current BOINCapi by using an earlier version of them files. _\|/_ U r s |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Sorry, Jason, this time you might have understand something wrong. Crunch3r fixed the issues he had with the current BOINCapi by using an earlier version of them files.Ah okay, so all the chat about distros and libs was not relevant here. Thanks Urs for clearing that up. Jason [Edit: @Martin, I believe I heard that somewhere too, but we're porting code based on Macs, which in turn is based on, and updated from, older Berkeley sources so anything is possible.] "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.