Lunchtime Review (Jun 02 2008)

Message boards : Technical News : Lunchtime Review (Jun 02 2008)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 762156 - Posted: 2 Jun 2008, 18:58:32 UTC

Early Sunday morning I discovered the assimilators were all failing. Immediate analysis uncovered zero smoking guns. All the assimilators were choking on the same subset of results, and all while inserting pulses. Plus the actual processes were seg-faulting before they could produce any useful error codes. Checking the failing result files and database entries showed nothing obvious (all different sizes, submitted at different times, created by different clients, etc.). I did all I could do. I told the other guys (Bob, Jeff, Eric) - Bob's checking the database now for any subtle weird behaviour (once again I found no obvious problems yesterday) and Jeff's recompiling the assimilator code (perhaps a version that outputs useful error information). In the meantime, the assimilation cue grows, and our disk usage grows with it (as we haven't deleted anything in over a day) - sooner than later I'll have to stop the splitters to prevent storage disasters. I'll update this thread if we figure out what's up on that front.

The only other real gripe right now is that our data recorder system at Arecibo is only seeing one of two data drives. Not a tragedy - we can still record data but this will put additional strain on the operators down there until we figure out why.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 762156 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 762199 - Posted: 2 Jun 2008, 21:29:08 UTC

Got a weird error, myself... got
"6/2/2008 2:16:21 PM|SETI@home|Giving up on download of 07mr08ad.20495.8661.7.8.140: file not found"

on 8 of 18 WU's ... no recent problems with the client, (5.10.30) no problems with my I-net connection. (the other 10 downloaded OK...)

.

Hello, from Albany, CA!...
ID: 762199 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 762213 - Posted: 2 Jun 2008, 22:01:48 UTC - in response to Message 762199.  
Last modified: 2 Jun 2008, 22:34:07 UTC

Got a weird error, myself... got
"6/2/2008 2:16:21 PM|SETI@home|Giving up on download of 07mr08ad.20495.8661.7.8.140: file not found"

on 8 of 18 WU's ... no recent problems with the client, (5.10.30) no problems with my I-net connection. (the other 10 downloaded OK...)


Ditto...

About 50% of downloads are failing on my end... same message in log.

Here's a result


[Edit] Problem seems resolved. Downloads proceeded smoothly 'till daily quota was reached earlier than expected. I guess the failed DLs count against quota :-( [/edit]
ID: 762213 · Report as offensive
Swibby Bear

Send message
Joined: 1 Aug 01
Posts: 246
Credit: 7,945,093
RAC: 0
United States
Message 762215 - Posted: 2 Jun 2008, 22:03:49 UTC

I encountered the same problem as KWSN - roughly 50% of my downloads failed in exactly the same way. The rest seem to have downloaded fine.
ID: 762215 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 762248 - Posted: 2 Jun 2008, 23:07:54 UTC
Last modified: 2 Jun 2008, 23:43:01 UTC

Not fixed yet. I have one machine that is failing 100% of downloads and another that did about 50% until it hit the server limit. Both quads and both hardwired.
[edit] Oops, the current failure is a different error message. "Temporarily Failed . . ." This will probably be OK once things are back up again. The previous failure was indeed the same as reported above. [/edit]

Update later. You're probably aware of this already, but just in case; All downloads are now stopped.
ID: 762248 · Report as offensive
Profile BMaytum
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 104
Credit: 4,382,041
RAC: 2
United States
Message 762310 - Posted: 3 Jun 2008, 2:30:34 UTC - in response to Message 762248.  

.... Update later. You're probably aware of this already, but just in case; All downloads are now stopped.


I just downloaded 6 WUs, no problem.

Sabertooth Z77, i7-3770K@4.2GHz, GTX680, W8.1Pro x64
P5N32-E SLI, C2D E8400@3Ghz, GTX580, Win7SP1Pro x64 & PCLinuxOS2015 x64
ID: 762310 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 762359 - Posted: 3 Jun 2008, 6:36:35 UTC - in response to Message 762156.  
Last modified: 3 Jun 2008, 6:44:31 UTC

In the meantime, the assimilation cue grows, and our disk usage grows with it (as we haven't deleted anything in over a day) - sooner than later I'll have to stop the splitters to prevent storage disasters.
- Matt

If it's possible to have a out line of how full the storage drive's are on the Server status page would people find this information useful?
Thank you for you views on this.

Matt & team good luck with sorting the storage issues. I'm sure you will get it sorted

Cheers
Speedy
ID: 762359 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 762397 - Posted: 3 Jun 2008, 12:19:44 UTC - in response to Message 762156.  

... Plus the actual processes were seg-faulting before they could produce any useful error codes...
It might be worth mentioning, just in case it's relevant, the difficulty Crunch3r was having building Linux Science apps recently (on Fedora IIRC). My understanding, perhaps incorrect, is that symptoms included unexplained segmentation faults, and lots of dependency weirdness. I believe he solved that by switching distro ....

Jason


"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 762397 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20291
Credit: 7,508,002
RAC: 20
United Kingdom
Message 762476 - Posted: 3 Jun 2008, 16:11:57 UTC - in response to Message 762397.  

... the difficulty Crunch3r was having building Linux Science apps recently (on Fedora IIRC). My understanding, perhaps incorrect, is that symptoms included unexplained segmentation faults, and lots of dependency weirdness. I believe he solved that by switching distro ....

Is not the code from Berkeley now based on Ubuntu?

Or is that only for the Boinc client itself?

Good luck,
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 762476 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 762508 - Posted: 3 Jun 2008, 21:53:49 UTC - in response to Message 762397.  
Last modified: 3 Jun 2008, 21:54:18 UTC

... Plus the actual processes were seg-faulting before they could produce any useful error codes...
It might be worth mentioning, just in case it's relevant, the difficulty Crunch3r was having building Linux Science apps recently (on Fedora IIRC). My understanding, perhaps incorrect, is that symptoms included unexplained segmentation faults, and lots of dependency weirdness. I believe he solved that by switching distro ....

Jason

Sorry, Jason, this time you might have understand something wrong. Crunch3r fixed the issues he had with the current BOINCapi by using an earlier version of them files.
_\|/_
U r s
ID: 762508 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 762642 - Posted: 4 Jun 2008, 2:05:21 UTC - in response to Message 762508.  
Last modified: 4 Jun 2008, 2:23:23 UTC

Sorry, Jason, this time you might have understand something wrong. Crunch3r fixed the issues he had with the current BOINCapi by using an earlier version of them files.
Ah okay, so all the chat about distros and libs was not relevant here. Thanks Urs for clearing that up.

Jason

[Edit: @Martin, I believe I heard that somewhere too, but we're porting code based on Macs, which in turn is based on, and updated from, older Berkeley sources so anything is possible.]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 762642 · Report as offensive

Message boards : Technical News : Lunchtime Review (Jun 02 2008)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.