Extended outage Jul 20 2010 - problems

Message boards : Number crunching : Extended outage Jul 20 2010 - problems
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 1018032 - Posted: 21 Jul 2010, 1:36:24 UTC

Hi

If there are problems that appeared as a result of the early start time or other things that have not been talked about please post them here. If you have a major suggestion with a bit of reasoning, you are welcome to post in Outage Observations/Comments - Read the Rules Below. Actually what I digest and have feed back to the Seti staff is getting looked at. The hard part is the balance of what needs to be accomplished and what you the "users" expect.

As a Moderator, I will sticky this thread until Friday when the outage is released. This becomes the location to post observations/issues/suggestions.

Otherwise for a bit of comic relief, Panic Mode (xx version) is good (never trust anything you see there).

If you like to just relax, the Seti Cafe is Open.

For those that have something to say the Political Forum is sometimes a bit much (flakjackets and fireproof pants are recommended).

Regards

Pappa

Please consider a Donation to the Seti Project.

ID: 1018032 · Report as offensive
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8962
Credit: 12,678,685
RAC: 0
United States
Message 1018042 - Posted: 21 Jul 2010, 2:33:57 UTC

My Observation: It's seti's system--they can start up/shut down whenever they choose--just appreciate the updates.


ID: 1018042 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 1018048 - Posted: 21 Jul 2010, 3:20:44 UTC

We have a political forum but not a Sports forum??

And it's not the same thing!!

Boinc....Boinc....Boinc....Boinc....
ID: 1018048 · Report as offensive
Profile Uli
Volunteer tester
Avatar

Send message
Joined: 6 Feb 00
Posts: 10923
Credit: 5,996,015
RAC: 1
Germany
Message 1018051 - Posted: 21 Jul 2010, 3:24:56 UTC
Last modified: 21 Jul 2010, 3:27:55 UTC

Good thought Geek@Play
Pluto will always be a planet to me.

Seti Ambassador
Not to late to order an Anni Shirt
ID: 1018051 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1018056 - Posted: 21 Jul 2010, 3:51:42 UTC

I don't mind the early start to the outage but I am a little worried about the problems that popped up just before they shut down. By that I mean this message...
7/20/2010 10:50:01 AM SETI@home [error] Error reported by file upload server: can't open file /home/boincadm/projects/sah/upload/3e5/06jn10aa.23528.19699.12.10.7_1_0: Read-only file system
and the Validate errors that started to appear. I know they have a script they can run for the validate errors but will we have to list them all or will they be able to do them themselves?
Another thing that would be nice if they could find some way to handle all the ghosts we are getting. It is a pain to have to run down our caches and detach to clear them but if we leave them they interfere with our getting new tasks when we come back from the outage. Also, do time outs count as errors for the daily quota?

Ok, that's all I can think of to bi....uhh complain about.



PROUD MEMBER OF Team Starfire World BOINC
ID: 1018056 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 1018073 - Posted: 21 Jul 2010, 5:01:15 UTC

As I did get time to read a lot of the various threads. It would appear that a portion of the cause of all of the odd things that people saw including the early shutdown was the result of the Boinc Database crashing.

I have not asked, so what I offer is speculation... So with Matt on Staycation I would guess that those left were very busy.

Regards

Please consider a Donation to the Seti Project.

ID: 1018073 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1018091 - Posted: 21 Jul 2010, 7:36:04 UTC - in response to Message 1018073.  

As I did get time to read a lot of the various threads. It would appear that a portion of the cause of all of the odd things that people saw including the early shutdown was the result of the Boinc Database crashing.

Pappa,

Whatever it was, it didn't feel like a database crash from here. First there was a network outage, cutting off uploads and web server access ('page not found'), and from the Cricket graph downloads too, though I can't confirm that from personal observation. None of those has anything to do with BOINC, and a database crash causes different symptoms.

Then there were file storage problems - the upload area reporting itself to be read-only,, and the validator failing to find previously-uploaded and previously-accessible result files (as Joe has pointed out). Again, nothing to do with database access there.

The speculation I put in my PM to Jeff - and it is only speculation (no reply as yet), from 6,000 miles away, and to be taken with a pinch of salt - was a power surge or brownout which triggered some, but not all, of the mess of interconnected devices in the server closet to reboot themselves. So some machines carried on as normal - web server, upload server: they're on UPS, I think - but other devices, such as the big network storage unit, weren't ready for a while. That's what happens after a brownout on my little network at home: different devices have different susceptibilities.
ID: 1018091 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1018102 - Posted: 21 Jul 2010, 10:23:28 UTC - in response to Message 1018091.  
Last modified: 21 Jul 2010, 10:35:54 UTC

And I posted already in another thread, 40 MB tasks which have a Detached Label, they where UPloaded 13 july. (Previous outage)
One, my biggest, host appears still death, so I switched 1 8500GT for a GTS250, from that host, into another one.
This host now has a GTS250 .
And UPLoads are stuck again, but a look at the SERVER Page, shows;
only BOINC, MB, AP-Data Bases and Data-Driven WEB-Pages, are online and running, the rest, including UPLoad/D--Load is disabled or Not Running
.

Could it be the switch (of the CUDA card), that errored out, maybe 4 MB WU's, but 40 tasks ?!
Or they were 'ghost WU's ', or something went wrong during UPLoad, which I doubt!
A little out of topic :)
Hope to get some more tasks, AP tasks @ Bêta are finished, most of them, have to be validated, but are OK. Only 1 AP errored, all were computed with BROOK+ OpenCL rev.434., whitout any error.
ID: 1018102 · Report as offensive
divedude

Send message
Joined: 5 Jun 06
Posts: 9
Credit: 4,394,705
RAC: 0
United States
Message 1018560 - Posted: 23 Jul 2010, 1:52:06 UTC

I set my account to download 10 days data to download and process and only 1 of my systems is rcving all the work units. Have not rcvd new WU's in several days. What is up?
ID: 1018560 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 1018591 - Posted: 23 Jul 2010, 4:44:35 UTC - in response to Message 1018560.  

I set my account to download 10 days data to download and process and only 1 of my systems is rcving all the work units. Have not rcvd new WU's in several days. What is up?


Every Week, we have extended outages. Tuesday to Friday. This is to work on server issues that can not be completed in about 7 hours and start doing some Science that has been put off for ages.

Friday, tomorrow afternoon... Everyone should be at the point where things start to flow again.

If I seem a bit "loopy," I have did about 700 miles driving in the last three days. None of it was "pretty." Don't ask...

Regards

Please consider a Donation to the Seti Project.

ID: 1018591 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1018653 - Posted: 23 Jul 2010, 12:37:51 UTC

Okay.. I suppose the outtage will start to come up soon, I have coffee brewing, and the mad rush to upload on the horizon...

Are there any major changes we might want to watch out for?
Janice
ID: 1018653 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1018656 - Posted: 23 Jul 2010, 12:48:44 UTC - in response to Message 1018653.  

Okay.. I suppose the outtage will start to come up soon, I have coffee brewing, and the mad rush to upload on the horizon...

Are there any major changes we might want to watch out for?

None that I know of. Just watch out for new posts by Jeff Cobb or the team. Let's hope he's remembered to have a look at the upload server filesystem, and reset the validators, before going live.
ID: 1018656 · Report as offensive
Bernd Noessler

Send message
Joined: 15 Nov 09
Posts: 99
Credit: 52,635,434
RAC: 0
Germany
Message 1018657 - Posted: 23 Jul 2010, 12:55:29 UTC - in response to Message 1018653.  


But maybe we will see the .vlar tasks. If they updated the seti splitters.

ID: 1018657 · Report as offensive
Scarecrow

Send message
Joined: 15 Jul 00
Posts: 4520
Credit: 486,601
RAC: 0
United States
Message 1018660 - Posted: 23 Jul 2010, 13:15:12 UTC

Place your bets.

or
ID: 1018660 · Report as offensive
Profile Bryan Wallace
Volunteer tester

Send message
Joined: 28 Jul 00
Posts: 22
Credit: 52,559,173
RAC: 0
United States
Message 1018669 - Posted: 23 Jul 2010, 13:46:40 UTC

yeah, having 10 days' worth of data is highly recommended, especially for high-output systems. nothing is more frustrating than watching your i7 rig with a gtx 260 sitting idle for 3 days...

keep in mind, however, that if you have a slower computer without a gpu cruncher then you probably only need 2-3 days instead.
ID: 1018669 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1018737 - Posted: 23 Jul 2010, 16:30:37 UTC - in response to Message 1018657.  


But maybe we will see the .vlar tasks. If they updated the seti splitters.

So Jeff says.
ID: 1018737 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1018769 - Posted: 23 Jul 2010, 17:46:11 UTC

I have to say.. this has been one of the smoothest outtages I have seen over all. Pending numbers seem to be going way up, I am not sure why, But other than the crash going into the outtage, all seems to have brought few surprises.

In addition, the perception that feedback is being taken into consideration..
Well it goes a long way and it was evident.





Janice
ID: 1018769 · Report as offensive
Profile Helli_retiered
Volunteer tester
Avatar

Send message
Joined: 15 Dec 99
Posts: 707
Credit: 108,785,585
RAC: 0
Germany
Message 1018821 - Posted: 23 Jul 2010, 19:11:37 UTC

Well, the MDB Queries are relative high - near 1.200. And the Upload Rate is over 45 Mbits/sec - i haven't seen such a high value...

Helli

A loooong time ago: First Credits after SETI@home Restart
ID: 1018821 · Report as offensive

Message boards : Number crunching : Extended outage Jul 20 2010 - problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.