Extended outage Jul 20 2010 - problems


log in

Advanced search

Message boards : Number crunching : Extended outage Jul 20 2010 - problems

Author Message
Profile Pappa
Volunteer tester
Avatar
Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 1018032 - Posted: 21 Jul 2010, 1:36:24 UTC

Hi

If there are problems that appeared as a result of the early start time or other things that have not been talked about please post them here. If you have a major suggestion with a bit of reasoning, you are welcome to post in Outage Observations/Comments - Read the Rules Below. Actually what I digest and have feed back to the Seti staff is getting looked at. The hard part is the balance of what needs to be accomplished and what you the "users" expect.

As a Moderator, I will sticky this thread until Friday when the outage is released. This becomes the location to post observations/issues/suggestions.

Otherwise for a bit of comic relief, Panic Mode (xx version) is good (never trust anything you see there).

If you like to just relax, the Seti Cafe is Open.

For those that have something to say the Political Forum is sometimes a bit much (flakjackets and fireproof pants are recommended).

Regards

Pappa

____________
Please consider a Donation to the Seti Project.

Profile Blurf
Volunteer tester
Send message
Joined: 2 Sep 06
Posts: 7618
Credit: 7,027,026
RAC: 814
United States
Message 1018042 - Posted: 21 Jul 2010, 2:33:57 UTC

My Observation: It's seti's system--they can start up/shut down whenever they choose--just appreciate the updates.
____________


Profile Geek@Play
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,144,272
RAC: 279
United States
Message 1018048 - Posted: 21 Jul 2010, 3:20:44 UTC

We have a political forum but not a Sports forum??

And it's not the same thing!!

____________
Boinc....Boinc....Boinc....Boinc....

Profile UliProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Feb 00
Posts: 10024
Credit: 5,470,340
RAC: 115
Germany
Message 1018051 - Posted: 21 Jul 2010, 3:24:56 UTC
Last modified: 21 Jul 2010, 3:27:55 UTC

Good thought Geek@Play
____________
Pluto will always be a planet to me.
Order your 15th Seti Anniversary Shirt today. Just PM me for details.
Cash Donation Specialist

Seti Ambassador

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 16,212,255
RAC: 4,912
United States
Message 1018056 - Posted: 21 Jul 2010, 3:51:42 UTC

I don't mind the early start to the outage but I am a little worried about the problems that popped up just before they shut down. By that I mean this message...

7/20/2010 10:50:01 AM SETI@home [error] Error reported by file upload server: can't open file /home/boincadm/projects/sah/upload/3e5/06jn10aa.23528.19699.12.10.7_1_0: Read-only file system
and the Validate errors that started to appear. I know they have a script they can run for the validate errors but will we have to list them all or will they be able to do them themselves?
Another thing that would be nice if they could find some way to handle all the ghosts we are getting. It is a pain to have to run down our caches and detach to clear them but if we leave them they interfere with our getting new tasks when we come back from the outage. Also, do time outs count as errors for the daily quota?

Ok, that's all I can think of to bi....uhh complain about.

____________


PROUD MEMBER OF Team Starfire World BOINC

Profile Pappa
Volunteer tester
Avatar
Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 1018073 - Posted: 21 Jul 2010, 5:01:15 UTC

As I did get time to read a lot of the various threads. It would appear that a portion of the cause of all of the odd things that people saw including the early shutdown was the result of the Boinc Database crashing.

I have not asked, so what I offer is speculation... So with Matt on Staycation I would guess that those left were very busy.

Regards

____________
Please consider a Donation to the Seti Project.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8763
Credit: 52,715,353
RAC: 18,446
United Kingdom
Message 1018091 - Posted: 21 Jul 2010, 7:36:04 UTC - in response to Message 1018073.

As I did get time to read a lot of the various threads. It would appear that a portion of the cause of all of the odd things that people saw including the early shutdown was the result of the Boinc Database crashing.

Pappa,

Whatever it was, it didn't feel like a database crash from here. First there was a network outage, cutting off uploads and web server access ('page not found'), and from the Cricket graph downloads too, though I can't confirm that from personal observation. None of those has anything to do with BOINC, and a database crash causes different symptoms.

Then there were file storage problems - the upload area reporting itself to be read-only,, and the validator failing to find previously-uploaded and previously-accessible result files (as Joe has pointed out). Again, nothing to do with database access there.

The speculation I put in my PM to Jeff - and it is only speculation (no reply as yet), from 6,000 miles away, and to be taken with a pinch of salt - was a power surge or brownout which triggered some, but not all, of the mess of interconnected devices in the server closet to reboot themselves. So some machines carried on as normal - web server, upload server: they're on UPS, I think - but other devices, such as the big network storage unit, weren't ready for a while. That's what happens after a brownout on my little network at home: different devices have different susceptibilities.

Profile Fred J. Verster
Volunteer tester
Avatar
Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,902,797
RAC: 257
Netherlands
Message 1018102 - Posted: 21 Jul 2010, 10:23:28 UTC - in response to Message 1018091.
Last modified: 21 Jul 2010, 10:35:54 UTC

And I posted already in another thread, 40 MB tasks which have a Detached Label, they where UPloaded 13 july. (Previous outage)
One, my biggest, host appears still death, so I switched 1 8500GT for a GTS250, from that host, into another one.
This host now has a GTS250 .
And UPLoads are stuck again, but a look at the SERVER Page, shows;
only BOINC, MB, AP-Data Bases and Data-Driven WEB-Pages, are online and running, the rest, including UPLoad/D--Load is disabled or Not Running
.

Could it be the switch (of the CUDA card), that errored out, maybe 4 MB WU's, but 40 tasks ?!
Or they were 'ghost WU's ', or something went wrong during UPLoad, which I doubt!
A little out of topic :)
Hope to get some more tasks, AP tasks @ BĂȘta are finished, most of them, have to be validated, but are OK. Only 1 AP errored, all were computed with BROOK+ OpenCL rev.434., whitout any error.
____________

divedude
Send message
Joined: 5 Jun 06
Posts: 9
Credit: 2,074,528
RAC: 0
United States
Message 1018560 - Posted: 23 Jul 2010, 1:52:06 UTC

I set my account to download 10 days data to download and process and only 1 of my systems is rcving all the work units. Have not rcvd new WU's in several days. What is up?
____________

Profile Pappa
Volunteer tester
Avatar
Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 1018591 - Posted: 23 Jul 2010, 4:44:35 UTC - in response to Message 1018560.

I set my account to download 10 days data to download and process and only 1 of my systems is rcving all the work units. Have not rcvd new WU's in several days. What is up?


Every Week, we have extended outages. Tuesday to Friday. This is to work on server issues that can not be completed in about 7 hours and start doing some Science that has been put off for ages.

Friday, tomorrow afternoon... Everyone should be at the point where things start to flow again.

If I seem a bit "loopy," I have did about 700 miles driving in the last three days. None of it was "pretty." Don't ask...

Regards

____________
Please consider a Donation to the Seti Project.

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,647,395
RAC: 516
United States
Message 1018653 - Posted: 23 Jul 2010, 12:37:51 UTC

Okay.. I suppose the outtage will start to come up soon, I have coffee brewing, and the mad rush to upload on the horizon...

Are there any major changes we might want to watch out for?
____________

Janice

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8763
Credit: 52,715,353
RAC: 18,446
United Kingdom
Message 1018656 - Posted: 23 Jul 2010, 12:48:44 UTC - in response to Message 1018653.

Okay.. I suppose the outtage will start to come up soon, I have coffee brewing, and the mad rush to upload on the horizon...

Are there any major changes we might want to watch out for?

None that I know of. Just watch out for new posts by Jeff Cobb or the team. Let's hope he's remembered to have a look at the upload server filesystem, and reset the validators, before going live.

Bernd Noessler
Send message
Joined: 15 Nov 09
Posts: 99
Credit: 52,635,434
RAC: 0
Germany
Message 1018657 - Posted: 23 Jul 2010, 12:55:29 UTC - in response to Message 1018653.


But maybe we will see the .vlar tasks. If they updated the seti splitters.

Profile Scarecrow
Avatar
Send message
Joined: 15 Jul 00
Posts: 4395
Credit: 459,613
RAC: 15
United States
Message 1018660 - Posted: 23 Jul 2010, 13:15:12 UTC

Place your bets.

or

Profile Bryan Wallace
Volunteer tester
Send message
Joined: 28 Jul 00
Posts: 22
Credit: 52,559,173
RAC: 203
United States
Message 1018669 - Posted: 23 Jul 2010, 13:46:40 UTC

yeah, having 10 days' worth of data is highly recommended, especially for high-output systems. nothing is more frustrating than watching your i7 rig with a gtx 260 sitting idle for 3 days...

keep in mind, however, that if you have a slower computer without a gpu cruncher then you probably only need 2-3 days instead.
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8763
Credit: 52,715,353
RAC: 18,446
United Kingdom
Message 1018737 - Posted: 23 Jul 2010, 16:30:37 UTC - in response to Message 1018657.


But maybe we will see the .vlar tasks. If they updated the seti splitters.

So Jeff says.

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,647,395
RAC: 516
United States
Message 1018769 - Posted: 23 Jul 2010, 17:46:11 UTC

I have to say.. this has been one of the smoothest outtages I have seen over all. Pending numbers seem to be going way up, I am not sure why, But other than the crash going into the outtage, all seems to have brought few surprises.

In addition, the perception that feedback is being taken into consideration..
Well it goes a long way and it was evident.





____________

Janice

Profile HelliProject donor
Volunteer tester
Avatar
Send message
Joined: 15 Dec 99
Posts: 704
Credit: 91,905,667
RAC: 28,383
Germany
Message 1018821 - Posted: 23 Jul 2010, 19:11:37 UTC

Well, the MDB Queries are relative high - near 1.200. And the Upload Rate is over 45 Mbits/sec - i haven't seen such a high value...

Helli

____________
A loooong time ago: My first Credits

Message boards : Number crunching : Extended outage Jul 20 2010 - problems

Copyright © 2014 University of California