Extended outage Jul 20 2010 - problems


log in

Advanced search

Message boards : Number crunching : Extended outage Jul 20 2010 - problems

Author Message
Profile Pappa
Volunteer tester
Avatar
Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 1018032 - Posted: 21 Jul 2010, 1:36:24 UTC

Hi

If there are problems that appeared as a result of the early start time or other things that have not been talked about please post them here. If you have a major suggestion with a bit of reasoning, you are welcome to post in Outage Observations/Comments - Read the Rules Below. Actually what I digest and have feed back to the Seti staff is getting looked at. The hard part is the balance of what needs to be accomplished and what you the "users" expect.

As a Moderator, I will sticky this thread until Friday when the outage is released. This becomes the location to post observations/issues/suggestions.

Otherwise for a bit of comic relief, Panic Mode (xx version) is good (never trust anything you see there).

If you like to just relax, the Seti Cafe is Open.

For those that have something to say the Political Forum is sometimes a bit much (flakjackets and fireproof pants are recommended).

Regards

Pappa

____________
Please consider a Donation to the Seti Project.

Profile Blurf
Volunteer tester
Send message
Joined: 2 Sep 06
Posts: 7269
Credit: 6,275,844
RAC: 2,189
United States
Message 1018042 - Posted: 21 Jul 2010, 2:33:57 UTC

My Observation: It's seti's system--they can start up/shut down whenever they choose--just appreciate the updates.
____________


Profile Geek@Play
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2460
Credit: 83,938,252
RAC: 29,899
United States
Message 1018048 - Posted: 21 Jul 2010, 3:20:44 UTC

We have a political forum but not a Sports forum??

And it's not the same thing!!

____________
Boinc....Boinc....Boinc....Boinc....

Profile Uli
Volunteer tester
Avatar
Send message
Joined: 6 Feb 00
Posts: 9360
Credit: 4,940,283
RAC: 3,348
Germany
Message 1018051 - Posted: 21 Jul 2010, 3:24:56 UTC
Last modified: 21 Jul 2010, 3:27:55 UTC

Good thought Geek@Play
____________
Pluto will always be a planet to me.

Cash Donation Specialist

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 13,738,567
RAC: 11,979
United States
Message 1018056 - Posted: 21 Jul 2010, 3:51:42 UTC

I don't mind the early start to the outage but I am a little worried about the problems that popped up just before they shut down. By that I mean this message...

7/20/2010 10:50:01 AM SETI@home [error] Error reported by file upload server: can't open file /home/boincadm/projects/sah/upload/3e5/06jn10aa.23528.19699.12.10.7_1_0: Read-only file system
and the Validate errors that started to appear. I know they have a script they can run for the validate errors but will we have to list them all or will they be able to do them themselves?
Another thing that would be nice if they could find some way to handle all the ghosts we are getting. It is a pain to have to run down our caches and detach to clear them but if we leave them they interfere with our getting new tasks when we come back from the outage. Also, do time outs count as errors for the daily quota?

Ok, that's all I can think of to bi....uhh complain about.

____________


PROUD MEMBER OF Team Starfire World BOINC

Profile Pappa
Volunteer tester
Avatar
Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 1018073 - Posted: 21 Jul 2010, 5:01:15 UTC

As I did get time to read a lot of the various threads. It would appear that a portion of the cause of all of the odd things that people saw including the early shutdown was the result of the Boinc Database crashing.

I have not asked, so what I offer is speculation... So with Matt on Staycation I would guess that those left were very busy.

Regards

____________
Please consider a Donation to the Seti Project.

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 45,006,375
RAC: 13,657
United Kingdom
Message 1018091 - Posted: 21 Jul 2010, 7:36:04 UTC - in response to Message 1018073.

As I did get time to read a lot of the various threads. It would appear that a portion of the cause of all of the odd things that people saw including the early shutdown was the result of the Boinc Database crashing.

Pappa,

Whatever it was, it didn't feel like a database crash from here. First there was a network outage, cutting off uploads and web server access ('page not found'), and from the Cricket graph downloads too, though I can't confirm that from personal observation. None of those has anything to do with BOINC, and a database crash causes different symptoms.

Then there were file storage problems - the upload area reporting itself to be read-only,, and the validator failing to find previously-uploaded and previously-accessible result files (as Joe has pointed out). Again, nothing to do with database access there.

The speculation I put in my PM to Jeff - and it is only speculation (no reply as yet), from 6,000 miles away, and to be taken with a pinch of salt - was a power surge or brownout which triggered some, but not all, of the mess of interconnected devices in the server closet to reboot themselves. So some machines carried on as normal - web server, upload server: they're on UPS, I think - but other devices, such as the big network storage unit, weren't ready for a while. That's what happens after a brownout on my little network at home: different devices have different susceptibilities.

Profile Fred J. Verster
Volunteer tester
Avatar
Send message
Joined: 21 Apr 04
Posts: 3232
Credit: 31,585,541
RAC: 0
Netherlands
Message 1018102 - Posted: 21 Jul 2010, 10:23:28 UTC - in response to Message 1018091.
Last modified: 21 Jul 2010, 10:35:54 UTC

And I posted already in another thread, 40 MB tasks which have a Detached Label, they where UPloaded 13 july. (Previous outage)
One, my biggest, host appears still death, so I switched 1 8500GT for a GTS250, from that host, into another one.
This host now has a GTS250 .
And UPLoads are stuck again, but a look at the SERVER Page, shows;
only BOINC, MB, AP-Data Bases and Data-Driven WEB-Pages, are online and running, the rest, including UPLoad/D--Load is disabled or Not Running
.

Could it be the switch (of the CUDA card), that errored out, maybe 4 MB WU's, but 40 tasks ?!
Or they were 'ghost WU's ', or something went wrong during UPLoad, which I doubt!
A little out of topic :)
Hope to get some more tasks, AP tasks @ BĂȘta are finished, most of them, have to be validated, but are OK. Only 1 AP errored, all were computed with BROOK+ OpenCL rev.434., whitout any error.
____________


Knight Who Says Ni N!, OUT numbered.................

divedude
Send message
Joined: 5 Jun 06
Posts: 9
Credit: 2,074,528
RAC: 0
United States
Message 1018560 - Posted: 23 Jul 2010, 1:52:06 UTC

I set my account to download 10 days data to download and process and only 1 of my systems is rcving all the work units. Have not rcvd new WU's in several days. What is up?
____________

Profile Pappa
Volunteer tester
Avatar
Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 1018591 - Posted: 23 Jul 2010, 4:44:35 UTC - in response to Message 1018560.

I set my account to download 10 days data to download and process and only 1 of my systems is rcving all the work units. Have not rcvd new WU's in several days. What is up?


Every Week, we have extended outages. Tuesday to Friday. This is to work on server issues that can not be completed in about 7 hours and start doing some Science that has been put off for ages.

Friday, tomorrow afternoon... Everyone should be at the point where things start to flow again.

If I seem a bit "loopy," I have did about 700 miles driving in the last three days. None of it was "pretty." Don't ask...

Regards

____________
Please consider a Donation to the Seti Project.

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,216,786
RAC: 128
United States
Message 1018653 - Posted: 23 Jul 2010, 12:37:51 UTC

Okay.. I suppose the outtage will start to come up soon, I have coffee brewing, and the mad rush to upload on the horizon...

Are there any major changes we might want to watch out for?
____________

Janice

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 45,006,375
RAC: 13,657
United Kingdom
Message 1018656 - Posted: 23 Jul 2010, 12:48:44 UTC - in response to Message 1018653.

Okay.. I suppose the outtage will start to come up soon, I have coffee brewing, and the mad rush to upload on the horizon...

Are there any major changes we might want to watch out for?

None that I know of. Just watch out for new posts by Jeff Cobb or the team. Let's hope he's remembered to have a look at the upload server filesystem, and reset the validators, before going live.

Bernd Noessler
Send message
Joined: 15 Nov 09
Posts: 99
Credit: 52,635,315
RAC: 0
Germany
Message 1018657 - Posted: 23 Jul 2010, 12:55:29 UTC - in response to Message 1018653.


But maybe we will see the .vlar tasks. If they updated the seti splitters.

Profile Scarecrow
Avatar
Send message
Joined: 15 Jul 00
Posts: 4376
Credit: 451,359
RAC: 366
United States
Message 1018660 - Posted: 23 Jul 2010, 13:15:12 UTC

Place your bets.

or

Profile Bryan Wallace
Volunteer tester
Send message
Joined: 28 Jul 00
Posts: 22
Credit: 51,871,903
RAC: 5,297
United States
Message 1018669 - Posted: 23 Jul 2010, 13:46:40 UTC

yeah, having 10 days' worth of data is highly recommended, especially for high-output systems. nothing is more frustrating than watching your i7 rig with a gtx 260 sitting idle for 3 days...

keep in mind, however, that if you have a slower computer without a gpu cruncher then you probably only need 2-3 days instead.
____________

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 45,006,375
RAC: 13,657
United Kingdom
Message 1018737 - Posted: 23 Jul 2010, 16:30:37 UTC - in response to Message 1018657.


But maybe we will see the .vlar tasks. If they updated the seti splitters.

So Jeff says.

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,216,786
RAC: 128
United States
Message 1018769 - Posted: 23 Jul 2010, 17:46:11 UTC

I have to say.. this has been one of the smoothest outtages I have seen over all. Pending numbers seem to be going way up, I am not sure why, But other than the crash going into the outtage, all seems to have brought few surprises.

In addition, the perception that feedback is being taken into consideration..
Well it goes a long way and it was evident.





____________

Janice

Profile Helli
Volunteer tester
Avatar
Send message
Joined: 15 Dec 99
Posts: 697
Credit: 77,754,871
RAC: 79,878
Germany
Message 1018821 - Posted: 23 Jul 2010, 19:11:37 UTC

Well, the MDB Queries are relative high - near 1.200. And the Upload Rate is over 45 Mbits/sec - i haven't seen such a high value...

Helli

____________

Message boards : Number crunching : Extended outage Jul 20 2010 - problems

Copyright © 2014 University of California