Panic Mode On (33) Server problems

Message boards : Number crunching : Panic Mode On (33) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1006629 - Posted: 20 Jun 2010, 17:44:02 UTC

Well, I, for one, have calmed down.
I have to resign myself to the fact that since I have chosen to crunch Seti, and Seti only, I must accept the downtime of the project gracefully.

And there will be some times when my rigs cannot get enough work on hand to keep them running during extended or sequential project failures.

Such is what I have chosen to do.

Welcome to life in Setiland.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1006629 · Report as offensive
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8962
Credit: 12,678,685
RAC: 0
United States
Message 1006643 - Posted: 20 Jun 2010, 18:53:58 UTC - in response to Message 1006642.  

Well, I, for one, have calmed down.
I have to resign myself to the fact that since I have chosen to crunch Seti, and Seti only, I must accept the downtime of the project gracefully.

And there will be some times when my rigs cannot get enough work on hand to keep them running during extended or sequential project failures.

Such is what I have chosen to do.

Welcome to life in Setiland.


Mark, that is the most marvellous post I think I have ever seen you make. I always knew it was worth believing in you.

Take care now.

Chris S.


The calm is nice to see and appreciated.


ID: 1006643 · Report as offensive
Profile Dave Cummings
Volunteer tester

Send message
Joined: 16 May 09
Posts: 219
Credit: 1,193,729
RAC: 0
United Kingdom
Message 1006698 - Posted: 20 Jun 2010, 22:27:44 UTC

glad to see u back
ID: 1006698 · Report as offensive
Highlander
Avatar

Send message
Joined: 5 Oct 99
Posts: 167
Credit: 37,987,668
RAC: 16
Germany
Message 1006799 - Posted: 21 Jun 2010, 6:30:33 UTC

funny msg:

21.06.2010 08:19:04 [error] Error reported by file upload server: Server is out of disk space

its no wonder with over 1 million unvalidated WU
- Performance is not a simple linear function of the number of CPUs you throw at the problem. -
ID: 1006799 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 1006802 - Posted: 21 Jun 2010, 6:42:46 UTC - in response to Message 1006799.  

funny msg:

21.06.2010 08:19:04 [error] Error reported by file upload server: Server is out of disk space

its no wonder with over 1 million unvalidated WU


Got this message as well today with an upload transient error, will just sit and wait until it has been fixed
ID: 1006802 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65738
Credit: 55,293,173
RAC: 49
United States
Message 1006830 - Posted: 21 Jun 2010, 7:50:52 UTC - in response to Message 1006802.  

funny msg:

21.06.2010 08:19:04 [error] Error reported by file upload server: Server is out of disk space

its no wonder with over 1 million unvalidated WU


Got this message as well today with an upload transient error, will just sit and wait until it has been fixed

I turned off network access in Boinc earlier when I heard You all talking about It, I'm sure It'll get fixed on Monday or Tuesday, So We wait.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1006830 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19048
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1006831 - Posted: 21 Jun 2010, 8:35:19 UTC

Had to happen sooner or later.
If the validators are switched off then the results are not been transferred to the science database. So no space is being emptied to make room for further uploads.
ID: 1006831 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1006836 - Posted: 21 Jun 2010, 8:54:41 UTC

Actually it would help much more if the assimilators would do their job. As long as they are not working, validating the results don't help to free up some disk space.
ID: 1006836 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 1006860 - Posted: 21 Jun 2010, 12:28:27 UTC

Now we have really hit the end of the world. Not only are the project developers playfully screwing around with the boinc structure, apparently affecting not just SETI but many other projects per some other posts around here, but Scarecrow is off-line due to thunderstorms in Nebraska, or so it would seem. What's next? A rain of fire and the face of evil leading us to the End of Days?
ID: 1006860 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 1006869 - Posted: 21 Jun 2010, 13:04:25 UTC - in response to Message 1006860.  

Now we have really hit the end of the world. Not only are the project developers playfully screwing around with the boinc structure, apparently affecting not just SETI but many other projects per some other posts around here, but Scarecrow is off-line due to thunderstorms in Nebraska, or so it would seem. What's next? A rain of fire and the face of evil leading us to the End of Days?

Oh no...(Putting on tin hat)...It is the end.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 1006869 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1006877 - Posted: 21 Jun 2010, 13:28:07 UTC

9 million result files averaging 25KB would be about 225GB of disk space. I don't know how big bruno's fiber channel array is, but would have guessed at least 2TB.
                                                               Joe
ID: 1006877 · Report as offensive
Kieron Walsh

Send message
Joined: 2 Mar 00
Posts: 74
Credit: 43,502,325
RAC: 112
United Kingdom
Message 1006879 - Posted: 21 Jun 2010, 13:43:52 UTC - in response to Message 1006869.  

"Oh no...(Putting on tin hat)...It is the end."

Not again!!!

;-)

ID: 1006879 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1006882 - Posted: 21 Jun 2010, 13:59:50 UTC - in response to Message 1006877.  

9 million result files averaging 25KB would be about 225GB of disk space. I don't know how big bruno's fiber channel array is, but would have guessed at least 2TB.
                                                               Joe

The problem is more likely to be the (8,672,596 - 1,058,977) = 7,613,619 workunit data files @ 367 KB - that's 2.6 TB that can't be deleted until the validator decides whether they need to be resent or not.

Not to mention the 68,714 Astropulse results in the same state - that's another half a terabyte for the 8 MB data files.

Errors in both directions on those figures, of course - there will be some triple returns still inconclusive, and some datafiles with no returns at all (both tasks srill out in the field), but the order of magnitude should be right if my calculator is working properly...
ID: 1006882 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 1006883 - Posted: 21 Jun 2010, 14:00:51 UTC - in response to Message 1006877.  

9 million result files averaging 25KB would be about 225GB of disk space. I don't know how big bruno's fiber channel array is, but would have guessed at least 2TB.
                                                               Joe


But how much of the 2TB is available for the validator? Bruno does some other stuff, and maybe the disk is full of other files, like saved games and photos from last summer's vacation (based on a quick survey of what's filled up my disc).

But seriously, how long will it take to clear the validator back log? Maybe the Tuesday shutdown will come a day early this week.

ID: 1006883 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1006939 - Posted: 21 Jun 2010, 16:23:59 UTC - in response to Message 1006882.  

9 million result files averaging 25KB would be about 225GB of disk space. I don't know how big bruno's fiber channel array is, but would have guessed at least 2TB.
                                                               Joe

The problem is more likely to be the (8,672,596 - 1,058,977) = 7,613,619 workunit data files @ 367 KB - that's 2.6 TB that can't be deleted until the validator decides whether they need to be resent or not.

Not to mention the 68,714 Astropulse results in the same state - that's another half a terabyte for the 8 MB data files.

Errors in both directions on those figures, of course - there will be some triple returns still inconclusive, and some datafiles with no returns at all (both tasks srill out in the field), but the order of magnitude should be right if my calculator is working properly...

I agree, it's whatever else is being stored in bruno's array that is likely causing the error. We can only guess how they've got storage mapped.

Here's a very unlikely thought; awhile back IIRC Eric posted about a difficulty with the sah_validate processes running briefly each time they were started, then falling over. Suppose that same difficulty has returned but they have a cron job set to restart them periodically and every time they crash they have consumed some of bruno's disk array space...
                                                                Joe
ID: 1006939 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1007060 - Posted: 21 Jun 2010, 20:54:08 UTC - in response to Message 1006939.  

...
Here's a very unlikely thought; awhile back IIRC Eric posted about a difficulty with the sah_validate processes running briefly each time they were started, then falling over. Suppose that same difficulty has returned but they have a cron job set to restart them periodically and every time they crash they have consumed some of bruno's disk array space...
                                                                Joe

I remember that. I believe it was just for AP though.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1007060 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10

Message boards : Number crunching : Panic Mode On (33) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.