Project Status - 08/26/2005 4pm PST

Message boards : Number crunching : Project Status - 08/26/2005 4pm PST
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next

AuthorMessage
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 158159 - Posted: 27 Aug 2005, 15:20:54 UTC

Hi

You are not alone.
They are working on it.
Maybe they open the door for a few houres.

greetz Mike



With each crime and every kindness we birth our future.
ID: 158159 · Report as offensive
Blanckaert
Volunteer tester

Send message
Joined: 30 May 99
Posts: 15
Credit: 393,159
RAC: 14
United States
Message 158160 - Posted: 27 Aug 2005, 15:22:44 UTC - in response to Message 158154.  

Rom, I'm sitting here with 685 work units waiting to upload, some have drifted past due by a day or two, many others are getting close. Will Berkeley take into account the down time and allow these work units as finished on time or am I just generating heat at a time when CALISO would prefer that I use my "appliances" sparingly?


I'm guessing they will still accept them... I know if a quorum ain't reached and a late WU shows up and meets the quorum it'll be accepted...

(But I'm sure someone will correct me...)
Mark

ID: 158160 · Report as offensive
lynxtra
Avatar

Send message
Joined: 3 Sep 04
Posts: 137
Credit: 273,636
RAC: 0
United Kingdom
Message 158162 - Posted: 27 Aug 2005, 15:29:57 UTC - in response to Message 158052.  

is this why i am not getting any work units at the moments


Yes ! Anything in front end is down ... So it's why you can't upload AND download.

Thanks Rom for your update.

One thing I don't get is why not draining the WFV queue to at least 500 000 before coming back live. You're going to get 1 000 000 results back as soon as you'll open the upload, so isn't it risky ?

The file system is there for 1 000 000 results (1024*1024) as far as I'm not mistaken, so it will be slow for long before catching, and stopping the project with 2 000 000 results will take much more time than with the current 1 000 000 waiting.

I believe that finshing the clean up before coming on line would be safer.

how long is this situation going to be like this?
ID: 158162 · Report as offensive
Profile KWSN - MajorKong
Volunteer tester
Avatar

Send message
Joined: 5 Jan 00
Posts: 2892
Credit: 1,499,890
RAC: 0
United States
Message 158169 - Posted: 27 Aug 2005, 15:57:09 UTC - in response to Message 158162.  


how long is this situation going to be like this?


I don't think anyone, not even the Devs, knows for sure. Short answer (without meaning to be a smartass) is "until it isn't."

Of course, everyone hopes it is sooner rather than later.
https://youtu.be/iY57ErBkFFE

#Texit

Don't blame me, I voted for Johnson(L) in 2016.

Truth is dangerous... especially when it challenges those in power.
ID: 158169 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 158170 - Posted: 27 Aug 2005, 16:04:30 UTC

Murphy is alive and well and living at my house.

While we are all waiting for these latest issues to be resolved, as with most people all of my crunchers ran out of work. No problem you say? The results will eventually be returned, credited, and added to the science db?

Well, not quite. Murphy couldn't resist this golden opportunity to totally trash the hard disk on one of my faster machines. Even booting from the Windoz distribution CD and going into COnsole mode I am unable to overwrite the files that Windoz complains about while trying to boot. I couldn't even rename or delete them and try again to copy new files. Finally, "Chkdsk /R" reports "one or more unrecoverable errors". And of course worst of all, Console mode won't even read the BOINC directory to save those files off to a flash drive.

Can you say Toast?

So now, in addition to losing a full queue's worth of new workunits, (bad enough but not tragic) I've also lost almost 4 days worth of crunching. Disaster? End of the world? Hell not, it just sucks. Oh well, life's a bitch and then you die.

Guess I'm on my way out the door to buy a new drive.

This is NOT aimed as anyone in particular, but my message is: Don't complain, this could be worse.
ID: 158170 · Report as offensive
Profile [B^S] Spydermb
Volunteer tester
Avatar

Send message
Joined: 16 Jul 99
Posts: 496
Credit: 10,860,148
RAC: 0
United States
Message 158172 - Posted: 27 Aug 2005, 16:10:04 UTC

Just just added to Front Page News
August 27, 2005
Outage Update: We have been offline since Tuesday, August 23 and plan to stay offline until after the weekend. This is because of a disk failure as well as an unwieldy backlog of results to validate and delete. We apologize for the inconvenience, but credit that has been pending for weeks is now being granted. We want to be as "caught up" as possible before coming back on line. The process of catching up is taking much longer than expected. More information in Technical News.


O Well, Enjoy the weekend.
BOINC SYNERGY is an International Team and We Welcome All BOINC Participants!
BOINC Synergy Click to Join BOINC Synergy
ID: 158172 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 158174 - Posted: 27 Aug 2005, 16:12:45 UTC - in response to Message 158091.  

Hi folks ... are there any plans to add new/better HW to the existing one ? There were many problems in the availibility of the system all year long, and you people at seti is giving their best to keep going ...
Warm Regards from Buenos Aires

Sometimes, more hardware is not the solution.

The problem is the sheer number of files that have accumulated for various reasons: the "antique" work units mentioned in the technical news, a backlog caused by work on the science database (the back-end we don't see), etc.

I can think of at least two ways to avoid the large number of files, and others have come up with additional ideas.

My personal favorite: put instructions in the master fetch file telling the clients to upload/download/report more slowly (or more quickly). If the validators/assimilators/deleters start getting full, slow down the BOINC clients.

(My favorite algorithm is p-Persistence -- see "Computer Networks" by Andrew Tannenbaum).
ID: 158174 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 158176 - Posted: 27 Aug 2005, 16:15:34 UTC - in response to Message 158172.  

Just just added to Front Page News
August 27, 2005
Outage Update: We have been offline since Tuesday, August 23 and plan to stay offline until after the weekend. This is because of a disk failure as well as an unwieldy backlog of results to validate and delete. We apologize for the inconvenience, but credit that has been pending for weeks is now being granted. We want to be as "caught up" as possible before coming back on line. The process of catching up is taking much longer than expected. More information in Technical News.


O Well, Enjoy the weekend.

Note that this was updated today, Saturday. Earlier in this thread Rom gave us his update, and it sounds like they are going to be turning processes off and on all weekend to "coax" the work through the system -- giving the validators unrestricted access, then giving the transitioners or assimilators or deleters unrestricted access.

Less competition between processes should be faster.
ID: 158176 · Report as offensive
Profile [B^S] Spydermb
Volunteer tester
Avatar

Send message
Joined: 16 Jul 99
Posts: 496
Credit: 10,860,148
RAC: 0
United States
Message 158179 - Posted: 27 Aug 2005, 16:21:53 UTC - in response to Message 158176.  
Last modified: 27 Aug 2005, 16:23:42 UTC

Note that this was updated today, Saturday. Earlier in this thread Rom gave us his update, and it sounds like they are going to be turning processes off and on all weekend to "coax" the work through the system -- giving the validators unrestricted access, then giving the transitioners or assimilators or deleters unrestricted access.

Less competition between processes should be faster.



Thanks Ned, I missed that information...
BOINC SYNERGY is an International Team and We Welcome All BOINC Participants!
BOINC Synergy Click to Join BOINC Synergy
ID: 158179 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 158185 - Posted: 27 Aug 2005, 16:39:02 UTC - in response to Message 158179.  

Note that this was updated today, Saturday. Earlier in this thread Rom gave us his update, and it sounds like they are going to be turning processes off and on all weekend to "coax" the work through the system -- giving the validators unrestricted access, then giving the transitioners or assimilators or deleters unrestricted access.

Less competition between processes should be faster.


Thanks Ned, I missed that information...


For something like this, you don't need someone camping out in the office. You do something, you go do something else for a few hours, then you come back and see what happened.

If it was good, you either leave it running, or maybe change the mix a little to see if you can get more speed, then come back again in a few hours and see what happened.

It looks like someone is doing just that -- and for the most part, doing it through SSH or something similar.

ID: 158185 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 158189 - Posted: 27 Aug 2005, 16:49:33 UTC

I may be wrong, but I believe this is turning out to be the longest sustained outage of the year, so one would expect (especially after what has been a troubled two months here) to find folks at the edge of their 'patience is a virtue' quota.

I am also sure the folks at Berkeley are doing all that they are capable of to get this project going again. Realizing that, I think we all should cut the Berkeley folks some slack by not tossing stones at them -- I suspect they are at least as much as many of us on the edge of their patience quota.
ID: 158189 · Report as offensive
Profile dre
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 26
Credit: 429,124
RAC: 0
United States
Message 158195 - Posted: 27 Aug 2005, 16:59:06 UTC
Last modified: 27 Aug 2005, 17:00:10 UTC

The validator really seems to be working now. In the last 15 minutes the waiting for validation queue had gone down by almost 18,000 WUs. Let's hope the worst is over and that the hardware will now be able to keep up with the demands placed on it.
ID: 158195 · Report as offensive
Blablablabla

Send message
Joined: 23 Jan 05
Posts: 1
Credit: 9,117,463
RAC: 0
Netherlands
Message 158196 - Posted: 27 Aug 2005, 17:02:46 UTC

Just a thought:

It seems the problems are related to pending credit, right???
When 4 results are returned before the deadline; the credit is calculated through some formula (that I just cannot figure out...). So credit is pending until the fourth result is in. What puzzles me is that when a result returns past the deadline, you throw away the formula and grant credit according to an "in-the-middle" principle (results: 12-23-20, granted: 20).

Why not throw away formula and use the "in-the-middle" crediting. When the fourth result is in before deadline it is granted the same credit as the other three.

This would mean:
- 3 results are enough to grant credit
- no more calculation of credit, just sort 3 results en grant #2.


As I said.....just a thought.

BTW: I just love a regular update.
ID: 158196 · Report as offensive
Profile KWSN - MajorKong
Volunteer tester
Avatar

Send message
Joined: 5 Jan 00
Posts: 2892
Credit: 1,499,890
RAC: 0
United States
Message 158205 - Posted: 27 Aug 2005, 17:17:52 UTC - in response to Message 158196.  

Just a thought:

It seems the problems are related to pending credit, right???
When 4 results are returned before the deadline; the credit is calculated through some formula (that I just cannot figure out...). So credit is pending until the fourth result is in. What puzzles me is that when a result returns past the deadline, you throw away the formula and grant credit according to an "in-the-middle" principle (results: 12-23-20, granted: 20).

Why not throw away formula and use the "in-the-middle" crediting. When the fourth result is in before deadline it is granted the same credit as the other three.

This would mean:
- 3 results are enough to grant credit
- no more calculation of credit, just sort 3 results en grant #2.


As I said.....just a thought.

BTW: I just love a regular update.


To the best of my understanding, that is pretty much the way things work now. The only time the 'formula' is used is when the 4th result comes back before the first three get validated. Then, the high and low are discarded, and the 2 remaining ones are averaged. Of course, this all assumes that all of the returns pass validation.

https://youtu.be/iY57ErBkFFE

#Texit

Don't blame me, I voted for Johnson(L) in 2016.

Truth is dangerous... especially when it challenges those in power.
ID: 158205 · Report as offensive
Blanckaert
Volunteer tester

Send message
Joined: 30 May 99
Posts: 15
Credit: 393,159
RAC: 14
United States
Message 158210 - Posted: 27 Aug 2005, 17:23:01 UTC
Last modified: 27 Aug 2005, 17:37:58 UTC

Well Something is slowly working...

From the Stat pages at 1610UTC and 1710UTC...

the Ready to Send is down 3,088
In Progress is down 958
Waiting Validation is down 19,178

So with luck maybe the system will be cleared by monday, and hardware to replace the Disk failure will be in place and maybe have a good run next week... (well after the system attempts to catch up from the backlog that is waiting, with ppl ready to hold down there enter keys....)

*gone to get a beer, and wait for the flames that may or may not come...*
*and make some popcorn....*


**EDIT**
From the Stat pages at 1710UTC and 1730UTC...

the Ready to Send is down 2,691
In Progress is down 656
Waiting Validation is down 14,294

So it is coming down.... slowly... but hey it's progress....

**End Edit**


Mark

ID: 158210 · Report as offensive
chrisjohnston
Volunteer tester

Send message
Joined: 31 Aug 99
Posts: 385
Credit: 91,410
RAC: 0
United States
Message 158213 - Posted: 27 Aug 2005, 17:25:07 UTC - in response to Message 158196.  

Why not throw away formula and use the "in-the-middle" crediting. When the fourth result is in before deadline it is granted the same credit as the other three.


Isn't that how it works? If the three are similar enough, then it grants credit on 3, if there is a problem, it takes the fourth.
- cJ

ID: 158213 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 158215 - Posted: 27 Aug 2005, 17:30:59 UTC - in response to Message 158196.  

Just a thought:

{snip}

Why not throw away formula and use the "in-the-middle" crediting. When the fourth result is in before deadline it is granted the same credit as the other three.

This would mean:
- 3 results are enough to grant credit
- no more calculation of credit, just sort 3 results en grant #2.

As I said.....just a thought.


That's pretty close to what it does now:

When there are three results, you toss the highest and lowest, and that's it, the work is done, and the work can be transitioned/deleted.

When there are more than three, the highest and lowest are tossed, and the granted credit is the average of the remaining work.

I don't think the problem is caused by the normal validation of work, It seems that the problem is caused by the abnormal accumulation of results.

ID: 158215 · Report as offensive
xx
Volunteer tester

Send message
Joined: 23 May 99
Posts: 166
Credit: 3,450,910
RAC: 0
United States
Message 158218 - Posted: 27 Aug 2005, 17:36:57 UTC

If the scheduler is off, how is it possible that the "ready to send" queue has decreased?
ID: 158218 · Report as offensive
Blanckaert
Volunteer tester

Send message
Joined: 30 May 99
Posts: 15
Credit: 393,159
RAC: 14
United States
Message 158219 - Posted: 27 Aug 2005, 17:39:55 UTC - in response to Message 158218.  

If the scheduler is off, how is it possible that the "ready to send" queue has decreased?



Because as it validates WUs it may find that having to send a fifth or sixth unit isn't needed anymore since it now has it. (or is may have the 7th or 8th result to send, that is didn't need as the 5th or 6th was in the Waiting to Validate queue)


Mark

ID: 158219 · Report as offensive
Profile Sir Ulli
Volunteer tester
Avatar

Send message
Joined: 21 Oct 99
Posts: 2246
Credit: 6,136,250
RAC: 0
Germany
Message 158221 - Posted: 27 Aug 2005, 17:43:08 UTC

btw

my credit is climbing.. :)

since my last Contakt to the Berkeley Servers i got 1.000 new CS, thanks to the Firefox Plugin...to notice this.


Greetings from Germany NRW
Ulli




ID: 158221 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next

Message boards : Number crunching : Project Status - 08/26/2005 4pm PST


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.