Validator queue keeps growing......

Message boards : Number crunching : Validator queue keeps growing......
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
kelpie

Send message
Joined: 3 Apr 05
Posts: 4
Credit: 65,055
RAC: 0
United Kingdom
Message 152969 - Posted: 18 Aug 2005, 10:42:47 UTC - in response to Message 152964.  

To push the boundaries of Computer Science and SETI ever further harder!



lol hmm...pushing the boundaries of the Uni bank account maybe ;¬)
ID: 152969 · Report as offensive
Profile Prognatus

Send message
Joined: 6 Jul 99
Posts: 1600
Credit: 391,546
RAC: 0
Norway
Message 152988 - Posted: 18 Aug 2005, 12:08:38 UTC
Last modified: 18 Aug 2005, 12:12:41 UTC

From Technical News Aug 18:
<blockquote>We are looking at ways to speed this up. And, of course, ways to keep these files from building up again.</blockquote>
Although I'm not pretending to be an expert on these matters, and I also have the utmost confidence that Berkeley is on top of this issue and has the needed expertise to handle it, I'll take this as an invitation for possible solution suggestions from this community.

From a bystanders view, and from what has previously been said from Berkeley officials, it seems to me there may be serious design flaws in the validation routine and that Berkeley maybe should rethink if the current routine has improvement potentials. (It's more a hunch/intuitive thought from me than one made from sufficient available information, but maybe someone more knowledgeable will be able to expand on this thought - or enfeeble/invalidate it...)

I'll try to explain further what I mean. This is a quote from Technical News, Aug 11:
<blockquote>[...] when looking for a file, the system does a hash on the filename to find which of the 1000 subdirectories this file should be in.</blockquote>
When we know that it has been said on several occations that the chief problem of the server is massive directory lookups, it seems natural to think if a smarter way of doing this may ease the stress off the server and, hence, help to clear the validation queue.

Why not turn the process around somehow and read the uploaded results sequentially, instead of hashing into a directory lookup?

Maybe the next database in the validation process is running on a less stressed server, so one can hash into that instead? Just a thought.

ID: 152988 · Report as offensive
Rudy Ackerman

Send message
Joined: 12 Apr 00
Posts: 15
Credit: 24,776
RAC: 0
United States
Message 153003 - Posted: 18 Aug 2005, 12:55:50 UTC
Last modified: 18 Aug 2005, 12:56:21 UTC

I'm doing my part to help the problem . . . .. I've stopped processing SETI work. I'm only doing Einstein work units. When I first added Einstein I did it to fill the gaps when Seti was down. But then I noticed somthing. Einstein was never down. Server status is always green, go figure it either works really well or they do a really good job of hiding the problems.
"hokey religions and ancient weapons are no match for a good blaster"


ID: 153003 · Report as offensive
Jesse Viviano

Send message
Joined: 27 Feb 00
Posts: 100
Credit: 3,949,583
RAC: 0
United States
Message 153006 - Posted: 18 Aug 2005, 13:05:59 UTC - in response to Message 152960.  

Maybe this is another reason to disallow people from using BOINC versions 4.25 and earlier. When I was using version 4.25 for Windows when I started, I noticed that I had to manually force an update in order to report completed results. Now, they fixed that problem in versions 4.43 and greater so that it would automatically report a completed result immediately after successfully uploading it. In the old versions, the BOINC client would wait until it decided that it needed to download new work before reporting the uploaded result. Any automatic reporting would have to take place then while piggybacking on the download request unless the user notices this and forces an update. Could those results be those that were uploaded before the deadline and reported afted the deadline due to that now fixed bug?


You have it backwards here, the problem is/was in a few versions right around 4.45. Results are not supposed to be reported immediately. This is fixed (again) in the current development clients, finished results wait for reporting as they should.

There never was a client that would not report automatically as far as I know, including 4.25. It was just delayed.

What advantage is there for the reporting to be delayed? If the finished work units are reported earlier, that gives the back end servers an earlier shot at validation, assimilation, and deletion of the finished work units. This helps the project keep its disks cleaner and prevent back logs due to overly large directories as long as there is no back log of work units somewhere down the finished work unit pipeline (which SETI@home is unfortunately experiencing). Therefore I think that it is silly to not report the finished work unit immediately.
ID: 153006 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 153007 - Posted: 18 Aug 2005, 13:06:25 UTC

Oh, the suspense is killing me... ;)

Waiting for validation 998,014

Which will mean at next count (15 minutes orso from now), we'll have that 1 million WFVs. :-D

Then what shall we do with our suspense.... ?
ID: 153007 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 153011 - Posted: 18 Aug 2005, 13:15:55 UTC - in response to Message 153003.  

I'm doing my part to help the problem . . . .. I've stopped processing SETI work.


Is aim just to get these "credits" or to help project with their scientific problem? Sooner or later all get their credits. The more work will be done the more benefit project. So what a thoughts about stopping participation?!
ID: 153011 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 153020 - Posted: 18 Aug 2005, 13:44:16 UTC
Last modified: 18 Aug 2005, 13:44:31 UTC

That milestone is reached then. It's official. As of 18 Aug 2005 13:40:09 UTC we have Waiting for validation 1,000,448...

Virtual drinks are on me. Let's head to the bar! :)
ID: 153020 · Report as offensive
Big Blue
Volunteer tester

Send message
Joined: 8 Feb 05
Posts: 16
Credit: 2,721,283
RAC: 0
Germany
Message 153021 - Posted: 18 Aug 2005, 13:44:27 UTC

Its done
Waiting for validation 1,000,448


I stopped Seti too
ID: 153021 · Report as offensive
Profile cliff west

Send message
Joined: 7 May 01
Posts: 211
Credit: 16,180,728
RAC: 15
United States
Message 153025 - Posted: 18 Aug 2005, 13:51:07 UTC
Last modified: 18 Aug 2005, 13:51:43 UTC

jest wait till it hits 1,100,000 the count down begains... say by the end of next week ;)
ID: 153025 · Report as offensive
Ken Phillips m0mcw
Volunteer tester
Avatar

Send message
Joined: 2 Feb 00
Posts: 267
Credit: 415,678
RAC: 0
United Kingdom
Message 153027 - Posted: 18 Aug 2005, 13:55:47 UTC - in response to Message 153021.  
Last modified: 18 Aug 2005, 13:58:15 UTC

Its done
Waiting for validation 1,000,448


I stopped Seti too


Methinks that stopping running seti@home, is a bit redundant at the moment, as there are clear signs (unless I'm mistaken) that the Berkeley folks are experimenting with throttling back the 'ready to send' queue, which has been steadily falling since about 20:00 UTC on wednesday; meaning that with rarity of downloads, after a lag, will automaticly come rarity of uploads, which should then allow the deleters to catch up, without having having to stop the project.

I for one am not suspending or detaching, I've plenty to do from other projects, and have to agree that the staffers need to 'live fix it' in their own way, otherwise, don't you think they would have already asked us to 'back off'?

Ken P.
Ken Phillips

BOINC question? Look here



"The beginning is the most important part of the work." - Plato
ID: 153027 · Report as offensive
Profile Spectrum
Avatar

Send message
Joined: 14 Jun 99
Posts: 468
Credit: 53,129,336
RAC: 0
Australia
Message 153028 - Posted: 18 Aug 2005, 13:58:55 UTC

I think if the administrators asked for a bit of a break I would gracefully comply
ID: 153028 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 153081 - Posted: 18 Aug 2005, 15:03:48 UTC - in response to Message 152862.  

@Ned


Sounds like a major boinc gap here; on the one hand it is not able to give late comers credit they promised but on the other hand it says that bogus results files are being passed through to the u/l storage. Wouldn't the other projects have this problem too?

If you return work after the deadline, there are no promises.

This work is so far past the deadline that all results have been moved to the science database and deleted from the BOINC database.


In the first way I read the tech news, I think there is a boinc structural error if the wu is closed to new results (assimilated) and subsequent results can keep getting uploaded.

The other way I read it is that the 'deadline' may not be part of the process; the wu's can be processed and assimilated before all the wu's issued are returned and before the deadline.

May this Farce be with You
ID: 153081 · Report as offensive
Profile John Cropper
Avatar

Send message
Joined: 3 May 00
Posts: 444
Credit: 416,933
RAC: 0
United States
Message 153086 - Posted: 18 Aug 2005, 15:09:53 UTC - in response to Message 153081.  

I've "No new work"'d several of my machines in anticipation of some sort of slow-down/catch-up maneuver by Berkeley personnel.

OTOH, the draw-down could also be that people are "bulking up" for the anticipated downtime today.

Either way, when the validator queue starts to draw down, credit totals will shoot up rapidly, quelling some of that gang's loudest complaints, which is a good thing... :o)

Stewie: So, is there any tread left on the tires? Or at this point would it be like throwing a hot dog down a hallway?

Fox Sunday (US) at 9PM ET/PT
ID: 153086 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 153093 - Posted: 18 Aug 2005, 15:19:35 UTC - in response to Message 152934.  

Let's be bold and set 1.5M as the next big target.

Seems will hit 1M before todays outage. What is the next target ?


ID: 153093 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21436
Credit: 7,508,002
RAC: 20
United Kingdom
Message 153096 - Posted: 18 Aug 2005, 15:23:32 UTC - in response to Message 152988.  
Last modified: 18 Aug 2005, 15:29:44 UTC

Although I'm not pretending to be an expert on these matters, and I also have the utmost confidence that Berkeley is on top of this issue...

Then please read what they've posted and research a little as to what their descriptions describe.

From a bystanders view, and from what has previously been said from Berkeley officials, it seems to me there may be serious design flaws...

Undobtedly there are flaws in the Boinc system. That is why it is under development and it is real research. I strongly disagree about the "serious flaws" description. Can you explain further?

Why not turn the process around somehow and read the uploaded results sequentially, instead of hashing into a directory lookup?

Maybe the next database in the validation process is running on a less stressed server, so one can hash into that instead? Just a thought.

Yes... Well...

We've moved on a long way from using sequential mag tape files for working storage and overnight batch processing.

Take a look at HDD disk useage, and how filesystems work?


Just because something doesn't run as expected, doesn't mean that it is fatally broken.

Regards,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 153096 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 153101 - Posted: 18 Aug 2005, 15:28:46 UTC - in response to Message 153003.  

I've not done that -- though I have definitely backed off on the resource share SETI gets from my farm. About a month ago I was generating something like 3000 'credits' daily for SETI, 1000 for Einstein and 0 for Climate. These days it is more like 1500 for SETI, 2750 for Einstein and 1250 for Climate. SETI was getting 75% of my BOINC CPU then, in the interim I shifted a batch of workstations from SETI classic to BOINC, but focused on setting them up with Einstein or Climate. In the past couple of weeks, I've reduced SETI resource share as well so SETI is down to less than 30% of my CPU cycles at the moment.


I'm doing my part to help the problem . . . .. I've stopped processing SETI work. I'm only doing Einstein work units. When I first added Einstein I did it to fill the gaps when Seti was down. But then I noticed somthing. Einstein was never down. Server status is always green, go figure it either works really well or they do a really good job of hiding the problems.


ID: 153101 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 153104 - Posted: 18 Aug 2005, 15:37:25 UTC - in response to Message 153011.  

The aim for me is to help multiple projects with scientific problems. That is one of the most useful aspects of BOINC. At the moment, with the validation queue issues here, where frankly the possibility of lost work results seems increasingly possible (and lost validation is lost work for SETI as well as the contributors), it makes sense to re-allocate CPU cycles to projects where the work is getting validated reliably.

Another way of looking at this is that from all reports, SETI is struggling generally with the existing workload (which is several times larger than any of the other projects). This is a function of the volume of data SETI processes as well as the less than optimum hardware and (let's face it) code which quite possibly is not up to the task of handling the large number of users. In addition to that, there is this potential group of new users (still running SETI classic) which might well overwhelm SETI BOINC. Further to that, SETI BOINC has physical space constraints plus physical plant constraints.

It makes sense for those generating a lot of work (I'm certainly one of them) to have the other BOINC projects benefit from the CPU cycles they have to offer with projects which are not at this point overloaded and, as a result, are far more reliable at the moment.




Is aim just to get these "credits" or to help project with their scientific problem? Sooner or later all get their credits. The more work will be done the more benefit project. So what a thoughts about stopping participation?!


ID: 153104 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 153143 - Posted: 18 Aug 2005, 17:03:32 UTC
Last modified: 18 Aug 2005, 17:08:54 UTC

I say, If you really wanna help.....DON'T let up on the seti WUs, but instead of getting the old seti WUs....Join seti beta, and load up on some WUs that take from 10 to 50 hours to crunch. You'll reduce the load on seti and help develop the new application and maybe even crunch some Astropulse WUs.

tony
[edit] it doubles the sensitivity of the search, and YES, you get much larger credit than seti does. (although I don't know if we get to keep the credit)

ID: 153143 · Report as offensive
Ken Phillips m0mcw
Volunteer tester
Avatar

Send message
Joined: 2 Feb 00
Posts: 267
Credit: 415,678
RAC: 0
United Kingdom
Message 153165 - Posted: 18 Aug 2005, 17:44:12 UTC - in response to Message 153143.  
Last modified: 18 Aug 2005, 17:46:01 UTC

I say, If you really wanna help.....DON'T let up on the seti WUs, but instead of getting the old seti WUs....Join seti beta, and load up on some WUs that take from 10 to 50 hours to crunch. You'll reduce the load on seti and help develop the new application and maybe even crunch some Astropulse WUs.

tony
[edit] it doubles the sensitivity of the search, and YES, you get much larger credit than seti does. (although I don't know if we get to keep the credit)


Cheers Tony,

I've been intrigued about how to join the beta project for a while now, I'm now signed up, but this is all I get when I try to attach, is this normal at the moment, or have I got some rummaging around to do? :-)

Ken P.

[edit]Oops! Dodgy link fixed[/edit]
Ken Phillips

BOINC question? Look here



"The beginning is the most important part of the work." - Plato
ID: 153165 · Report as offensive
Rudy Ackerman

Send message
Joined: 12 Apr 00
Posts: 15
Credit: 24,776
RAC: 0
United States
Message 153180 - Posted: 18 Aug 2005, 18:29:29 UTC
Last modified: 18 Aug 2005, 18:30:08 UTC

As for the credit thing, I stopped downloading new Seit work days ago, but as I have so many WU's pending, I'm still getting about the same number of credits per day so my average credits per day have not changed. I bet I have 7 - 10 days of back logged WU's.

Lets hear it for 2,000,000

But then it's always easy to throw rocks at others work


"hokey religions and ancient weapons are no match for a good blaster"


ID: 153180 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Validator queue keeps growing......


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.