Validator queue keeps growing......

Message boards : Number crunching : Validator queue keeps growing......
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 152770 - Posted: 17 Aug 2005, 23:56:27 UTC - in response to Message 152353.  

We can all hope that, but I fear it is hoping for whirrelled peas.


We're planning on doing some rigorous file deleting during the database outage tomorrow. I hope we can turn the corner before the validation queue hits 1M!

- Matt


ID: 152770 · Report as offensive
Profile eL_nino

Send message
Joined: 11 Mar 04
Posts: 79
Credit: 999,964
RAC: 0
Croatia
Message 152771 - Posted: 17 Aug 2005, 23:57:25 UTC

I hope that they will not just delete these unvalidated results and start from zero... If they do that they will lose many, many crunchers :(
ID: 152771 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 152779 - Posted: 18 Aug 2005, 0:24:17 UTC - in response to Message 152353.  

Well looking at the trend, that gives you about 24 hours from now... Good luck.


We're planning on doing some rigorous file deleting during the database outage tomorrow. I hope we can turn the corner before the validation queue hits 1M!

- Matt


ID: 152779 · Report as offensive
Profile Qui-Gon
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 2940
Credit: 19,199,902
RAC: 11
United States
Message 152801 - Posted: 18 Aug 2005, 1:23:05 UTC
Last modified: 18 Aug 2005, 1:23:34 UTC

There is an update on the Technical News page describing what the Berkeley crew is doing about the validation backlog.
ID: 152801 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21533
Credit: 7,508,002
RAC: 20
United Kingdom
Message 152810 - Posted: 18 Aug 2005, 1:46:49 UTC - in response to Message 152801.  

There is an update on the Technical News page ... about the validation backlog.

All nicely explained.

Happy crunchin',
Martin

:)

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 152810 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 152824 - Posted: 18 Aug 2005, 2:13:05 UTC

From 8/17 tech news...
In addition, there are a great many result files in our upload directories that have no corresponding row in the database. These disassociated result files will never be deleted by the file deleter program. Such results can appear when a workunit had reached it's quorum number of returned results and is passed through validation, assimilation, file (both workunit and result) deletion and finally DB purging and *then* one or more results come in (perhaps they were slowed down by running intermittently on a laptop). The disassociated results are the bulk of what needs deleting.


Sounds like a major boinc gap here; on the one hand it is not able to give late comers credit they promised but on the other hand it says that bogus results files are being passed through to the u/l storage. Wouldn't the other projects have this problem too?
May this Farce be with You
ID: 152824 · Report as offensive
Profile Jim Baize
Volunteer tester

Send message
Joined: 6 May 00
Posts: 758
Credit: 149,536
RAC: 0
United States
Message 152843 - Posted: 18 Aug 2005, 2:35:13 UTC - in response to Message 152824.  

From 8/17 tech news...
In addition, there are a great many result files in our upload directories that have no corresponding row in the database. These disassociated result files will never be deleted by the file deleter program. Such results can appear when a workunit had reached it's quorum number of returned results and is passed through validation, assimilation, file (both workunit and result) deletion and finally DB purging and *then* one or more results come in (perhaps they were slowed down by running intermittently on a laptop). The disassociated results are the bulk of what needs deleting.


Sounds like a major boinc gap here; on the one hand it is not able to give late comers credit they promised but on the other hand it says that bogus results files are being passed through to the u/l storage. Wouldn't the other projects have this problem too?



they may very well have the same problem, but just don't realize it yet. Perhaps it showed up on SETI first because of the sheer volume of wu's and users.
ID: 152843 · Report as offensive
Jesse Viviano

Send message
Joined: 27 Feb 00
Posts: 100
Credit: 3,949,583
RAC: 0
United States
Message 152854 - Posted: 18 Aug 2005, 2:54:13 UTC - in response to Message 152824.  

From 8/17 tech news...
In addition, there are a great many result files in our upload directories that have no corresponding row in the database. These disassociated result files will never be deleted by the file deleter program. Such results can appear when a workunit had reached it's quorum number of returned results and is passed through validation, assimilation, file (both workunit and result) deletion and finally DB purging and *then* one or more results come in (perhaps they were slowed down by running intermittently on a laptop). The disassociated results are the bulk of what needs deleting.


Sounds like a major boinc gap here; on the one hand it is not able to give late comers credit they promised but on the other hand it says that bogus results files are being passed through to the u/l storage. Wouldn't the other projects have this problem too?

Maybe this is another reason to disallow people from using BOINC versions 4.25 and earlier. When I was using version 4.25 for Windows when I started, I noticed that I had to manually force an update in order to report completed results. Now, they fixed that problem in versions 4.43 and greater so that it would automatically report a completed result immediately after successfully uploading it. In the old versions, the BOINC client would wait until it decided that it needed to download new work before reporting the uploaded result. Any automatic reporting would have to take place then while piggybacking on the download request unless the user notices this and forces an update. Could those results be those that were uploaded before the deadline and reported afted the deadline due to that now fixed bug?
ID: 152854 · Report as offensive
Profile jorgeur

Send message
Joined: 1 Dec 03
Posts: 10
Credit: 594,581
RAC: 0
Brazil
Message 152859 - Posted: 18 Aug 2005, 3:04:19 UTC

hi,



when directory or file accessing is the bottleneck,
we need to use
faster disks and faster disk controller
ou to add more memory;

how about adding some memory ?,
if not available, maybe taking from some other not so loaded machine ...
next shortage, maybe?

ID: 152859 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 152862 - Posted: 18 Aug 2005, 3:06:46 UTC - in response to Message 152824.  

From 8/17 tech news...
In addition, there are a great many result files in our upload directories that have no corresponding row in the database. These disassociated result files will never be deleted by the file deleter program. Such results can appear when a workunit had reached it's quorum number of returned results and is passed through validation, assimilation, file (both workunit and result) deletion and finally DB purging and *then* one or more results come in (perhaps they were slowed down by running intermittently on a laptop). The disassociated results are the bulk of what needs deleting.


Sounds like a major boinc gap here; on the one hand it is not able to give late comers credit they promised but on the other hand it says that bogus results files are being passed through to the u/l storage. Wouldn't the other projects have this problem too?

If you return work after the deadline, there are no promises.

This work is so far past the deadline that all results have been moved to the science database and deleted from the BOINC database.
ID: 152862 · Report as offensive
Profile Czech Crunchers Unit

Send message
Joined: 2 Jul 99
Posts: 5
Credit: 62,227,636
RAC: 0
Czech Republic
Message 152892 - Posted: 18 Aug 2005, 5:01:31 UTC

Waiting for validation 970,375

... soon to 1M ...
ID: 152892 · Report as offensive
Profile SunMicrosystemsLLG

Send message
Joined: 4 Jul 05
Posts: 102
Credit: 1,360,617
RAC: 0
United Kingdom
Message 152918 - Posted: 18 Aug 2005, 6:30:35 UTC - in response to Message 152859.  

hi,

when directory or file accessing is the bottleneck,
we need to use
faster disks and faster disk controller
ou to add more memory;

how about adding some memory ?,
if not available, maybe taking from some other not so loaded machine ...
next shortage, maybe?



Surely the best fix (if the directories don't need to be such a large size - which by the sounds off it they don't), is not to let them get so big in the first place and control the size.

Which is what they are doing.

Adding more hardware to overcome the problem by brute force masks the problem rather than fixes it.
ID: 152918 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 152921 - Posted: 18 Aug 2005, 6:43:12 UTC - in response to Message 152801.  

Yup -- well perhaps the efforts during the outage tomorrow will have a favorable impact.


There is an update on the Technical News page describing what the Berkeley crew is doing about the validation backlog.


ID: 152921 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 152922 - Posted: 18 Aug 2005, 6:45:53 UTC - in response to Message 152854.  

I've got a few workstations running 4.45 -- but on several other workstations when I tried to deploy 4.45 it installed fine, but when I tried to join a project it would query the project and 'fail' me and unjoin from the project. So I either stayed with 4.19 or installed 4.19 clean. Something about 4.45 doesn't like me I guess.



Maybe this is another reason to disallow people from using BOINC versions 4.25 and earlier. When I was using version 4.25 for Windows when I started, I noticed that I had to manually force an update in order to report completed results. Now, they fixed that problem in versions 4.43 and greater so that it would automatically report a completed result immediately after successfully uploading it. In the old versions, the BOINC client would wait until it decided that it needed to download new work before reporting the uploaded result. Any automatic reporting would have to take place then while piggybacking on the download request unless the user notices this and forces an update. Could those results be those that were uploaded before the deadline and reported afted the deadline due to that now fixed bug?


ID: 152922 · Report as offensive
Bronco
Volunteer tester
Avatar

Send message
Joined: 22 Jun 05
Posts: 123
Credit: 19,340
RAC: 0
France
Message 152934 - Posted: 18 Aug 2005, 8:35:46 UTC

Seems will hit 1M before todays outage. What is the next target ?
"In a world without walls and fences, who needs windows and gates ?"
for the team
ID: 152934 · Report as offensive
kelpie

Send message
Joined: 3 Apr 05
Posts: 4
Credit: 65,055
RAC: 0
United Kingdom
Message 152935 - Posted: 18 Aug 2005, 8:37:21 UTC

Can someone explain why the project needs to keep growing like Topsy if it's already struggling under the weight of results? With the limited resources at their disposal, would Seti not be better off slowing the project down a bit, taking on fewer new members for a while and letting the numbers thin by natural wastage rather than trying to cope with the staggering tidal wave of WU results by going through numerous complicated hardware upgrades? Just wondering
ID: 152935 · Report as offensive
Bronco
Volunteer tester
Avatar

Send message
Joined: 22 Jun 05
Posts: 123
Credit: 19,340
RAC: 0
France
Message 152937 - Posted: 18 Aug 2005, 8:54:33 UTC

Good question. They don't want to stop increasing tne member queue because old seti's crunchers are not all registered at that time ? But, once again, just decreasing the number of WUs sent to crunchers will allow to keep the system doing what he can do ...

Having a look at the evolution of the WFV queue for 8 days now, nothing indicates that the queue is going to decrease one day. So good luck to the team, I still expect a silver bullet ...
"In a world without walls and fences, who needs windows and gates ?"
for the team
ID: 152937 · Report as offensive
Profile PT

Send message
Joined: 19 May 99
Posts: 231
Credit: 902,910
RAC: 0
United Kingdom
Message 152951 - Posted: 18 Aug 2005, 9:46:03 UTC

985,452... and ticking...!
ID: 152951 · Report as offensive
Profile Keck_Komputers
Volunteer tester
Avatar

Send message
Joined: 4 Jul 99
Posts: 1575
Credit: 4,152,111
RAC: 1
United States
Message 152960 - Posted: 18 Aug 2005, 10:01:04 UTC - in response to Message 152854.  

Maybe this is another reason to disallow people from using BOINC versions 4.25 and earlier. When I was using version 4.25 for Windows when I started, I noticed that I had to manually force an update in order to report completed results. Now, they fixed that problem in versions 4.43 and greater so that it would automatically report a completed result immediately after successfully uploading it. In the old versions, the BOINC client would wait until it decided that it needed to download new work before reporting the uploaded result. Any automatic reporting would have to take place then while piggybacking on the download request unless the user notices this and forces an update. Could those results be those that were uploaded before the deadline and reported afted the deadline due to that now fixed bug?


You have it backwards here, the problem is/was in a few versions right around 4.45. Results are not supposed to be reported immediately. This is fixed (again) in the current development clients, finished results wait for reporting as they should.

There never was a client that would not report automatically as far as I know, including 4.25. It was just delayed.
BOINC WIKI

BOINCing since 2002/12/8
ID: 152960 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21533
Credit: 7,508,002
RAC: 20
United Kingdom
Message 152964 - Posted: 18 Aug 2005, 10:26:07 UTC - in response to Message 152935.  

Can someone explain why the project needs to keep growing like Topsy if it's already struggling under the weight of results?...

To push the boundaries of Computer Science and SETI ever further harder!

Or, they may just enjoy a good challenge ;)

Cheers,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 152964 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Validator queue keeps growing......


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.