Father Padilla Meets the Perfect Gnat (Dec 03 2007)

Message boards : Technical News : Father Padilla Meets the Perfect Gnat (Dec 03 2007)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 688715 - Posted: 4 Dec 2007, 16:17:14 UTC - in response to Message 688712.  

.....As for funky deadlines, bogus credit, etc. take that up with BOINC. I don't follow that too closely.

- Matt

Begging your pardon, Matt, but deadlines are very much a SETI (project) issue, and not a BOINC issue.

They're set by the splitter/scheduler process, so perhaps we should ask Eric when he's been unchained from his grant proposal writing desk (and been let out for some recovery time, LOL).
ID: 688715 · Report as offensive
Profile the silver surfer
Avatar

Send message
Joined: 24 Feb 01
Posts: 131
Credit: 3,739,307
RAC: 0
Austria
Message 688717 - Posted: 4 Dec 2007, 16:19:01 UTC

Just a question : Why does SETI need two valid results, and, for example, ROSETTA only one ?
(What`s the difference in the calculation process?)

KURT



ID: 688717 · Report as offensive
Profile [KWSN]John Galt 007
Volunteer tester
Avatar

Send message
Joined: 9 Nov 99
Posts: 2444
Credit: 25,086,197
RAC: 0
United States
Message 688720 - Posted: 4 Dec 2007, 16:26:27 UTC - in response to Message 688717.  

Just a question : Why does SETI need two valid results, and, for example, ROSETTA only one ?
(What`s the difference in the calculation process?)

KURT



IIRC, Rosetta generates 1000's of workunits per model, and sends these out to everyone. That way you only need 1 result per workunit, but 1000's of results per model.
Clk2HlpSetiCty:::PayIt4ward

ID: 688720 · Report as offensive
Profile the silver surfer
Avatar

Send message
Joined: 24 Feb 01
Posts: 131
Credit: 3,739,307
RAC: 0
Austria
Message 688722 - Posted: 4 Dec 2007, 16:34:48 UTC

@John

Understood, Thank You very much !

KURT



ID: 688722 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 688726 - Posted: 4 Dec 2007, 16:43:55 UTC - in response to Message 688712.  

This is all healthy discussion. Two quick points:

1. There really is no way to verify results unless we have *at least* two results per workunit. Being this is a scientific project, we need verification.

2. We already have a solution to the disk space problem: adding more disk space. We'll need to do this anyway as Astropulse is just around the bend and will require even more workunit storage. We're not going to go back to 3/2 unless we really have to - it's a huge increase in bandwidth consumption (a current bottleneck) and wasted computing resources - and we're trying to be as "green" as possible. As for funky deadlines, bogus credit, etc. take that up with BOINC. I don't follow that too closely.

- Matt


Two rebuttals:
#2: If the current system is not running optimally, then adding disk space is merely a bandaid. I would think the project should be driven to run optimally before scaling it up to the next level. This means enforcing a boinc 5.10 rule.
#2: With the ability to perform a server-side wu abort, then a 3/2 rule might work. Late wu's could be terminated on the third wu. It is green in that you don't have to run more disk space. It does have the bandwidth problem, of course, but is this actually a problem?
#1: I would like to see the analysis that says 1 wu couldn't work for seti. Given, false negatives are lost if only one wu is analyzed. But positives (valid and false positives) could be checked/validated by re-issuing wu's when a positive signal is reported by a client. I would think that additional wu's would then be generated near to the one giving a positive response, which adds to the confirmation process. In the meantime, we would go through and process a lot more data, and use a lot less disk space and energy.
ID: 688726 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 688728 - Posted: 4 Dec 2007, 16:58:07 UTC - in response to Message 688712.  

This is all healthy discussion. Two quick points:

1. There really is no way to verify results unless we have *at least* two results per workunit. Being this is a scientific project, we need verification.

2. We already have a solution to the disk space problem: adding more disk space. We'll need to do this anyway as Astropulse is just around the bend and will require even more workunit storage. We're not going to go back to 3/2 unless we really have to - it's a huge increase in bandwidth consumption (a current bottleneck) and wasted computing resources - and we're trying to be as "green" as possible. As for funky deadlines, bogus credit, etc. take that up with BOINC. I don't follow that too closely.

- Matt


Since you had no rebuttal to my suggestion of doubling WU size (see above), I'll take that to mean you are considering it. It's good to hear that the project has secured more disk space; that is going to help overall.
ID: 688728 · Report as offensive
Profile Uioped1
Volunteer tester
Avatar

Send message
Joined: 17 Sep 03
Posts: 50
Credit: 1,179,926
RAC: 0
United States
Message 688733 - Posted: 4 Dec 2007, 17:13:52 UTC - in response to Message 688717.  
Last modified: 4 Dec 2007, 17:15:25 UTC

Just a question : Why does SETI need two valid results, and, for example, ROSETTA only one ?
(What`s the difference in the calculation process?)

KURT


Rosetta actually has something akin to a replication of many thousands, however they cannot use the boinc replication feature due to the nature of their algorithm. If you look at the results scatterchart you will see this illustrated. Unlike S@H, which takes a result and calculates an exact figure for the number of pulses, etc. in that result, the rosetta algorithm guesses at what the best folding is. Then, the project scientists take all of those guesses, and through the use of statistical models produce a result that has a finite certainty of being within a finite distance of the 'correct' result.

The key difference is that S@H does not guess, where rosetta does. Guesses by nature are only an approximation, and will not always be the same, so there can be no validation of an individual guess. A bad guesser is instead eliminated through statistics.


1. There really is no way to verify results unless we have *at least* two results per workunit. Being this is a scientific project, we need verification.


One feature that would be really nice would be the ability to have a replication of less than two. I suppose this would have to be grouped at the host level, making a new set of data tracking issues... imagine if you had a replication of 1.25 i.e. every fourth result returned by a host would be validated by a different host. Even better would be a dynamic replication figure, where hosts that returned bad results would have more of their results validated (or better just get fewer results between validated ones.)

The implication that results are not scientifically valid unless every one has been checked twice is hogwash though.
ID: 688733 · Report as offensive
Ncrab

Send message
Joined: 1 Dec 07
Posts: 10
Credit: 57,389
RAC: 0
Brazil
Message 688734 - Posted: 4 Dec 2007, 17:15:56 UTC - in response to Message 688712.  

...
1. There really is no way to verify results unless we have *at least* two results per workunit. Being this is a scientific project, we need verification.



Sounds definitive. And that is the reason I still thinking about.

But is not easy to me to be explicit enough writing in english.
About "checksums", it´s intend only for a kind of "encoding" partials results that Check-WU job has done.
The idea was Check-WU do not recalculate the entire WU, but only key process (maybe track some existent dependencies...) and grab this results (or only fragments of code generated in that process) to a posterior "checksum-like" process.

Clearly, not all redundance will be achieved. But a level of assurance. And this level can be tunning selecting appropriate key process.

What about a beta doing this to compare with the erros actualy catched with the 2/2 scheme ?
ID: 688734 · Report as offensive
Profile Uioped1
Volunteer tester
Avatar

Send message
Joined: 17 Sep 03
Posts: 50
Credit: 1,179,926
RAC: 0
United States
Message 688739 - Posted: 4 Dec 2007, 17:23:22 UTC - in response to Message 688728.  


Since you had no rebuttal to my suggestion of doubling WU size (see above), I'll take that to mean you are considering it. It's good to hear that the project has secured more disk space; that is going to help overall.


Unfortunately, when a result is set to the client the workunit is not removed from the server. the WU is actually stored on both the host computer and on the server, so increasing the length of time that it takes a host to process a WU also increases the length of time it is stored on the server. Admittedly, if WUs take more time to complete you theoretically need fewer total WUs in process at a given time, but I suspect the gains would be wiped out by longer times where a WU is returned by a fast host and is waiting on a slower host. Mostly, increasing the WU time helps with bandwidth.
ID: 688739 · Report as offensive
Profile Uioped1
Volunteer tester
Avatar

Send message
Joined: 17 Sep 03
Posts: 50
Credit: 1,179,926
RAC: 0
United States
Message 688741 - Posted: 4 Dec 2007, 17:33:35 UTC - in response to Message 688734.  



But is not easy to me to be explicit enough writing in english.
About "checksums", it´s intend only for a kind of "encoding" partials results that Check-WU job has done.
The idea was Check-WU do not recalculate the entire WU, but only key process (maybe track some existent dependencies...) and grab this results (or only fragments of code generated in that process) to a posterior "checksum-like" process.

Clearly, not all redundance will be achieved. But a level of assurance. And this level can be tunning selecting appropriate key process.

What about a beta doing this to compare with the erros actualy catched with the 2/2 scheme ?



I think I understand what you're getting at... You're saying produce a trace of the program execution without actually performing all the calculations. you have one result in a WU that does the actual calculations, and a second who verifies that the first did what it said it did, hopefully much more quickly. This may or may not be possible; I am not familiar enough with the proceesing involved to say.

I think that in general you will have to do a significant portion of the calculation in order to truly validate the result, and as the app becomes more and more optimized the fraction will approach 1. Given the difficulty of making that validator program, it is probably not worth the effort.
ID: 688741 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 688745 - Posted: 4 Dec 2007, 20:36:30 UTC - in response to Message 688739.  
Last modified: 4 Dec 2007, 20:38:58 UTC

Unfortunately, when a result is set to the client the workunit is not removed from the server. the WU is actually stored on both the host computer and on the server, so increasing the length of time that it takes a host to process a WU also increases the length of time it is stored on the server. Admittedly, if WUs take more time to complete you theoretically need fewer total WUs in process at a given time, but I suspect the gains would be wiped out by longer times where a WU is returned by a fast host and is waiting on a slower host. Mostly, increasing the WU time helps with bandwidth.


My understanding is that there is, for each WU:
Database entries
The WU file itself
Results returned by clients (awaiting validation)

The change I was proposing should reduce the number of "results returned by clients awaiting validation" because 1) each result will be for the size of 2 WU now, and 2) it will take longer for the result to be uploaded (longer crunch time). This will cause clients to download less WU over time. By not having as many results "in the wild", the disk space requirements should lessen. Is this not correct?
ID: 688745 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 688759 - Posted: 4 Dec 2007, 21:47:29 UTC - in response to Message 688726.  


Two rebuttals:
#2: If the current system is not running optimally, then adding disk space is merely a bandaid. I would think the project should be driven to run optimally before scaling it up to the next level. This means enforcing a boinc 5.10 rule.


Rebuttal to your rebuttal:

Not sure why you think 5.10.x is required. Sure, it has some nicey-nice things, but the absolute bottom end to enforce should be what is enforced. As such, 5.8.17 would be the ideal candidate for a required minimum version as it supports the server-side aborts.

I bring this up because my experience with installing 5.10.28 was not very pleasant (it didn't work), and having to get back to 5.8.16 was a chore (3 installs to finally stabilize it). I'd have to update to .17, but I'd be a lot more comfortable with that than going to 5.10.x...
ID: 688759 · Report as offensive
TXR13
Volunteer tester

Send message
Joined: 25 Mar 02
Posts: 7
Credit: 201,180
RAC: 0
Canada
Message 688775 - Posted: 4 Dec 2007, 23:09:55 UTC - in response to Message 688759.  

5.8.17 would be the ideal candidate for a required minimum version as it supports the server-side aborts.


I could be wrong, of course, but doesn't 5.8.16 also support server-side aborts? I haven't looked at the changelog for those versions, but all my Linux clients run .16, and several of them working on SETI Beta got server-side abort messages just fine not two weeks ago...

Personally, I like the idea of a 5.8.16 minimum version, if it supports server-side aborts, especially since it's listed on the BOINC download page as a recommended version. 5.8.17 is available, of course, but you have to go hunting for it, which might be off-putting for some users. And of course, the 5.10.14+ clients force you to sit on your completed results for a full 24 hours before reporting, which annoys me greatly.

I like Linux at 5.8.16, and Windows at 5.10.13. :) Of course, that influences my thoughts about what a minimum version should be, but I do agree with the main thought, which is that the minimum version should be raised.
ID: 688775 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 688786 - Posted: 4 Dec 2007, 23:33:12 UTC - in response to Message 688759.  


Two rebuttals:
#2: If the current system is not running optimally, then adding disk space is merely a bandaid. I would think the project should be driven to run optimally before scaling it up to the next level. This means enforcing a boinc 5.10 rule.


Rebuttal to your rebuttal:

Not sure why you think 5.10.x is required. Sure, it has some nicey-nice things, but the absolute bottom end to enforce should be what is enforced. As such, 5.8.17 would be the ideal candidate for a required minimum version as it supports the server-side aborts.

I bring this up because my experience with installing 5.10.28 was not very pleasant (it didn't work), and having to get back to 5.8.16 was a chore (3 installs to finally stabilize it). I'd have to update to .17, but I'd be a lot more comfortable with that than going to 5.10.x...

There is no 5.8.17 for Windows, you'd need 5.9.3 or later to enable server aborts.

5.10.x allows queue settings up to 20 days, where older versions topped out at 10 days. As we can't see those preference settings for other users' hosts it is unclear how much that contributes to long turnarounds.
                                                                Joe
ID: 688786 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3224
Credit: 4,603,826
RAC: 0
United States
Message 688794 - Posted: 4 Dec 2007, 23:50:08 UTC - in response to Message 688775.  

And of course, the 5.10.14+ clients force you to sit on your completed results for a full 24 hours before reporting, which annoys me greatly.

This is on purpose to stop the pounding of the system. It was meant to be this way in the first place, and is finally working the way it's supposed to.

It doesn't hurt anything.


My movie https://vimeo.com/manage/videos/502242
ID: 688794 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 688807 - Posted: 5 Dec 2007, 0:18:46 UTC - in response to Message 688786.  


There is no 5.8.17 for Windows, you'd need 5.9.3 or later to enable server aborts.


OK. Then I would guess I'd need to either update to 5.9.xx, 5.10.xx, or if 5.8.17 is supported on Linux, fire up my Ubuntu VM and go through the VM. Due to my painful experience with 5.10.28, I decided that so long as any of the 3 projects I participate in didn't need me to be using something newer than 5.8.16, then I was not going to use anything newer. SETI and Einstein both only use IR=2 (so aborts are generally not needed), and LHC doesn't have the abort functionality turned on...

I don't know why 5.10.28 was such a pain for me, but it was. I had shut down the manager, and there isn't a service on this system (my AMD), so I don't know why the install choked. Getting back running took an install and then a repair reinstall, followed by a repair reinstall the following time after a reboot to just get me back going again with 5.8.16.

ID: 688807 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 688811 - Posted: 5 Dec 2007, 0:33:04 UTC - in response to Message 688794.  

And of course, the 5.10.14+ clients force you to sit on your completed results for a full 24 hours before reporting, which annoys me greatly.

This is on purpose to stop the pounding of the system. It was meant to be this way in the first place, and is finally working the way it's supposed to.

It doesn't hurt anything.


Actually, it does hurt if your result is the one that is going to form a quorum. This means that there is a forced delay to forming the quorum and thus being able to possibly determine validity, declare a canonical result, and move the work down the assimilation pathway to be able to clear it from storage.

What I'm seeing develop is some BOINC developers are not fully thinking out their ideas. I appreciate that the goal (Rom's?) was to cut down on the database hits on result reporting. However, one must consider the effects of such all the way down the chain. If they thought about the consequence as for result storage and it was determined that the benefits outweighed the risks, then great. If, however, nobody even thought about that (much like it is apparent that there wasn't much thought put into auto-hiding messages if someone is banned), then that is a rushed design process and is ultimately not good for BOINC as a whole.

Brian
ID: 688811 · Report as offensive
Ncrab

Send message
Joined: 1 Dec 07
Posts: 10
Credit: 57,389
RAC: 0
Brazil
Message 688812 - Posted: 5 Dec 2007, 0:33:11 UTC - in response to Message 688741.  

I think I understand what you're getting at... You're saying produce a trace of the program execution without actually performing all the calculations.


If I understand, it will be a trace of the results of the program execution, but maybe we have saying the same thing.

Given the difficulty of making that validator program, it is probably not worth the effort.


The program actually already do this job, just do no log the "trace of the results" for future validation.
ID: 688812 · Report as offensive
TXR13
Volunteer tester

Send message
Joined: 25 Mar 02
Posts: 7
Credit: 201,180
RAC: 0
Canada
Message 688851 - Posted: 5 Dec 2007, 2:56:47 UTC - in response to Message 688794.  

And of course, the 5.10.14+ clients force you to sit on your completed results for a full 24 hours before reporting, which annoys me greatly.

This is on purpose to stop the pounding of the system. It was meant to be this way in the first place, and is finally working the way it's supposed to.

It doesn't hurt anything.


I'm not saying it hurts anything. I know it doesn't hurt anything. It just annoys me, mainly because I like turning work around promptly. I'm not trying to hammer the servers, and I'm perfectly willing to wait the normal two and a half hours between result upload and result reporting. I just don't want to wait 24 hours. It seems overly long, to me.

How is it pounding the system to be waiting a couple hours in the first place? How does increasing that to 24 hours help?
ID: 688851 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 688854 - Posted: 5 Dec 2007, 3:21:24 UTC - in response to Message 688851.  


I'm not saying it hurts anything. I know it doesn't hurt anything.


Actually, as I pointed out, it can hurt the storage situation that is currently being discussed if what is waiting to be reported is what forms a quorum.


How is it pounding the system to be waiting a couple hours in the first place? How does increasing that to 24 hours help?


See this post from Joe to me to see the database overhead savings by reporting more than one result at a time.
ID: 688854 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Technical News : Father Padilla Meets the Perfect Gnat (Dec 03 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.