Message boards :
Technical News :
Father Padilla Meets the Perfect Gnat (Dec 03 2007)
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
.....As for funky deadlines, bogus credit, etc. take that up with BOINC. I don't follow that too closely. Begging your pardon, Matt, but deadlines are very much a SETI (project) issue, and not a BOINC issue. They're set by the splitter/scheduler process, so perhaps we should ask Eric when he's been unchained from his grant proposal writing desk (and been let out for some recovery time, LOL). |
the silver surfer Send message Joined: 24 Feb 01 Posts: 131 Credit: 3,739,307 RAC: 0 |
Just a question : Why does SETI need two valid results, and, for example, ROSETTA only one ? (What`s the difference in the calculation process?) KURT |
[KWSN]John Galt 007 Send message Joined: 9 Nov 99 Posts: 2444 Credit: 25,086,197 RAC: 0 |
Just a question : Why does SETI need two valid results, and, for example, ROSETTA only one ? IIRC, Rosetta generates 1000's of workunits per model, and sends these out to everyone. That way you only need 1 result per workunit, but 1000's of results per model. Clk2HlpSetiCty:::PayIt4ward |
the silver surfer Send message Joined: 24 Feb 01 Posts: 131 Credit: 3,739,307 RAC: 0 |
@John Understood, Thank You very much ! KURT |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
This is all healthy discussion. Two quick points: Two rebuttals: #2: If the current system is not running optimally, then adding disk space is merely a bandaid. I would think the project should be driven to run optimally before scaling it up to the next level. This means enforcing a boinc 5.10 rule. #2: With the ability to perform a server-side wu abort, then a 3/2 rule might work. Late wu's could be terminated on the third wu. It is green in that you don't have to run more disk space. It does have the bandwidth problem, of course, but is this actually a problem? #1: I would like to see the analysis that says 1 wu couldn't work for seti. Given, false negatives are lost if only one wu is analyzed. But positives (valid and false positives) could be checked/validated by re-issuing wu's when a positive signal is reported by a client. I would think that additional wu's would then be generated near to the one giving a positive response, which adds to the confirmation process. In the meantime, we would go through and process a lot more data, and use a lot less disk space and energy. |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
This is all healthy discussion. Two quick points: Since you had no rebuttal to my suggestion of doubling WU size (see above), I'll take that to mean you are considering it. It's good to hear that the project has secured more disk space; that is going to help overall. |
Uioped1 Send message Joined: 17 Sep 03 Posts: 50 Credit: 1,179,926 RAC: 0 |
Just a question : Why does SETI need two valid results, and, for example, ROSETTA only one ? Rosetta actually has something akin to a replication of many thousands, however they cannot use the boinc replication feature due to the nature of their algorithm. If you look at the results scatterchart you will see this illustrated. Unlike S@H, which takes a result and calculates an exact figure for the number of pulses, etc. in that result, the rosetta algorithm guesses at what the best folding is. Then, the project scientists take all of those guesses, and through the use of statistical models produce a result that has a finite certainty of being within a finite distance of the 'correct' result. The key difference is that S@H does not guess, where rosetta does. Guesses by nature are only an approximation, and will not always be the same, so there can be no validation of an individual guess. A bad guesser is instead eliminated through statistics.
One feature that would be really nice would be the ability to have a replication of less than two. I suppose this would have to be grouped at the host level, making a new set of data tracking issues... imagine if you had a replication of 1.25 i.e. every fourth result returned by a host would be validated by a different host. Even better would be a dynamic replication figure, where hosts that returned bad results would have more of their results validated (or better just get fewer results between validated ones.) The implication that results are not scientifically valid unless every one has been checked twice is hogwash though. |
Ncrab Send message Joined: 1 Dec 07 Posts: 10 Credit: 57,389 RAC: 0 |
... Sounds definitive. And that is the reason I still thinking about. But is not easy to me to be explicit enough writing in english. About "checksums", it´s intend only for a kind of "encoding" partials results that Check-WU job has done. The idea was Check-WU do not recalculate the entire WU, but only key process (maybe track some existent dependencies...) and grab this results (or only fragments of code generated in that process) to a posterior "checksum-like" process. Clearly, not all redundance will be achieved. But a level of assurance. And this level can be tunning selecting appropriate key process. What about a beta doing this to compare with the erros actualy catched with the 2/2 scheme ? |
Uioped1 Send message Joined: 17 Sep 03 Posts: 50 Credit: 1,179,926 RAC: 0 |
Unfortunately, when a result is set to the client the workunit is not removed from the server. the WU is actually stored on both the host computer and on the server, so increasing the length of time that it takes a host to process a WU also increases the length of time it is stored on the server. Admittedly, if WUs take more time to complete you theoretically need fewer total WUs in process at a given time, but I suspect the gains would be wiped out by longer times where a WU is returned by a fast host and is waiting on a slower host. Mostly, increasing the WU time helps with bandwidth. |
Uioped1 Send message Joined: 17 Sep 03 Posts: 50 Credit: 1,179,926 RAC: 0 |
I think I understand what you're getting at... You're saying produce a trace of the program execution without actually performing all the calculations. you have one result in a WU that does the actual calculations, and a second who verifies that the first did what it said it did, hopefully much more quickly. This may or may not be possible; I am not familiar enough with the proceesing involved to say. I think that in general you will have to do a significant portion of the calculation in order to truly validate the result, and as the app becomes more and more optimized the fraction will approach 1. Given the difficulty of making that validator program, it is probably not worth the effort. |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Unfortunately, when a result is set to the client the workunit is not removed from the server. the WU is actually stored on both the host computer and on the server, so increasing the length of time that it takes a host to process a WU also increases the length of time it is stored on the server. Admittedly, if WUs take more time to complete you theoretically need fewer total WUs in process at a given time, but I suspect the gains would be wiped out by longer times where a WU is returned by a fast host and is waiting on a slower host. Mostly, increasing the WU time helps with bandwidth. My understanding is that there is, for each WU: Database entries The WU file itself Results returned by clients (awaiting validation) The change I was proposing should reduce the number of "results returned by clients awaiting validation" because 1) each result will be for the size of 2 WU now, and 2) it will take longer for the result to be uploaded (longer crunch time). This will cause clients to download less WU over time. By not having as many results "in the wild", the disk space requirements should lessen. Is this not correct? |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Rebuttal to your rebuttal: Not sure why you think 5.10.x is required. Sure, it has some nicey-nice things, but the absolute bottom end to enforce should be what is enforced. As such, 5.8.17 would be the ideal candidate for a required minimum version as it supports the server-side aborts. I bring this up because my experience with installing 5.10.28 was not very pleasant (it didn't work), and having to get back to 5.8.16 was a chore (3 installs to finally stabilize it). I'd have to update to .17, but I'd be a lot more comfortable with that than going to 5.10.x... |
TXR13 Send message Joined: 25 Mar 02 Posts: 7 Credit: 201,180 RAC: 0 |
5.8.17 would be the ideal candidate for a required minimum version as it supports the server-side aborts. I could be wrong, of course, but doesn't 5.8.16 also support server-side aborts? I haven't looked at the changelog for those versions, but all my Linux clients run .16, and several of them working on SETI Beta got server-side abort messages just fine not two weeks ago... Personally, I like the idea of a 5.8.16 minimum version, if it supports server-side aborts, especially since it's listed on the BOINC download page as a recommended version. 5.8.17 is available, of course, but you have to go hunting for it, which might be off-putting for some users. And of course, the 5.10.14+ clients force you to sit on your completed results for a full 24 hours before reporting, which annoys me greatly. I like Linux at 5.8.16, and Windows at 5.10.13. :) Of course, that influences my thoughts about what a minimum version should be, but I do agree with the main thought, which is that the minimum version should be raised. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
There is no 5.8.17 for Windows, you'd need 5.9.3 or later to enable server aborts. 5.10.x allows queue settings up to 20 days, where older versions topped out at 10 days. As we can't see those preference settings for other users' hosts it is unclear how much that contributes to long turnarounds. Joe |
Pooh Bear 27 Send message Joined: 14 Jul 03 Posts: 3224 Credit: 4,603,826 RAC: 0 |
And of course, the 5.10.14+ clients force you to sit on your completed results for a full 24 hours before reporting, which annoys me greatly. This is on purpose to stop the pounding of the system. It was meant to be this way in the first place, and is finally working the way it's supposed to. It doesn't hurt anything. My movie https://vimeo.com/manage/videos/502242 |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
OK. Then I would guess I'd need to either update to 5.9.xx, 5.10.xx, or if 5.8.17 is supported on Linux, fire up my Ubuntu VM and go through the VM. Due to my painful experience with 5.10.28, I decided that so long as any of the 3 projects I participate in didn't need me to be using something newer than 5.8.16, then I was not going to use anything newer. SETI and Einstein both only use IR=2 (so aborts are generally not needed), and LHC doesn't have the abort functionality turned on... I don't know why 5.10.28 was such a pain for me, but it was. I had shut down the manager, and there isn't a service on this system (my AMD), so I don't know why the install choked. Getting back running took an install and then a repair reinstall, followed by a repair reinstall the following time after a reboot to just get me back going again with 5.8.16. |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
And of course, the 5.10.14+ clients force you to sit on your completed results for a full 24 hours before reporting, which annoys me greatly. Actually, it does hurt if your result is the one that is going to form a quorum. This means that there is a forced delay to forming the quorum and thus being able to possibly determine validity, declare a canonical result, and move the work down the assimilation pathway to be able to clear it from storage. What I'm seeing develop is some BOINC developers are not fully thinking out their ideas. I appreciate that the goal (Rom's?) was to cut down on the database hits on result reporting. However, one must consider the effects of such all the way down the chain. If they thought about the consequence as for result storage and it was determined that the benefits outweighed the risks, then great. If, however, nobody even thought about that (much like it is apparent that there wasn't much thought put into auto-hiding messages if someone is banned), then that is a rushed design process and is ultimately not good for BOINC as a whole. Brian |
Ncrab Send message Joined: 1 Dec 07 Posts: 10 Credit: 57,389 RAC: 0 |
I think I understand what you're getting at... You're saying produce a trace of the program execution without actually performing all the calculations. If I understand, it will be a trace of the results of the program execution, but maybe we have saying the same thing. Given the difficulty of making that validator program, it is probably not worth the effort. The program actually already do this job, just do no log the "trace of the results" for future validation. |
TXR13 Send message Joined: 25 Mar 02 Posts: 7 Credit: 201,180 RAC: 0 |
And of course, the 5.10.14+ clients force you to sit on your completed results for a full 24 hours before reporting, which annoys me greatly. I'm not saying it hurts anything. I know it doesn't hurt anything. It just annoys me, mainly because I like turning work around promptly. I'm not trying to hammer the servers, and I'm perfectly willing to wait the normal two and a half hours between result upload and result reporting. I just don't want to wait 24 hours. It seems overly long, to me. How is it pounding the system to be waiting a couple hours in the first place? How does increasing that to 24 hours help? |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Actually, as I pointed out, it can hurt the storage situation that is currently being discussed if what is waiting to be reported is what forms a quorum.
See this post from Joe to me to see the database overhead savings by reporting more than one result at a time. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.