Calibrating Client v5.3.12.tx37 - fair credits in all projects

Author	Message
trux Volunteer tester Send message Joined: 6 Feb 01 Posts: 344 Credit: 1,127,051 RAC: 0	Message 226153 - Posted: 5 Jan 2006, 4:16:38 UTC Last modified: 5 Jan 2006, 4:19:09 UTC I feel this topic may trigger a huge controversy and flame wars, so before releasing the new BOINC core client v5.3.6.tx20, I think I first give you some information and watch the reactions :) I have a new version of the BOINC core client, that calibrates result times (not the benchmark), with the use of the already present duration_correction_factor and another piece of algorithm taking in view other aspects related to the credit, benchmarks and WU time. By default, the client works exactly as the standard one. It is compiled optimized, but not aggressively - no CPU or SSE specific optimization is used, hence the Windows client can be used on all common CPU's. Therefore also, it should claim rather fair credits for most projects. Then there is a configuration option that enables automatic calibration of the reported final WU time so, that the claimed credit respects the real and fair value of the WU. It may work at one, multiple, or with all projects. Normally, the reference WU, when calculated by a reference machines, results in credit of 32.32 Cobblestones. Approximately the same credit should be granted for any full-length S@H WU. Unfortunately, due to the optimized S@H applications, this is rarely the case. This is the reason people are hunting for clients with higher benchmarks, and some people even try to manipulate them manually. That is rather unfair, especially if the host uses to work on multiple projects. The credit claimed for the work at other projects may be then far over the normal level, which is certainly not fair, should be avoided, and can even trigger alarms or bans on the project server. I spoke about the necessity of bringing some calibration system already many months ago, at my very first core client. Correctly, to be just and secure, it needs to be driven from the server. I hoped the official team comes with a solution, but they have much too many tasks on the schedule to care about this. Besides it, at S@H Enhanced, the problem seems to be solved by counting the FP operations instead of the time spent on it (well, it is solved only until the next optimizer brings a version that crunches the same results with half the FP operations, but that's another topic :). However, already with the means we have, we can make it fairer than it is currently, when the claimed credit for the same unit from different hosts may vary by ratio of 10 and more. I wrote the necessary changes for the self-calibrating client and am just testing it. So far it looks like it may work pretty well, but I need the servers to be on, for completing the tests. OK, now how it works: When the Credit Calibration option is enabled in the configuration file, the client will modify the final WU time, so that it respects the rules and the real value of the WU, better than just using plain benchmarks. It uses the built-in correction coefficient that self-adjusts its value with each WU. Additionally, the ratio of FPOPS and IOPS benchmarks is taken in the calculation. Resulting Claimed Credits (CC) for full-length S@H units should be close to the ideal 32 Cobblestones, and accordingly smaller at shorter units. In similar way, the client is also capable to adjust the final WU time, and the CC at other projects. However, since the benchmarks of the client are performing faster than the official one, at projects with no optimized application, the correction will usually result in shorter corrected final WU time than the time actually spent on the calculation. It also means smaller Claimed Credit. And that is actually the wanted result and the real purpose - we do not want to claim unfairly high credits at projects with no optimization. I am really curious about the reactions. That includes comments of the trolls who used to jump up each time a new optimized core came out. If the reactions are too negative, or if the official team jumps in with a veto, I'll certainly not publish the client, but otherwise it is ready. PS EDIT: you may use the small voting boxes under this post to express your opinion: + you are looking forward to the release - you think it is illegal or unfair and should be dumped trux BOINC software Freediving Team Czech Republic ID: 226153 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 226158 - Posted: 5 Jan 2006, 4:25:35 UTC First, it looks as if S@H is moving to Flop Counting which will eliminate the need for this correction. I would like to see some actual numbers - I would expect that this client would almost always produce the median results. If it does stand up after analysis, I would hope that you would be willing to submit the code back to Berkeley to be incorporated as part of BOINC. BOINC WIKI ID: 226158 ·

trux Volunteer tester Send message Joined: 6 Feb 01 Posts: 344 Credit: 1,127,051 RAC: 0	Message 226163 - Posted: 5 Jan 2006, 4:32:55 UTC - in response to Message 226158. Last modified: 5 Jan 2006, 4:40:17 UTC First, it looks as if S@H is moving to Flop Counting which will eliminate the need for this correction. Not entirely - read my post, I am speaking about it too. I would like to see some actual numbers - I would expect that this client would almost always produce the median results. No, certainly not If it does stand up after analysis, I would hope that you would be willing to submit the code back to Berkeley to be incorporated as part of BOINC. Sure, my source code is always available on my server, and it will be the same with this version, of course, too trux BOINC software Freediving Team Czech Republic ID: 226163 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 226166 - Posted: 5 Jan 2006, 4:45:46 UTC - in response to Message 226163. First, it looks as if S@H is moving to Flop Counting which will eliminate the need for this correction. Not entirely - read my post, I am speaking about it too. OK, I am still missing the discussion of Flops Counting by the project apps. I would like to see some actual numbers - I would expect that this client would almost always produce the median results. No, certainly not Why no numbers? Run it with an optimized app for a while and post the claimed credits along with the claimed credits for the other computers. A couple of dozen should be fairly convincing. If it does stand up after analysis, I would hope that you would be willing to submit the code back to Berkeley to be incorporated as part of BOINC. Sure, my source code is always available on my server, and it will be the same with this version, of course, too The BOINC development team always wants CVS diffs emailed to them for proposed patches. They will not really accept anything else. Mostly because they don't have teh time to browse through several different code bases to try to find the relevant fixes. BOINC WIKI ID: 226166 ·

trux Volunteer tester Send message Joined: 6 Feb 01 Posts: 344 Credit: 1,127,051 RAC: 0	Message 226173 - Posted: 5 Jan 2006, 4:55:45 UTC - in response to Message 226166. Last modified: 5 Jan 2006, 4:57:33 UTC OK, I am still missing the discussion of Flops Counting by the project apps. In my initial post, I am telling the Flop counting will work only untill someone comes with a new cruncher that uses much less FP opearations to get the same result. It means it will require some adjustment too. Why no numbers? Why being such a bully? :) As soon as I have couple of dozens of results, and as soon as the servers validate them, I'll certainly post them. There is nothing to hide :) The BOINC development team always wants CVS diffs emailed to them for proposed patches. They will not really accept anything else. Mostly because they don't have teh time to browse through several different code bases to try to find the relevant fixes. If they are really interested, it is certainly not a big deal to make the diff. As a volunteer developer, you can even make it yourself and submit to them - I'll be only thankful. trux BOINC software Freediving Team Czech Republic ID: 226173 ·

Skip Davis Send message Joined: 22 Dec 00 Posts: 44 Credit: 2,565,939 RAC: 0	Message 226206 - Posted: 5 Jan 2006, 6:22:00 UTC Trux please send me and email @ skippr_d@yahoo.com ID: 226206 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 226242 - Posted: 5 Jan 2006, 9:55:45 UTC - in response to Message 226173. OK, I am still missing the discussion of Flops Counting by the project apps. In my initial post, I am telling the Flop counting will work only untill someone comes with a new cruncher that uses much less FP opearations to get the same result. It means it will require some adjustment too. Um, not really. The FLOPS counting has a function call where the counted number of operations for a block of code is submitted. So, the point being is that they have "pre-counted" the operations in a block. How many actual CPU operations are done is not relevant and not counted. So, if you vectorize a section of code that is 2,000 Operations, it will still show as 2,000, even if the vector operation is done as one instruction. Because you are still doing 2,000 floating point operations. So, optimization should not change the output numbers, assuming of course that the code is not "adjusted" ... :) I don't quite see the point of your effort when the enhanced SETI@Home Science Application is around the corner ... also Einstein@Home and Rosetta@Home have at least indicated they will look into adding FLOP counting to their applications also ... I will be interested in the numbers too ... ID: 226242 ·

trux Volunteer tester Send message Joined: 6 Feb 01 Posts: 344 Credit: 1,127,051 RAC: 0	Message 226317 - Posted: 5 Jan 2006, 13:14:38 UTC - in response to Message 226242. Last modified: 5 Jan 2006, 13:47:51 UTC The FLOPS counting has a function call where the counted number of operations for a block of code is submitted. So, the point being is that they have "pre-counted" the operations in a block. How many actual CPU operations are done is not relevant and not counted. So, if you vectorize a section of code that is 2,000 Operations, it will still show as 2,000, even if the vector operation is done as one instruction. Because you are still doing 2,000 floating point operations. You misunderstood, Paul. As I wrote, the Flop counting will be useless in the moment a developer comes with a clever trick that modifies the algorithm in the way, it uses a different number of operations. I mean something similar to what happened when Hans Dorn with Harals Naparstd introduced the caching, or even something more radical. I know that caching is already built in the Enhanced version, but it does not mean new improvemnt cannot be found. Once a new more efficient algorithm is developped, there is no reason it should claim less credit than the older software for the same units. I don't quite see the point of your effort when the enhanced SETI@Home Science Application is around the corner ... It is around the corner since months, and although I may be wrong, I have the feeling it still can be there couple of moons more. Even then, I bet a lot of people will hesitate upgrading and will wait till the last moment when the old client is banned in some way. And that can take even more time. In the time being, the situation is unaceptable since you have as huge differences in the claimed credit for the very same WU as for example 5 at one host and 100 at another one. I do not find it all right. trux BOINC software Freediving Team Czech Republic ID: 226317 ·

trux Volunteer tester Send message Joined: 6 Feb 01 Posts: 344 Credit: 1,127,051 RAC: 0	Message 226323 - Posted: 5 Jan 2006, 13:21:59 UTC - in response to Message 226206. Last modified: 5 Jan 2006, 13:29:22 UTC Trux please send me and email @ skippr_d@yahoo.com You can contact me at the address BOINC on my domain truXoft.com, but please do only if you have something important to tell. I am receiving daily many hundreds of emails and wouldn't like that people start contacting me with questions they can place here. I use email mainly for my daily job and for time reasons may have to ignore anything unimportant. No beta tester offers please! No requests for code please - it will be posted in the same moment as the software (if the client is ever released) trux BOINC software Freediving Team Czech Republic ID: 226323 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 226329 - Posted: 5 Jan 2006, 13:55:15 UTC You misunderstood, Paul. As I wrote, the Flop counting will be useless in the moment a developer comes with a clever trick that modifies the algorithm in the way, it uses a different number of operations. I mean something similar to what happened when Hans Dorn with Harals Naparstd introduced the caching, or even something more radical. I know that caching is already built in the Enhanced version, but it does not mean new improvemnt cannot be found. Once a new more efficient algorithm is developped, there is no reason it should claim less credit than the older software for the same units. Ah, ok, got it now ... ID: 226329 ·

skab Send message Joined: 13 Mar 03 Posts: 18 Credit: 2,874,929 RAC: 0	Message 226335 - Posted: 5 Jan 2006, 14:15:43 UTC "the client will modify the final WU time" This statement concerns me to some degree. Are you saying that the WU time as reported will be adjusted or just adjusted in tha calculation for credit. Other than that I think this would be a fair solution to this problem, I see posts about people holding back completed wu's so that the slower computers can set the credit and they don't get shorted on there credits. When I look at some of my machines the wu's are only claiming 3 or 4 credits were other machines are claiming closer to 30 credits and the granted credits end up in the 20's. It almost sounds like this would bring us back to getting credit for the work performed as the seti classic was thought to do instead of the time used to do the work. SETI, ONLY SETI, ALWAYS SETI!! ID: 226335 ·

Ingleside Volunteer developer Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13	Message 226336 - Posted: 5 Jan 2006, 14:16:55 UTC - in response to Message 226317. You misunderstood, Paul. As I wrote, the Flop counting will be useless in the moment a developer comes with a clever trick that modifies the algorithm in the way, it uses a different number of operations. I mean something similar to what happened when Hans Dorn with Harals Naparstd introduced the caching, or even something more radical. I know that caching is already built in the Enhanced version, but it does not mean new improvemnt cannot be found. Once a new more efficient algorithm is developped, there is no reason it should claim less credit than the older software for the same units. Well, if an applications implementation of "flops-counting" is to after a loop just multiply loop-count with a constant, even if the inner workings of the loop is changed so instead of actually calculating anything just looks-up info in a previously cached table, you'll still get the exact same flops-count... Even if finds an improvement so example half the time doesn't need to run the loop at all, this just means you'll need to change the constant you'll going to multiply with in the end. If a "flops-loop" can be removed, either add a constant, or change the constant in another "flops-counting"-loop. You'll likely not get exactly the same flops-count as earlier, but example 1% variation isn't really a problem, since variations in cpu-time re-running the exact same result on same computer can be much higher. ID: 226336 ·

trux Volunteer tester Send message Joined: 6 Feb 01 Posts: 344 Credit: 1,127,051 RAC: 0	Message 226346 - Posted: 5 Jan 2006, 15:00:52 UTC - in response to Message 226336. Last modified: 5 Jan 2006, 15:32:04 UTC Well, if an applications implementation of "flops-counting" is to after a loop just multiply loop-count with a constant, even if the inner workings of the loop is changed so instead of actually calculating anything just looks-up info in a previously cached table, you'll still get the exact same flops-count... Yup, in some cases, you can simply use a correction coefficient in the project application, assuming the loop count is the same. But you can also have code improvements that completely change the structure; change the order of execution; extract and calculate some parts of the code out of order; or calculate with dynamically variable number of loops, depending on the real-time situation; etc, etc,... With the development, the loop or operation counting will prove as few reliable as the current cpu time counting. You have to face it. Operation counting is certainly better, but it is not a miraculous solution. trux BOINC software Freediving Team Czech Republic ID: 226346 ·

trux Volunteer tester Send message Joined: 6 Feb 01 Posts: 344 Credit: 1,127,051 RAC: 0	Message 226347 - Posted: 5 Jan 2006, 15:09:03 UTC - in response to Message 226335. Last modified: 5 Jan 2006, 15:14:47 UTC "the client will modify the final WU time" This statement concerns me to some degree. The Claimed Credit is calculated on the server side, not on the client side. In case of the current S@H application, it is based on two pieces of data: benchmarks (fpops and iops), and the final WU time. Modifying any of them was an option when developing the calibrating client, but finally, for several reasons, I opted for the time modification. I am sure there will be people asking it the other way, but currently it is simply so. The real time is being reported by the client too, and will be visible in the details of the unit in the result page, though. It almost sounds like this would bring us back to getting credit for the work performed as the seti classic was thought to do instead of the time used to do the work. Not at all! At Classic, you received the same credit for every WU, regardless how long it actually was. Same for a 5 sec interrupted WU as for the full-lenght WU. This was the exact reason to introduce the measured credits at BOINC. This does not happen with the calibrated client - it claims close to the reference value of 32 for a full lenght WU, but proportionally less for shorter or noisy WU's. trux BOINC software Freediving Team Czech Republic ID: 226347 ·

Ingleside Volunteer developer Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13	Message 226361 - Posted: 5 Jan 2006, 15:37:09 UTC - in response to Message 226347. Last modified: 5 Jan 2006, 15:40:01 UTC The Claimed Credit is calculated on the server side, not on the client side. In case of the current S@H application, it is based on two pieces of data: benchmarks (fpops and iops), and the final WU time. Modifying any of them was an option when developing the calibrating client, but finally, for several reasons, I opted for the time modification. I am sure there will be people asking it the other way, but for the moment it is simply so. The real time is being reported by the client too, and will be visible in the details of the unit in the result page, though. Well, if you're starting to mess with the reported cpu-times you're asking for problems... Besides, as BOINC alpha has shown, even re-running the exact same wu on same computer can give over 30% variation in cpu-time, so how would you accurately calibrate anything based on cpu-time? ID: 226361 ·

trux Volunteer tester Send message Joined: 6 Feb 01 Posts: 344 Credit: 1,127,051 RAC: 0	Message 226365 - Posted: 5 Jan 2006, 15:51:09 UTC - in response to Message 226361. Last modified: 5 Jan 2006, 16:17:28 UTC Well, if you're starting to mess with the reported cpu-times you're asking for problems... Depends what type of problems you mean. At some of them, I am ready to face them and resist. Others may indeed mean the change of the approach. This is exactly one of the reasons I am not yet releasing the client but testing it myself, and opening this discussion in advance. Besides, as BOINC alpha has shown, even re-running the exact same wu on same computer can give over 30% variation in cpu-time, so how would you accurately calibrate anything based on cpu-time? Similarly as the built-in time estimation is done, with some modifications and improvements. I am aware that it is not 100% exact, but still it is by distance much more precise than the current benchmarking that gives Claimed Credits varying in range of thousands of percents at different hosts calculating the very same unit. trux BOINC software Freediving Team Czech Republic ID: 226365 ·

Ingleside Volunteer developer Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13	Message 226374 - Posted: 5 Jan 2006, 16:12:35 UTC - in response to Message 226365. Depends what type of problems you mean. Some of them I am ready to face and resist. Others may indeed mean the change of the approach. This is exactly one of the reasons I am not yet releasing the client but testing it myself, and opening this discussion in advance. Impossible cpu-times is corrected and logged server-side, your client will give a jump in #logged results, so if a project decides to also start to penalize these users trying to cheat... ID: 226374 ·

trux Volunteer tester Send message Joined: 6 Feb 01 Posts: 344 Credit: 1,127,051 RAC: 0	Message 226382 - Posted: 5 Jan 2006, 16:24:49 UTC - in response to Message 226374. Impossible cpu-times is corrected and logged server-side, your client will give a jump in #logged results, so if a project decides to also start to penalize these users trying to cheat... The correction is not necessarily going to be out of realistic ranges, and the total CPU time may still remain quite acceptable, especially at machines not running 24/24. However, I admit, that it could be indeed of a concern anyway, and may lead to the change of my approach, and to modifying it to adjusting the benchmarks instead, although it has other disatvantages. As I wrote, this is the reason I am currently testing it. Can you point me to the part of the source code in the server software that handles it? trux BOINC software Freediving Team Czech Republic ID: 226382 ·

Ingleside Volunteer developer Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13	Message 226398 - Posted: 5 Jan 2006, 16:55:06 UTC - in response to Message 226382. Last modified: 5 Jan 2006, 16:56:36 UTC Can you point me to the part of the source code in the server software that handles it? sched/handle_request.C line 602-617. ID: 226398 ·

trux Volunteer tester Send message Joined: 6 Feb 01 Posts: 344 Credit: 1,127,051 RAC: 0	Message 226586 - Posted: 5 Jan 2006, 23:44:49 UTC - in response to Message 226398. sched/handle_request.C line 602-617. Thanks, Ingleside, this is a valuable information. It indeed looks like I may have to change the method and use benchmarks instead. trux BOINC software Freediving Team Czech Republic ID: 226586 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.