Message boards :
Number crunching :
AP Optimized r555 vs r1797
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
Ok, if you say it 'works' with scientific notation and 'does not work' when typing out the digits how exactly do you define 'work'? Do you really want me to answer to this one? So, without geting upset myself (at least I try not to), clarifying a few points. a) I don't think you are an idiot or a 'f***wit'. If you got the impression I was inferring that or being patronising or something, will you please give me credit for being a non-native speaker? b)Well, of course you woudn't have posted that it made a difference if it didn't. And I'm sure you were puzzled as well and tried both a few times. Which leaves us with the initial question: Why does it work in most cases, but refuses to take non-scientific in some? If anything the scientific notation should be problematic. c)when I said 'scientific is easier and less error-prone' it was for the benefit of the lurkers and the people who write readmes. I wasn't about to imply that you can't count. d)when I ask for clarification of what 'work' and 'doesn't work' exactly mean, I was actually expecting a 'with scientific notation the estimates change to good figures - without they don't change at all' I was after what happens with non-scientific notation - nothing? something but not the right thing? I ain't clairvoyant and I'm not the one sitting in front of the rig [wish I were in some cases, would make things sooo much easier...]. At this point you (and others) have found a problem. Now we get to the difficult bit of trying to figure out when the problem occurs. If you can't repro it yourself (and that's all I wanted to imply with 'worked for me' - that it's not broken as such, only sometimes), then you are at the mercy of the abilities of the person experienceing the problem. Before any bug can be fixed, you first have to find the circumstances under which it occurs. At least with such elusive bugs as this. I can't fix it either. I am just very good at trippin gover bugs and analysing boinc. And I can write bug reps that David understands ;) So, and now I'll have a look at your next post with the technical bits. A person who won't read has no advantage over one who can't read. (Mark Twain) |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
After my outburst above, here is some information that might prove useful. Right. So boinc was seeing the value... What matters in the end is what ends up in client_state.xml. You are basically replacing a server generated <flops> entry (==APR) with your own value. At the same time Boinc switches the mechanism it uses to finetune and deliver estimates. I then removed the flops entry completely and the ETA's blew out to 258 hours. Nearly 100 Hours more than the original estimate. yes, that's a known issue with adding/removing flops. which is why the recommenddation has always been to change them on an empty cache. The extact reason is a bit intricate and the last dcf/flops discussion was at least a year ago. Also boinc changes over time. So I don't remember the exact details now. it has to do with how the estimates are calculated and what fieleds oinc uses under which circumstances. However, when I fired up the NV_r1761 app for the first time, the ETA's were ~160 hours. I entered the flops value in long format and the ETA's did not change. You wouldn't by chance remember (or be able to find) the exact post? There should be no difference in the mechanism between MB and AP. the only difference is the magnitude of the <flops> entered. I then entered the flops value this way and on a restart the ETA's were within five minutes of the actual crunching time. No, not new installs, just new entries on the app details page. Will probably also occur when changinh card to something with very different speed as then APR doesn't match. @ Sleepy. Since you would have only completed a few units. Could you please repeat the experiment I did and report what you find ? We have a suspicioun that your boinc version might be part of the problem. In any case, as I pointed out at the start, what matters is what ends up in client_state.xml So please try the following (and @Sleepy every change of app_info.xml requires a boinc restart to work): Look at client_state.xml and find the <app_version> entry for AP. make a note of the <flops> entry. It should match what you put in the scientific notation - but with all digits. change to long notation check client_state.xml again - what does the field read now? change back to scientific and recheck. Sorry if I'm makeing you jump through hoops. We're trying to figure out if the value is passed along properly or if maybe some capping is interfering. A person who won't read has no advantage over one who can't read. (Mark Twain) |
Sleepy Send message Joined: 21 May 99 Posts: 219 Credit: 98,947,784 RAC: 28,360 |
@ Sleepy. Since you would have only completed a few units. Could you please repeat the experiment I did and report what you find ? In my case it happened as you described. Now, in scientific format, I can (as it should normally be) set any completion time I wish by changing the flops value. I tried before and did not work with the non scientific format. I have only completed 2 AP WUs since I put everything right, so I cannot tell how everything will behave in the long run. I will try to test this in some more months (? at the rate I can get AP WUs this seems to be a reasonable time frame). Happy crunching! :-) Sleepy |
Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0 |
While Im using an older version of BOINC and I dont have an AMD GPU, I did a little and easy test to try to see if there is any difference using different notations. What Ive did was to suspend all the crunchings and the network activity to not mess with the weird flops ive used during the tests. And then Ive changed the flops using both notations using values 10 times greather each time (in my case it was 3.29GF, 32.9GF, 329GF, 3290F and 32900GF) so the biggest value was a string with 14 characteres. In each case after restarting BOINC the speed was aknoledged correctly by BOINC and the remaining estimated time was reduced (but not linealy, which I guess is the right thing because it uses also the already elapsed time). The point is that independently of the notation used the remaining estimated changed to the same value each time. All this on BOINC 6.10 viewed remotelly with BM 6.12 (which gives the speed on the properties window of the tasks). It took less than 5 mins. to test all this and everything went back to the original state after restoring the flops to the original value. It might be worth to do the same test on a newer version of BOINC and specifically in a host with an AMD GPU with an AP in progress... |
Filipe Send message Joined: 12 Aug 00 Posts: 218 Credit: 21,281,677 RAC: 20 |
After receiving a couple of PM's on this subject, may I point out that for Astropulse the flops entry has be in scientific notation format. i.e XXXXe0x, where x is the number of zeros after the integer eg, 9 for Gigaflops, 8 for 100's of Megaflops etc. Where in the app.info.xml should i ad the flops counter? <app_info> <app> <name>astropulse_v6</name> </app> <file_info> <name>AP6_win_x86_SSE2_OpenCL_NV_r1761.exe</name> <executable/> </file_info> <file_info> <name>libfftw3f-3.dll</name> </file_info> <file_info> <name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</name> </file_info> <app_version> <app_name>astropulse_v6</app_name> <version_num>604</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.04</avg_ncpus> <max_ncpus>0.2</max_ncpus> <plan_class>cuda_opencl_100</plan_class> <coproc> <type>CUDA</type> <count>1</count> <flops>486e09</flops> </coproc> <file_ref> <file_name>AP6_win_x86_SSE2_OpenCL_NV_r1761.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>libfftw3f-3.dll</file_name> </file_ref> <file_ref> <file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name> <open_name>ap_cmdline.txt</open_name> </file_ref> </app_version> <app_version> <app_name>astropulse_v6</app_name> <version_num>604</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.04</avg_ncpus> <max_ncpus>0.2</max_ncpus> <plan_class>opencl_nvidia_100</plan_class> <coproc> <type>CUDA</type> <count>1</count> <flops>486e09</flops> </coproc> <file_ref> <file_name>AP6_win_x86_SSE2_OpenCL_NV_r1761.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>libfftw3f-3.dll</file_name> </file_ref> <file_ref> <file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name> <open_name>ap_cmdline.txt</open_name> </file_ref> </app_version> </app_info> |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Where in the app.info.xml should i ad the flops counter? NOT inside the <coproc> block. By convention, usually on the line after the <plan_class>, but anywhere at that level will do. Refer to The format of app_info.xml if in doubt. |
Filipe Send message Joined: 12 Aug 00 Posts: 218 Credit: 21,281,677 RAC: 20 |
Thank you Richard. It´s working now. |
trader Send message Joined: 25 Jun 00 Posts: 126 Credit: 4,968,173 RAC: 0 |
couldn't someone just put an extra large number in there and use that to get more ap work (i know the 100wu per physical cpu will apply) |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
couldn't someone just put an extra large number in there and use that to get more ap work (i know the 100wu per physical cpu will apply) What's the point, when you can increase your cache size in number of days and get extra work that way, more likely all work will end up as Maximum Time Exceeded because the number was too Big, and the runtimes weren't fast enough. Claggy |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
couldn't someone just put an extra large number in there and use that to get more ap work (i know the 100wu per physical cpu will apply) You can put whatever you want in there. However you may run into -177 errors initially. After a while the your DCF may be adjusted to correct for the value, or if also doing CPU tasks it will bounce around all over the place. In the end a drastically off number is a disadvantage for you. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.