AP Optimized r555 vs r1797

Message boards : Number crunching : AP Optimized r555 vs r1797
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1355756 - Posted: 11 Apr 2013, 16:40:06 UTC - in response to Message 1355693.  

Ok, if you say it 'works' with scientific notation and 'does not work' when typing out the digits how exactly do you define 'work'?

In my case, once the flops value was entered in the format XXXXe0x it brought the ETA down from ~150 Hrs to 38 minutes with nothing else being touched. BOINC immediately downloaded more AP units for the GPU instead of waiting till the running unit completed before requesting more units and only being given one.

Fellas please. Give me some credit for not being a complete idiot !

I can find my way around an app_info file, I can count to 9 without using my fingers and I do make sure of my facts before before posting.

In the past I've some some good wins, which include finding the reason for a quirk in the use of the original Rescheduling program and (not SAH related) a work around for a bug in the Fedora installation program which had all the "gurus" stumped. There are others, but I can't recall them atm.

I agree it should not make any difference what format the flops value is entered in but evidently it does and I'm not the only one to find this.

I've used non-scientific notation and it did its job.

So have I ! But in this case it didn't work. Do you really think I would have posted if it did ?

Because I'm not a programmer, I can't fix it. So I'm passing it on to those who's area of expertise is in that domain.

So stop inferring I'm a f***wit and if something needs to be fixed, fix it.

T.A.


Do you really want me to answer to this one?

So, without geting upset myself (at least I try not to), clarifying a few points.
a) I don't think you are an idiot or a 'f***wit'. If you got the impression I was inferring that or being patronising or something, will you please give me credit for being a non-native speaker?

b)Well, of course you woudn't have posted that it made a difference if it didn't. And I'm sure you were puzzled as well and tried both a few times. Which leaves us with the initial question: Why does it work in most cases, but refuses to take non-scientific in some? If anything the scientific notation should be problematic.

c)when I said 'scientific is easier and less error-prone' it was for the benefit of the lurkers and the people who write readmes. I wasn't about to imply that you can't count.

d)when I ask for clarification of what 'work' and 'doesn't work' exactly mean, I was actually expecting a 'with scientific notation the estimates change to good figures - without they don't change at all' I was after what happens with non-scientific notation - nothing? something but not the right thing?
I ain't clairvoyant and I'm not the one sitting in front of the rig [wish I were in some cases, would make things sooo much easier...].
At this point you (and others) have found a problem. Now we get to the difficult bit of trying to figure out when the problem occurs. If you can't repro it yourself (and that's all I wanted to imply with 'worked for me' - that it's not broken as such, only sometimes), then you are at the mercy of the abilities of the person experienceing the problem.
Before any bug can be fixed, you first have to find the circumstances under which it occurs. At least with such elusive bugs as this.

I can't fix it either. I am just very good at trippin gover bugs and analysing boinc. And I can write bug reps that David understands ;)

So, and now I'll have a look at your next post with the technical bits.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1355756 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1355775 - Posted: 11 Apr 2013, 17:10:38 UTC - in response to Message 1355716.  

After my outburst above, here is some information that might prove useful.

The ETA's for GPU units on my rig were currently sitting at 43 minutes.

I stopped the rig and edited the app_info file, entering the flops value in the long format. On restart the GPU ETA's were 50 minutes.

Right. So boinc was seeing the value...
What matters in the end is what ends up in client_state.xml.
You are basically replacing a server generated <flops> entry (==APR) with your own value. At the same time Boinc switches the mechanism it uses to finetune and deliver estimates.

I then removed the flops entry completely and the ETA's blew out to 258 hours. Nearly 100 Hours more than the original estimate.

yes, that's a known issue with adding/removing flops. which is why the recommenddation has always been to change them on an empty cache.
The extact reason is a bit intricate and the last dcf/flops discussion was at least a year ago. Also boinc changes over time. So I don't remember the exact details now. it has to do with how the estimates are calculated and what fieleds oinc uses under which circumstances.

However, when I fired up the NV_r1761 app for the first time, the ETA's were ~160 hours. I entered the flops value in long format and the ETA's did not change.

I double checked my app_info file and the flops entry was in the correct position and had the correct number of zeros.

I then went looking, both here and on Lunatics because I vaguely remembered reading something that the flops value for AP was calculated differently than it was for MB. It was during this search that I found the reference that said for AP the flops entry had to be in scientific notation.

You wouldn't by chance remember (or be able to find) the exact post?
There should be no difference in the mechanism between MB and AP. the only difference is the magnitude of the <flops> entered.

I then entered the flops value this way and on a restart the ETA's were within five minutes of the actual crunching time.

Since then the rig has completed around 400 GPU units. So any server side calculations would now have had a chance to stabilise and under these conditions the long format entry works.

Maybe this problem only occurs on new installs of the app ?

No, not new installs, just new entries on the app details page. Will probably also occur when changinh card to something with very different speed as then APR doesn't match.

@ Sleepy. Since you would have only completed a few units. Could you please repeat the experiment I did and report what you find ?

Edited for clarity

T.A.

We have a suspicioun that your boinc version might be part of the problem.
In any case, as I pointed out at the start, what matters is what ends up in client_state.xml

So please try the following (and @Sleepy every change of app_info.xml requires a boinc restart to work):

Look at client_state.xml and find the <app_version> entry for AP. make a note of the <flops> entry. It should match what you put in the scientific notation - but with all digits.

change to long notation

check client_state.xml again - what does the field read now?

change back to scientific and recheck.

Sorry if I'm makeing you jump through hoops. We're trying to figure out if the value is passed along properly or if maybe some capping is interfering.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1355775 · Report as offensive
Sleepy
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 219
Credit: 98,947,784
RAC: 28,360
Italy
Message 1355777 - Posted: 11 Apr 2013, 17:17:17 UTC - in response to Message 1355716.  

@ Sleepy. Since you would have only completed a few units. Could you please repeat the experiment I did and report what you find ?


In my case it happened as you described.
Now, in scientific format, I can (as it should normally be) set any completion time I wish by changing the flops value.
I tried before and did not work with the non scientific format.

I have only completed 2 AP WUs since I put everything right, so I cannot tell how everything will behave in the long run.

I will try to test this in some more months (? at the rate I can get AP WUs this seems to be a reasonable time frame).

Happy crunching! :-)

Sleepy

ID: 1355777 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1355781 - Posted: 11 Apr 2013, 17:20:58 UTC - in response to Message 1355756.  

While Im using an older version of BOINC and I dont have an AMD GPU, I did a little and easy test to try to see if there is any difference using different notations.

What Ive did was to suspend all the crunchings and the network activity to not mess with the weird flops ive used during the tests.
And then Ive changed the flops using both notations using values 10 times greather each time (in my case it was 3.29GF, 32.9GF, 329GF, 3290F and 32900GF) so the biggest value was a string with 14 characteres. In each case after restarting BOINC the speed was aknoledged correctly by BOINC and the remaining estimated time was reduced (but not linealy, which I guess is the right thing because it uses also the already elapsed time).
The point is that independently of the notation used the remaining estimated changed to the same value each time.

All this on BOINC 6.10 viewed remotelly with BM 6.12 (which gives the speed on the properties window of the tasks).
It took less than 5 mins. to test all this and everything went back to the original state after restoring the flops to the original value.

It might be worth to do the same test on a newer version of BOINC and specifically in a host with an AMD GPU with an AP in progress...
ID: 1355781 · Report as offensive
Filipe

Send message
Joined: 12 Aug 00
Posts: 218
Credit: 21,281,677
RAC: 20
Portugal
Message 1361059 - Posted: 25 Apr 2013, 13:55:34 UTC

After receiving a couple of PM's on this subject, may I point out that for Astropulse the flops entry has be in scientific notation format. i.e XXXXe0x, where x is the number of zeros after the integer eg, 9 for Gigaflops, 8 for 100's of Megaflops etc.

Thus the entry for my GTX470 (1120GF)is...
<flops>1120e09</flops>

For a GTX550Ti (486GF) it would be
<flops>486e09</flops>

for a GTX 580 (1679GF)
<flops>1679e09</flops


Where in the app.info.xml should i ad the flops counter?


<app_info>
<app>
<name>astropulse_v6</name>
</app>
<file_info>
<name>AP6_win_x86_SSE2_OpenCL_NV_r1761.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3.dll</name>
</file_info>
<file_info>
<name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</name>
</file_info>
<app_version>
<app_name>astropulse_v6</app_name>
<version_num>604</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_100</plan_class>
<coproc>
<type>CUDA</type>
<count>1</count>
<flops>486e09</flops>
</coproc>
<file_ref>
<file_name>AP6_win_x86_SSE2_OpenCL_NV_r1761.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v6</app_name>
<version_num>604</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_100</plan_class>
<coproc>
<type>CUDA</type>
<count>1</count>
<flops>486e09</flops>
</coproc>
<file_ref>
<file_name>AP6_win_x86_SSE2_OpenCL_NV_r1761.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
</app_info>
ID: 1361059 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1361060 - Posted: 25 Apr 2013, 14:01:57 UTC - in response to Message 1361059.  

Where in the app.info.xml should i ad the flops counter?

NOT inside the <coproc> block. By convention, usually on the line after the <plan_class>, but anywhere at that level will do.

Refer to The format of app_info.xml if in doubt.
ID: 1361060 · Report as offensive
Filipe

Send message
Joined: 12 Aug 00
Posts: 218
Credit: 21,281,677
RAC: 20
Portugal
Message 1361062 - Posted: 25 Apr 2013, 14:04:12 UTC

Thank you Richard.
It´s working now.
ID: 1361062 · Report as offensive
Profile trader
Volunteer tester

Send message
Joined: 25 Jun 00
Posts: 126
Credit: 4,968,173
RAC: 0
United States
Message 1361529 - Posted: 26 Apr 2013, 18:14:40 UTC - in response to Message 1361062.  

couldn't someone just put an extra large number in there and use that to get more ap work (i know the 100wu per physical cpu will apply)
ID: 1361529 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1361531 - Posted: 26 Apr 2013, 18:18:51 UTC - in response to Message 1361529.  

couldn't someone just put an extra large number in there and use that to get more ap work (i know the 100wu per physical cpu will apply)

What's the point, when you can increase your cache size in number of days and get extra work that way, more likely all work will end up as Maximum Time Exceeded because the number was too Big, and the runtimes weren't fast enough.

Claggy

ID: 1361531 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1361532 - Posted: 26 Apr 2013, 18:21:35 UTC - in response to Message 1361529.  

couldn't someone just put an extra large number in there and use that to get more ap work (i know the 100wu per physical cpu will apply)

You can put whatever you want in there. However you may run into -177 errors initially. After a while the your DCF may be adjusted to correct for the value, or if also doing CPU tasks it will bounce around all over the place.
In the end a drastically off number is a disadvantage for you.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1361532 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : AP Optimized r555 vs r1797


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.