Same WU takes more time in Lunatics (ATI) than only CPU ?


log in

Advanced search

Message boards : Number crunching : Same WU takes more time in Lunatics (ATI) than only CPU ?

1 · 2 · 3 · Next
Author Message
Zapiao
Volunteer tester
Send message
Joined: 29 Oct 01
Posts: 110
Credit: 122,278
RAC: 0
Portugal
Message 1284605 - Posted: 16 Sep 2012, 21:21:58 UTC

So whats the point of installing lunatics?
____________
By your command !!!

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1284627 - Posted: 16 Sep 2012, 22:18:23 UTC

First, you must mean "similar" work unit; it's impossible to run the "same" workunit on your system twice.

Second, I'm curious as to how you arrived at this assumption, given that there are only 5 pending WU's and one valid WU on your two machines that have ATI cards, and all those were processed with the CPU.

You DO seem to have alot of errors, though.
____________

Zapiao
Volunteer tester
Send message
Joined: 29 Oct 01
Posts: 110
Credit: 122,278
RAC: 0
Portugal
Message 1284630 - Posted: 16 Sep 2012, 22:34:11 UTC - in response to Message 1284627.

Because the remaining time in CPU was about 10h and with lunatics activated i got WUs of 40h to complete. I thought using my ATI i would have WUs completed in minutes.
____________
By your command !!!

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1284639 - Posted: 16 Sep 2012, 23:32:53 UTC - in response to Message 1284630.

Because the remaining time in CPU was about 10h and with lunatics activated i got WUs of 40h to complete. I thought using my ATI i would have WUs completed in minutes.


You might have received AP units for your GPU. Those take longer than regular MB units. But any "time" shown is simply a "guess" by BOINC. It would probably be considerably less.

It looks like you now have some MB ATI units. Let them run (don't abort them) and then compare them to MB CPU units. And don't even look at the completion time estimates.
____________

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 829
Credit: 1,571,941
RAC: 250
Germany
Message 1284727 - Posted: 17 Sep 2012, 7:36:20 UTC - in response to Message 1284639.
Last modified: 17 Sep 2012, 7:39:52 UTC

and then compare them to MB CPU units

... with similar angle range, see the line "WU true angle range is" in the std_err. It's pointless to compare an 0.4 WU with a VHAR (AR>1.0).
____________
.

Profile ignorance is no excuse
Avatar
Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,433,321
RAC: 0
Korea, North
Message 1284799 - Posted: 17 Sep 2012, 14:23:55 UTC

I have to assume that he is pointing to his laptop pc. That has a APU where the CPU and GPU are on the same chip and share resources. the GPU will not be very fast. It may be about as fast or as slow as your CPU WU's
____________
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

End terrorism by building a school

Profile Fred J. Verster
Volunteer tester
Avatar
Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,895,921
RAC: 3,665
Netherlands
Message 1284819 - Posted: 17 Sep 2012, 15:42:43 UTC - in response to Message 1284799.
Last modified: 17 Sep 2012, 15:50:00 UTC

I have to assume that he is pointing to his laptop pc. That has a APU where the CPU and GPU are on the same chip and share resources. the GPU will not be very fast. It may be about as fast or as slow as your CPU WU's


APU C-60.


You can not compaire a core i7-3960X plus a AMD/ATI 7900(Tahiti),
with an C-60 APU. Optimized or not.

Valid WU,
both SETI Enhanced stock app.

On your i3- M380 and AMD/ATI 5000 (Cedar) you're running the optimized app for AMD/ATI.
OpenCL Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Max compute units: 2 Max work group size: 128 - - - - - - - - - - - - - - - - - - - - - - - - - - - -

State: All (340) · In progress (22) · Pending (5) · Valid (2) · Invalid (0) · Error (311)
Application: All (340) · Astropulse v505 (0) · AstroPulse v6 (0) · SETI@home Enhanced (340) · SETI@home v7 (0)

I'm guessing both hosts are LapTops, which all have serious problems with
heat. And thus power consumption, directly influencing your computing power,
R(ecent A(verage) C(redit) or 'throughput'.

Mobile versions are also always limited by their battery-life, that's why Desk_tops will out-perform mobile devices, at this time.
____________

Zapiao
Volunteer tester
Send message
Joined: 29 Oct 01
Posts: 110
Credit: 122,278
RAC: 0
Portugal
Message 1284820 - Posted: 17 Sep 2012, 15:44:05 UTC - in response to Message 1284799.

I was afraid of that. I thought this ATI was good for crunching but i was wrong.
____________
By your command !!!

Profile Fred J. Verster
Volunteer tester
Avatar
Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,895,921
RAC: 3,665
Netherlands
Message 1284826 - Posted: 17 Sep 2012, 15:53:12 UTC - in response to Message 1284820.

I was afraid of that. I thought this ATI was good for crunching but i was wrong.


Well if they don't get too hot they will work, but always be slower compaired
to their desktop families.


____________

Zapiao
Volunteer tester
Send message
Joined: 29 Oct 01
Posts: 110
Credit: 122,278
RAC: 0
Portugal
Message 1284851 - Posted: 17 Sep 2012, 16:55:14 UTC - in response to Message 1284826.

I was afraid of that. I thought this ATI was good for crunching but i was wrong.


Well if they don't get too hot they will work, but always be slower compaired
to their desktop families.


Now my ATI it s about 68ºC.
____________
By your command !!!

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3504
Credit: 47,777,220
RAC: 46,840
Russia
Message 1284971 - Posted: 17 Sep 2012, 21:39:40 UTC - in response to Message 1284820.

I was afraid of that. I thought this ATI was good for crunching but i was wrong.


Did you get some performance per $ invested or some performance per watt estimation or your conclusion based exclusively on BOINC time to complete estimates that have nothing near to reality with first few tasks???

ATi as good crunching device as others are. It can be disputed what better or worser only after comparison measure established. Good in one measure can be bad in some another one, btw.

And about 68C - it's quite low temp for C-60. It feels quite good at 80C too.

I crunch on C-60 netbook almost year already and can say that this APU device definitely faster than my prev Atom based one. And its GPU part faster in SETI (both AP and MB apps) that it's 2 CPU cores part. In performance per invested $ measure it's better. Would it be better in performance per watt measure? Maybe, but can't give answer cause did not checked it.
So, be carefull with conclusions based on quite little experience.

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24552
Credit: 33,894,182
RAC: 24,135
Germany
Message 1284985 - Posted: 17 Sep 2012, 22:36:34 UTC

Thank you Raistmer.
I was going to say that much earlier.

____________

Profile cov_route
Avatar
Send message
Joined: 13 Sep 12
Posts: 296
Credit: 7,432,177
RAC: 12,987
Canada
Message 1285320 - Posted: 19 Sep 2012, 3:24:43 UTC

The first AP unit I got for my AMD 6670 1GB card (5 days ago) was puzzling. It ran at ~0% GPU load and took something like 24 hours to finish. The load graph showed bumps up to maybe 10% every few minutes.

Long story short, I discovered if I set unroll 8, ffa_block 16384, and ffa_block_fetch 8192 the AP units run at 85% and a typical run time for the 20 or so I've completed is 1.5 hours.

I didn't expect I'd have to tweak the stock app, but in my case I definitely had to to basically make it work at all.

So I'm just wondering if Zapiao has to do some tweakage like I did.

Zapiao
Volunteer tester
Send message
Joined: 29 Oct 01
Posts: 110
Credit: 122,278
RAC: 0
Portugal
Message 1285322 - Posted: 19 Sep 2012, 3:33:19 UTC - in response to Message 1285320.
Last modified: 19 Sep 2012, 3:43:58 UTC

Can you explain how to do that tweakage? " unroll 8, ffa_block 16384, and ffa_block_fetch 8192 " where i set this?
____________
By your command !!!

Profile cov_route
Avatar
Send message
Joined: 13 Sep 12
Posts: 296
Credit: 7,432,177
RAC: 12,987
Canada
Message 1285332 - Posted: 19 Sep 2012, 4:25:53 UTC - in response to Message 1285322.

I can tell you what I did, I'm hardly the expert. Raistmer is, he wrote the code in case you didn't know.

I'm running win 7, so in C:\ProgramData\BOINC\projects\setiathome.berkeley.edu there is a file called ap_cmdline_6.04_windows_intelx86__opencl_ati.txt. You can use that file to set parameters for the Astropulse app when it starts up. That's the first thing I know. I'll give an example of how to type the values into that file later on.

The second thing I know is the three parameters I've seen talked about: -unroll, -ffa_block, and -ffa_block_fetch. Of those three, unroll is most important. The first thing I did was adjust that one so I got the GPU load up off the floor.

Question is, what value to use? Again from poking around I got the impression it should be set in relation to how many compute units the GPU has. How do you find *that* out? Well, I looked in the stderr.txt file for a running Astropulse job. Where is that? C:\ProgramData\BOINC\slots\<number> where <number> is some small integer like 0, 1, 2. One of those slots will be for the Astropulse job, you just have to look in each one for files that look like they belong to Astropulse. Problem is, if there is no running job, there is no slot.

I found from stderr.txt I have 6 compute units. If you can't find out how many you have maybe just start increasing unroll from the default value which is 2 until you find a value that works well.

So here is how to set unroll, open that file from paragraph 2 with notepad and type "-unroll 3" without the quotes. Use 3 or whatever value you are trying out. Then save.

Boy this is getting long. Ok so next, I would test it on a running AP job (probably not the professional way of doing it) by suspending the whole SETI@home project from BOINC then resuming it. That restarts all your jobs including Astropulse which will pick up your new parameters. Then look at your GPU load using GPUZ (free download) or Catalyst Control Center to see if it's any better. I ended up using 8, which is 2 more than my number of compute units.

The other two parameters affect how much video memory your job uses. I'm not sure but if you set them too high you might crash the job. The defaults are -ffa_block 1024 -ffa_block_fetch 512. What you should do is make sure ffa_block is always twice ffa_block_fetch and then increase them by factors of two so first I'd try -ffa_block 2048 -ffa_block_fetch 1024 and so forth.

So your parameters file would look like, for example

-unroll 4 -ffa_block 2048 -ffa_block_fetch 1024

GPUZ will tell you how much memory your job is using, if you know what the limit is you can stop increasing the ffa's before you exceed it. Otherwise, maybe don't mess with them because they are secondary in importance after -unroll. You can just have a file that sets unroll and the other parameters will take their defaults.

As I said, that's what I did based on reading the message boards and it seems to work for me, but I'm not the expert.

Zapiao
Volunteer tester
Send message
Joined: 29 Oct 01
Posts: 110
Credit: 122,278
RAC: 0
Portugal
Message 1285339 - Posted: 19 Sep 2012, 5:01:59 UTC - in response to Message 1285332.

I dont crunch astropulse, so it will work for seti?
____________
By your command !!!

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3504
Credit: 47,777,220
RAC: 46,840
Russia
Message 1285553 - Posted: 19 Sep 2012, 17:24:21 UTC - in response to Message 1285339.

I dont crunch astropulse, so it will work for seti?


1. No, it would not, MB app uses different set of params to tune.
2. C-60 is low end GPU so it should work quite good with stock defaults (for AP). Or you trying to speedup your secondary ATi cruncher, HD5xxx based one ?
3. Cause you said you run not AP but "SETI" (MB, i.e. MultiBeam, perhaps) you running with app_info, not stock.

I see no completed GPU tasks for C-60 host but few for HD5xxx host.
From stderr:

Number of period iterations for PulseFind setted to:20
Number of app instances per device setted to:1

you could try to set number of instances to 2 for app and decrease number of iterations (if not experience GUI lags).

All this can be done via <cmdline> field in app_info.xml file for MultiBeam app.
<cmdline>-instances_per_device 2 -period_iterations_num 10</cmdline>
Also, <count>0.5</count> should be used for r390 of MultiBeam app you running if you want to run 2 instances at once.


Zapiao
Volunteer tester
Send message
Joined: 29 Oct 01
Posts: 110
Credit: 122,278
RAC: 0
Portugal
Message 1285617 - Posted: 19 Sep 2012, 19:57:34 UTC - in response to Message 1285553.
Last modified: 19 Sep 2012, 19:58:14 UTC

I dont crunch astropulse, so it will work for seti?


1. No, it would not, MB app uses different set of params to tune.
2. C-60 is low end GPU so it should work quite good with stock defaults (for AP). Or you trying to speedup your secondary ATi cruncher, HD5xxx based one ?
3. Cause you said you run not AP but "SETI" (MB, i.e. MultiBeam, perhaps) you running with app_info, not stock.

I see no completed GPU tasks for C-60 host but few for HD5xxx host.
From stderr:

Number of period iterations for PulseFind setted to:20
Number of app instances per device setted to:1

you could try to set number of instances to 2 for app and decrease number of iterations (if not experience GUI lags).

All this can be done via <cmdline> field in app_info.xml file for MultiBeam app.
<cmdline>-instances_per_device 2 -period_iterations_num 10</cmdline>
Also, <count>0.5</count> should be used for r390 of MultiBeam app you running if you want to run 2 instances at once.



What happens increasing the period iterations?
____________
By your command !!!

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3504
Credit: 47,777,220
RAC: 46,840
Russia
Message 1285621 - Posted: 19 Sep 2012, 20:05:25 UTC - in response to Message 1285617.
Last modified: 19 Sep 2012, 20:05:48 UTC

What happens increasing the period iterations?

One big kernel call divided on few smaller ones. This help to fight with GUI lags but can decrease performance a little, not in big degree actually.

Zapiao
Volunteer tester
Send message
Joined: 29 Oct 01
Posts: 110
Credit: 122,278
RAC: 0
Portugal
Message 1285737 - Posted: 20 Sep 2012, 5:21:35 UTC
Last modified: 20 Sep 2012, 5:48:23 UTC

How can i configure to use only GPU in the cc_config file?
____________
By your command !!!

1 · 2 · 3 · Next

Message boards : Number crunching : Same WU takes more time in Lunatics (ATI) than only CPU ?

Copyright © 2014 University of California