MultiBeam application for ATi GPUs released

Message boards : Number crunching : MultiBeam application for ATi GPUs released
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next

AuthorMessage
Saaby900T

Send message
Joined: 24 Dec 10
Posts: 76
Credit: 4,971,171
RAC: 0
United States
Message 1073032 - Posted: 1 Feb 2011, 8:03:09 UTC - in response to Message 1073022.  

I'am talking in terms of RAC.
ID: 1073032 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1073033 - Posted: 1 Feb 2011, 8:05:33 UTC

Given that the OpenCL application has only just been released, I don't think there's enough data to make an accurate assessment on that.
Soli Deo Gloria
ID: 1073033 · Report as offensive
Saaby900T

Send message
Joined: 24 Dec 10
Posts: 76
Credit: 4,971,171
RAC: 0
United States
Message 1073042 - Posted: 1 Feb 2011, 9:10:47 UTC - in response to Message 1073033.  

given that it was in beta for a few months i figure they might have a rough idea of how they compare.
ID: 1073042 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34257
Credit: 79,922,639
RAC: 80
Germany
Message 1073062 - Posted: 1 Feb 2011, 10:48:21 UTC

I´m using it a couple of weeks now.
Its ~10,000.

But it can increase easily if i run only Astropulses i can go up to 20,000 with no problem.

The problem is to catch enough.

Lets say i´m crunching 24 APs a day with an average of 800 cr is 19,200.



With each crime and every kindness we birth our future.
ID: 1073062 · Report as offensive
Saaby900T

Send message
Joined: 24 Dec 10
Posts: 76
Credit: 4,971,171
RAC: 0
United States
Message 1073069 - Posted: 1 Feb 2011, 11:22:56 UTC - in response to Message 1073062.  

So How would a 6870 Scale?(best Guess) suppose 12000 RAC (with mostly MB and some AP)
So How would a 6970 scale? (guess) suppose 12000 RAC-15000 RAC (with mostly MB and some AP)
ID: 1073069 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1073076 - Posted: 1 Feb 2011, 12:07:55 UTC - in response to Message 1073069.  

my 5850 and amd 940 get 12500 with a very large pending list.

I would assume you'd be able to run 4-6 mb on a 6950 and 6970 thats going to at least double output and since its clocked a whole lot higher it will complete the work faster. I would hazard a guess at around 25k or more including the cpu work.

I've been comparing my GPU on other projects and have found that its not quite as fast as a NV 275. However on Seti and running multiple WU's at a time it seems more like a fermi 460 with its ability to run multiple WU's
the one advantage ATI cards have is we can run AP WU's on our GPU's. I finish a non blanked WU in 90 minutes. That is I can finish 2 AP unblanked in 90 minutes because I can run 2 at a time.

I suspect the NV 5XX series and ATI 6XXX series will be quite comparable on seti.

As stated before, the OPENCl for the ati cards is just getting started.


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1073076 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1073097 - Posted: 1 Feb 2011, 13:54:26 UTC - in response to Message 1072671.  
Last modified: 1 Feb 2011, 13:59:27 UTC

In the app_info.xml file are entries:

<cmdline>-period_iterations_num X -instances_per_device X</cmdline>

This is for both (MB + AP) ATI apps?

Which values I should insert if I have a HD6850?

I know what means instances_per_device, but what means period_iterations_num?


Thanks!

No, that cmd line is just for MB.
From Raistmer's Lunatics post (which he linked in the first post):

-period_iterations_num <N> splits single longest PulseFind kernes call on N calls
-period_iterations_num 1 (default value)
If you see lags in GUI or even driver restarts - add this parameter with value >1 (integer numbers).

i.e. if you experience problems increase that number. I assume you loose some speed though.


Thanks.

But.. I still don't know what it means. If I have a driver restart or some probs, higher number. O.K. ..

But what means -period_iterations_num?

I'm not a coder, I'm a 'user'.

If I translate it with my little helper the Google-translator, it means 'time repeats number' or something. This don't make me smarter.
If the app have an error, the app restart so often the set number, if not successfully error/abort?
Or the number is the secs between restarts?
ID: 1073097 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1073099 - Posted: 1 Feb 2011, 13:57:21 UTC - in response to Message 1073032.  
Last modified: 1 Feb 2011, 14:08:20 UTC

I'am talking in terms of RAC.


One of my five GTX260 OC have a S@h-RAC of ~ 18,000 , only with MB-CUDA.
ID: 1073099 · Report as offensive
Profile Miep
Volunteer moderator
Avatar

Send message
Joined: 23 Jul 99
Posts: 2412
Credit: 351,996
RAC: 0
Message 1073106 - Posted: 1 Feb 2011, 14:20:41 UTC - in response to Message 1073097.  

[-period_iterations_num <N> splits single longest PulseFind kernes call on N calls


Thanks.

But.. I still don't know what it means. If I have a driver restart or some probs, higher number. O.K. ..

But what means -period_iterations_num?

I'm not a coder, I'm a 'user'.

If I translate it with my little helper the Google-translator, it means 'time repeats number' or something. This don't make me smarter.
If the app have an error, the app restart so often the set number, if not successfully error/abort?
Or the number is the secs between restarts?


I suppose only Raistmer knows what it really does in terms of coding.
Iterations commonly refers to how often you perform a calculation.
'splits single longest PulseFind kernes call on N calls'
suggests to me that it subdivides the kernel calls into a manageble number.

say your longest call is 1024 - with 1 you make one large 1024 call - with two you do two 512 calls etc. smaller calls are 'easier' to calculate but slow things down, because you are doing more.

as to why this is advisable with lags/restarts - smaller calls = less power required, more left for system.

Ich kann das auch nochmal auf Deutsch schreiben, wenn es zum Verstaendnis beitraegt?
Carola
-------
I'm multilingual - I can misunderstand people in several languages!
ID: 1073106 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1073160 - Posted: 1 Feb 2011, 15:58:29 UTC - in response to Message 1073106.  

Thanks!

Sure, I have not much knowledge about how the apps work, but now it's little bit more clear.


Nein, Du brauchst es nicht auch auf Deutsch erklären.. ;-)

ID: 1073160 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1073212 - Posted: 1 Feb 2011, 17:36:01 UTC - in response to Message 1072811.  

What OS do you use?
I've seen such report before, but the reason is unknown. It's completely "CPU" based part (namely, app should wait for OS mutant object and report failure if this waiting failed). And specific config on you host?

EIDT: what rights have user under that account science app running?
Too low privilegies level will result in unability to work with OS objects like mutexes. Check whar user account BOINC uses to run app.


Hi Raistmer,

I am using Win7x64.
Boinc tells me that it is running under the admin account.
I have lots of time tonight and will try to figure it out.

Thank you

EDIT: this is going on now:

mb_6.10...exe running eating up one full core, no activity on gpu though

Looks like you have dual-GPU setup. Did you tried if app working with only 1 GPU installed?
ID: 1073212 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1073213 - Posted: 1 Feb 2011, 17:37:34 UTC - in response to Message 1072947.  


reinstalled boinc client, driver update. no fancy xml config, only tried to disable the onboard graphics with <ati_ignore.... which worked.


What about physical GPU removal ?
ID: 1073213 · Report as offensive
Rabbit&Carrot

Send message
Joined: 3 Oct 03
Posts: 25
Credit: 80,178,117
RAC: 0
Korea, South
Message 1073236 - Posted: 1 Feb 2011, 23:32:31 UTC

First of all, I want to thank Raistmer and SubSpace (and many other volunteer beta testers) who allow us to run MB WU's on ATI HD5xxx graphic cards. I can now crunch using a HD 5970 with MB_6.10_win_SSE3_ATI_HD5_r177.exe installed!

It's been only two days, but I have a few questions.

I've just found that the predominant number of my WU's finish computing in 30-50 seconds with -9 overflow. They are still pending and waiting for the wingmen, but I assume each of them only gives very small credit (less than 0.5). I don't understand why this computer receives so many WU's with -9 overflow or did I set up something wrong?

According to Raistmer's explanation on app_info, N from "-instances_per_device <N>" is the number of application instances per single device. I change it from the default value one to two so that I can run four WU's simultaneously with my HD 5970 which has two GPU's, but only two WU's run at the same time. Am I correctly understanding what that number means?
ID: 1073236 · Report as offensive
baron_iv
Volunteer tester
Avatar

Send message
Joined: 4 Nov 02
Posts: 109
Credit: 104,905,241
RAC: 0
United States
Message 1073245 - Posted: 2 Feb 2011, 0:05:26 UTC

I am currently looking into ways to benchmark both ATI and NVidia cards for distributed computing tasks. I am in contact with a couple of people and we're trying to figure out the best way to go about it. Ideally, I'd like to do something like Guru3d.com does with their reviews, but instead of using 3D games/benchmarks, I'd run pre-configured distributed computing applications so that we would be able to have some way to compare all of the GPUs... ultimately, to help us make better decisions when buying a GPU for distributed computing. It would be a pretty big job, but I already have the CPU/motherboard combos so I can test on different platforms (amd vs intel) although typically there aren't a lot of differences as far as GPU computing is concerned, but it's still perhaps something people may want to know.

I don't know if there are any similarities in the way that the milkyway@home ATI client works to seti's client, but the performance of milkyway on ATI is very very impressive. It leaves NVidia cards in its proverbial dust. I can run three milkyway instances on the single GPU and they finish in 4 minutes...so a bit over 1 minute per task. Obviously, the work is VERY different, the programming is likely very different and I know that milkyway is double precision whereas seti is single precision, which is why AMD/ATI cards are so much faster...NVidia seems to have gimped their desktop GPUs a bit when it comes to double-precision so that they could sell the Tesla series at greatly inflated prices. I Don't know if Raistmer has time to look at the milkyway application to see if there are any performance tweaks that he could use to increase performance on seti or not. Overall, he has really done an amazing job with the ATI/AMD apps. He was kind enough to let me beta test and each new build was faster than the previous, which was great. Perhaps there is more performance to be squeezed out? I don't know, I know zilch about programming for GPUs.

I am still getting an occasional driver restart on my 6950, but overall it's very fast and performing up-to-par. At this point, I would still recommend for someone to buy a 5870 (or any of the 58xx series) over a newer 68xx or 69xx. Given the price on the new 1gb 6950 model, I'd be hard pressed to buy anything over that, I believe it's an even better value than the 460 or 560. As far as AP goes, it's rock solid on the 6950, I never have any driver restarts on that, unless I am also running MB app (which causes the restart). I think these are mostly growing pains though and Raistmer said that the MB app may not work properly on the 6800/6900 series, so I was forewarned. ;)


-baron_iv
Proud member of:
GPU Users Group
ID: 1073245 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1073262 - Posted: 2 Feb 2011, 1:15:16 UTC - in response to Message 1073236.  

I change it from the default value one to two so that I can run four WU's simultaneously with my HD 5970 which has two GPU's, but only two WU's run at the same time. Am I correctly understanding what that number means?

Your device has 2 GPU cores that visible as 2 GPUs.
When you set corresponding param to 2 you should be able to run 2 instances per each GPU that should result in 4 instances of MB running simultaneously.
also, check that you don't forget to set <count> to 0.5 instead of 1 (otherwise BOINC will run only 1 task per GPU).
But keep in mind that HD5970 GPU is actually not supported by AMD as OpenCL device. They support only single core for this GPU (it's a shame, I know).
In beta test some solution to this problem was found. Here is citation from AMD's forum:

Hi,

This is an update using 2xHD5970:

- Adding "GPU_USE_SYNC_OBJECTS" does some magic and 2 instances run at about 70/80% each one (numbers vary); but if I launch 3 instances, the 3rd runs at 40%; when I launched 4 instances, the system crashed violently (maybe the PSU was not enought for the 4 GPUs). This is independant of what combination of GPUs it uses.

- I see no time difference when I run an instance alone and with other instance in parallel, that is coherent with the GPU usage informed by aticonfig.

so, I believe that in SDK 2.3 we will not see a drop of performance if we use GPU_USE_SYNC_OBJECTS and we limit the multiGPU to 1xHD5970 using both GPUs (and the other good practices that we discovered so far...)

best regards,

Alfonso


It was tested by Morten with ATi MB and looks like it helps to get rid of invalid computations on secondary core.

You need to add environment variable with name GPU_USE_SYNC_OBJECTS
ID: 1073262 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1073263 - Posted: 2 Feb 2011, 1:21:48 UTC - in response to Message 1073245.  


I am still getting an occasional driver restart on my 6950, but overall it's very fast and performing up-to-par. At this point, I would still recommend for someone to buy a 5870 (or any of the 58xx series) over a newer 68xx or 69xx. Given the price on the new 1gb 6950 model, I'd be hard pressed to buy anything over that, I believe it's an even better value than the 460 or 560. As far as AP goes, it's rock solid on the 6950, I never have any driver restarts on that, unless I am also running MB app (which causes the restart). I think these are mostly growing pains though and Raistmer said that the MB app may not work properly on the 6800/6900 series, so I was forewarned. ;)


So far it's known that "usual" non-HD5 version works on HD6xxx w/o problems.
Unfortunately, it not using shared memory features of new GPUs and runs slower. Will benefit of stable working outweight slower computation - don't know.
I still hope that current instability of HD6xxx GPUs will be fixed with new driver release.
AP runs stable because it doesn't use shared memory too (it ~equal non-HD5 version of ATi MB in design).
ID: 1073263 · Report as offensive
Saaby900T

Send message
Joined: 24 Dec 10
Posts: 76
Credit: 4,971,171
RAC: 0
United States
Message 1073330 - Posted: 2 Feb 2011, 5:33:46 UTC - in response to Message 1073076.  

my 5850 and amd 940 get 12500 with a very large pending list.

I would assume you'd be able to run 4-6 mb on a 6950 and 6970 thats going to at least double output and since its clocked a whole lot higher it will complete the work faster. I would hazard a guess at around 25k or more including the cpu work.

I've been comparing my GPU on other projects and have found that its not quite as fast as a NV 275. However on Seti and running multiple WU's at a time it seems more like a fermi 460 with its ability to run multiple WU's
the one advantage ATI cards have is we can run AP WU's on our GPU's. I finish a non blanked WU in 90 minutes. That is I can finish 2 AP unblanked in 90 minutes because I can run 2 at a time.

I suspect the NV 5XX series and ATI 6XXX series will be quite comparable on seti.

As stated before, the OPENCl for the ati cards is just getting started.


How much different/Better would the 5870 be to yours ??
ID: 1073330 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1073431 - Posted: 2 Feb 2011, 13:33:08 UTC - in response to Message 1073330.  

I think you should get at leas 15-20% more out of the 5870


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1073431 · Report as offensive
garfield
Volunteer tester

Send message
Joined: 4 Jan 02
Posts: 45
Credit: 7,409,265
RAC: 65
Austria
Message 1073465 - Posted: 2 Feb 2011, 16:06:46 UTC

First of all: Many thanks to everyone involved in developing this app.
I'm running this on my second system, which is a mixed ATI/nVidia system.
E8400, Win7x64, 6GB, 6.10.58, HD6950, GTX460, graphic-drivers updated three days ago.
I had two times a system crash, win rebooted and displayed a message 'System rebooted after ATI Driver Crashed', the current running SETI ATI-app did not validate, but ~50 others worked fine.
It's a well known fact, that ATI-drivers are faulty and OpenCL-implementation has numerous errors and needs much optimization work, nevertheless it's a big step forward.
Just for orientation: wu's are a little bit slower on ATI than on nVidia (~26min/~22min).
ID: 1073465 · Report as offensive
Aker

Send message
Joined: 2 Nov 01
Posts: 24
Credit: 2,030,727
RAC: 0
United States
Message 1073529 - Posted: 2 Feb 2011, 20:03:42 UTC

I've got an old 4670, using 11.1 drivers on 32bit XP. I'm able to run the ati AP opencl app just fine with the default settings(just completed wu)

I tried using the app_info.xml snippet posted with default settings. Both the non-HD5 and HD5 apps show up as running in boinc but, cpu usage stays at max, gpu load at 0% and it never updates the progress bar. It just sits there running with no errors.

I tried increasing period_iterations_num and seeing -hp with no change in result.
Before reinstalling drivers I did have a few error out but, I'm not sure if that was the same cause. (example)

Not sure if there is anything to be done but, I wanted to thank you for the work you've put in developing these apps. :)
ID: 1073529 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next

Message boards : Number crunching : MultiBeam application for ATi GPUs released


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.