OpenCL NV MultiBeam v8 SoG edition for Windows

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 18 · Next

AuthorMessage
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1764988 - Posted: 14 Feb 2016, 21:22:28 UTC

Sounds good to me.

However, it might be pertinent to think about just how many different apps you have for each of the different platforms, and (maybe) settle on fewer, given the lack of resources. You guys have to sleep, don't you?

It's better (IMO) to support more platforms than more versions of each app for each platform, in the interest of more folks being able to do SETI (which is what the project is really for, IIRC).
ID: 1764988 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1764995 - Posted: 14 Feb 2016, 21:41:20 UTC - in response to Message 1764994.  

Or in the next one?

Maybe in next one if it proves its usability.

Usability, well yes.....

After having done about 4000 WU's with the SoG app here on main, without any invalids or errors whatsoever, and showing that on my system at least, it's considerably faster than CUDA, I would say that it certainly have proven its usability (at least on my system.)

But until it is released here on main as stock, we will never know, how it reacts in the wild, so to speak.


SoG is host dependent.
Its slower on most AMD GPU`s but faster on some Nvidias.
Only time will tell.
Would be interesting how it does on a Titan.


With each crime and every kindness we birth our future.
ID: 1764995 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1764996 - Posted: 14 Feb 2016, 21:44:35 UTC - in response to Message 1764994.  

Or in the next one?

Maybe in next one if it proves its usability.

Usability, well yes.....

After having done about 4000 WU's with the SoG app here on main, without any invalids or errors whatsoever, and showing that on my system at least, it's considerably faster than CUDA, I would say that it certainly have proven its usability (at least on my system.)

Thanks for providing testcase.
Little more extended testing goes on beta currently with not bad results APR-wise.
I'll provide new build to beta soon with special "lightweight" path for low-end devices to decrease lags.
But my offline (quite limited for now cause it's friend's evice I have little access to) tests with GT720 entry-level GPU shows that at least in default config new build will be definitely slower on such devices than CUDA42 (strange, but that GPU prefers 42 over 50). But I hope BOINC's "natural selection"" mechanism will be able to provide overall speed improvement leaving best-suited build for particular host in long run. On high-end GPUs tests more positive.
ID: 1764996 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1764999 - Posted: 14 Feb 2016, 22:06:29 UTC - in response to Message 1764995.  

DId you mean Titan X, Titan Black or just a plain Titan?

I know someone with a Black, could see if he wants to give it a short.

I can move my Titan X machine over but the 980TI are pretty close to the Titan X in performance but if you want I could give it a try.

But I think the issue is going to be around how big a CPU the user has.

My 980Tis were limited by a 8 core AMD, so I couldn't get past 3 work units per card on a mulitGPU system.

It would be sometime tonight before I can clear my cache and make the move.
ID: 1764999 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1765013 - Posted: 14 Feb 2016, 22:38:24 UTC - in response to Message 1764999.  
Last modified: 14 Feb 2016, 22:38:36 UTC

DId you mean Titan X, Titan Black or just a plain Titan?

I know someone with a Black, could see if he wants to give it a short.

I can move my Titan X machine over but the 980TI are pretty close to the Titan X in performance but if you want I could give it a try.

But I think the issue is going to be around how big a CPU the user has.

My 980Tis were limited by a 8 core AMD, so I couldn't get past 3 work units per card on a mulitGPU system.

It would be sometime tonight before I can clear my cache and make the move.


All types of Titan.

Would be great if you could give it a try.


With each crime and every kindness we birth our future.
ID: 1765013 · Report as offensive
OTS
Volunteer tester

Send message
Joined: 6 Jan 08
Posts: 369
Credit: 20,533,537
RAC: 0
United States
Message 1765023 - Posted: 14 Feb 2016, 22:58:49 UTC

Any idea when there will be a Linux/Nvidia SoG version on beta to test?
ID: 1765023 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1765027 - Posted: 14 Feb 2016, 23:04:17 UTC

I get equal or better performance with what I consider to be mid-range to low end cards. I've got an aging 570 and a 750ti in this machine. Just running one wu at a time, don't really notice any screen lag. Getting 95-97% gpu utilization.

CPU time is very low on larger AR's,

http://setiathome.berkeley.edu/show_host_detail.php?hostid=7251681

Chris
ID: 1765027 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1765033 - Posted: 14 Feb 2016, 23:46:58 UTC - in response to Message 1764994.  

After having done about 4000 WU's with the SoG app here on main, without any invalids or errors whatsoever, and showing that on my system at least, it's considerably faster than CUDA, I would say that it certainly have proven its usability (at least on my system.)

But until it is released here on main as stock, we will never know, how it reacts in the wild, so to speak.


Not on mine!

One of my machines is a 4790K (with HT OFF) with 2 x GTX980, and currently is doing 43K RAC stock apps vs your 23K RAC with 4790K with HT ON and 1 x GTX 980 running SoG. Seems to me that SoG has essentially NO advantage over the stock v8, then, to a first approximation. (I am running 3 WUs/GPU, but it has almost the same RAC as when I was running 2/GPU, judging by the slope of the STATS tab line in BOINC before and after that change).
ID: 1765033 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1765093 - Posted: 15 Feb 2016, 4:38:41 UTC - in response to Message 1765033.  

Definitely interesting to see the the pros & cons of the different approaches. Been wrestling with similar things in Cuda development, and come to the conclusion that one size-fits all isn't going to work without considerable work on architecture and options, with tools [development and user] to support that. Maintaining 5 builds on Windows only was manageable, but incorporation of performance code supplied by Petri, on top of other planned improvements and other platforms, is mandating a move to a plugin architecture, and reduced build count
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1765093 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1765102 - Posted: 15 Feb 2016, 5:23:19 UTC - in response to Message 1765093.  

Speaking of pros and cons..

Couple of disclaimers. I do use commandlines (ignore -instances_per_device 2 I use a app_config to override this)

but don't have -use_sleep

So I have 4 SoG running on each of the 4 Titans.

My initial concern about CPU is looking to be right.

Total CPU (16 hyperthreaded cores) starts at 30-40% and rapidly rises to about 75% average with peaks of 85% of all cores (this for 16 work units) until work is ready to report.

Kernal activity is almost all of CPU workage. (SIV64X looks like a red panic sign across all 16 cores except 1)

Without knowing how this works, it looks like the kernal is building and stays at a high level of use until the Work unit is done but doesn't go all the way back down when a new one starts. Does that sound right? As new work is started it drops lower but never to the initial value. Usually stays around 50% of all cores and then builds again up toward 85% as the work progresses.

Why is that important, because it doesn't leave any room for CPU work since the entire CPU is being used to support the GPUs.

On the pro side, it does seem to process lower angles faster. It's hard yet to compare since I seem to be getting smaller angle now than I had been getting for most of the weekend.

I'll keep trying to get comparable work units.
ID: 1765102 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1765121 - Posted: 15 Feb 2016, 7:14:33 UTC - in response to Message 1765033.  


Not on mine!

One of my machines is a 4790K (with HT OFF) with 2 x GTX980, and currently is doing 43K RAC stock apps vs your 23K RAC with 4790K with HT ON and 1 x GTX 980 running SoG. Seems to me that SoG has essentially NO advantage over the stock v8, then, to a first approximation. (I am running 3 WUs/GPU, but it has almost the same RAC as when I was running 2/GPU, judging by the slope of the STATS tab line in BOINC before and after that change).

Do math. 23*2=46>43.
ID: 1765121 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1765127 - Posted: 15 Feb 2016, 7:20:03 UTC - in response to Message 1765102.  


Why is that important, because it doesn't leave any room for CPU work since the entire CPU is being used to support the GPUs.

Try to rise CPU apps priority above "below normal" - how picture will change?
ID: 1765127 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1765129 - Posted: 15 Feb 2016, 7:22:55 UTC - in response to Message 1765127.  
Last modified: 15 Feb 2016, 7:25:12 UTC

Here is my commandline, ignore instance per device

-sbs 384 -instances_per_device 2 -period_iterations_num 40 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 16 -oclfft_tune_cw 16 -hp -no_cpu_lock


Edit..

I do not currently have any CPU apps running due to concern of usage by GPU

Edit 2..

I may try some tomorrow if you like but it's really late here and I'm headed to bed, going to leave it like this until I check it later today
ID: 1765129 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1765130 - Posted: 15 Feb 2016, 7:29:53 UTC - in response to Message 1765129.  
Last modified: 15 Feb 2016, 7:30:11 UTC

I was going step wise to see how it handled only GPU work first and increase the instances per card to what I normally run before adding any CPU work.

Tomorrow if you like, I can get the machine to copy what I normally do with Cuda and CPU work at the same time but I want to be able to watch it progress when I do that just in case it locks up.
ID: 1765130 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1765139 - Posted: 15 Feb 2016, 8:30:24 UTC - in response to Message 1765130.  

The idea behind this to estimate how CPU load really affects app performance.
With SoG build I expect (at least on higher ARs) much less influence of CPU apps.

My AMD APU experience says that all that CPU usage is just busy-wait loop most of time.
On APU with idle CPU app takes 100% CPU (single core). But on the same but loaded PC CPU time drops considerably (elapsed increased of course but in much less degree).
It seems AMD's busy-loop executing on low enough priority to allow BOINC's CPU app take CPU from it.
From other side, nVidia busy-loop seems has bigger priority than BOINC's CPU app.
So it consumes CPU even on loaded PC.
That's why it would be interesting to see what will be if CPU load will have increased priority.
It can be done with ProcessLasso for example or (maybe) by BOINC's own means.
ID: 1765139 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1765143 - Posted: 15 Feb 2016, 9:21:20 UTC - in response to Message 1765121.  
Last modified: 15 Feb 2016, 9:33:09 UTC


Not on mine!

One of my machines is a 4790K (with HT OFF) with 2 x GTX980, and currently is doing 43K RAC stock apps vs your 23K RAC with 4790K with HT ON and 1 x GTX 980 running SoG. Seems to me that SoG has essentially NO advantage over the stock v8, then, to a first approximation. (I am running 3 WUs/GPU, but it has almost the same RAC as when I was running 2/GPU, judging by the slope of the STATS tab line in BOINC before and after that change).

Do math. 23*2=46>43.


I did. He is HT, I am not, so he has more cores doing v8 than I do, and before GPU version, I was getting 1-2k RAC per core. So, roughly, that takes a few K away from his 23, so...my guesstimate of approximate equality.

And I do get some APs, so that blurs it a bit more, I grant.
ID: 1765143 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1765160 - Posted: 15 Feb 2016, 12:03:56 UTC - in response to Message 1765093.  
Last modified: 15 Feb 2016, 12:04:42 UTC

Definitely interesting to see the the pros & cons of the different approaches. Been wrestling with similar things in Cuda development, and come to the conclusion that one size-fits all isn't going to work without considerable work on architecture and options, with tools [development and user] to support that. Maintaining 5 builds on Windows only was manageable, but incorporation of performance code supplied by Petri, on top of other planned improvements and other platforms, is mandating a move to a plugin architecture, and reduced build count

It would be nice to see the Mac nVidia situation solved sometime soon. Here's a typical example of what happens every few minutes, http://setiathome.berkeley.edu/workunit.php?wuid=2061391551
The Current OpenCL App is anywhere from 2 to 4 times Slower than the CUDA App and Also gives the Wrong results. This is happening every few minutes. There has been a solution for Weeks.
OpenCL GeForce GTX 780M
Run time: 2 hours 25 min 15 sec
CPU time: 4 min 20 sec
Spike count: 29
Autocorr count: 0
Pulse count: 0
Triplet count: 0
Gaussian count: 1

CUDA GeForce GT 650M
Run time: 35 min 1 sec
CPU time: 6 min 13 sec
Spike count: 25
Autocorr count: 0
Pulse count: 0
Triplet count: 0
Gaussian count: 1

While people concern themselves over a few seconds of run time, some are taking up to 4 times as long as they should and producing incorrect results in the process.
ID: 1765160 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1765191 - Posted: 15 Feb 2016, 15:53:38 UTC - in response to Message 1765139.  

The idea behind this to estimate how CPU load really affects app performance.
With SoG build I expect (at least on higher ARs) much less influence of CPU apps.

My AMD APU experience says that all that CPU usage is just busy-wait loop most of time.
On APU with idle CPU app takes 100% CPU (single core). But on the same but loaded PC CPU time drops considerably (elapsed increased of course but in much less degree).
It seems AMD's busy-loop executing on low enough priority to allow BOINC's CPU app take CPU from it.
From other side, nVidia busy-loop seems has bigger priority than BOINC's CPU app.
So it consumes CPU even on loaded PC.
That's why it would be interesting to see what will be if CPU load will have increased priority.
It can be done with ProcessLasso for example or (maybe) by BOINC's own means.



I run Process Lasso on all my machines.
ID: 1765191 · Report as offensive
OTS
Volunteer tester

Send message
Joined: 6 Jan 08
Posts: 369
Credit: 20,533,537
RAC: 0
United States
Message 1765196 - Posted: 15 Feb 2016, 16:21:27 UTC - in response to Message 1765023.  

Any idea when there will be a Linux/Nvidia SoG version on beta to test?


?????
ID: 1765196 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1765203 - Posted: 15 Feb 2016, 17:16:07 UTC - in response to Message 1765197.  

Don't know how accurate it is (in my case it's a pretty accurate indicator over a large sample) but free-dc.org's stats for your computer show you have just about maxed out the RAC assuming you haven't changed much in your configuration. Interestingly it shows you were 5-10k higher at the end of Janurary with whatever mix of apps you were using then. Could be an aberration in the data though...

Chris

RAC of 24,031.32 now, running 4 SoG's at a time on the GPU, and only 2 MB's on the CPU. No AP's whatsoever.
Still climbing :-)
https://setiathome.berkeley.edu/results.php?hostid=7585453&offset=0&show_names=0&state=4&appid=29

ID: 1765203 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 18 · Next

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.