OpenCL NV MultiBeam v8 SoG edition for Windows

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 21 · Next

AuthorMessage
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8927
Credit: 49,849,242
RAC: 65
Sweden
Message 1762678 - Posted: 5 Feb 2016, 23:14:22 UTC

SoG up and running with the <plan_class>opencl_nvidia_SoG</plan_class>

Let's burn a GTX980 :-)
ID: 1762678 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14474
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1762679 - Posted: 5 Feb 2016, 23:17:33 UTC - in response to Message 1762671.  

Tut are you getting work for that OpenCL SOG?

Maybe you need to change plan claas to opencl_nvidia_sah until Eric has released plan class for SoG version.

Plan Class names used under Anonymous Platform don't have to match the plan classes used for stock distributions - I've made up plan classes including my initials and the word 'test' before now, and they worked just fine.

But they should include the keyword for the type of scheduling anticipated - OpenCL in this case (for BOINC versions >= 7.0.40). All mine did, so I can't speak for what happens if you leave it out. It'll be in a (debug) log if you fall foul of something, and need to look it up.

This is the other way round, but error messages might look something like this:

11/15/2012 8:53:52 AM | | App version needs opencl but GPU doesn't support it
ID: 1762679 · Report as offensive
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8927
Credit: 49,849,242
RAC: 65
Sweden
Message 1762681 - Posted: 5 Feb 2016, 23:27:49 UTC
Last modified: 5 Feb 2016, 23:34:12 UTC

Very high CPU usage for WU's other than High AR's. Almost a full core, for AR's other than VHAR's where the CPU usage is 8-10% only.

Since the WU's I tested this with on BETA, was all above 2.something in AR, the low CPU usage was what surprised me most. However here on main, with mostly lower AR's the high CPU usage really shows.

Thanks Dog, that we do not get VLAR's for CPU here, or even this GTX980 would come to a screeching halt :-)

EDIT: But SoG is fast, scaringly fast. So I can live with high CPU usage, just dropping a CPU core from CPU crunching.

Geeze....
ID: 1762681 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1762683 - Posted: 5 Feb 2016, 23:35:37 UTC - in response to Message 1762681.  

Very high CPU usage for WU's other than High AR's. Almost a full core, for AR's other than VHAR's where the CPU usage is 8-10% only.

Since the WU's I tested this with on BETA, was all above 2.something in AR, the low CPU usage was what surprised me most. However here on main, with mostly lower AR's the high CPU usage really shows.

Thanks Dog, that we do not get VLAR's for CPU here, or even this GTX980 would come to a screeching halt :-)

EDIT: But SoG is fast, scaringly fast. Geeze....

ATi OpenCL build handles VLAR quite easely. Worth to try with OpenCL NV also.
That's the disadvantage of beta - subset of ARs, subset of devices...

Pulses and Triplets still processed by old way - and synhing uses lot of CPU as before (again, NV-specific).
ID: 1762683 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 33270
Credit: 79,922,639
RAC: 80
Germany
Message 1762684 - Posted: 5 Feb 2016, 23:37:28 UTC - in response to Message 1762681.  

Very high CPU usage for WU's other than High AR's. Almost a full core, for AR's other than VHAR's where the CPU usage is 8-10% only.

Since the WU's I tested this with on BETA, was all above 2.something in AR, the low CPU usage was what surprised me most. However here on main, with mostly lower AR's the high CPU usage really shows.

Thanks Dog, that we do not get VLAR's for CPU here, or even this GTX980 would come to a screeching halt :-)

EDIT: But SoG is fast, scaringly fast. So I can live with high CPU usage, just dropping a CPU core from CPU crunching.

Geeze....


You can try _use_sleep or -use_sleep_ex 5 to reduce CPU usage.
But i suggst to use this only running multiple instances.
With each crime and every kindness we birth our future.
ID: 1762684 · Report as offensive
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8927
Credit: 49,849,242
RAC: 65
Sweden
Message 1762685 - Posted: 5 Feb 2016, 23:46:10 UTC - in response to Message 1762684.  
Last modified: 5 Feb 2016, 23:46:28 UTC

Very high CPU usage for WU's other than High AR's. Almost a full core, for AR's other than VHAR's where the CPU usage is 8-10% only.

Since the WU's I tested this with on BETA, was all above 2.something in AR, the low CPU usage was what surprised me most. However here on main, with mostly lower AR's the high CPU usage really shows.

Thanks Dog, that we do not get VLAR's for CPU here, or even this GTX980 would come to a screeching halt :-)

EDIT: But SoG is fast, scaringly fast. So I can live with high CPU usage, just dropping a CPU core from CPU crunching.

Geeze....


You can try _use_sleep or -use_sleep_ex 5 to reduce CPU usage.
But i suggst to use this only running multiple instances.

Well, running 3 at a time is indeed multiple instances. However, I'll wait and see if I can live with this, because by using -use_sleep, this app will not be any faster than CUDA50.
ID: 1762685 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1762686 - Posted: 5 Feb 2016, 23:53:05 UTC - in response to Message 1762685.  

because by using -use_sleep, this app will not be any faster than CUDA50.

Would be interesting to check this BTW.
Sleep() implemented mostly in PulseFind area. And VHAR has small amount of PulseFind so -use_sleep impact there would be quite small and CPU savings with midrange AR could be substantional.
From other side, balancing overall host performance depends on GPU vs CPU work share. For fast GPUs most of host RAC should come from GPU part and CPU part could be neglectible.
ID: 1762686 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 33270
Credit: 79,922,639
RAC: 80
Germany
Message 1762687 - Posted: 5 Feb 2016, 23:54:27 UTC - in response to Message 1762685.  
Last modified: 5 Feb 2016, 23:55:28 UTC

Very high CPU usage for WU's other than High AR's. Almost a full core, for AR's other than VHAR's where the CPU usage is 8-10% only.

Since the WU's I tested this with on BETA, was all above 2.something in AR, the low CPU usage was what surprised me most. However here on main, with mostly lower AR's the high CPU usage really shows.

Thanks Dog, that we do not get VLAR's for CPU here, or even this GTX980 would come to a screeching halt :-)

EDIT: But SoG is fast, scaringly fast. So I can live with high CPU usage, just dropping a CPU core from CPU crunching.

Geeze....


You can try _use_sleep or -use_sleep_ex 5 to reduce CPU usage.
But i suggst to use this only running multiple instances.

Well, running 3 at a time is indeed multiple instances. However, I'll wait and see if I can live with this, because by using -use_sleep, this app will not be any faster than CUDA50.


Thats why i suggested -use_sleep_ex 5.
Shouldn`t be much slower running 3 instances but reduces CPU usage at least a little bit.

Running benches atm.
With each crime and every kindness we birth our future.
ID: 1762687 · Report as offensive
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8927
Credit: 49,849,242
RAC: 65
Sweden
Message 1762688 - Posted: 5 Feb 2016, 23:57:05 UTC - in response to Message 1762687.  
Last modified: 5 Feb 2016, 23:57:36 UTC


Thats why i suggested -use_sleep_ex 5.
Shouldn`t be much slower running 3 instances but reduces CPU usage at least a little bit.

Running benches atm.

Thanks for the suggestions Mike, always appreciated. I'll keep it in mind, if I get really bothered about the CPU usage at some AR's.
ID: 1762688 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1762691 - Posted: 6 Feb 2016, 0:01:43 UTC - in response to Message 1762688.  

Also would be interesting to check how it responds to -cpu_lock.
OpenCL NV quite uncharted area and what we know on ATi side not always directly applicable here.
ID: 1762691 · Report as offensive
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8927
Credit: 49,849,242
RAC: 65
Sweden
Message 1762693 - Posted: 6 Feb 2016, 0:06:12 UTC - in response to Message 1762691.  

Also would be interesting to check how it responds to -cpu_lock.
OpenCL NV quite uncharted area and what we know on ATi side not always directly applicable here.

You want me to add -cpu_lock to the command line?
ID: 1762693 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1762695 - Posted: 6 Feb 2016, 0:13:10 UTC - in response to Message 1762693.  

Also would be interesting to check how it responds to -cpu_lock.
OpenCL NV quite uncharted area and what we know on ATi side not always directly applicable here.

You want me to add -cpu_lock to the command line?


Just as part of app parameter space exploration, later, when you establish some baseline impression how it behaves on different ARs. Baseline required to have smth to compare with. Then such things like -use_sleep and/or -cpu_lock and -sbs N variations can be tested.
ID: 1762695 · Report as offensive
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8927
Credit: 49,849,242
RAC: 65
Sweden
Message 1762696 - Posted: 6 Feb 2016, 0:15:12 UTC - in response to Message 1762695.  

Also would be interesting to check how it responds to -cpu_lock.
OpenCL NV quite uncharted area and what we know on ATi side not always directly applicable here.

You want me to add -cpu_lock to the command line?


Just as part of app parameter space exploration, later, when you establish some baseline impression how it behaves on different ARs. Baseline required to have smth to compare with. Then such things like -use_sleep and/or -cpu_lock and -sbs N variations can be tested.

OK, I'll keep it running as it is now.
ID: 1762696 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1762729 - Posted: 6 Feb 2016, 2:18:01 UTC - in response to Message 1762696.  

Definitely using a lot more CPU than Beta, also seems to be taking longer to process.
ID: 1762729 · Report as offensive
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8927
Credit: 49,849,242
RAC: 65
Sweden
Message 1762732 - Posted: 6 Feb 2016, 2:36:07 UTC - in response to Message 1762729.  
Last modified: 6 Feb 2016, 2:37:49 UTC

Definitely using a lot more CPU than Beta, also seems to be taking longer to process.

Yeah, but then the WU's we got on Beta, was consistently over 2 on the AR. Not one was a "normal" AR.

Here, we see all kinds of AR's. Just now I'm crunching a bunch with an extreme AR of over 51, yes 51 :-) Those behave strange on the app, not any high CPU usage, but the progress is iffy to say the least (in the Boinc manager, the progress % doesn't even move until it suddenly jumps to 100% for these extreme WU's, but the progress indicator in BoincTasks work for these too), but they're done in 9-10 minutes, running 3 at a time. Too high to be fast, or something.

Example of one of those "crazy" WU's:
WU true angle range is : 51.249186
ID: 1762732 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1762742 - Posted: 6 Feb 2016, 3:45:25 UTC - in response to Message 1762732.  

Well, had a chance to look at some of these processed. They are now slower than Cuda here on main. Also seeing unusually high kernal usage. Within the last 20% of the analysis, kernal activity spikes, all CPUs go to 100%. I had been using a command line but removed it when it appears to be actually hampering the work, so now it's just running stock 3 at a time.
ID: 1762742 · Report as offensive
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8927
Credit: 49,849,242
RAC: 65
Sweden
Message 1762747 - Posted: 6 Feb 2016, 4:03:08 UTC - in response to Message 1762742.  

Well, had a chance to look at some of these processed. They are now slower than Cuda here on main. Also seeing unusually high kernal usage. Within the last 20% of the analysis, kernal activity spikes, all CPUs go to 100%. I had been using a command line but removed it when it appears to be actually hampering the work, so now it's just running stock 3 at a time.

Well YMMV of course. I can not say that this app is slower than CUDA, on the contrary, for me it is much faster than CUDA50.

But we'll see. I'll let it run as it does for now. I know how my production rate on CUDA looked like (per day), so I will pretty fast be able to compare with this app.
ID: 1762747 · Report as offensive
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8927
Credit: 49,849,242
RAC: 65
Sweden
Message 1762778 - Posted: 6 Feb 2016, 6:33:39 UTC

Further comments:

It's quite clear now, that the AR's of the WU's we use to test new apps on Beta, is not representative of the mix of different AR's we will meet here on main. There's a need for a better mix of AR's on Beta, that's for sure.

The results we get on Beta for the new apps, is in no way the results we will get for those apps here on main. That much is totally clear by now.

Anyhow, I'll continue with the SoG, until I can say whether or not it crunches V8 MB's faster or slower than CUDA50 on my GTX980 Strix, on the long run. So far, it's unclear, mostly because the SoG reacts not so good to "normal" AR's, and lower AR's, compared to how CUDA50 reacts to them. AR's between 2.xx and up to (so far unknown AR's), is where SoG really shines bright.

And that ends the comment from the Swedish jury, for now. :-)
ID: 1762778 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1762793 - Posted: 6 Feb 2016, 8:08:05 UTC - in response to Message 1762742.  

Well, had a chance to look at some of these processed. They are now slower than Cuda here on main. Also seeing unusually high kernal usage. Within the last 20% of the analysis, kernal activity spikes, all CPUs go to 100%. I had been using a command line but removed it when it appears to be actually hampering the work, so now it's just running stock 3 at a time.


Could you provide links to comparison pairs, please.
ID: 1762793 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8634
Credit: 2,930,782
RAC: 1
Italy
Message 1762873 - Posted: 6 Feb 2016, 15:02:52 UTC

I have installed a Geforce GTX 750 on my Windows 10 PC, reinstalled the Lunatics package and is now crunching SETI@home tasks. In the stderr.txt I see that the nVidia driver is 353.54. Is this OK? I did nothing to install drivers, Windows 10 did all the work.
Tullio
ID: 1762873 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 21 · Next

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.