AstroPulse v7 v7.10 (opencl_nvidia_100)

Message boards : Number crunching : AstroPulse v7 v7.10 (opencl_nvidia_100)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1673004 - Posted: 2 May 2015, 11:44:56 UTC

Thats different Uli.
Streaming causes heavy load on the GPU so i would just use exclude app in this case.
I assume you dont stream 24/7.


With each crime and every kindness we birth our future.
ID: 1673004 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1673005 - Posted: 2 May 2015, 11:57:13 UTC - in response to Message 1673004.  
Last modified: 2 May 2015, 11:57:36 UTC

Thats different Uli.
Streaming causes heavy load on the GPU so i would just use exclude app in this case.
I assume you dont stream 24/7.
Maybe, but why is it working as i want it with driver version 337.88?
There definitely *is* a change in the driver after that version, which makes things worse - at least for me.
...and maybe others, that don't have (virtual) CPU cores idling around in masses.
Aloha, Uli

ID: 1673005 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1673240 - Posted: 3 May 2015, 1:52:12 UTC

Since nobody cares to make an answer, ok, what i state is correct, so how to talk against... ;)

Here you can see the CPU usage with driver 337.88:


With newer driver the usage rise up to 1 core per AP-GPU-unit, no matter what, and all other CPU WUs starving to death...
Aloha, Uli

ID: 1673240 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1673248 - Posted: 3 May 2015, 2:24:18 UTC - in response to Message 1673240.  
Last modified: 3 May 2015, 2:37:47 UTC

Since nobody cares to make an answer, ok, what i state is correct, so how to talk against... ;)

Here you can see the CPU usage with driver 337.88:


With newer driver the usage rise up to 1 core per AP-GPU-unit, no matter what, and all other CPU WUs starving to death...


In this time your VGA cards crunch a Milkyway and two APv7 tasks.
You have two, a GT640 and a GT430 in this PC.

Which apps/tasks running on which card?

AFAIK, to the time I crunched Milkyway on my GTX285 with WinXP x86 it was the same (1 Milkyway task on GPU) the MW app used one whole thread (50% of Duo-CPU). Also with the v263.06 NV driver which have not the OpenCL BUG.

BTW. I can't see the end of the MW app (the column is to small), it's a CPU or GPU app?
ID: 1673248 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1673253 - Posted: 3 May 2015, 2:39:27 UTC - in response to Message 1673248.  
Last modified: 3 May 2015, 2:49:59 UTC

(...)
Which apps/tasks running on which card?
(...)
I thought, it was obvious, the two AP tasks are running on the GPUs, the GT 640 and the GT 430, sharing one CPU core for feeding and the milkyway task runs on the remaining CPU. This scenario is independent of the projects, running on the particular units. The behavior is the same, regardless of the projects involved...

[edit]
To be absolutely exact:
The first AP7 task runs on the GT 430.
The second AP7 task runs on the GT 640.
Please see, they both are sharing one CPU core for feeding!
The third WU (milkyway 1.00, 1.02 is for GPU) is a pure CPU WU running on the remaining core.
Aloha, Uli

ID: 1673253 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1673255 - Posted: 3 May 2015, 2:51:41 UTC - in response to Message 1673253.  
Last modified: 3 May 2015, 2:57:02 UTC

After - http://milkyway.cs.rpi.edu/milkyway/apps.php - I saw MWv1.00 is for CPU. ;-)

If I read your message correct, one AP v7.10 OpenCL app use -use_sleep and the other AP v7.10 OpenCL app don't use -use_sleep? Correct?
But currently I can't imagine how to set this ... ;-)
ID: 1673255 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1673257 - Posted: 3 May 2015, 2:59:33 UTC - in response to Message 1673255.  

If I read your messages correct, one AP v7.10 OpenCL app use -use_sleep and the other AP v7.10 OpenCL app don't use -use_sleep? Correct?

Unfortunately not, both are running without -use_sleep. If i would manage to run only the GT 430 with -use_sleep, the GT 640 would instantly catch the "remaining part" of the core for it's own, using 100% of the core. I didn't figure out, how to add the "-use_sleep" to the "AstroPulse_NV_config.xml"-file, to set this parameter only for the secondary GPU. :?

If this could be managed, there would still be the issue, that the GT 640 would catch a whole CPU core for it's own feeding with the 337.88 and also with the newer drivers. This really sucks big time! :/
Aloha, Uli

ID: 1673257 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1673295 - Posted: 3 May 2015, 6:42:52 UTC
Last modified: 3 May 2015, 6:49:54 UTC

So it't better to use ATI/AMD VGA cards for OpenCL apps. ;-)


@ all
I understand some messages here in this thead like, to find the 'guilty' person who made this BUG (NV VGA card, OpenCL app, use one whole CPU thread for support), correct?
OK, just an opinion/contemplation of a regular SETI user.

For Windows there are 3 OpenCL apps available, for ATI/AMD & NVidia VGA cards and Intel's iGPU.

AFAIK, the basics of this 3 OpenCL apps are the same, just 'small' adjustments for the 3 different devices.

AFAIK, ATI/AMD VGA cards and the Intel's iGPU don't use one whole CPU thread for support if OpenCL apps running (not needed to use -use_sleep).
On NVidia VGA cards, OpenCL app, one whole CPU thread for support (-use_sleep needed for to free the CPU thread).

BTW, the v263.06 NVidia driver with WinXP x86, I saw there no one whole CPU thread support if OpenCL on my GTX285.


So who is the 'guilty' person? I guess (IMHO) it's NVidia. They sabotage with their (newer) drivers the correct (and fast) execution of OpenCL apps for to say 'hey look, CUDA is the 'better' way'.
ID: 1673295 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1673307 - Posted: 3 May 2015, 7:57:30 UTC - in response to Message 1673257.  

If I read your messages correct, one AP v7.10 OpenCL app use -use_sleep and the other AP v7.10 OpenCL app don't use -use_sleep? Correct?

Unfortunately not, both are running without -use_sleep. If i would manage to run only the GT 430 with -use_sleep, the GT 640 would instantly catch the "remaining part" of the core for it's own, using 100% of the core. I didn't figure out, how to add the "-use_sleep" to the "AstroPulse_NV_config.xml"-file, to set this parameter only for the secondary GPU. :?

If this could be managed, there would still be the issue, that the GT 640 would catch a whole CPU core for it's own feeding with the 337.88 and also with the newer drivers. This really sucks big time! :/


And did you try to read ReadMe?

-cpu_lock_fixed_cpu N : Will enable CPUlock too but will bind all app instances to the same N-th CPU (N=0,1,.., number of CPUs-1).


At least worth to try in your case.
ID: 1673307 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1673330 - Posted: 3 May 2015, 12:13:25 UTC - in response to Message 1673295.  
Last modified: 3 May 2015, 12:23:20 UTC

So who is the 'guilty' person?


Actually Microsoft would be the ones controlling the driver specifications on Windows, and KhronosGroup the OpenCL interface to that (Must contact my friend there again sometime and ask [again] his opinion on driver GPU synchronisation methods ).

If there is a single culprit, I would point the finger at 'change', which is never easy, and rarely pleases everyone.

[Edit:] If it's any consolation, Google decided to push their own 'Renderscript' API instead of Cuda or OpenCL, in the name of preventing the Android market from fragmenting ( perhaps unwittingly further fragmenting the gpgpu community by spreading its limited resources even thinner ).

Well at least in chaos there can breed opportunity.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1673330 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1673340 - Posted: 3 May 2015, 13:18:29 UTC - in response to Message 1673307.  

I don't want to "accuse" anybody or find someone "guilty", i simply want the system to run as good as it used to - that's all! :)

And did you try to read ReadMe?

-cpu_lock_fixed_cpu N : Will enable CPUlock too but will bind all app instances to the same N-th CPU (N=0,1,.., number of CPUs-1).


At least worth to try in your case.

Thanks again for this suggestion. I tried exactly that at the first time, the drivers changed without success.
I changed my AstroPulse_NV_config.xml to the following content:

<device0>
	<cpu_lock_fixed_cpu>0</cpu_lock_fixed_cpu>
</device0>
<device1>
	<cpu_lock_fixed_cpu>0</cpu_lock_fixed_cpu>
	<unroll>4</unroll>
	<ffa_block>2048</ffa_block>
	<ffa_block_fetch>1024</ffa_block_fetch>
	<tune>
		<tune_kernel_index>1</tune_kernel_index>
		<tune_workgroup_size_x>128</tune_workgroup_size_x>
		<tune_workgroup_size_y>8</tune_workgroup_size_y>
		<tune_workgroup_size_z>1</tune_workgroup_size_z>
	</tune>
	<tune>
		<tune_kernel_index>2</tune_kernel_index>
		<tune_workgroup_size_x>128</tune_workgroup_size_x>
		<tune_workgroup_size_y>8</tune_workgroup_size_y>
		<tune_workgroup_size_z>1</tune_workgroup_size_z>
	</tune>
</device1>

Now i will go to NVidia site and download the latest drivers for my GT 430/640 and will install them.
Let's see, if this works now. Keep your fingers crossed! ;)
Aloha, Uli

ID: 1673340 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1673342 - Posted: 3 May 2015, 13:26:28 UTC - in response to Message 1673340.  

Always check stderr.txt to be sure app responded properly to your requests. To make sure option you wanted is applied indeed.
ID: 1673342 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1673343 - Posted: 3 May 2015, 13:30:13 UTC - in response to Message 1673340.  

I don't want to "accuse" anybody or find someone "guilty", i simply want the system to run as good as it used to - that's all! :)
(...)


I didn't meant you. ;-)

It was for all, because of the much earlier messages here in this thread.
I just wanted to also give my two cents.

:-)
ID: 1673343 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1673346 - Posted: 3 May 2015, 13:36:27 UTC - in response to Message 1673342.  

Always check stderr.txt to be sure app responded properly to your requests. To make sure option you wanted is applied indeed.


Double-Thanks again!

The -cpu_lock_fixed_cpu 0 parameter is only recognized in the command line txt file, *not* in the config.xml file:

In config.xml file:
Running on device number: 1
Device-specific DATA_CHUNK_UNROLL set to:8
Device-specific FFA thread block override value:2048
Device-specific FFA thread fetchblock override value:1024
Device-specific TUNE: kernel 1 now has workgroup size of (128,8,1)
Device-specific TUNE: kernel 2 now has workgroup size of (128,8,1)
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: NVIDIA Corporation


In cmdline txt file:
Running on device number: 1
CPU affinity adjustment enabled, fixed CPU 0 will be used
Device-specific DATA_CHUNK_UNROLL set to:8
Device-specific FFA thread block override value:2048
Device-specific FFA thread fetchblock override value:1024
Device-specific TUNE: kernel 1 now has workgroup size of (128,8,1)
Device-specific TUNE: kernel 2 now has workgroup size of (128,8,1)
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: NVIDIA Corporation


In the mean time the driver 350.12 downloaded, will install it now.
Hope to be back in a few minutes... ;)
Aloha, Uli

ID: 1673346 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1673353 - Posted: 3 May 2015, 14:01:46 UTC

Ok, new driver 350.12 installed. The cpu lock feature is working, both instances share one cpu core.

But have a look at the GPU load, it used to be 98-99%:



This is simply *not* satisfying. I will revert back to 337.88 until Nvidia finally manages again to write drivers...
Aloha, Uli

ID: 1673353 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1673356 - Posted: 3 May 2015, 14:17:41 UTC
Last modified: 3 May 2015, 14:29:58 UTC

For completeness, this is when i turn off the cpu_lock feature:



Both cores are used in the spin loop of the driver, leaving all other apps without a chance to get a cpu share. The cpu load for milkyway is 0, nada, niente, null, nothing...

This is with -use_sleep:



Lots of cycles wasted.
Aloha, Uli

ID: 1673356 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1673374 - Posted: 3 May 2015, 15:04:22 UTC

After the usual problems in reverting a driver version back, which windows constantly refuses to do without gentle force, i'm back to version 337.88 and "magically" all things run smooth again:



Both GPU apps peacefully share one core and the second core can be utilized to do something useful.

Nvidia, please wake up and get your things together!
Aloha, Uli

ID: 1673374 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1673389 - Posted: 3 May 2015, 15:30:15 UTC - in response to Message 1673374.  

Interesting.
Looks like you observing same change as other saw with 263.xx to higher driver transition... but with much later drivers.
Perhaps, some change for your GPU type postponed till those 337.xx ...
ID: 1673389 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1673400 - Posted: 3 May 2015, 15:43:05 UTC - in response to Message 1673389.  

Interesting.
Looks like you observing same change as other saw with 263.xx to higher driver transition... but with much later drivers.
Perhaps, some change for your GPU type postponed till those 337.xx ...


My understanding of this problem is, in the driver versions until 337.88 the spin loop was a single thread for all devices.
In the newer drivers this spin loop is probably implemented in a multi threaded manner and therefore using a own thread per device?
Aloha, Uli

ID: 1673400 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1673432 - Posted: 3 May 2015, 16:29:00 UTC - in response to Message 1673400.  

Quite possible...
ID: 1673432 · Report as offensive
Previous · 1 · 2 · 3

Message boards : Number crunching : AstroPulse v7 v7.10 (opencl_nvidia_100)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.