Weird: AP6_win_x86_SSE2_OpenCL_NV_r2058.exe


log in

Advanced search

Message boards : Number crunching : Weird: AP6_win_x86_SSE2_OpenCL_NV_r2058.exe

Author Message
Ulrich Metzner
Volunteer tester
Avatar
Send message
Joined: 3 Jul 02
Posts: 969
Credit: 8,177,083
RAC: 9,755
Germany
Message 1467970 - Posted: 24 Jan 2014, 2:23:13 UTC
Last modified: 24 Jan 2014, 2:24:07 UTC

Hi everyone,

as the title reads, it is weird, the "AP6_win_x86_SSE2_OpenCL_NV_r2058.exe" executable takes a nearly to nothing (i mean, really nearly *nothing*) cpu share on my Windows XP pro 32 bit system, where the very same executable takes a *full* core on my Windows 8.1 pro 64 bit machine. On both systems the very same Nvidia driver version 332.21 is active and i don't get a clue, what's going on?
Is Windows XP still the Windows derivative to choose for really effective crunching?
I thought, Windows 8.1 is mostly written in C/C++ again and therefore again as fast, as it could be?! :? :?

Any help/hints very much appreciated!
____________
Aloha, Uli

Profile arkayn
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3597
Credit: 47,401,844
RAC: 5,878
United States
Message 1467990 - Posted: 24 Jan 2014, 3:29:40 UTC - in response to Message 1467970.

Hi everyone,

as the title reads, it is weird, the "AP6_win_x86_SSE2_OpenCL_NV_r2058.exe" executable takes a nearly to nothing (i mean, really nearly *nothing*) cpu share on my Windows XP pro 32 bit system, where the very same executable takes a *full* core on my Windows 8.1 pro 64 bit machine. On both systems the very same Nvidia driver version 332.21 is active and i don't get a clue, what's going on?
Is Windows XP still the Windows derivative to choose for really effective crunching?
I thought, Windows 8.1 is mostly written in C/C++ again and therefore again as fast, as it could be?! :? :?

Any help/hints very much appreciated!


From the XP machine...

Running on device number: 0
DATA_CHUNK_UNROLL set to:4
FFA thread block override value:2048
FFA thread fetchblock override value:1024
Maximum single buffer size set to:256MB
Sleep() & wait for event loops will be used in some places
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used


From the 8.1 machine...

Running on device number: 0
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used


Looks like you forgot the set the parameters on the 8.1 machine.
____________

Ulrich Metzner
Volunteer tester
Avatar
Send message
Joined: 3 Jul 02
Posts: 969
Credit: 8,177,083
RAC: 9,755
Germany
Message 1468001 - Posted: 24 Jan 2014, 4:10:13 UTC - in response to Message 1467990.
Last modified: 24 Jan 2014, 4:40:57 UTC

(...)
Looks like you forgot the set the parameters on the 8.1 machine.

Do you really think, it depends on that parameters?
I only set these parameters on the XP machine, because i want to see DVB-C television on that machine from time to time and if i omit the parameters the video stream is awfully stuttering. :? But ok, thanks, i'll give it a try on the 8.1 machine, let's see...

[edit]
You are right, the "-use_sleep" parameter is the culprit.
If i use this parameter in Windows 8.1, the processor usage drops from a full core to nearly zero! :O
Also the GPU usage drops from ~95% to 50-60%, but the overall performance seems to be the same! :O²
On Windows XP the GPU usage stays at ~95%, regardless of this parameter! :O³
I'm puzzled, but i'll let it run some time to see, how it develops...
____________
Aloha, Uli

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4947
Credit: 270,864,122
RAC: 396,515
Brazil
Message 1468042 - Posted: 24 Jan 2014, 6:26:44 UTC
Last modified: 24 Jan 2014, 6:30:24 UTC

With r2058 you could easely run 2 AP WU at a time on your Fermi/Keppler GPUs with no pain, that will rise your GPU usage again. Of course YMMV so you need to test if crunching 2 at a time is better than 1 (usualy is).
____________

Claggy
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4048
Credit: 32,693,420
RAC: 508
United Kingdom
Message 1468085 - Posted: 24 Jan 2014, 8:26:20 UTC - in response to Message 1468001.

(...)
Looks like you forgot the set the parameters on the 8.1 machine.

Do you really think, it depends on that parameters?
I only set these parameters on the XP machine, because i want to see DVB-C television on that machine from time to time and if i omit the parameters the video stream is awfully stuttering. :? But ok, thanks, i'll give it a try on the 8.1 machine, let's see...

[edit]
You are right, the "-use_sleep" parameter is the culprit.
If i use this parameter in Windows 8.1, the processor usage drops from a full core to nearly zero! :O
Also the GPU usage drops from ~95% to 50-60%, but the overall performance seems to be the same! :O²
On Windows XP the GPU usage stays at ~95%, regardless of this parameter! :O³
I'm puzzled, but i'll let it run some time to see, how it develops...

With the "-use_sleep" parameter your've got to up the unroll and FFA thread block parameters to claw back GPU usage.

Claggy

Ulrich Metzner
Volunteer tester
Avatar
Send message
Joined: 3 Jul 02
Posts: 969
Credit: 8,177,083
RAC: 9,755
Germany
Message 1468089 - Posted: 24 Jan 2014, 8:34:45 UTC - in response to Message 1468085.


With the "-use_sleep" parameter your've got to up the unroll and FFA thread block parameters to claw back GPU usage.


Thank you and yes, i got to the same conclusion, here are the new parameters:
DATA_CHUNK_UNROLL set to:8
FFA thread block override value:6144
FFA thread fetchblock override value:1536
Maximum single buffer size set to:256MB
Sleep() & wait for event loops will be used in some places
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used


from here: http://setiathome.berkeley.edu/result.php?resultid=3347672063
You have to scroll down to the last restart of the executable.
____________
Aloha, Uli

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3369
Credit: 46,113,295
RAC: 19,486
Russia
Message 1468099 - Posted: 24 Jan 2014, 9:10:15 UTC
Last modified: 24 Jan 2014, 9:10:58 UTC

-use_sleep was introduced much before r2058 but because of nasty NV OpenCL runtime bug it was almost non-functional.
With latest builds that bug was discovered and workaround implemented.
Bug was filed to NV, they said @will look@ and no response more than for 2 months already.
____________

Ulrich Metzner
Volunteer tester
Avatar
Send message
Joined: 3 Jul 02
Posts: 969
Credit: 8,177,083
RAC: 9,755
Germany
Message 1468102 - Posted: 24 Jan 2014, 9:25:32 UTC

BTW:
On Mike's site (http://mikesworldnet.de/produkte.html) there are r2083 executables for AMD/ATI and for the CPU but for Nvidia there is only version r2058. Did i miss something?
____________
Aloha, Uli

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3369
Credit: 46,113,295
RAC: 19,486
Russia
Message 1468180 - Posted: 24 Jan 2014, 13:12:36 UTC - in response to Message 1468102.

Perhaps not. Work in progress.Re-building whole app set, especially with new BOINC APis takes the time so sometimes better to skip few iterations for less essential flavours.
____________

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4920
Credit: 72,647,312
RAC: 3,965
Australia
Message 1468185 - Posted: 24 Jan 2014, 13:26:07 UTC - in response to Message 1468180.
Last modified: 24 Jan 2014, 13:26:53 UTC

Perhaps not. Work in progress.Re-building whole app set, especially with new BOINC APis takes the time so sometimes better to skip few iterations for less essential flavours.


If you're currently rebuilding applications, you might want to consider linking to commode.obj (as documented and supplied amongst the MS stuff) just to workaround the stderr truncation issues (also possibly others). It's not a fix for Boinc's dated practices, but seems for now to work around at least the stderr truncation issue.
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3369
Credit: 46,113,295
RAC: 19,486
Russia
Message 1468212 - Posted: 24 Jan 2014, 14:40:00 UTC - in response to Message 1468185.

thanks, Jason, will try on next build.
____________

Message boards : Number crunching : Weird: AP6_win_x86_SSE2_OpenCL_NV_r2058.exe

Copyright © 2014 University of California