Weird: AP6_win_x86_SSE2_OpenCL_NV_r2058.exe

Message boards : Number crunching : Weird: AP6_win_x86_SSE2_OpenCL_NV_r2058.exe
Message board moderation

To post messages, you must log in.

AuthorMessage
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1467970 - Posted: 24 Jan 2014, 2:23:13 UTC
Last modified: 24 Jan 2014, 2:24:07 UTC

Hi everyone,

as the title reads, it is weird, the "AP6_win_x86_SSE2_OpenCL_NV_r2058.exe" executable takes a nearly to nothing (i mean, really nearly *nothing*) cpu share on my Windows XP pro 32 bit system, where the very same executable takes a *full* core on my Windows 8.1 pro 64 bit machine. On both systems the very same Nvidia driver version 332.21 is active and i don't get a clue, what's going on?
Is Windows XP still the Windows derivative to choose for really effective crunching?
I thought, Windows 8.1 is mostly written in C/C++ again and therefore again as fast, as it could be?! :? :?

Any help/hints very much appreciated!
Aloha, Uli

ID: 1467970 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1467990 - Posted: 24 Jan 2014, 3:29:40 UTC - in response to Message 1467970.  

Hi everyone,

as the title reads, it is weird, the "AP6_win_x86_SSE2_OpenCL_NV_r2058.exe" executable takes a nearly to nothing (i mean, really nearly *nothing*) cpu share on my Windows XP pro 32 bit system, where the very same executable takes a *full* core on my Windows 8.1 pro 64 bit machine. On both systems the very same Nvidia driver version 332.21 is active and i don't get a clue, what's going on?
Is Windows XP still the Windows derivative to choose for really effective crunching?
I thought, Windows 8.1 is mostly written in C/C++ again and therefore again as fast, as it could be?! :? :?

Any help/hints very much appreciated!


From the XP machine...
Running on device number: 0
DATA_CHUNK_UNROLL set to:4
FFA thread block override value:2048
FFA thread fetchblock override value:1024
Maximum single buffer size set to:256MB
Sleep() & wait for event loops will be used in some places
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used


From the 8.1 machine...
Running on device number: 0
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used


Looks like you forgot the set the parameters on the 8.1 machine.

ID: 1467990 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1468001 - Posted: 24 Jan 2014, 4:10:13 UTC - in response to Message 1467990.  
Last modified: 24 Jan 2014, 4:40:57 UTC

(...)
Looks like you forgot the set the parameters on the 8.1 machine.

Do you really think, it depends on that parameters?
I only set these parameters on the XP machine, because i want to see DVB-C television on that machine from time to time and if i omit the parameters the video stream is awfully stuttering. :? But ok, thanks, i'll give it a try on the 8.1 machine, let's see...

[edit]
You are right, the "-use_sleep" parameter is the culprit.
If i use this parameter in Windows 8.1, the processor usage drops from a full core to nearly zero! :O
Also the GPU usage drops from ~95% to 50-60%, but the overall performance seems to be the same! :O²
On Windows XP the GPU usage stays at ~95%, regardless of this parameter! :O³
I'm puzzled, but i'll let it run some time to see, how it develops...
Aloha, Uli

ID: 1468001 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1468042 - Posted: 24 Jan 2014, 6:26:44 UTC
Last modified: 24 Jan 2014, 6:30:24 UTC

With r2058 you could easely run 2 AP WU at a time on your Fermi/Keppler GPUs with no pain, that will rise your GPU usage again. Of course YMMV so you need to test if crunching 2 at a time is better than 1 (usualy is).
ID: 1468042 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1468085 - Posted: 24 Jan 2014, 8:26:20 UTC - in response to Message 1468001.  

(...)
Looks like you forgot the set the parameters on the 8.1 machine.

Do you really think, it depends on that parameters?
I only set these parameters on the XP machine, because i want to see DVB-C television on that machine from time to time and if i omit the parameters the video stream is awfully stuttering. :? But ok, thanks, i'll give it a try on the 8.1 machine, let's see...

[edit]
You are right, the "-use_sleep" parameter is the culprit.
If i use this parameter in Windows 8.1, the processor usage drops from a full core to nearly zero! :O
Also the GPU usage drops from ~95% to 50-60%, but the overall performance seems to be the same! :O²
On Windows XP the GPU usage stays at ~95%, regardless of this parameter! :O³
I'm puzzled, but i'll let it run some time to see, how it develops...

With the "-use_sleep" parameter your've got to up the unroll and FFA thread block parameters to claw back GPU usage.

Claggy
ID: 1468085 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1468089 - Posted: 24 Jan 2014, 8:34:45 UTC - in response to Message 1468085.  


With the "-use_sleep" parameter your've got to up the unroll and FFA thread block parameters to claw back GPU usage.


Thank you and yes, i got to the same conclusion, here are the new parameters:
DATA_CHUNK_UNROLL set to:8
FFA thread block override value:6144
FFA thread fetchblock override value:1536
Maximum single buffer size set to:256MB
Sleep() & wait for event loops will be used in some places
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used


from here: http://setiathome.berkeley.edu/result.php?resultid=3347672063
You have to scroll down to the last restart of the executable.
Aloha, Uli

ID: 1468089 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1468099 - Posted: 24 Jan 2014, 9:10:15 UTC
Last modified: 24 Jan 2014, 9:10:58 UTC

-use_sleep was introduced much before r2058 but because of nasty NV OpenCL runtime bug it was almost non-functional.
With latest builds that bug was discovered and workaround implemented.
Bug was filed to NV, they said @will look@ and no response more than for 2 months already.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1468099 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1468102 - Posted: 24 Jan 2014, 9:25:32 UTC

BTW:
On Mike's site (http://mikesworldnet.de/produkte.html) there are r2083 executables for AMD/ATI and for the CPU but for Nvidia there is only version r2058. Did i miss something?
Aloha, Uli

ID: 1468102 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1468180 - Posted: 24 Jan 2014, 13:12:36 UTC - in response to Message 1468102.  

Perhaps not. Work in progress.Re-building whole app set, especially with new BOINC APis takes the time so sometimes better to skip few iterations for less essential flavours.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1468180 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1468185 - Posted: 24 Jan 2014, 13:26:07 UTC - in response to Message 1468180.  
Last modified: 24 Jan 2014, 13:26:53 UTC

Perhaps not. Work in progress.Re-building whole app set, especially with new BOINC APis takes the time so sometimes better to skip few iterations for less essential flavours.


If you're currently rebuilding applications, you might want to consider linking to commode.obj (as documented and supplied amongst the MS stuff) just to workaround the stderr truncation issues (also possibly others). It's not a fix for Boinc's dated practices, but seems for now to work around at least the stderr truncation issue.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1468185 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1468212 - Posted: 24 Jan 2014, 14:40:00 UTC - in response to Message 1468185.  

thanks, Jason, will try on next build.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1468212 · Report as offensive

Message boards : Number crunching : Weird: AP6_win_x86_SSE2_OpenCL_NV_r2058.exe


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.