New HP Z400 - Lunatics in question

Message boards : Number crunching : New HP Z400 - Lunatics in question
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
far_raf

Send message
Joined: 26 Apr 00
Posts: 120
Credit: 47,977,058
RAC: 19
Canada
Message 1521517 - Posted: 27 May 2014, 6:56:18 UTC - in response to Message 1520930.  

Hi Tbar, I had gone through the install on Lunatics twice, both times I told it to use the cuda 5 or the kepler hardware - I forget the exact question it asks.

What you say makes total sense to me regarding the cuda 3.2 / cuda 5.0 issue - my card can do cuda 6.

So I remain confused as to why my cuda 6 capable pc would do cuda 3.2 apps.
ID: 1521517 · Report as offensive
far_raf

Send message
Joined: 26 Apr 00
Posts: 120
Credit: 47,977,058
RAC: 19
Canada
Message 1521518 - Posted: 27 May 2014, 7:13:32 UTC - in response to Message 1520975.  

I agree with Mike as well Juan, I have to find out why the cpu ap tasks are taking so long. Juan, you know I went through the install selecting the correct options after the file fail. I remain confused a bit

I find there are other issues that I do not understand -

1. Why does boinc not always read "mbcuda.cfg" when starting a task?
2. Why do I see reports of tasks starting with cuda 3.2 even now?
3. Why do tasks start with a report on start up of using the "fermi path" (on this one I am really puzzled, I do not get it, the cards are cuda 6)?

I remain your friend
Regards
Robert
ID: 1521518 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1521555 - Posted: 27 May 2014, 11:54:28 UTC - in response to Message 1521518.  
Last modified: 27 May 2014, 12:09:49 UTC

1. Why does boinc not always read "mbcuda.cfg" when starting a task?

That you need to ask to Jason´s, who creat the build, but i belive the answer is because the configuration only changes if you change the GPU or something similar.

2. Why do I see reports of tasks starting with cuda 3.2 even now?

Because for any unknown reason your host crunch the WU with the wrong build.

3. Why do tasks start with a report on start up of using the "fermi path" (on this one I am really puzzled, I do not get it, the cards are cuda 6)?

See the #2 answer, your card is cuda 6 capable, that not means it not work with previus cuda flavors

I remain your friend

I´m sure about that.

About the Lunnatics instalation, when you do did you remember to clear the cuda32 defoult option and allow the cuda50 only? Maybe this is your problem. By defoult the cuda32 is enabled IIRC. If your installation is OK your processed WUs must show cuda50 only, even if your boinc mager could show something diferent. Another way to check that is by looking the WU when crunching with taskmgr they must show Lunatics_x41zc_win32_cuda50.exe on the process bar.

About your long time, Mike is right (as allways) your times are extreamely long specialy for your powerful CPU, so i have some questions:

Do you run simultanuesly any other CPU intensive program? How many CPU task are you running in paralel of how many GPU WU at the same time on how many GPu´s? It´s hard to belive but you could suffering from core "starvation".

Did you follow the GPU clock? does its mantain stable or goes down sometimes?

Temperature could have some impact in your host but you allways say the GPU´s are running cold so i belive that is not a player in your case.

And one last "stupid" question, did you use any screen saver or power management? Remember the windows defoult power management slow down all after some time to save power. If that´s your case you need to change to the performance mode who don´t do that (in your control panel power options).

But keep crunching.
ID: 1521555 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1521598 - Posted: 27 May 2014, 15:35:40 UTC - in response to Message 1521555.  
Last modified: 27 May 2014, 15:48:46 UTC

1. Why does boinc not always read "mbcuda.cfg" when starting a task?

That you need to ask to Jason´s, who creat the build, but i belive the answer is because the configuration only changes if you change the GPU or something similar.


*scratching head*. That's not a Boinc file, but an x-branch one. My Windows builds of x41zc, stock and installer, ALL read the mbcuda.cfg file at every task startup. You'll need to show me what you're looking at, if you think it is not doing so.

[Edit:] one idea, if you are using stock then there are different mbcuda.cfg txt files per version in the project folder. That's an unfortunate limitation of Boinc I can do nothing about. You should put the same settings in each variant.

Running stock could also explain why builds swap around. Boinc doesn't do time estimates all that well (part of the reason it screws up credit too). That means sometimes it can get confused and stuck on the wrong build.

The current (custom) Cuda 6 build I'm running isn't any faster for Kepler GPUs than the Cuda 5 variant, so not worth releasing for the sakes of a number. That's mostly because I'm bogged down with Boinc problems and making some special tools to use all the new Cuda 6 features. Probably once some issues are resolved application development will be much faster (fingers crossed).
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1521598 · Report as offensive
far_raf

Send message
Joined: 26 Apr 00
Posts: 120
Credit: 47,977,058
RAC: 19
Canada
Message 1521826 - Posted: 28 May 2014, 7:01:48 UTC - in response to Message 1521555.  

1. Why does boinc not always read "mbcuda.cfg" when starting a task?

That you need to ask to Jason´s, who creat the build, but i belive the answer is because the configuration only changes if you change the GPU or something similar.

--> Will do Juan, Jason posted after you did.

2. Why do I see reports of tasks starting with cuda 3.2 even now?

Because for any unknown reason your host crunch the WU with the wrong build.

--> I am starting to think I need to start from a clean install.

3. Why do tasks start with a report on start up of using the "fermi path" (on this one I am really puzzled, I do not get it, the cards are cuda 6)?

See the #2 answer, your card is cuda 6 capable, that not means it not work with previus cuda flavors

--> I agree with this.

I remain your friend

I´m sure about that.

--> So am I Juan.

About the Lunnatics instalation, when you do did you remember to clear the cuda32 defoult option and allow the cuda50 only? Maybe this is your problem. By defoult the cuda32 is enabled IIRC. If your installation is OK your processed WUs must show cuda50 only, even if your boinc mager could show something diferent. Another way to check that is by looking the WU when crunching with taskmgr they must show Lunatics_x41zc_win32_cuda50.exe on the process bar.

--> All current tasks are mb 7.00 or v7 cuda 50.

About your long time, Mike is right (as allways) your times are extreamely long specialy for your powerful CPU, so i have some questions:

Do you run simultanuesly any other CPU intensive program? How many CPU task are you running in paralel of how many GPU WU at the same time on how many GPu´s? It´s hard to belive but you could suffering from core "starvation".

--> There are no other applications running on the Z400, this unit is currently dedicated to Seti. I am running all 8 cores and 2 tasks per 750.

Did you follow the GPU clock? does its mantain stable or goes down sometimes?

--> I have monitored the gpu clocks (core and mem), they have been stable.

Temperature could have some impact in your host but you allways say the GPU´s are running cold so i belive that is not a player in your case.

--> As we have discussed before the temps on the cpu & gpu's are well within spec.

And one last "stupid" question, did you use any screen saver or power management? Remember the windows defoult power management slow down all after some time to save power. If that´s your case you need to change to the performance mode who don´t do that (in your control panel power options).

--> LMBO, no screen saver and I defeated the power management as one of the first things I do on a pc. Sue me, I am not energy star compliant.

But keep crunching.


--> I will.

Regards
Robert
ID: 1521826 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1521850 - Posted: 28 May 2014, 7:46:06 UTC - in response to Message 1521826.  
Last modified: 28 May 2014, 8:08:26 UTC

Its 4:20AM here so please forgive my even worst english.

2. Why do I see reports of tasks starting with cuda 3.2 even now?

Because for any unknown reason your host crunch the WU with the wrong build.

--> I am starting to think I need to start from a clean install.

Could be a good ideia, somethig could be corrupted an it´s hard t tell at distance.

--> All current tasks are mb 7.00 or v7 cuda 50.

That´s is what we expect.

Do you run simultanuesly any other CPU intensive program? How many CPU task are you running in paralel of how many GPU WU at the same time on how many GPu´s? It´s hard to belive but you could suffering from core "starvation".

--> There are no other applications running on the Z400, this unit is currently dedicated to Seti. I am running all 8 cores and 2 tasks per 750.

That could be one of the answers why your crunching times are so high, you actualy can´t run all cores and 2 tasks on each GPU at the same time, you need to keep at least one core free to feed the GPU + 1 aditional core for each AP GPU WU you run.

So your actual limit (with the builds you are ussing) is 7CPU WU (less 1 per GPU AP running) for example if your host is crunching 2 GPU AP WU, your limit will be 5 CPU taks. Use all 8 cores + 4GPU WU´s produces what i call "core starvation" your CPU don´t have any core avaiable to constantly feed your hungry GPU´s then your entire host slow down. To fix that, release one core to feed the GPU´s and use app_config.xml file to manage to release one aditional core when GPU AP (CPU_usage) where crunching and your times must down to normal crunching times.

This is not an Lunatics or stock issue, it´s the normal way the older AP crunching code works, it needs one core to feed each GPU task plus one. Some could say one core per GPU AP is enought and you don´t neet to leave the aditional core for all, i don´t run that version anymore but when i run, at least on my hosts, i need to leave the aditional core but my I5 are a lot less capable than your powerfull Xeon CPU and my 690 are an hungry beasts, so test is need on your particular host to see what is the best, as allways. More technical explanation is above my knowledge but sure others could explain that if you ask for. There are some new builds that works arround and fix this limitation, but that is for another time, first let us make your host work as expected.

Maybe others like Mike, Zalster, etc. could give us some other clues or configuration ideias.
ID: 1521850 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1521913 - Posted: 28 May 2014, 14:01:03 UTC - in response to Message 1521850.  
Last modified: 28 May 2014, 14:05:16 UTC

Hey Robert and Juan,

Sorry been busy since coming back from the long weekend. Ok, my guess with the cuda 32s are that he had an allotment of them from the testing from Seti to determine which cuda was going to work the best with his GPU prior to installation of the Lunatics. After he installed the Lunatics, the computer had to finish those before it started onto the cuda 50s. (again my best guess)

If we are going for a clean install, then set Boinc task manager to No new task for the project and finish out what he has. Also, after the new install. Make sure to suspend the project or set No new task before you do the Lunatics. Otherwise, Seti is going to download a whole bunch of work before he can even do the installation of the lunatics. Maybe a better way would be to set the default on your account to no work so when you do the new install nothing is downloaded at first. After you do all the installations, you could change location of the computer to a profile that will allow it to download work after you are done. (seems like a lot to do but would ensure that only those cudas you want are download otherwise you have to work those the others) If you plan on going for the clean install.

Otherwise since it now appears you have the correct versions of which cudas (50s) and MB, we turn our attention to how many are crunching.

Only 1 of my computers is an intel. It's a Quad Core with a single GPU. For that system I figured out that I could run 2 APs with 2 CPU cores supporting them on that single GPU (750ti). That left 2 cores to crunch on their own. For MB, I figured 2 MB with 0.6 core support (or 1 mb with 0.3 core) This configuration is overkill for the MB. When we originally talked, you installed a config file that specified 0.04 core per MB. My number is just easier for me, I'll explain below. Either way it doesn't affect that 3 cores are left unused. Seti will uses those 3 unused to crunch and even that 4th core with only 0.4 available with crunch. Sounds counterintuitive doesn't it. But I think of it this way. 2 (0.3) cores are floated between all the 4 cores and at some point each core does some computing for the GPU. Now, this doesn't happen with the APs since they require a complete core, so seti actually gives them their own Core to crunch with, rather than share time on another cores.

so in short AP= 0.5 Gpu + 1 CPU and 2 APs per GPU
MB= 0.5 GPU + 0.3 CPU and 2 MB per GPU


Why 0.3 CPU for the MBs? My other systems. They use AMD 8 core chips. After experimenting with different settings, I finally found that for mine, 0.3 CPU gave the best result for MB crunching. So that 0.3 carried over to the intel as I could easily remember what the ratio was if I ever needed to start from scratch.

For my AMD systems I only do 1 AP per each GPU. 1 AP = 1 GPU + 1 CPU

For all these different rations I use app_config.xml.
Have you modified yours since we last talked? What does your config say?


For your system, you have an 8 core with 2 GPUs correct?
How many APs are crunching on the Intel chip and how many are crunching on each card?
How many MBs are crunched on the chip and how many on the GPUs?


Going by the above configuration, You should at least crunch 4 APs on the 8 core and 2 APs on each GPU for 4 APs, a total of 8 at a time. 4+ (2x2) = 8 APs

For the MBs, 2 per GPUs is 4 MBs. 4 x 0.04=0.16 So you should have 8 Cores (technically 7.84) free to crunch MBs so 12 MBs at the same time. (2x2) +8 = 12 MBs

Is this close? After that we need to look at how long it takes to complete each work unit. APs on my Intel take 1 hour 20 minutes to complete on the GPU. MBs take 23 minutes on the GPU.

What are your times looking like? If they are taking longer than that we might need to look at giving more of the core to the GPU to crunch. (I played with the ratios for a bit before deciding on what worked best for my system)

Sorry for the long reply.

Zalster
ID: 1521913 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1521935 - Posted: 28 May 2014, 14:55:46 UTC
Last modified: 28 May 2014, 15:01:42 UTC

That´s exactly why i think is happening, 8CPU+4GPU Wu are too much even for his xeon CPU (i wish i have some here - LOL), he need to free some cores and test what is his host best configuration. My guess 4CPU+4GPU AP thru 6CPU+4GPU MB.
ID: 1521935 · Report as offensive
far_raf

Send message
Joined: 26 Apr 00
Posts: 120
Credit: 47,977,058
RAC: 19
Canada
Message 1522300 - Posted: 29 May 2014, 6:53:27 UTC - in response to Message 1521598.  

This post is information for Jason, Juan and Zalster. This is going to be a super long post, sorry.


1. Why does boinc not always read "mbcuda.cfg" when starting a task?

That you need to ask to Jason´s, who creat the build, but i belive the answer is because the configuration only changes if you change the GPU or something similar.


*scratching head*. That's not a Boinc file, but an x-branch one. My Windows builds of x41zc, stock and installer, ALL read the mbcuda.cfg file at every task startup. You'll need to show me what you're looking at, if you think it is not doing so.

--> see the info following the quote.

[Edit:] one idea, if you are using stock then there are different mbcuda.cfg txt files per version in the project folder. That's an unfortunate limitation of Boinc I can do nothing about. You should put the same settings in each variant.

--> I figured that out a while ago Jason.

Running stock could also explain why builds swap around. Boinc doesn't do time estimates all that well (part of the reason it screws up credit too). That means sometimes it can get confused and stuck on the wrong build.

The current (custom) Cuda 6 build I'm running isn't any faster for Kepler GPUs than the Cuda 5 variant, so not worth releasing for the sakes of a number. That's mostly because I'm bogged down with Boinc problems and making some special tools to use all the new Cuda 6 features. Probably once some issues are resolved application development will be much faster (fingers crossed).



Hi Jason, what follows is a large collection of information from my pending/inconclusive lists on the boinc website.

I would like to point out in the astropulse info is a reference to using the "fermi path". Does the current astropulse build not use Kepler?

In any event here is the information you all asked for.


CONFIGURATION INFORMATION
-------------------------

STOCK MBCUDA.CFG

;;; This configuration file is for optional control of Cuda Multibeam x41zc
;;; Currently, the available options are for
;;; application process priority control (without external tools), and
;;; per gpu priority control (useful for multiple Cuda GPU systems)
;[mbcuda]
;;;;; Global applications settings, to apply to all Cuda devices
;;; You can uncomment the processpriority line below, by removing the ';', to engage machine global priority control of x41x
;;; possible options are 'belownormal' (which is the default), 'normal', 'abovenormal', or 'high'
;;; For dedicated crunching machines, 'abovenormal' is recommended
;;; raising global application priorities above the default
;;; may have system dependant usability effects, and can have positive or negative effects on overall throughput
;processpriority = high
;;; Pulsefinding: Advanced options for long pulsefinds (affect display usability & long kernel runs)
;;; defaults are conservative.
;;; WARNING: Excessive values may induce display lag, driver timeout & recovery, or errors.
;;; pulsefinding blocks per multiprocessor (1-16), default is 1 for Pre-Fermi, 4 for Fermi or newer GPUs
;pfblockspersm = 8
;;; pulsefinding maximum periods per kernel launch (1-1000), default is 100, as per 6.09
;pfperiodsperlaunch = 200

;[bus1slot0]
;;; Optional GPU specifc control (requires Cuda 3.2 or newer app), example
;processpriority = abovenormal
;pfblockspersm = 8
;pfperiodsperlaunch = 200

---------------------------

MY MODIFIED MBCUDA.CFG

;;; This configuration file is for optional control of Cuda Multibeam x41zc
;;; Currently, the available options are for
;;; application process priority control (without external tools), and
;;; per gpu priority control (useful for multiple Cuda GPU systems)
[mbcuda]
;;;;; Global applications settings, to apply to all Cuda devices
;;; You can uncomment the processpriority line below, by removing the ';', to engage machine global priority control of x41x
;;; possible options are 'belownormal' (which is the default), 'normal', 'abovenormal', or 'high'
;;; For dedicated crunching machines, 'abovenormal' is recommended
;;; raising global application priorities above the default
;;; may have system dependant usability effects, and can have positive or negative effects on overall throughput
processpriority = high
;;; Pulsefinding: Advanced options for long pulsefinds (affect display usability & long kernel runs)
;;; defaults are conservative.
;;; WARNING: Excessive values may induce display lag, driver timeout & recovery, or errors.
;;; pulsefinding blocks per multiprocessor (1-16), default is 1 for Pre-Fermi, 4 for Fermi or newer GPUs
pfblockspersm = 8
;;; pulsefinding maximum periods per kernel launch (1-1000), default is 100, as per 6.09
pfperiodsperlaunch = 300

[bus15slot0]
;;; Optional GPU specifc control (requires Cuda 3.2 or newer app), example
processpriority = abovenormal
pfblockspersm = 8
pfperiodsperlaunch = 300

[bus40slot0]
;;; Optional GPU specifc control (requires Cuda 3.2 or newer app), example
processpriority = abovenormal
pfblockspersm = 8
pfperiodsperlaunch = 300

---------------------------

MY MODIFIED MBCUDA-7.00-CUDA50.CFG

;;; This configuration file is for optional control of Cuda Multibeam x41zc
;;; Currently, the available options are for
;;; application process priority control (without external tools), and
;;; per gpu priority control (useful for multiple Cuda GPU systems)
[mbcuda]
;;;;; Global applications settings, to apply to all Cuda devices
;;; You can uncomment the processpriority line below, by removing the ';', to engage machine global priority control of x41x
;;; possible options are 'belownormal' (which is the default), 'normal', 'abovenormal', or 'high'
;;; For dedicated crunching machines, 'abovenormal' is recommended
;;; raising global application priorities above the default
;;; may have system dependant usability effects, and can have positive or negative effects on overall throughput
processpriority = abovenormal
;;; Pulsefinding: Advanced options for long pulsefinds (affect display usability & long kernel runs)
;;; defaults are conservative.
;;; WARNING: Excessive values may induce display lag, driver timeout & recovery, or errors.
;;; pulsefinding blocks per multiprocessor (1-16), default is 1 for Pre-Fermi, 4 for Fermi or newer GPUs
pfblockspersm = 8
;;; pulsefinding maximum periods per kernel launch (1-1000), default is 100, as per 6.09
pfperiodsperlaunch = 300

[bus15slot0]
;;; Optional GPU specifc control (requires Cuda 3.2 or newer app), example
processpriority = abovenormal
pfblockspersm = 8
pfperiodsperlaunch = 300

[bus40slot0]
;;; Optional GPU specifc control (requires Cuda 3.2 or newer app), example
processpriority = abovenormal
pfblockspersm = 8
pfperiodsperlaunch = 300

----------------------------

APP_CONFIG.XML

<app_config>
<app>
<name>setiathome_v7</name>
<gpu_versions>
<gpu_usage>0.50</gpu_usage>
<cpu_usage>0.04</cpu_usage>
</gpu_versions>
</app>
<app>
<name>astropulse_v6</name>
<gpu_versions>
<gpu_usage>0.50</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>

-------------------------------

TASK INFORMATION
-----------------
ASTROPULSE
-----------------

CURRENT - AstroPulse v6 v6.04 (opencl_nvidia_100)

http://setiathome.berkeley.edu/workunit.php?wuid=1509814447 - FROM 05/29/2014


<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
Running on device number: 1
DATA_CHUNK_UNROLL at default:2
DATA_CHUNK_UNROLL at default:2
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: NVIDIA Corporation
BOINC assigns device 1
Info: BOINC provided device ID used
Used GPU device parameters are:
Number of compute units: 5
Single buffer allocation size: 256MB
max WG size: 1024
FERMI path used: yes

Build features: Non-graphics OpenCL USE_OPENCL_NV COMBINED_DECHIRP_KERNEL FFTW USE_INCREASED_PRECISION USE_SSE2 x86
CPUID: Intel(R) Xeon(R) CPU W3565 @ 3.20GHz

Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
AstroPulse v.6
Non-graphics FFTW USE_CONVERSION_OPT
Windows x86 rev 1316, V6 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2

OpenCL version by Raistmer

ffa threshold mods by Joe Segur
SSE3 dechirping by JDWhale
Combined dechirp kernel by Frizz
Number of OpenCL platforms: 1


OpenCL Platform Name: NVIDIA CUDA
Number of devices: 2
Max compute units: 5
Max work group size: 1024
Max clock frequency: 1137Mhz
Max memory allocation: 536870912
Cache type: Read/Write
Cache line size: 128
Cache size: 81920
Global memory size: 2147483648
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 49152
Queue properties:
Out-of-Order: Yes
Name: GeForce GTX 750 Ti
Vendor: NVIDIA Corporation
Driver version: 337.88
Version: OpenCL 1.1 CUDA
Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing

cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics

cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64
Max compute units: 5
Max work group size: 1024
Max clock frequency: 1137Mhz
Max memory allocation: 536870912
Cache type: Read/Write
Cache line size: 128
Cache size: 81920
Global memory size: 2147483648
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 49152
Queue properties:
Out-of-Order: Yes
Name: GeForce GTX 750 Ti
Vendor: NVIDIA Corporation
Driver version: 337.88
Version: OpenCL 1.1 CUDA
Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing

cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics

cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64

---------------------------------------------------

LUNATICS - AstroPulse v6 Anonymous platform (NVIDIA GPU)

http://setiathome.berkeley.edu/workunit.php?wuid=1503741667 - FROM 05/21/2014

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: NVIDIA Corporation
BOINC assigns device 1
Info: BOINC provided device ID used
Used GPU device parameters are:
Number of compute units: 5
Single buffer allocation size: 256MB
max WG size: 1024
FERMI path used: yes

Build features: Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY COMBINED_DECHIRP_KERNEL FFTW USE_INCREASED_PRECISION USE_SSE2 x86
CPUID: Intel(R) Xeon(R) CPU W3565 @ 3.20GHz

Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2
AstroPulse v6
Non-graphics FFTW USE_CONVERSION_OPT
Windows x86 rev 1843, V6 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2

OpenCL version by Raistmer

ffa threshold mods by Joe Segur
SSE3 dechirping by JDWhale
Combined dechirp kernel by Frizz
Number of OpenCL platforms: 2


OpenCL Platform Name: NVIDIA CUDA
Number of devices: 2
Max compute units: 5
Max work group size: 1024
Max clock frequency: 1137Mhz
Max memory allocation: 536870912
Cache type: Read/Write
Cache line size: 128
Cache size: 81920
Global memory size: 2147483648
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 49152
Queue properties:
Out-of-Order: Yes
Name: GeForce GTX 750 Ti
Vendor: NVIDIA Corporation
Driver version: 332.17
Version: OpenCL 1.1 CUDA
Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing

cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics

cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64
Max compute units: 5
Max work group size: 1024
Max clock frequency: 1137Mhz
Max memory allocation: 536870912
Cache type: Read/Write
Cache line size: 128
Cache size: 81920
Global memory size: 2147483648
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 49152
Queue properties:
Out-of-Order: Yes
Name: GeForce GTX 750 Ti
Vendor: NVIDIA Corporation
Driver version: 332.17
Version: OpenCL 1.1 CUDA
Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing

cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics

cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64


OpenCL Platform Name: NVIDIA CUDA
Number of devices: 2
Max compute units: 5
Max work group size: 1024
Max clock frequency: 1137Mhz
Max memory allocation: 536870912
Cache type: Read/Write
Cache line size: 128
Cache size: 81920
Global memory size: 2147483648
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 49152
Queue properties:
Out-of-Order: Yes
Name: GeForce GTX 750 Ti
Vendor: NVIDIA Corporation
Driver version: 332.17
Version: OpenCL 1.1 CUDA
Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing

cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics

cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64
Max compute units: 5
Max work group size: 1024
Max clock frequency: 1137Mhz
Max memory allocation: 536870912
Cache type: Read/Write
Cache line size: 128
Cache size: 81920
Global memory size: 2147483648
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 49152
Queue properties:
Out-of-Order: Yes
Name: GeForce GTX 750 Ti
Vendor: NVIDIA Corporation
Driver version: 332.17
Version: OpenCL 1.1 CUDA
Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing

cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics

cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64

-------------------------------------------

TASK INFORMATION
-----------------
MULTIBEAM
-----------------

START OF STDERR OUTPUT FOR TASK - http://setiathome.berkeley.edu/workunit.php?wuid=1509186520 - FROM 05/27/2014 A CURRENT TASK

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 2 CUDA device(s):
Device 1: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536
computeCap 5.0, multiProcs 5
pciBusID = 40, pciSlotID = 0
Device 2: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536
computeCap 5.0, multiProcs 5
pciBusID = 15, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 750 Ti is okay
SETI@home using CUDA accelerated device GeForce GTX 750 Ti
mbcuda.cfg, matching pci device processpriority key detected
mbcuda.cfg, matching pci device pfblockspersm key detected
pulsefind: blocks per SM 8
mbcuda.cfg, matching pci device pfperiodsperlaunch key detected
pulsefind: periods per launch 300
Priority of process set to ABOVE_NORMAL successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, Cuda 5.00

-----------------------------------------

START OF STDERR OUTPUT FOR TASK - http://setiathome.berkeley.edu/workunit.php?wuid=1506569983 TASK FROM 05/24/2014

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 2 CUDA device(s):
Device 1: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536
computeCap 5.0, multiProcs 5
pciBusID = 40, pciSlotID = 0
Device 2: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536
computeCap 5.0, multiProcs 5
pciBusID = 15, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 750 Ti is okay
SETI@home using CUDA accelerated device GeForce GTX 750 Ti
mbcuda.cfg, processpriority key detected
mbcuda.cfg, Global pfblockspersm key being used for this device
pulsefind: blocks per SM 8
mbcuda.cfg, Global pfperiodsperlaunch key being used for this device
pulsefind: periods per launch 200
Priority of process set to ABOVE_NORMAL successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, Cuda 5.00

-------------------------------------------

START OF STDERR OUTPUT FOR TASK - http://setiathome.berkeley.edu/workunit.php?wuid=1502654505 - LUNATICS TASK FROM 05/20/2014

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 2 CUDA device(s):
Device 1: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536
computeCap 5.0, multiProcs 5
pciBusID = 40, pciSlotID = 0
Device 2: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536
computeCap 5.0, multiProcs 5
pciBusID = 15, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 750 Ti is okay
SETI@home using CUDA accelerated device GeForce GTX 750 Ti
mbcuda.cfg, processpriority key detected
mbcuda.cfg, Global pfblockspersm key being used for this device
pulsefind: blocks per SM 6
mbcuda.cfg, Global pfperiodsperlaunch key being used for this device
pulsefind: periods per launch 200
Priority of process set to ABOVE_NORMAL successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, Cuda 3.20

-------------------------------------------

START OF STDERR OUTPUT FOR TASK - http://setiathome.berkeley.edu/workunit.php?wuid=1495574194 - LUNATICS TASK FROM 05/09/2014

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 2 CUDA device(s):
Device 1: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536
computeCap 5.0, multiProcs 5
pciBusID = 40, pciSlotID = 0
Device 2: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536
computeCap 5.0, multiProcs 5
pciBusID = 15, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 2
setiathome_CUDA: CUDA Device 2 specified, checking...
Device 2: GeForce GTX 750 Ti is okay
SETI@home using CUDA accelerated device GeForce GTX 750 Ti
pulsefind: blocks per SM 4 (Fermi or newer default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, Cuda 3.20

-------------------------------------------
ID: 1522300 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1522311 - Posted: 29 May 2014, 7:24:42 UTC - in response to Message 1522300.  

...I would like to point out in the astropulse info is a reference to using the "fermi path". Does the current astropulse build not use Kepler?


I think most of the NV OpenCL code is is fairly generic, though not being involved with development of that myself I can;t say for certain. Fermi+ do have special needs, which is probably why there is a special path named that.

For your mbcuda.cfg's in stderrs, looks like except for the oldest stderr you posted, they all picked up. I'll just point out, only in case you didn't realise it, that if you want both devices to use the same settings, then you only need the global ones, so (trimmed):

[mbcuda]
processpriority = abovenormal
pfblockspersm = 8
pfperiodsperlaunch = 300

[bus15slot0]
processpriority = abovenormal
pfblockspersm = 8
pfperiodsperlaunch = 300

[bus40slot0]
processpriority = abovenormal
pfblockspersm = 8
pfperiodsperlaunch = 300


can become simply:

[mbcuda]
processpriority = abovenormal
pfblockspersm = 8
pfperiodsperlaunch = 300

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1522311 · Report as offensive
far_raf

Send message
Joined: 26 Apr 00
Posts: 120
Credit: 47,977,058
RAC: 19
Canada
Message 1522323 - Posted: 29 May 2014, 7:47:57 UTC - in response to Message 1522311.  
Last modified: 29 May 2014, 7:51:03 UTC

Thanks Jason, as I read through the post after the fact I saw that the 1st settings were "global" rendering the specific slot settings redundant. Good eye on your part.

edit - I was applying the shotgun approach to the issue, lmbo.

Regards
Robert
ID: 1522323 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1522567 - Posted: 30 May 2014, 1:29:23 UTC - in response to Message 1522323.  

Just reread this thread. Mike is right, for some reason the Chip is processing much slower than I would expect. Would installing the individual optimization apps for the MB and AP for the chip make a difference you think? His times for the GPUs are good.


Zalster
ID: 1522567 · Report as offensive
far_raf

Send message
Joined: 26 Apr 00
Posts: 120
Credit: 47,977,058
RAC: 19
Canada
Message 1522916 - Posted: 31 May 2014, 6:14:10 UTC

Just a test of whatever the hell is going on, I adjusted my app_config.xml to:

APP_CONFIG.XML

<app_config>
<app>
<name>setiathome_v7</name>
<gpu_versions>
<gpu_usage>0.33</gpu_usage> - old 0.50
<cpu_usage>0.1</cpu_usage> - old 0.04
</gpu_versions>
</app>
<app>
<name>astropulse_v6</name>
<gpu_versions>
<gpu_usage>0.33</gpu_usage> - old 0.50
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>

-------------------------------

So the gpu is running 3 ap tasks now. I would like all to please note that I had called out 1 cpu for the old and new ap tasks in both versions. I did see 7 cpu tasks while there was 4 ap tasks running - this means a cpu core was missing. I assume it was feeding gpu.

The information just keeps flowing.

Regards
Robert
ID: 1522916 · Report as offensive
far_raf

Send message
Joined: 26 Apr 00
Posts: 120
Credit: 47,977,058
RAC: 19
Canada
Message 1522919 - Posted: 31 May 2014, 6:17:15 UTC - in response to Message 1522311.  

Hi Jason, my mbcuda.cfg has been edited down to:

"can become simply:

[mbcuda]
processpriority = abovenormal
pfblockspersm = 8
pfperiodsperlaunch = 300


____________"

Per your observation.

TYVM

Regards
Robert
ID: 1522919 · Report as offensive
far_raf

Send message
Joined: 26 Apr 00
Posts: 120
Credit: 47,977,058
RAC: 19
Canada
Message 1522930 - Posted: 31 May 2014, 6:33:13 UTC
Last modified: 31 May 2014, 6:53:20 UTC

This is just a sidebar - of the over 3.6k people who have viewed this thread - I am a little surprised at the level of no comment. Really, you folks, did you have nothing to say? You read the thread, did you get any useful info from it?

Massive amounts of setup info - still no comment?

Huge amounts of info regarding Lunatics - Still no comment?

Do tell please, just asking for your input.

And I would like you Folks to please note, some very very big names have been on thread - not going to drop names. But if they had comments what is your problem?

Total Respect for All.

Robert

edit: for total number of people who have viewed thread.

edit: Title change
ID: 1522930 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1522982 - Posted: 31 May 2014, 12:27:41 UTC - in response to Message 1522930.  
Last modified: 31 May 2014, 12:29:35 UTC

Hey Robert,

Thanks for that info. I don't think the GPU is the problem. Your times to complete weren't bad for either the MB or the APs on the GPU. I can't do 3 WUs on my GPU b/c I'm running with AMD chips. For me it was a case of diminishing returns after 2 work units. What I was commenting on, and what I think Mike was commenting is that the work done by the CPU itself on work units seem long given your chip. It almost seems as if it not being utilized to it's full advantage. 42 hours for an AP on your Chip seems really long. The MB for 6.7 hours also seems excessive. My AMD chip will do the longest for MB at 3.5 hours. For APs it's 14 hours. My chip isn't as good as yours so why is it crunching faster? I would think your chip should do an AP in 5.5 hours. See what I mean. So what I was wondering if maybe installing the optimization for the MBs and APs that work strictly on the CPU wouldn't improve your times. If it's something you are willing to test then I think we can give you links.


http://mikesworldnet.de/produkte.html

You could look at MB7_SSE2_r171_no_IPP and AP6_CPU_r2083 and see if that speeds up your CPU crunching. I believe these are the correct ones. If I'm wrong, please someone let me know.


On the sidenote part, I think alot of people do get useful info from these threads. But sometimes they worry about putting anything down here for fear of telling someone the wrong thing that could cause issues with that person's computer. Then there is the feedback that sometimes occurs on here. It can be pretty rough, so rather than have to face that, it's safer not to say anything.

Let me know how your test goes and if you decide on testing the CPU modifications. Hope all goes well.

Happy Crunching..

Zalster
ID: 1522982 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1522997 - Posted: 31 May 2014, 14:22:55 UTC
Last modified: 31 May 2014, 14:41:54 UTC

You are linking to my old site.
Some of the apps are already depreciated and have been removed.

Try
http://mikesworldnet.de/home

Evenso he has 64 bit OS so its better to try the r_2202 64 bit SSE 4.1 version.
Its certainly faster on the W3565.


With each crime and every kindness we birth our future.
ID: 1522997 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1523002 - Posted: 31 May 2014, 14:36:43 UTC - in response to Message 1522997.  

sorry about that..
ID: 1523002 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1523004 - Posted: 31 May 2014, 14:47:07 UTC

Hi guy´s

I still belive part of his problems is core starvation, he say is running 8CPU+4GPU WU at the same time. Too much for an 8 core processor specialy if the 4GPU WU where AP´s.
ID: 1523004 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1523009 - Posted: 31 May 2014, 15:06:27 UTC - in response to Message 1523004.  
Last modified: 31 May 2014, 15:08:41 UTC

Hi guy´s

I still belive part of his problems is core starvation, he say is running 8CPU+4GPU WU at the same time. Too much for an 8 core processor specialy if the 4GPU WU where AP´s.


I agree with you Juan, but thats his choice.

I also would suggest to free 2 CPU cores.
But upgrading to the 64 bit apps doesn`t hurt as well.

It would definitely help if people would read the readme files that come with the installer.
It took me quite a while to modify them.


With each crime and every kindness we birth our future.
ID: 1523009 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : New HP Z400 - Lunatics in question


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.