Message boards :
Number crunching :
New HP Z400 - Lunatics in question
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
far_raf Send message Joined: 26 Apr 00 Posts: 120 Credit: 47,977,058 RAC: 19 |
Hi Tbar, I had gone through the install on Lunatics twice, both times I told it to use the cuda 5 or the kepler hardware - I forget the exact question it asks. What you say makes total sense to me regarding the cuda 3.2 / cuda 5.0 issue - my card can do cuda 6. So I remain confused as to why my cuda 6 capable pc would do cuda 3.2 apps. |
far_raf Send message Joined: 26 Apr 00 Posts: 120 Credit: 47,977,058 RAC: 19 |
I agree with Mike as well Juan, I have to find out why the cpu ap tasks are taking so long. Juan, you know I went through the install selecting the correct options after the file fail. I remain confused a bit I find there are other issues that I do not understand - 1. Why does boinc not always read "mbcuda.cfg" when starting a task? 2. Why do I see reports of tasks starting with cuda 3.2 even now? 3. Why do tasks start with a report on start up of using the "fermi path" (on this one I am really puzzled, I do not get it, the cards are cuda 6)? I remain your friend Regards Robert |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
1. Why does boinc not always read "mbcuda.cfg" when starting a task? That you need to ask to Jason´s, who creat the build, but i belive the answer is because the configuration only changes if you change the GPU or something similar. 2. Why do I see reports of tasks starting with cuda 3.2 even now? Because for any unknown reason your host crunch the WU with the wrong build. 3. Why do tasks start with a report on start up of using the "fermi path" (on this one I am really puzzled, I do not get it, the cards are cuda 6)? See the #2 answer, your card is cuda 6 capable, that not means it not work with previus cuda flavors I remain your friend I´m sure about that. About the Lunnatics instalation, when you do did you remember to clear the cuda32 defoult option and allow the cuda50 only? Maybe this is your problem. By defoult the cuda32 is enabled IIRC. If your installation is OK your processed WUs must show cuda50 only, even if your boinc mager could show something diferent. Another way to check that is by looking the WU when crunching with taskmgr they must show Lunatics_x41zc_win32_cuda50.exe on the process bar. About your long time, Mike is right (as allways) your times are extreamely long specialy for your powerful CPU, so i have some questions: Do you run simultanuesly any other CPU intensive program? How many CPU task are you running in paralel of how many GPU WU at the same time on how many GPu´s? It´s hard to belive but you could suffering from core "starvation". Did you follow the GPU clock? does its mantain stable or goes down sometimes? Temperature could have some impact in your host but you allways say the GPU´s are running cold so i belive that is not a player in your case. And one last "stupid" question, did you use any screen saver or power management? Remember the windows defoult power management slow down all after some time to save power. If that´s your case you need to change to the performance mode who don´t do that (in your control panel power options). But keep crunching. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
1. Why does boinc not always read "mbcuda.cfg" when starting a task? *scratching head*. That's not a Boinc file, but an x-branch one. My Windows builds of x41zc, stock and installer, ALL read the mbcuda.cfg file at every task startup. You'll need to show me what you're looking at, if you think it is not doing so. [Edit:] one idea, if you are using stock then there are different mbcuda.cfg txt files per version in the project folder. That's an unfortunate limitation of Boinc I can do nothing about. You should put the same settings in each variant. Running stock could also explain why builds swap around. Boinc doesn't do time estimates all that well (part of the reason it screws up credit too). That means sometimes it can get confused and stuck on the wrong build. The current (custom) Cuda 6 build I'm running isn't any faster for Kepler GPUs than the Cuda 5 variant, so not worth releasing for the sakes of a number. That's mostly because I'm bogged down with Boinc problems and making some special tools to use all the new Cuda 6 features. Probably once some issues are resolved application development will be much faster (fingers crossed). "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
far_raf Send message Joined: 26 Apr 00 Posts: 120 Credit: 47,977,058 RAC: 19 |
1. Why does boinc not always read "mbcuda.cfg" when starting a task? --> I will. Regards Robert |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Its 4:20AM here so please forgive my even worst english. 2. Why do I see reports of tasks starting with cuda 3.2 even now? Could be a good ideia, somethig could be corrupted an it´s hard t tell at distance. --> All current tasks are mb 7.00 or v7 cuda 50. That´s is what we expect. Do you run simultanuesly any other CPU intensive program? How many CPU task are you running in paralel of how many GPU WU at the same time on how many GPu´s? It´s hard to belive but you could suffering from core "starvation". That could be one of the answers why your crunching times are so high, you actualy can´t run all cores and 2 tasks on each GPU at the same time, you need to keep at least one core free to feed the GPU + 1 aditional core for each AP GPU WU you run. So your actual limit (with the builds you are ussing) is 7CPU WU (less 1 per GPU AP running) for example if your host is crunching 2 GPU AP WU, your limit will be 5 CPU taks. Use all 8 cores + 4GPU WU´s produces what i call "core starvation" your CPU don´t have any core avaiable to constantly feed your hungry GPU´s then your entire host slow down. To fix that, release one core to feed the GPU´s and use app_config.xml file to manage to release one aditional core when GPU AP (CPU_usage) where crunching and your times must down to normal crunching times. This is not an Lunatics or stock issue, it´s the normal way the older AP crunching code works, it needs one core to feed each GPU task plus one. Some could say one core per GPU AP is enought and you don´t neet to leave the aditional core for all, i don´t run that version anymore but when i run, at least on my hosts, i need to leave the aditional core but my I5 are a lot less capable than your powerfull Xeon CPU and my 690 are an hungry beasts, so test is need on your particular host to see what is the best, as allways. More technical explanation is above my knowledge but sure others could explain that if you ask for. There are some new builds that works arround and fix this limitation, but that is for another time, first let us make your host work as expected. Maybe others like Mike, Zalster, etc. could give us some other clues or configuration ideias. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Hey Robert and Juan, Sorry been busy since coming back from the long weekend. Ok, my guess with the cuda 32s are that he had an allotment of them from the testing from Seti to determine which cuda was going to work the best with his GPU prior to installation of the Lunatics. After he installed the Lunatics, the computer had to finish those before it started onto the cuda 50s. (again my best guess) If we are going for a clean install, then set Boinc task manager to No new task for the project and finish out what he has. Also, after the new install. Make sure to suspend the project or set No new task before you do the Lunatics. Otherwise, Seti is going to download a whole bunch of work before he can even do the installation of the lunatics. Maybe a better way would be to set the default on your account to no work so when you do the new install nothing is downloaded at first. After you do all the installations, you could change location of the computer to a profile that will allow it to download work after you are done. (seems like a lot to do but would ensure that only those cudas you want are download otherwise you have to work those the others) If you plan on going for the clean install. Otherwise since it now appears you have the correct versions of which cudas (50s) and MB, we turn our attention to how many are crunching. Only 1 of my computers is an intel. It's a Quad Core with a single GPU. For that system I figured out that I could run 2 APs with 2 CPU cores supporting them on that single GPU (750ti). That left 2 cores to crunch on their own. For MB, I figured 2 MB with 0.6 core support (or 1 mb with 0.3 core) This configuration is overkill for the MB. When we originally talked, you installed a config file that specified 0.04 core per MB. My number is just easier for me, I'll explain below. Either way it doesn't affect that 3 cores are left unused. Seti will uses those 3 unused to crunch and even that 4th core with only 0.4 available with crunch. Sounds counterintuitive doesn't it. But I think of it this way. 2 (0.3) cores are floated between all the 4 cores and at some point each core does some computing for the GPU. Now, this doesn't happen with the APs since they require a complete core, so seti actually gives them their own Core to crunch with, rather than share time on another cores. so in short AP= 0.5 Gpu + 1 CPU and 2 APs per GPU MB= 0.5 GPU + 0.3 CPU and 2 MB per GPU Why 0.3 CPU for the MBs? My other systems. They use AMD 8 core chips. After experimenting with different settings, I finally found that for mine, 0.3 CPU gave the best result for MB crunching. So that 0.3 carried over to the intel as I could easily remember what the ratio was if I ever needed to start from scratch. For my AMD systems I only do 1 AP per each GPU. 1 AP = 1 GPU + 1 CPU For all these different rations I use app_config.xml. Have you modified yours since we last talked? What does your config say? For your system, you have an 8 core with 2 GPUs correct? How many APs are crunching on the Intel chip and how many are crunching on each card? How many MBs are crunched on the chip and how many on the GPUs? Going by the above configuration, You should at least crunch 4 APs on the 8 core and 2 APs on each GPU for 4 APs, a total of 8 at a time. 4+ (2x2) = 8 APs For the MBs, 2 per GPUs is 4 MBs. 4 x 0.04=0.16 So you should have 8 Cores (technically 7.84) free to crunch MBs so 12 MBs at the same time. (2x2) +8 = 12 MBs Is this close? After that we need to look at how long it takes to complete each work unit. APs on my Intel take 1 hour 20 minutes to complete on the GPU. MBs take 23 minutes on the GPU. What are your times looking like? If they are taking longer than that we might need to look at giving more of the core to the GPU to crunch. (I played with the ratios for a bit before deciding on what worked best for my system) Sorry for the long reply. Zalster |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
That´s exactly why i think is happening, 8CPU+4GPU Wu are too much even for his xeon CPU (i wish i have some here - LOL), he need to free some cores and test what is his host best configuration. My guess 4CPU+4GPU AP thru 6CPU+4GPU MB. |
far_raf Send message Joined: 26 Apr 00 Posts: 120 Credit: 47,977,058 RAC: 19 |
This post is information for Jason, Juan and Zalster. This is going to be a super long post, sorry. 1. Why does boinc not always read "mbcuda.cfg" when starting a task? Hi Jason, what follows is a large collection of information from my pending/inconclusive lists on the boinc website. I would like to point out in the astropulse info is a reference to using the "fermi path". Does the current astropulse build not use Kepler? In any event here is the information you all asked for. CONFIGURATION INFORMATION ------------------------- STOCK MBCUDA.CFG ;;; This configuration file is for optional control of Cuda Multibeam x41zc ;;; Currently, the available options are for ;;; application process priority control (without external tools), and ;;; per gpu priority control (useful for multiple Cuda GPU systems) ;[mbcuda] ;;;;; Global applications settings, to apply to all Cuda devices ;;; You can uncomment the processpriority line below, by removing the ';', to engage machine global priority control of x41x ;;; possible options are 'belownormal' (which is the default), 'normal', 'abovenormal', or 'high' ;;; For dedicated crunching machines, 'abovenormal' is recommended ;;; raising global application priorities above the default ;;; may have system dependant usability effects, and can have positive or negative effects on overall throughput ;processpriority = high ;;; Pulsefinding: Advanced options for long pulsefinds (affect display usability & long kernel runs) ;;; defaults are conservative. ;;; WARNING: Excessive values may induce display lag, driver timeout & recovery, or errors. ;;; pulsefinding blocks per multiprocessor (1-16), default is 1 for Pre-Fermi, 4 for Fermi or newer GPUs ;pfblockspersm = 8 ;;; pulsefinding maximum periods per kernel launch (1-1000), default is 100, as per 6.09 ;pfperiodsperlaunch = 200 ;[bus1slot0] ;;; Optional GPU specifc control (requires Cuda 3.2 or newer app), example ;processpriority = abovenormal ;pfblockspersm = 8 ;pfperiodsperlaunch = 200 --------------------------- MY MODIFIED MBCUDA.CFG ;;; This configuration file is for optional control of Cuda Multibeam x41zc ;;; Currently, the available options are for ;;; application process priority control (without external tools), and ;;; per gpu priority control (useful for multiple Cuda GPU systems) [mbcuda] ;;;;; Global applications settings, to apply to all Cuda devices ;;; You can uncomment the processpriority line below, by removing the ';', to engage machine global priority control of x41x ;;; possible options are 'belownormal' (which is the default), 'normal', 'abovenormal', or 'high' ;;; For dedicated crunching machines, 'abovenormal' is recommended ;;; raising global application priorities above the default ;;; may have system dependant usability effects, and can have positive or negative effects on overall throughput processpriority = high ;;; Pulsefinding: Advanced options for long pulsefinds (affect display usability & long kernel runs) ;;; defaults are conservative. ;;; WARNING: Excessive values may induce display lag, driver timeout & recovery, or errors. ;;; pulsefinding blocks per multiprocessor (1-16), default is 1 for Pre-Fermi, 4 for Fermi or newer GPUs pfblockspersm = 8 ;;; pulsefinding maximum periods per kernel launch (1-1000), default is 100, as per 6.09 pfperiodsperlaunch = 300 [bus15slot0] ;;; Optional GPU specifc control (requires Cuda 3.2 or newer app), example processpriority = abovenormal pfblockspersm = 8 pfperiodsperlaunch = 300 [bus40slot0] ;;; Optional GPU specifc control (requires Cuda 3.2 or newer app), example processpriority = abovenormal pfblockspersm = 8 pfperiodsperlaunch = 300 --------------------------- MY MODIFIED MBCUDA-7.00-CUDA50.CFG ;;; This configuration file is for optional control of Cuda Multibeam x41zc ;;; Currently, the available options are for ;;; application process priority control (without external tools), and ;;; per gpu priority control (useful for multiple Cuda GPU systems) [mbcuda] ;;;;; Global applications settings, to apply to all Cuda devices ;;; You can uncomment the processpriority line below, by removing the ';', to engage machine global priority control of x41x ;;; possible options are 'belownormal' (which is the default), 'normal', 'abovenormal', or 'high' ;;; For dedicated crunching machines, 'abovenormal' is recommended ;;; raising global application priorities above the default ;;; may have system dependant usability effects, and can have positive or negative effects on overall throughput processpriority = abovenormal ;;; Pulsefinding: Advanced options for long pulsefinds (affect display usability & long kernel runs) ;;; defaults are conservative. ;;; WARNING: Excessive values may induce display lag, driver timeout & recovery, or errors. ;;; pulsefinding blocks per multiprocessor (1-16), default is 1 for Pre-Fermi, 4 for Fermi or newer GPUs pfblockspersm = 8 ;;; pulsefinding maximum periods per kernel launch (1-1000), default is 100, as per 6.09 pfperiodsperlaunch = 300 [bus15slot0] ;;; Optional GPU specifc control (requires Cuda 3.2 or newer app), example processpriority = abovenormal pfblockspersm = 8 pfperiodsperlaunch = 300 [bus40slot0] ;;; Optional GPU specifc control (requires Cuda 3.2 or newer app), example processpriority = abovenormal pfblockspersm = 8 pfperiodsperlaunch = 300 ---------------------------- APP_CONFIG.XML <app_config> <app> <name>setiathome_v7</name> <gpu_versions> <gpu_usage>0.50</gpu_usage> <cpu_usage>0.04</cpu_usage> </gpu_versions> </app> <app> <name>astropulse_v6</name> <gpu_versions> <gpu_usage>0.50</gpu_usage> <cpu_usage>1</cpu_usage> </gpu_versions> </app> </app_config> ------------------------------- TASK INFORMATION ----------------- ASTROPULSE ----------------- CURRENT - AstroPulse v6 v6.04 (opencl_nvidia_100) http://setiathome.berkeley.edu/workunit.php?wuid=1509814447 - FROM 05/29/2014 <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> Running on device number: 1 DATA_CHUNK_UNROLL at default:2 DATA_CHUNK_UNROLL at default:2 Priority of worker thread raised successfully Priority of process adjusted successfully, below normal priority class used OpenCL platform detected: NVIDIA Corporation BOINC assigns device 1 Info: BOINC provided device ID used Used GPU device parameters are: Number of compute units: 5 Single buffer allocation size: 256MB max WG size: 1024 FERMI path used: yes Build features: Non-graphics OpenCL USE_OPENCL_NV COMBINED_DECHIRP_KERNEL FFTW USE_INCREASED_PRECISION USE_SSE2 x86 CPUID: Intel(R) Xeon(R) CPU W3565 @ 3.20GHz Cache: L1=64K L2=256K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 AstroPulse v.6 Non-graphics FFTW USE_CONVERSION_OPT Windows x86 rev 1316, V6 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2 OpenCL version by Raistmer ffa threshold mods by Joe Segur SSE3 dechirping by JDWhale Combined dechirp kernel by Frizz Number of OpenCL platforms: 1 OpenCL Platform Name: NVIDIA CUDA Number of devices: 2 Max compute units: 5 Max work group size: 1024 Max clock frequency: 1137Mhz Max memory allocation: 536870912 Cache type: Read/Write Cache line size: 128 Cache size: 81920 Global memory size: 2147483648 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: Yes Name: GeForce GTX 750 Ti Vendor: NVIDIA Corporation Driver version: 337.88 Version: OpenCL 1.1 CUDA Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 Max compute units: 5 Max work group size: 1024 Max clock frequency: 1137Mhz Max memory allocation: 536870912 Cache type: Read/Write Cache line size: 128 Cache size: 81920 Global memory size: 2147483648 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: Yes Name: GeForce GTX 750 Ti Vendor: NVIDIA Corporation Driver version: 337.88 Version: OpenCL 1.1 CUDA Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 --------------------------------------------------- LUNATICS - AstroPulse v6 Anonymous platform (NVIDIA GPU) http://setiathome.berkeley.edu/workunit.php?wuid=1503741667 - FROM 05/21/2014 <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> Running on device number: 1 Priority of worker thread raised successfully Priority of process adjusted successfully, below normal priority class used OpenCL platform detected: NVIDIA Corporation BOINC assigns device 1 Info: BOINC provided device ID used Used GPU device parameters are: Number of compute units: 5 Single buffer allocation size: 256MB max WG size: 1024 FERMI path used: yes Build features: Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY COMBINED_DECHIRP_KERNEL FFTW USE_INCREASED_PRECISION USE_SSE2 x86 CPUID: Intel(R) Xeon(R) CPU W3565 @ 3.20GHz Cache: L1=64K L2=256K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AstroPulse v6 Non-graphics FFTW USE_CONVERSION_OPT Windows x86 rev 1843, V6 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2 OpenCL version by Raistmer ffa threshold mods by Joe Segur SSE3 dechirping by JDWhale Combined dechirp kernel by Frizz Number of OpenCL platforms: 2 OpenCL Platform Name: NVIDIA CUDA Number of devices: 2 Max compute units: 5 Max work group size: 1024 Max clock frequency: 1137Mhz Max memory allocation: 536870912 Cache type: Read/Write Cache line size: 128 Cache size: 81920 Global memory size: 2147483648 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: Yes Name: GeForce GTX 750 Ti Vendor: NVIDIA Corporation Driver version: 332.17 Version: OpenCL 1.1 CUDA Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 Max compute units: 5 Max work group size: 1024 Max clock frequency: 1137Mhz Max memory allocation: 536870912 Cache type: Read/Write Cache line size: 128 Cache size: 81920 Global memory size: 2147483648 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: Yes Name: GeForce GTX 750 Ti Vendor: NVIDIA Corporation Driver version: 332.17 Version: OpenCL 1.1 CUDA Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 OpenCL Platform Name: NVIDIA CUDA Number of devices: 2 Max compute units: 5 Max work group size: 1024 Max clock frequency: 1137Mhz Max memory allocation: 536870912 Cache type: Read/Write Cache line size: 128 Cache size: 81920 Global memory size: 2147483648 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: Yes Name: GeForce GTX 750 Ti Vendor: NVIDIA Corporation Driver version: 332.17 Version: OpenCL 1.1 CUDA Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 Max compute units: 5 Max work group size: 1024 Max clock frequency: 1137Mhz Max memory allocation: 536870912 Cache type: Read/Write Cache line size: 128 Cache size: 81920 Global memory size: 2147483648 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: Yes Name: GeForce GTX 750 Ti Vendor: NVIDIA Corporation Driver version: 332.17 Version: OpenCL 1.1 CUDA Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 ------------------------------------------- TASK INFORMATION ----------------- MULTIBEAM ----------------- START OF STDERR OUTPUT FOR TASK - http://setiathome.berkeley.edu/workunit.php?wuid=1509186520 - FROM 05/27/2014 A CURRENT TASK <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> setiathome_CUDA: Found 2 CUDA device(s): Device 1: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 pciBusID = 40, pciSlotID = 0 Device 2: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 pciBusID = 15, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 750 Ti is okay SETI@home using CUDA accelerated device GeForce GTX 750 Ti mbcuda.cfg, matching pci device processpriority key detected mbcuda.cfg, matching pci device pfblockspersm key detected pulsefind: blocks per SM 8 mbcuda.cfg, matching pci device pfperiodsperlaunch key detected pulsefind: periods per launch 300 Priority of process set to ABOVE_NORMAL successfully Priority of worker thread set successfully setiathome enhanced x41zc, Cuda 5.00 ----------------------------------------- START OF STDERR OUTPUT FOR TASK - http://setiathome.berkeley.edu/workunit.php?wuid=1506569983 TASK FROM 05/24/2014 <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> setiathome_CUDA: Found 2 CUDA device(s): Device 1: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 pciBusID = 40, pciSlotID = 0 Device 2: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 pciBusID = 15, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 750 Ti is okay SETI@home using CUDA accelerated device GeForce GTX 750 Ti mbcuda.cfg, processpriority key detected mbcuda.cfg, Global pfblockspersm key being used for this device pulsefind: blocks per SM 8 mbcuda.cfg, Global pfperiodsperlaunch key being used for this device pulsefind: periods per launch 200 Priority of process set to ABOVE_NORMAL successfully Priority of worker thread set successfully setiathome enhanced x41zc, Cuda 5.00 ------------------------------------------- START OF STDERR OUTPUT FOR TASK - http://setiathome.berkeley.edu/workunit.php?wuid=1502654505 - LUNATICS TASK FROM 05/20/2014 <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> setiathome_CUDA: Found 2 CUDA device(s): Device 1: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 pciBusID = 40, pciSlotID = 0 Device 2: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 pciBusID = 15, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 750 Ti is okay SETI@home using CUDA accelerated device GeForce GTX 750 Ti mbcuda.cfg, processpriority key detected mbcuda.cfg, Global pfblockspersm key being used for this device pulsefind: blocks per SM 6 mbcuda.cfg, Global pfperiodsperlaunch key being used for this device pulsefind: periods per launch 200 Priority of process set to ABOVE_NORMAL successfully Priority of worker thread set successfully setiathome enhanced x41zc, Cuda 3.20 ------------------------------------------- START OF STDERR OUTPUT FOR TASK - http://setiathome.berkeley.edu/workunit.php?wuid=1495574194 - LUNATICS TASK FROM 05/09/2014 <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> setiathome_CUDA: Found 2 CUDA device(s): Device 1: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 pciBusID = 40, pciSlotID = 0 Device 2: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 pciBusID = 15, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 2 setiathome_CUDA: CUDA Device 2 specified, checking... Device 2: GeForce GTX 750 Ti is okay SETI@home using CUDA accelerated device GeForce GTX 750 Ti pulsefind: blocks per SM 4 (Fermi or newer default) pulsefind: periods per launch 100 (default) Priority of process set to BELOW_NORMAL (default) successfully Priority of worker thread set successfully setiathome enhanced x41zc, Cuda 3.20 ------------------------------------------- |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
...I would like to point out in the astropulse info is a reference to using the "fermi path". Does the current astropulse build not use Kepler? I think most of the NV OpenCL code is is fairly generic, though not being involved with development of that myself I can;t say for certain. Fermi+ do have special needs, which is probably why there is a special path named that. For your mbcuda.cfg's in stderrs, looks like except for the oldest stderr you posted, they all picked up. I'll just point out, only in case you didn't realise it, that if you want both devices to use the same settings, then you only need the global ones, so (trimmed): [mbcuda] can become simply: [mbcuda] "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
far_raf Send message Joined: 26 Apr 00 Posts: 120 Credit: 47,977,058 RAC: 19 |
Thanks Jason, as I read through the post after the fact I saw that the 1st settings were "global" rendering the specific slot settings redundant. Good eye on your part. edit - I was applying the shotgun approach to the issue, lmbo. Regards Robert |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Just reread this thread. Mike is right, for some reason the Chip is processing much slower than I would expect. Would installing the individual optimization apps for the MB and AP for the chip make a difference you think? His times for the GPUs are good. Zalster |
far_raf Send message Joined: 26 Apr 00 Posts: 120 Credit: 47,977,058 RAC: 19 |
Just a test of whatever the hell is going on, I adjusted my app_config.xml to: APP_CONFIG.XML <app_config> <app> <name>setiathome_v7</name> <gpu_versions> <gpu_usage>0.33</gpu_usage> - old 0.50 <cpu_usage>0.1</cpu_usage> - old 0.04 </gpu_versions> </app> <app> <name>astropulse_v6</name> <gpu_versions> <gpu_usage>0.33</gpu_usage> - old 0.50 <cpu_usage>1</cpu_usage> </gpu_versions> </app> </app_config> ------------------------------- So the gpu is running 3 ap tasks now. I would like all to please note that I had called out 1 cpu for the old and new ap tasks in both versions. I did see 7 cpu tasks while there was 4 ap tasks running - this means a cpu core was missing. I assume it was feeding gpu. The information just keeps flowing. Regards Robert |
far_raf Send message Joined: 26 Apr 00 Posts: 120 Credit: 47,977,058 RAC: 19 |
Hi Jason, my mbcuda.cfg has been edited down to: "can become simply: [mbcuda] processpriority = abovenormal pfblockspersm = 8 pfperiodsperlaunch = 300 ____________" Per your observation. TYVM Regards Robert |
far_raf Send message Joined: 26 Apr 00 Posts: 120 Credit: 47,977,058 RAC: 19 |
This is just a sidebar - of the over 3.6k people who have viewed this thread - I am a little surprised at the level of no comment. Really, you folks, did you have nothing to say? You read the thread, did you get any useful info from it? Massive amounts of setup info - still no comment? Huge amounts of info regarding Lunatics - Still no comment? Do tell please, just asking for your input. And I would like you Folks to please note, some very very big names have been on thread - not going to drop names. But if they had comments what is your problem? Total Respect for All. Robert edit: for total number of people who have viewed thread. edit: Title change |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Hey Robert, Thanks for that info. I don't think the GPU is the problem. Your times to complete weren't bad for either the MB or the APs on the GPU. I can't do 3 WUs on my GPU b/c I'm running with AMD chips. For me it was a case of diminishing returns after 2 work units. What I was commenting on, and what I think Mike was commenting is that the work done by the CPU itself on work units seem long given your chip. It almost seems as if it not being utilized to it's full advantage. 42 hours for an AP on your Chip seems really long. The MB for 6.7 hours also seems excessive. My AMD chip will do the longest for MB at 3.5 hours. For APs it's 14 hours. My chip isn't as good as yours so why is it crunching faster? I would think your chip should do an AP in 5.5 hours. See what I mean. So what I was wondering if maybe installing the optimization for the MBs and APs that work strictly on the CPU wouldn't improve your times. If it's something you are willing to test then I think we can give you links. http://mikesworldnet.de/produkte.html You could look at MB7_SSE2_r171_no_IPP and AP6_CPU_r2083 and see if that speeds up your CPU crunching. I believe these are the correct ones. If I'm wrong, please someone let me know. On the sidenote part, I think alot of people do get useful info from these threads. But sometimes they worry about putting anything down here for fear of telling someone the wrong thing that could cause issues with that person's computer. Then there is the feedback that sometimes occurs on here. It can be pretty rough, so rather than have to face that, it's safer not to say anything. Let me know how your test goes and if you decide on testing the CPU modifications. Hope all goes well. Happy Crunching.. Zalster |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
You are linking to my old site. Some of the apps are already depreciated and have been removed. Try http://mikesworldnet.de/home Evenso he has 64 bit OS so its better to try the r_2202 64 bit SSE 4.1 version. Its certainly faster on the W3565. With each crime and every kindness we birth our future. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
sorry about that.. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Hi guy´s I still belive part of his problems is core starvation, he say is running 8CPU+4GPU WU at the same time. Too much for an 8 core processor specialy if the 4GPU WU where AP´s. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Hi guy´s I agree with you Juan, but thats his choice. I also would suggest to free 2 CPU cores. But upgrading to the 64 bit apps doesn`t hurt as well. It would definitely help if people would read the readme files that come with the installer. It took me quite a while to modify them. With each crime and every kindness we birth our future. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.