Message boards :
Number crunching :
Problem with multi GPUs
Message board moderation
Author | Message |
---|---|
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
Re-imaged the i7/950 machine today, but there seems to be a problem using both GPUs (2 x EVGA GTX660SC @ 2Gb). Even though cc_config specifies to use all co-processors and the app_info is set with a count of .5, only 1 GPU is being used. Need help -- any ideas? 08/23/2013 19:54:02 | | Starting BOINC client version 7.0.64 for windows_x86_64 08/23/2013 19:54:02 | | log flags: file_xfer, sched_ops, task, coproc_debug, cpu_sched, dcf_debug 08/23/2013 19:54:02 | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6 08/23/2013 19:54:02 | | Data directory: D:\BOINC 08/23/2013 19:54:02 | | Running under account Cliff Harding 08/23/2013 19:54:02 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz [Family 6 Model 26 Stepping 5] 08/23/2013 19:54:02 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt syscall nx lm vmx tm2 pbe 08/23/2013 19:54:02 | | OS: Microsoft Windows 7: Ultimate x64 Edition, Service Pack 1, (06.01.7601.00) 08/23/2013 19:54:02 | | Memory: 5.99 GB physical, 6.00 GB virtual 08/23/2013 19:54:02 | | Disk: 298.09 GB total, 278.51 GB free 08/23/2013 19:54:02 | | Local time is UTC -4 hours 08/23/2013 19:54:02 | | CUDA: NVIDIA GPU 0: GeForce GTX 660 (driver version 326.80, CUDA version 5.50, compute capability 3.0, 2048MB, 1964MB available, 2132 GFLOPS peak) 08/23/2013 19:54:02 | | CUDA: NVIDIA GPU 1: GeForce GTX 660 (driver version 326.80, CUDA version 5.50, compute capability 3.0, 2048MB, 1897MB available, 2132 GFLOPS peak) 08/23/2013 19:54:02 | | OpenCL: NVIDIA GPU 0: GeForce GTX 660 (driver version 326.80, device version OpenCL 1.1 CUDA, 2048MB, 1964MB available, 2132 GFLOPS peak) 08/23/2013 19:54:02 | | OpenCL: NVIDIA GPU 1: GeForce GTX 660 (driver version 326.80, device version OpenCL 1.1 CUDA, 2048MB, 1897MB available, 2132 GFLOPS peak) 08/23/2013 19:54:02 | | NVIDIA library reports 2 GPUs 08/23/2013 19:54:02 | | No ATI library found. 08/23/2013 19:54:02 | Milkyway@Home | Found app_info.xml; using anonymous platform 08/23/2013 19:54:02 | SETI@home | Found app_info.xml; using anonymous platform 08/23/2013 19:54:02 | | Config: report completed tasks immediately 08/23/2013 19:54:02 | | Config: use all coprocessors 08/23/2013 19:54:02 | | Config: GUI RPCs allowed from: 08/23/2013 19:54:02 | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 491386; resource share 0 08/23/2013 19:54:02 | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 5501972; resource share 100 08/23/2013 19:54:02 | SETI@home | General prefs: from SETI@home (last modified 18-Aug-2013 12:00:17) 08/23/2013 19:54:02 | SETI@home | Host location: none 08/23/2013 19:54:02 | SETI@home | General prefs: using your defaults 08/23/2013 19:54:02 | | Reading preferences override file 08/23/2013 19:54:02 | | Preferences: 08/23/2013 19:54:02 | | max memory usage when active: 5520.42MB 08/23/2013 19:54:02 | | max memory usage when idle: 6133.80MB 08/23/2013 19:54:02 | | max disk usage: 200.00GB 08/23/2013 19:54:02 | | max CPUs used: 6 08/23/2013 19:54:02 | | (to change preferences, visit a project web site or select Preferences in the Manager) 08/23/2013 19:54:02 | | Not using a proxy 08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 0 to ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0 08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 0 to ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2 08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0 08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1 08/23/2013 19:54:03 | SETI@home | Restarting task ap_22no08ac_B6_P1_00288_20130807_02611.wu_1 using astropulse_v6 version 601 in slot 5 08/23/2013 19:54:03 | SETI@home | Restarting task ap_24mr08ab_B0_P0_00054_20130807_27141.wu_0 using astropulse_v6 version 601 in slot 4 08/23/2013 19:54:03 | SETI@home | Restarting task ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0 using astropulse_v6 version 604 (cuda_opencl_100) in slot 2 08/23/2013 19:54:03 | SETI@home | Restarting task ap_19fe09af_B5_P0_00377_20130713_29783.wu_2 using astropulse_v6 version 601 in slot 3 08/23/2013 19:54:03 | SETI@home | Restarting task ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2 using astropulse_v6 version 604 (cuda_opencl_100) in slot 1 08/23/2013 19:54:03 | SETI@home | Restarting task ap_19se08ac_B0_P1_00061_20130807_29017.wu_0 using astropulse_v6 version 601 in slot 7 08/23/2013 19:54:03 | SETI@home | Restarting task ap_18au08af_B6_P1_00336_20130807_28850.wu_0 using astropulse_v6 version 601 in slot 8 08/23/2013 19:54:03 | SETI@home | Restarting task ap_19au08aa_B1_P1_00180_20130807_07789.wu_0 using astropulse_v6 version 601 in slot 9 08/23/2013 19:55:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0 08/23/2013 19:55:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2 08/23/2013 19:55:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0 08/23/2013 19:55:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1 08/23/2013 19:56:03 | SETI@home | Computation for task ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2 finished 08/23/2013 19:56:03 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0 08/23/2013 19:56:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 0 to ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0 08/23/2013 19:56:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1 08/23/2013 19:56:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_08oc08aa_B3_P1_00296_20130818_23672.wu_3 08/23/2013 19:56:03 | SETI@home | Restarting task ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0 using astropulse_v6 version 604 (cuda_opencl_100) in slot 6 08/23/2013 19:56:06 | SETI@home | Started upload of ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2_0 08/23/2013 19:56:10 | SETI@home | Finished upload of ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2_0 08/23/2013 19:56:10 | SETI@home | Sending scheduler request: To report completed tasks. 08/23/2013 19:56:10 | SETI@home | Reporting 1 completed tasks 08/23/2013 19:56:10 | SETI@home | Requesting new tasks for CPU and NVIDIA 08/23/2013 19:56:14 | SETI@home | Scheduler request completed: got 0 new tasks 08/23/2013 19:56:14 | SETI@home | No tasks sent 08/23/2013 19:56:14 | SETI@home | No tasks are available for AstroPulse v6 08/23/2013 19:56:14 | SETI@home | No tasks are available for the applications you have selected. 08/23/2013 19:56:14 | SETI@home | Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them 08/23/2013 19:57:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0 08/23/2013 19:57:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0 08/23/2013 19:57:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1 08/23/2013 19:57:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_08oc08aa_B3_P1_00296_20130818_23672.wu_3 08/23/2013 19:58:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0 08/23/2013 19:58:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0 08/23/2013 19:58:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1 08/23/2013 19:58:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_08oc08aa_B3_P1_00296_20130818_23672.wu_3 08/23/2013 19:59:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0 08/23/2013 19:59:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0 08/23/2013 19:59:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1 08/23/2013 19:59:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_08oc08aa_B3_P1_00296_20130818_23672.wu_3 I don't buy computers, I build them!! |
spitfire_mk_2 Send message Joined: 14 Apr 00 Posts: 563 Credit: 27,306,885 RAC: 0 |
in case your cc_config is messed up: http://www.overclock.net/t/827904/how-to-multi-gpus-on-boinc |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
This is the cc_config that I have been using for several years. The only thing that has changed since the system drive died is when I set/reset the log flags. This problem only occurred since todays re-image of the system. BOINC & Lunatics are clean installs with the new appl_info being overlayed by the old one before the system problem. <cc_config> <log_flags> <task>1</task> <task_debug>0</task_debug> <file_xfer>1</file_xfer> <sched_ops>1</sched_ops> <coproc_debug>0</coproc_debug> <cpu_sched>1</cpu_sched> <cpu_sched_debug>0</cpu_sched_debug> <dcf_debug>1</dcf_debug> <sched_op_debug>0</sched_op_debug> <state_debug>0</state_debug> <http_debug>0</http_debug> <http_xfer_debug>0</http_xfer_debug> <work_fetch_debug>0</work_fetch_debug> <rr_simulation>0</rr_simulation> </log_flags> <options> <use_all_gpus>1</use_all_gpus> <max_tasks_reported>250</max_tasks_reported> <report_results_immediately>1</report_results_immediately> <http_transfer_timeout>3500</http_transfer_timeout> <max_file_xfers_per_project>10</max_file_xfers_per_project> </options> </cc_config> I don't buy computers, I build them!! |
spitfire_mk_2 Send message Joined: 14 Apr 00 Posts: 563 Credit: 27,306,885 RAC: 0 |
I would suggest making a simple cc_config to see if it works. If simple one works, then, I would guess, that your complicated one is the problem. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Try this simple one: <cc_config> <options> <use_all_gpus>1</use_all_gpus> </options> </cc_config> look if it works, and if not post your first 20 lines of the initial log with it. remember totaly exit the boinc before try (not just the boincmgr) to be sure it´s working fine |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
Try this simple one: I created a cc_config.xml using your example, still no joy. Thus Ii think my original cc_config.xml is still valid. I even re-installed Lunatics to see helped. 08/23/2013 23:48:21 | | Starting BOINC client version 7.0.64 for windows_x86_64 08/23/2013 23:48:21 | | log flags: file_xfer, sched_ops, task 08/23/2013 23:48:21 | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6 08/23/2013 23:48:21 | | Data directory: D:\BOINC 08/23/2013 23:48:21 | | Running under account Cliff Harding 08/23/2013 23:48:21 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz [Family 6 Model 26 Stepping 5] 08/23/2013 23:48:21 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt syscall nx lm vmx tm2 pbe 08/23/2013 23:48:21 | | OS: Microsoft Windows 7: Ultimate x64 Edition, Service Pack 1, (06.01.7601.00) 08/23/2013 23:48:21 | | Memory: 5.99 GB physical, 6.00 GB virtual 08/23/2013 23:48:21 | | Disk: 298.09 GB total, 278.37 GB free 08/23/2013 23:48:21 | | Local time is UTC -4 hours 08/23/2013 23:48:21 | | CUDA: NVIDIA GPU 0: GeForce GTX 660 (driver version 326.80, CUDA version 5.50, compute capability 3.0, 2048MB, 1964MB available, 2132 GFLOPS peak) 08/23/2013 23:48:21 | | CUDA: NVIDIA GPU 1: GeForce GTX 660 (driver version 326.80, CUDA version 5.50, compute capability 3.0, 2048MB, 1873MB available, 2132 GFLOPS peak) 08/23/2013 23:48:21 | | OpenCL: NVIDIA GPU 0: GeForce GTX 660 (driver version 326.80, device version OpenCL 1.1 CUDA, 2048MB, 1964MB available, 2132 GFLOPS peak) 08/23/2013 23:48:21 | | OpenCL: NVIDIA GPU 1: GeForce GTX 660 (driver version 326.80, device version OpenCL 1.1 CUDA, 2048MB, 1873MB available, 2132 GFLOPS peak) 08/23/2013 23:48:21 | Milkyway@Home | Found app_info.xml; using anonymous platform 08/23/2013 23:48:21 | SETI@home | Found app_info.xml; using anonymous platform 08/23/2013 23:48:21 | | Config: use all coprocessors 08/23/2013 23:48:21 | | Config: GUI RPCs allowed from: 08/23/2013 23:48:21 | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 491386; resource share 0 08/23/2013 23:48:21 | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 5501972; resource share 100 08/23/2013 23:48:21 | SETI@home | General prefs: from SETI@home (last modified 18-Aug-2013 12:00:17) 08/23/2013 23:48:21 | SETI@home | Host location: none 08/23/2013 23:48:21 | SETI@home | General prefs: using your defaults 08/23/2013 23:48:21 | | Reading preferences override file 08/23/2013 23:48:21 | | Preferences: 08/23/2013 23:48:21 | | max memory usage when active: 5520.42MB 08/23/2013 23:48:21 | | max memory usage when idle: 6133.80MB 08/23/2013 23:48:21 | | max disk usage: 200.00GB 08/23/2013 23:48:21 | | max CPUs used: 6 08/23/2013 23:48:21 | | (to change preferences, visit a project web site or select Preferences in the Manager) 08/23/2013 23:48:21 | | Not using a proxy 08/23/2013 23:48:21 | SETI@home | Restarting task ap_22no08ac_B6_P1_00288_20130807_02611.wu_1 using astropulse_v6 version 601 in slot 5 08/23/2013 23:48:21 | SETI@home | Restarting task ap_24mr08ab_B0_P0_00054_20130807_27141.wu_0 using astropulse_v6 version 601 in slot 4 08/23/2013 23:48:21 | SETI@home | Restarting task ap_19fe09af_B5_P0_00377_20130713_29783.wu_2 using astropulse_v6 version 601 in slot 3 08/23/2013 23:48:21 | SETI@home | Restarting task ap_19se08ac_B0_P1_00061_20130807_29017.wu_0 using astropulse_v6 version 601 in slot 7 08/23/2013 23:48:21 | SETI@home | Restarting task ap_18au08af_B6_P1_00336_20130807_28850.wu_0 using astropulse_v6 version 601 in slot 8 08/23/2013 23:48:21 | SETI@home | Restarting task ap_19au08aa_B1_P1_00180_20130807_07789.wu_0 using astropulse_v6 version 601 in slot 9 08/23/2013 23:48:21 | SETI@home | Restarting task ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1 using astropulse_v6 version 604 (cuda_opencl_100) in slot 0 08/23/2013 23:48:21 | SETI@home | Restarting task ap_08oc08aa_B3_P1_00296_20130818_23672.wu_3 using astropulse_v6 version 604 (cuda_opencl_100) in slot 1 I don't buy computers, I build them!! |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
When you go into Device Manager does the computer show two video cards correctly? I have had instances when I had to reinstall the driver from the NVIDIA website to make two work. The other thing might be your app_config.xml if you have an old one still in the subdirectory. (unlikely, but there are entries that could prevent two from running) I'm betting your machine isn't "seeing" the second card. You might have-to shut down and cold boot, but check the Device Manager first. Edit: No, that doesn't seem to be it. I'm lost. EDIT EDIT: It says you have eight instances running. What makes you think you aren't using both GPUs? If you really aren't, trying freeing-up a core or two. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I´m with tbert, you log says you have 6 CPU WU running and 2 AP, 1AP on slot 0 (the first GPU) and 1 AP on the slot 2 (the second GPU) then whay you say you are not ussing both GPU´s? 08/23/2013 23:48:21 | SETI@home | Restarting task ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1 using astropulse_v6 version 604 (cuda_opencl_100) in slot 0 08/23/2013 23:48:21 | SETI@home | Restarting task ap_08oc08aa_B3_P1_00296_20130818_23672.wu_3 using astropulse_v6 version 604 (cuda_opencl_100) in slot 1 You are running only 6 CPU WU because this: max CPUs used: 6 My ideia to send a very small app_config.xml was to avoid any other possible cause, the file i submit is the minimum file to make multi GPUs (with diferent GPUs) to work. I´m not said your not work, but we are looking for something strange. Config: use all coprocessors All is fixed now? If what you are looking for is run more than one WU on each one of the GPU´s you could read: http://setiathome.berkeley.edu/forum_thread.php?id=72507&postid=1400585 But remember after 7.0.64 the file name changes now is: app_config.xml And just a tip YMMV but very few top of the class MB/GPU´s can run more than on AP at a time on the GPU´s and each AP WU running needs a free core. As i see you runs almost only AP by your log. So for a better performance you must try. At least on my 690/670 2xGPU host i run only 1 AP and 1MB or 2MB on the GPU at the time (exactly with the file showed at the thread), i try 2 AP and all slows down. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I´m with tbert, you log says you have 6 CPU WU running and 2 AP, 1AP on slot 0 (the first GPU) and 1 AP on the slot 2 (the second GPU) then whay you say you are not ussing both GPU´s? Misunderstanding there. The 'slots' referred to in the event log are the folders on the hard disk where the temporary working files for the task are stored. It is perfectly possible to run two tasks on a single GPU: their files will still be stored in different slot directories. To identify which GPU is being used, you need to look for a 'device' or 'instance' number: 24/08/2013 10:32:48 | SETI@home | [coproc] NVIDIA instance 1: confirming for 13au09ac.2704.7020.14.12.180_2 24/08/2013 10:32:48 | SETI@home | [coproc] NVIDIA instance 0: confirming for 07mr08ab.27198.12342.16.12.107_2 24/08/2013 10:32:48 | SETI@home | [coproc] NVIDIA instance 0: confirming for 31mr08aa.7403.25021.9.12.193_2 24/08/2013 10:32:48 | SETI@home | [coproc] Assigning 0.480000 of NVIDIA instance 1 to 06fe09ad.11950.10120.13.12.253_2 24/08/2013 10:32:48 | SETI@home | Starting task 06fe09ad.11950.10120.13.12.253_2 using setiathome_v7 version 700 (cuda50) in slot 8 So, that WU started as the second task on the second GPU - but with files stored in slot 8. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Yes but his first log allready say he is ussing both GPU´s and my config file can´t change that. 08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 0 to ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0 08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 0 to ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2 08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0 08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1 Config: use all coprocessors He is running few cpu WU all point he is realy runs 1 AP on each one of the GPU´s, since Boinc starts first all the GPU task he can before start the CPU wu. That´s why i belive tbert ask to post his manager screen. But i belive the problem is not if he is running on both GPU, what he realy want to do is run more than 1 WU on each GPU that why i point him the other thread. Or i´m wrong? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Yes but his first log allready say he is ussing both GPU´s and my config file can´t change that. That's fine - from that log, you can tell that 'instance 0' and 'instance 1' are both in use - with two tasks each, same as mine. But you cannot draw the same conclusion from the 'Restarting task ... in slot n' log you posted. Your wording "slot 0 (the first GPU) ... slot 2 (the second GPU)" taken out of context (as I did) might mislead future readers. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Oh dear. Go back to Cliff's original log, before the useful information was removed from it by the simplified cc_config.xml file. It started OK: 08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 0 to ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0 08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 0 to ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2 08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0 08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1 Four different tasks, two GPUs, allocated two per GPU. Two of the GPU tasks started OK: 08/23/2013 19:54:03 | SETI@home | Restarting task ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0 using astropulse_v6 version 604 (cuda_opencl_100) in slot 2 08/23/2013 19:54:03 | SETI@home | Restarting task ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2 using astropulse_v6 version 604 (cuda_opencl_100) in slot 1 Comparing the WU names carefully, the two which started are the two assigned to instance 0: _00259_ _00151_ which is confirmed by the later attempts to re-assign _00153_ and _00154_ to instance 1: 08/23/2013 19:55:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0 08/23/2013 19:55:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2 08/23/2013 19:55:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0 08/23/2013 19:55:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1 In fact, none of the task assignments to instance 1 succeed in the log we can see, although BOINC knows all about it and is ready, willing, and able to use it. Looking at the file 'ReadMe_AstroPulse_OpenCL_NV.txt' supplied by the Lunatics installer, I see: -instances_per_device N :Sets allowed number of simultaneously executed GPU app instances per GPU device (shared with MultiBeam app instances). I'm wondering whether Raistmer has built that application with a default value of N=1, thus limiting a 2-GPU machine to two tasks at once. But it looks like a bug, if both tasks run on instance 0 and no tasks run on instance 1. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Looking at the file 'ReadMe_AstroPulse_OpenCL_NV.txt' supplied by the Lunatics installer, I see: I´m not totaly sure but i belive i run 2 AP at a time on my multiple GPU hosts and works, but when i try 2 AP runs slower than 1 (due my I5 few free cores to drive the GPU´s i belive) . I have no more AP work avaiable in any of my caches, to test that on my multiple GPU hosts. Maybe someone else could do the test and share the info with us. But if Cliff want to check if both GPU´s are realy running, and helps us to verify your bug theory, he could do that by start the boinc with the app_config.xml i suply in the other thread, it will start 1 (only one) AP + 1 MB or 2 MB on each one of his GPU, so he will answer the first question, if both of the GPU´s are allready working, then after that change the 0.51 GPU to 0.50 on the AP and re-run the program, if 2 AP starts the problem is in other part of one of his app files if not all points that the bug realy exists. You believe it is worth trying? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Good plan. I'd recommend that any tester helping us out sets that <coproc_debug>1</coproc_debug> log flag so they can see exactly which task is running on which device (instance). I'd question whether that "(shared with MultiBeam app instances)" applies on NVidia cards. My understanding is that the "-instances_per_device N" switch is for OpenCL applications only, because of their high demand for CPU support: it is neither needed nor supported for CUDA applications. |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
Maybe use GPU-Z to check if both cards are actually in use (independent of BOINC)? |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Boinc actualy say both cards are OK and ready to use, it even assing tasks for both, the question is to discover why the works start on the first GPU (instance) and not at the second... if both GPUs are enabled and ready to use, as aparently shows by the logs. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I've found the stderr_txt for the tasks which failed to start (both eventually completed and validated). It's too long to post in full here, but there seem to be a lot of state.fold_buf_size_short=65536; state.fold_buf_size_long=262144 Running on device number: 1 DATA_CHUNK_UNROLL set to:12 FFA thread block override value:8192 FFA thread fetchblock override value:4096 Priority of worker thread raised successfully Priority of process adjusted successfully, high priority class used OpenCL platform detected: NVIDIA Corporation BOINC assigns device 1 Info: BOINC provided device ID used Used GPU device parameters are: Number of compute units: 5 Single buffer allocation size: 256MB max WG size: 1024 FERMI path used: yes Build features: Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY COMBINED_DECHIRP_KERNEL FFTW USE_INCREASED_PRECISION USE_SSE2 x86 CPUID: Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz Cache: L1=64K L2=256K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 ### Restart at 31.53 percent. state.fold_buf_size_short=65536; state.fold_buf_size_long=262144 GPU device synched Termination request detected or computations are finished. GPU device synched, exiting... Unfortunately, not really enough detail given in that final 'termination or finished - exiting' line to confirm or disprove my theory. I'll pass the full things to Raistmer. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I curious what he will return from his test. But he must be sleeping now if he lives in the US. |
spitfire_mk_2 Send message Joined: 14 Apr 00 Posts: 563 Credit: 27,306,885 RAC: 0 |
Looking at this part, it tells me that both cards are being used. Here is example where only one card is used: 8/21/2013 12:27:54 AM | | CUDA: NVIDIA GPU 0: GeForce GTX 460 (driver version 320.18, CUDA version 5.50, compute capability 2.1, 2048MB, 2002MB available, 874 GFLOPS peak) 8/21/2013 12:27:54 AM | | CUDA: NVIDIA GPU 1 (not used): GeForce 8400 (driver version 320.18, CUDA version 5.50, compute capability 1.1, 128MB, 102MB available, 31 GFLOPS peak) 8/21/2013 12:27:54 AM | | OpenCL: NVIDIA GPU 0: GeForce GTX 460 (driver version 320.18, device version OpenCL 1.1 CUDA, 2048MB, 2002MB available, 874 GFLOPS peak) 8/21/2013 12:27:54 AM | | OpenCL: NVIDIA GPU 1 (not used): GeForce 8400 (driver version 320.18, device version OpenCL 1.0 CUDA, 128MB, 102MB available, 31 GFLOPS peak) Notice that I have a message that of the two cards one is not used. This message is provided by the client. Cliff does not have this message. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Cliff does not have this message. That´s exactly why i ask him to test if the same happening with 1AP+1MB (or 2MB) on each GPU so we could verify the Richard´s bug theory or if the problem is on some place on the app files. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.