Problem with multi GPUs


log in

Advanced search

Message boards : Number crunching : Problem with multi GPUs

1 · 2 · 3 · 4 · Next
Author Message
Profile Cliff HardingProject donor
Volunteer tester
Avatar
Send message
Joined: 18 Aug 99
Posts: 1026
Credit: 53,858,828
RAC: 16,358
United States
Message 1407122 - Posted: 24 Aug 2013, 0:00:47 UTC

Re-imaged the i7/950 machine today, but there seems to be a problem using both GPUs (2 x EVGA GTX660SC @ 2Gb). Even though cc_config specifies to use all co-processors and the app_info is set with a count of .5, only 1 GPU is being used. Need help -- any ideas?

08/23/2013 19:54:02 | | Starting BOINC client version 7.0.64 for windows_x86_64
08/23/2013 19:54:02 | | log flags: file_xfer, sched_ops, task, coproc_debug, cpu_sched, dcf_debug
08/23/2013 19:54:02 | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
08/23/2013 19:54:02 | | Data directory: D:\BOINC
08/23/2013 19:54:02 | | Running under account Cliff Harding
08/23/2013 19:54:02 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz [Family 6 Model 26 Stepping 5]
08/23/2013 19:54:02 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt syscall nx lm vmx tm2 pbe
08/23/2013 19:54:02 | | OS: Microsoft Windows 7: Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)
08/23/2013 19:54:02 | | Memory: 5.99 GB physical, 6.00 GB virtual
08/23/2013 19:54:02 | | Disk: 298.09 GB total, 278.51 GB free
08/23/2013 19:54:02 | | Local time is UTC -4 hours
08/23/2013 19:54:02 | | CUDA: NVIDIA GPU 0: GeForce GTX 660 (driver version 326.80, CUDA version 5.50, compute capability 3.0, 2048MB, 1964MB available, 2132 GFLOPS peak)
08/23/2013 19:54:02 | | CUDA: NVIDIA GPU 1: GeForce GTX 660 (driver version 326.80, CUDA version 5.50, compute capability 3.0, 2048MB, 1897MB available, 2132 GFLOPS peak)
08/23/2013 19:54:02 | | OpenCL: NVIDIA GPU 0: GeForce GTX 660 (driver version 326.80, device version OpenCL 1.1 CUDA, 2048MB, 1964MB available, 2132 GFLOPS peak)
08/23/2013 19:54:02 | | OpenCL: NVIDIA GPU 1: GeForce GTX 660 (driver version 326.80, device version OpenCL 1.1 CUDA, 2048MB, 1897MB available, 2132 GFLOPS peak)
08/23/2013 19:54:02 | | NVIDIA library reports 2 GPUs
08/23/2013 19:54:02 | | No ATI library found.
08/23/2013 19:54:02 | Milkyway@Home | Found app_info.xml; using anonymous platform
08/23/2013 19:54:02 | SETI@home | Found app_info.xml; using anonymous platform
08/23/2013 19:54:02 | | Config: report completed tasks immediately
08/23/2013 19:54:02 | | Config: use all coprocessors
08/23/2013 19:54:02 | | Config: GUI RPCs allowed from:
08/23/2013 19:54:02 | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 491386; resource share 0
08/23/2013 19:54:02 | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 5501972; resource share 100
08/23/2013 19:54:02 | SETI@home | General prefs: from SETI@home (last modified 18-Aug-2013 12:00:17)
08/23/2013 19:54:02 | SETI@home | Host location: none
08/23/2013 19:54:02 | SETI@home | General prefs: using your defaults
08/23/2013 19:54:02 | | Reading preferences override file
08/23/2013 19:54:02 | | Preferences:
08/23/2013 19:54:02 | | max memory usage when active: 5520.42MB
08/23/2013 19:54:02 | | max memory usage when idle: 6133.80MB
08/23/2013 19:54:02 | | max disk usage: 200.00GB
08/23/2013 19:54:02 | | max CPUs used: 6
08/23/2013 19:54:02 | | (to change preferences, visit a project web site or select Preferences in the Manager)
08/23/2013 19:54:02 | | Not using a proxy
08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 0 to ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0
08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 0 to ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2
08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0
08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1
08/23/2013 19:54:03 | SETI@home | Restarting task ap_22no08ac_B6_P1_00288_20130807_02611.wu_1 using astropulse_v6 version 601 in slot 5
08/23/2013 19:54:03 | SETI@home | Restarting task ap_24mr08ab_B0_P0_00054_20130807_27141.wu_0 using astropulse_v6 version 601 in slot 4
08/23/2013 19:54:03 | SETI@home | Restarting task ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0 using astropulse_v6 version 604 (cuda_opencl_100) in slot 2
08/23/2013 19:54:03 | SETI@home | Restarting task ap_19fe09af_B5_P0_00377_20130713_29783.wu_2 using astropulse_v6 version 601 in slot 3
08/23/2013 19:54:03 | SETI@home | Restarting task ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2 using astropulse_v6 version 604 (cuda_opencl_100) in slot 1
08/23/2013 19:54:03 | SETI@home | Restarting task ap_19se08ac_B0_P1_00061_20130807_29017.wu_0 using astropulse_v6 version 601 in slot 7
08/23/2013 19:54:03 | SETI@home | Restarting task ap_18au08af_B6_P1_00336_20130807_28850.wu_0 using astropulse_v6 version 601 in slot 8
08/23/2013 19:54:03 | SETI@home | Restarting task ap_19au08aa_B1_P1_00180_20130807_07789.wu_0 using astropulse_v6 version 601 in slot 9
08/23/2013 19:55:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0
08/23/2013 19:55:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2
08/23/2013 19:55:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0
08/23/2013 19:55:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1
08/23/2013 19:56:03 | SETI@home | Computation for task ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2 finished
08/23/2013 19:56:03 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0
08/23/2013 19:56:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 0 to ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0
08/23/2013 19:56:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1
08/23/2013 19:56:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_08oc08aa_B3_P1_00296_20130818_23672.wu_3
08/23/2013 19:56:03 | SETI@home | Restarting task ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0 using astropulse_v6 version 604 (cuda_opencl_100) in slot 6
08/23/2013 19:56:06 | SETI@home | Started upload of ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2_0
08/23/2013 19:56:10 | SETI@home | Finished upload of ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2_0
08/23/2013 19:56:10 | SETI@home | Sending scheduler request: To report completed tasks.
08/23/2013 19:56:10 | SETI@home | Reporting 1 completed tasks
08/23/2013 19:56:10 | SETI@home | Requesting new tasks for CPU and NVIDIA
08/23/2013 19:56:14 | SETI@home | Scheduler request completed: got 0 new tasks
08/23/2013 19:56:14 | SETI@home | No tasks sent
08/23/2013 19:56:14 | SETI@home | No tasks are available for AstroPulse v6
08/23/2013 19:56:14 | SETI@home | No tasks are available for the applications you have selected.
08/23/2013 19:56:14 | SETI@home | Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them
08/23/2013 19:57:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0
08/23/2013 19:57:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0
08/23/2013 19:57:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1
08/23/2013 19:57:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_08oc08aa_B3_P1_00296_20130818_23672.wu_3
08/23/2013 19:58:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0
08/23/2013 19:58:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0
08/23/2013 19:58:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1
08/23/2013 19:58:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_08oc08aa_B3_P1_00296_20130818_23672.wu_3
08/23/2013 19:59:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0
08/23/2013 19:59:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0
08/23/2013 19:59:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1
08/23/2013 19:59:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_08oc08aa_B3_P1_00296_20130818_23672.wu_3

____________


I don't buy computers, I build them!!

spitfire_mk_2
Avatar
Send message
Joined: 14 Apr 00
Posts: 461
Credit: 13,114,693
RAC: 6,289
United States
Message 1407127 - Posted: 24 Aug 2013, 0:22:26 UTC

in case your cc_config is messed up: http://www.overclock.net/t/827904/how-to-multi-gpus-on-boinc
____________

Profile Cliff HardingProject donor
Volunteer tester
Avatar
Send message
Joined: 18 Aug 99
Posts: 1026
Credit: 53,858,828
RAC: 16,358
United States
Message 1407129 - Posted: 24 Aug 2013, 0:34:01 UTC

This is the cc_config that I have been using for several years. The only thing that has changed since the system drive died is when I set/reset the log flags. This problem only occurred since todays re-image of the system. BOINC & Lunatics are clean installs with the new appl_info being overlayed by the old one before the system problem.

<cc_config>
<log_flags>
<task>1</task>
<task_debug>0</task_debug>
<file_xfer>1</file_xfer>
<sched_ops>1</sched_ops>
<coproc_debug>0</coproc_debug>
<cpu_sched>1</cpu_sched>
<cpu_sched_debug>0</cpu_sched_debug>
<dcf_debug>1</dcf_debug>
<sched_op_debug>0</sched_op_debug>
<state_debug>0</state_debug>
<http_debug>0</http_debug>
<http_xfer_debug>0</http_xfer_debug>
<work_fetch_debug>0</work_fetch_debug>
<rr_simulation>0</rr_simulation>
</log_flags>
<options>
<use_all_gpus>1</use_all_gpus>
<max_tasks_reported>250</max_tasks_reported>
<report_results_immediately>1</report_results_immediately>
<http_transfer_timeout>3500</http_transfer_timeout>
<max_file_xfers_per_project>10</max_file_xfers_per_project>
</options>
</cc_config>
____________


I don't buy computers, I build them!!

spitfire_mk_2
Avatar
Send message
Joined: 14 Apr 00
Posts: 461
Credit: 13,114,693
RAC: 6,289
United States
Message 1407132 - Posted: 24 Aug 2013, 0:49:44 UTC - in response to Message 1407129.

I would suggest making a simple cc_config to see if it works. If simple one works, then, I would guess, that your complicated one is the problem.
____________

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5471
Credit: 313,429,155
RAC: 130,246
Brazil
Message 1407136 - Posted: 24 Aug 2013, 1:12:14 UTC
Last modified: 24 Aug 2013, 1:17:37 UTC

Try this simple one:

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>

look if it works, and if not post your first 20 lines of the initial log with it.

remember totaly exit the boinc before try (not just the boincmgr) to be sure it´s working fine
____________

Profile Cliff HardingProject donor
Volunteer tester
Avatar
Send message
Joined: 18 Aug 99
Posts: 1026
Credit: 53,858,828
RAC: 16,358
United States
Message 1407164 - Posted: 24 Aug 2013, 3:55:23 UTC - in response to Message 1407136.

Try this simple one:

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>

look if it works, and if not post your first 20 lines of the initial log with it.

remember totaly exit the boinc before try (not just the boincmgr) to be sure it´s working fine


I created a cc_config.xml using your example, still no joy. Thus Ii think my original cc_config.xml is still valid. I even re-installed Lunatics to see helped.

08/23/2013 23:48:21 | | Starting BOINC client version 7.0.64 for windows_x86_64
08/23/2013 23:48:21 | | log flags: file_xfer, sched_ops, task
08/23/2013 23:48:21 | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
08/23/2013 23:48:21 | | Data directory: D:\BOINC
08/23/2013 23:48:21 | | Running under account Cliff Harding
08/23/2013 23:48:21 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz [Family 6 Model 26 Stepping 5]
08/23/2013 23:48:21 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt syscall nx lm vmx tm2 pbe
08/23/2013 23:48:21 | | OS: Microsoft Windows 7: Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)
08/23/2013 23:48:21 | | Memory: 5.99 GB physical, 6.00 GB virtual
08/23/2013 23:48:21 | | Disk: 298.09 GB total, 278.37 GB free
08/23/2013 23:48:21 | | Local time is UTC -4 hours
08/23/2013 23:48:21 | | CUDA: NVIDIA GPU 0: GeForce GTX 660 (driver version 326.80, CUDA version 5.50, compute capability 3.0, 2048MB, 1964MB available, 2132 GFLOPS peak)
08/23/2013 23:48:21 | | CUDA: NVIDIA GPU 1: GeForce GTX 660 (driver version 326.80, CUDA version 5.50, compute capability 3.0, 2048MB, 1873MB available, 2132 GFLOPS peak)
08/23/2013 23:48:21 | | OpenCL: NVIDIA GPU 0: GeForce GTX 660 (driver version 326.80, device version OpenCL 1.1 CUDA, 2048MB, 1964MB available, 2132 GFLOPS peak)
08/23/2013 23:48:21 | | OpenCL: NVIDIA GPU 1: GeForce GTX 660 (driver version 326.80, device version OpenCL 1.1 CUDA, 2048MB, 1873MB available, 2132 GFLOPS peak)
08/23/2013 23:48:21 | Milkyway@Home | Found app_info.xml; using anonymous platform
08/23/2013 23:48:21 | SETI@home | Found app_info.xml; using anonymous platform
08/23/2013 23:48:21 | | Config: use all coprocessors
08/23/2013 23:48:21 | | Config: GUI RPCs allowed from:
08/23/2013 23:48:21 | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 491386; resource share 0
08/23/2013 23:48:21 | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 5501972; resource share 100
08/23/2013 23:48:21 | SETI@home | General prefs: from SETI@home (last modified 18-Aug-2013 12:00:17)
08/23/2013 23:48:21 | SETI@home | Host location: none
08/23/2013 23:48:21 | SETI@home | General prefs: using your defaults
08/23/2013 23:48:21 | | Reading preferences override file
08/23/2013 23:48:21 | | Preferences:
08/23/2013 23:48:21 | | max memory usage when active: 5520.42MB
08/23/2013 23:48:21 | | max memory usage when idle: 6133.80MB
08/23/2013 23:48:21 | | max disk usage: 200.00GB
08/23/2013 23:48:21 | | max CPUs used: 6
08/23/2013 23:48:21 | | (to change preferences, visit a project web site or select Preferences in the Manager)
08/23/2013 23:48:21 | | Not using a proxy
08/23/2013 23:48:21 | SETI@home | Restarting task ap_22no08ac_B6_P1_00288_20130807_02611.wu_1 using astropulse_v6 version 601 in slot 5
08/23/2013 23:48:21 | SETI@home | Restarting task ap_24mr08ab_B0_P0_00054_20130807_27141.wu_0 using astropulse_v6 version 601 in slot 4
08/23/2013 23:48:21 | SETI@home | Restarting task ap_19fe09af_B5_P0_00377_20130713_29783.wu_2 using astropulse_v6 version 601 in slot 3
08/23/2013 23:48:21 | SETI@home | Restarting task ap_19se08ac_B0_P1_00061_20130807_29017.wu_0 using astropulse_v6 version 601 in slot 7
08/23/2013 23:48:21 | SETI@home | Restarting task ap_18au08af_B6_P1_00336_20130807_28850.wu_0 using astropulse_v6 version 601 in slot 8
08/23/2013 23:48:21 | SETI@home | Restarting task ap_19au08aa_B1_P1_00180_20130807_07789.wu_0 using astropulse_v6 version 601 in slot 9
08/23/2013 23:48:21 | SETI@home | Restarting task ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1 using astropulse_v6 version 604 (cuda_opencl_100) in slot 0
08/23/2013 23:48:21 | SETI@home | Restarting task ap_08oc08aa_B3_P1_00296_20130818_23672.wu_3 using astropulse_v6 version 604 (cuda_opencl_100) in slot 1

____________


I don't buy computers, I build them!!

tbretProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2897
Credit: 218,381,374
RAC: 62,793
United States
Message 1407172 - Posted: 24 Aug 2013, 5:10:29 UTC - in response to Message 1407164.
Last modified: 24 Aug 2013, 5:15:59 UTC

When you go into Device Manager does the computer show two video cards correctly?

I have had instances when I had to reinstall the driver from the NVIDIA website to make two work.

The other thing might be your app_config.xml if you have an old one still in the subdirectory. (unlikely, but there are entries that could prevent two from running)

I'm betting your machine isn't "seeing" the second card. You might have-to shut down and cold boot, but check the Device Manager first.

Edit: No, that doesn't seem to be it. I'm lost.

EDIT EDIT: It says you have eight instances running. What makes you think you aren't using both GPUs?

If you really aren't, trying freeing-up a core or two.

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5471
Credit: 313,429,155
RAC: 130,246
Brazil
Message 1407230 - Posted: 24 Aug 2013, 9:33:18 UTC
Last modified: 24 Aug 2013, 9:44:25 UTC

I´m with tbert, you log says you have 6 CPU WU running and 2 AP, 1AP on slot 0 (the first GPU) and 1 AP on the slot 2 (the second GPU) then whay you say you are not ussing both GPU´s?

08/23/2013 23:48:21 | SETI@home | Restarting task ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1 using astropulse_v6 version 604 (cuda_opencl_100) in slot 0
08/23/2013 23:48:21 | SETI@home | Restarting task ap_08oc08aa_B3_P1_00296_20130818_23672.wu_3 using astropulse_v6 version 604 (cuda_opencl_100) in slot 1

You are running only 6 CPU WU because this: max CPUs used: 6

My ideia to send a very small app_config.xml was to avoid any other possible cause, the file i submit is the minimum file to make multi GPUs (with diferent GPUs) to work. I´m not said your not work, but we are looking for something strange.

Config: use all coprocessors

All is fixed now?

If what you are looking for is run more than one WU on each one of the GPU´s you could read: http://setiathome.berkeley.edu/forum_thread.php?id=72507&postid=1400585

But remember after 7.0.64 the file name changes now is: app_config.xml

And just a tip YMMV but very few top of the class MB/GPU´s can run more than on AP at a time on the GPU´s and each AP WU running needs a free core. As i see you runs almost only AP by your log. So for a better performance you must try. At least on my 690/670 2xGPU host i run only 1 AP and 1MB or 2MB on the GPU at the time (exactly with the file showed at the thread), i try 2 AP and all slows down.
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8760
Credit: 52,711,310
RAC: 23,645
United Kingdom
Message 1407232 - Posted: 24 Aug 2013, 9:43:45 UTC - in response to Message 1407230.

I´m with tbert, you log says you have 6 CPU WU running and 2 AP, 1AP on slot 0 (the first GPU) and 1 AP on the slot 2 (the second GPU) then whay you say you are not ussing both GPU´s?

08/23/2013 23:48:21 | SETI@home | Restarting task ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1 using astropulse_v6 version 604 (cuda_opencl_100) in slot 0
08/23/2013 23:48:21 | SETI@home | Restarting task ap_08oc08aa_B3_P1_00296_20130818_23672.wu_3 using astropulse_v6 version 604 (cuda_opencl_100) in slot 1

Misunderstanding there.

The 'slots' referred to in the event log are the folders on the hard disk where the temporary working files for the task are stored. It is perfectly possible to run two tasks on a single GPU: their files will still be stored in different slot directories.

To identify which GPU is being used, you need to look for a 'device' or 'instance' number:

24/08/2013 10:32:48 | SETI@home | [coproc] NVIDIA instance 1: confirming for 13au09ac.2704.7020.14.12.180_2 24/08/2013 10:32:48 | SETI@home | [coproc] NVIDIA instance 0: confirming for 07mr08ab.27198.12342.16.12.107_2 24/08/2013 10:32:48 | SETI@home | [coproc] NVIDIA instance 0: confirming for 31mr08aa.7403.25021.9.12.193_2 24/08/2013 10:32:48 | SETI@home | [coproc] Assigning 0.480000 of NVIDIA instance 1 to 06fe09ad.11950.10120.13.12.253_2 24/08/2013 10:32:48 | SETI@home | Starting task 06fe09ad.11950.10120.13.12.253_2 using setiathome_v7 version 700 (cuda50) in slot 8

So, that WU started as the second task on the second GPU - but with files stored in slot 8.

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5471
Credit: 313,429,155
RAC: 130,246
Brazil
Message 1407235 - Posted: 24 Aug 2013, 9:53:21 UTC
Last modified: 24 Aug 2013, 9:59:07 UTC

Yes but his first log allready say he is ussing both GPU´s and my config file can´t change that.

08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 0 to ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0
08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 0 to ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2
08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0
08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1

Config: use all coprocessors

He is running few cpu WU all point he is realy runs 1 AP on each one of the GPU´s, since Boinc starts first all the GPU task he can before start the CPU wu. That´s why i belive tbert ask to post his manager screen.

But i belive the problem is not if he is running on both GPU, what he realy want to do is run more than 1 WU on each GPU that why i point him the other thread.

Or i´m wrong?
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8760
Credit: 52,711,310
RAC: 23,645
United Kingdom
Message 1407237 - Posted: 24 Aug 2013, 10:04:47 UTC - in response to Message 1407235.

Yes but his first log allready say he is ussing both GPU´s and my config file can´t change that.

08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 0 to ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0
08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 0 to ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2
08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0
08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1

Config: use all coprocessors

He is running few cpu WU all point he is realy runs 1 AP on each one of the GPU´s, since Boinc starts first all the GPU task he can before start the CPU wu. That´s why i belive tbert ask to post his manager screen.

Or i´m wrong?

That's fine - from that log, you can tell that 'instance 0' and 'instance 1' are both in use - with two tasks each, same as mine.

But you cannot draw the same conclusion from the 'Restarting task ... in slot n' log you posted. Your wording "slot 0 (the first GPU) ... slot 2 (the second GPU)" taken out of context (as I did) might mislead future readers.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8760
Credit: 52,711,310
RAC: 23,645
United Kingdom
Message 1407250 - Posted: 24 Aug 2013, 10:38:51 UTC

Oh dear. Go back to Cliff's original log, before the useful information was removed from it by the simplified cc_config.xml file.

It started OK:

08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 0 to ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0 08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 0 to ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2 08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0 08/23/2013 19:54:03 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1

Four different tasks, two GPUs, allocated two per GPU.

Two of the GPU tasks started OK:

08/23/2013 19:54:03 | SETI@home | Restarting task ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0 using astropulse_v6 version 604 (cuda_opencl_100) in slot 2 08/23/2013 19:54:03 | SETI@home | Restarting task ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2 using astropulse_v6 version 604 (cuda_opencl_100) in slot 1

Comparing the WU names carefully, the two which started are the two assigned to instance 0:

_00259_
_00151_

which is confirmed by the later attempts to re-assign _00153_ and _00154_ to instance 1:

08/23/2013 19:55:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00259_20130818_04562.wu_0 08/23/2013 19:55:04 | SETI@home | [coproc] NVIDIA instance 0: confirming for ap_08oc08aa_B2_P0_00151_20130818_04562.wu_2 08/23/2013 19:55:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA free instance 1 to ap_14mr08ad_B6_P0_00153_20130818_16116.wu_0 08/23/2013 19:55:04 | SETI@home | [coproc] Assigning 0.500000 of NVIDIA instance 1 to ap_14mr08ad_B5_P1_00154_20130818_14973.wu_1

In fact, none of the task assignments to instance 1 succeed in the log we can see, although BOINC knows all about it and is ready, willing, and able to use it.

Looking at the file 'ReadMe_AstroPulse_OpenCL_NV.txt' supplied by the Lunatics installer, I see:

-instances_per_device N :Sets allowed number of simultaneously executed GPU app instances per GPU device (shared with MultiBeam app instances).
N - integer number of allowed instances.

I'm wondering whether Raistmer has built that application with a default value of N=1, thus limiting a 2-GPU machine to two tasks at once. But it looks like a bug, if both tasks run on instance 0 and no tasks run on instance 1.

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5471
Credit: 313,429,155
RAC: 130,246
Brazil
Message 1407252 - Posted: 24 Aug 2013, 10:52:37 UTC - in response to Message 1407250.
Last modified: 24 Aug 2013, 11:08:30 UTC

Looking at the file 'ReadMe_AstroPulse_OpenCL_NV.txt' supplied by the Lunatics installer, I see:

-instances_per_device N :Sets allowed number of simultaneously executed GPU app instances per GPU device (shared with MultiBeam app instances).
N - integer number of allowed instances.

I'm wondering whether Raistmer has built that application with a default value of N=1, thus limiting a 2-GPU machine to two tasks at once. But it looks like a bug, if both tasks run on instance 0 and no tasks run on instance 1.

I´m not totaly sure but i belive i run 2 AP at a time on my multiple GPU hosts and works, but when i try 2 AP runs slower than 1 (due my I5 few free cores to drive the GPU´s i belive) . I have no more AP work avaiable in any of my caches, to test that on my multiple GPU hosts. Maybe someone else could do the test and share the info with us.

But if Cliff want to check if both GPU´s are realy running, and helps us to verify your bug theory, he could do that by start the boinc with the app_config.xml i suply in the other thread, it will start 1 (only one) AP + 1 MB or 2 MB on each one of his GPU, so he will answer the first question, if both of the GPU´s are allready working, then after that change the 0.51 GPU to 0.50 on the AP and re-run the program, if 2 AP starts the problem is in other part of one of his app files if not all points that the bug realy exists.

You believe it is worth trying?
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8760
Credit: 52,711,310
RAC: 23,645
United Kingdom
Message 1407255 - Posted: 24 Aug 2013, 11:03:59 UTC - in response to Message 1407252.

Good plan. I'd recommend that any tester helping us out sets that

<coproc_debug>1</coproc_debug>

log flag so they can see exactly which task is running on which device (instance).

I'd question whether that "(shared with MultiBeam app instances)" applies on NVidia cards. My understanding is that the "-instances_per_device N" switch is for OpenCL applications only, because of their high demand for CPU support: it is neither needed nor supported for CUDA applications.

jravin
Send message
Joined: 25 Mar 02
Posts: 966
Credit: 104,801,879
RAC: 50,982
United States
Message 1407265 - Posted: 24 Aug 2013, 12:15:11 UTC

Maybe use GPU-Z to check if both cards are actually in use (independent of BOINC)?
____________

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5471
Credit: 313,429,155
RAC: 130,246
Brazil
Message 1407282 - Posted: 24 Aug 2013, 12:54:15 UTC
Last modified: 24 Aug 2013, 12:55:16 UTC

Boinc actualy say both cards are OK and ready to use, it even assing tasks for both, the question is to discover why the works start on the first GPU (instance) and not at the second... if both GPUs are enabled and ready to use, as aparently shows by the logs.
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8760
Credit: 52,711,310
RAC: 23,645
United Kingdom
Message 1407286 - Posted: 24 Aug 2013, 13:13:37 UTC

I've found the stderr_txt for the tasks which failed to start (both eventually completed and validated).

It's too long to post in full here, but there seem to be a lot of

state.fold_buf_size_short=65536; state.fold_buf_size_long=262144 Running on device number: 1 DATA_CHUNK_UNROLL set to:12 FFA thread block override value:8192 FFA thread fetchblock override value:4096 Priority of worker thread raised successfully Priority of process adjusted successfully, high priority class used OpenCL platform detected: NVIDIA Corporation BOINC assigns device 1 Info: BOINC provided device ID used Used GPU device parameters are: Number of compute units: 5 Single buffer allocation size: 256MB max WG size: 1024 FERMI path used: yes Build features: Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY COMBINED_DECHIRP_KERNEL FFTW USE_INCREASED_PRECISION USE_SSE2 x86 CPUID: Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz Cache: L1=64K L2=256K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 ### Restart at 31.53 percent. state.fold_buf_size_short=65536; state.fold_buf_size_long=262144 GPU device synched Termination request detected or computations are finished. GPU device synched, exiting...

Unfortunately, not really enough detail given in that final 'termination or finished - exiting' line to confirm or disprove my theory. I'll pass the full things to Raistmer.

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5471
Credit: 313,429,155
RAC: 130,246
Brazil
Message 1407293 - Posted: 24 Aug 2013, 13:57:07 UTC
Last modified: 24 Aug 2013, 13:57:32 UTC

I curious what he will return from his test. But he must be sleeping now if he lives in the US.
____________

spitfire_mk_2
Avatar
Send message
Joined: 14 Apr 00
Posts: 461
Credit: 13,114,693
RAC: 6,289
United States
Message 1407306 - Posted: 24 Aug 2013, 14:46:07 UTC - in response to Message 1407164.


08/23/2013 23:48:21 | | CUDA: NVIDIA GPU 0: GeForce GTX 660 (driver version 326.80, CUDA version 5.50, compute capability 3.0, 2048MB, 1964MB available, 2132 GFLOPS peak)
08/23/2013 23:48:21 | | CUDA: NVIDIA GPU 1: GeForce GTX 660 (driver version 326.80, CUDA version 5.50, compute capability 3.0, 2048MB, 1873MB available, 2132 GFLOPS peak)
08/23/2013 23:48:21 | | OpenCL: NVIDIA GPU 0: GeForce GTX 660 (driver version 326.80, device version OpenCL 1.1 CUDA, 2048MB, 1964MB available, 2132 GFLOPS peak)
08/23/2013 23:48:21 | | OpenCL: NVIDIA GPU 1: GeForce GTX 660 (driver version 326.80, device version OpenCL 1.1 CUDA, 2048MB, 1873MB available, 2132 GFLOPS peak)

Looking at this part, it tells me that both cards are being used.

Here is example where only one card is used:
8/21/2013 12:27:54 AM | | CUDA: NVIDIA GPU 0: GeForce GTX 460 (driver version 320.18, CUDA version 5.50, compute capability 2.1, 2048MB, 2002MB available, 874 GFLOPS peak)
8/21/2013 12:27:54 AM | | CUDA: NVIDIA GPU 1 (not used): GeForce 8400 (driver version 320.18, CUDA version 5.50, compute capability 1.1, 128MB, 102MB available, 31 GFLOPS peak)
8/21/2013 12:27:54 AM | | OpenCL: NVIDIA GPU 0: GeForce GTX 460 (driver version 320.18, device version OpenCL 1.1 CUDA, 2048MB, 2002MB available, 874 GFLOPS peak)
8/21/2013 12:27:54 AM | | OpenCL: NVIDIA GPU 1 (not used): GeForce 8400 (driver version 320.18, device version OpenCL 1.0 CUDA, 128MB, 102MB available, 31 GFLOPS peak)

Notice that I have a message that of the two cards one is not used. This message is provided by the client. Cliff does not have this message.
____________

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5471
Credit: 313,429,155
RAC: 130,246
Brazil
Message 1407312 - Posted: 24 Aug 2013, 14:56:51 UTC - in response to Message 1407306.

Cliff does not have this message.

That´s exactly why i ask him to test if the same happening with 1AP+1MB (or 2MB) on each GPU so we could verify the Richard´s bug theory or if the problem is on some place on the app files.

____________

1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Problem with multi GPUs

Copyright © 2014 University of California