Suddenly Missing GPU (nVidia driver update issue?)

Message boards : Number crunching : Suddenly Missing GPU (nVidia driver update issue?)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Woodgie
Avatar

Send message
Joined: 6 Dec 99
Posts: 134
Credit: 89,630,417
RAC: 55
United Kingdom
Message 1638132 - Posted: 6 Feb 2015, 11:06:33 UTC - in response to Message 1638116.  
Last modified: 6 Feb 2015, 11:18:20 UTC

It is worth noting, however, that once the cc_confix.xml file has been created, any change to Advanced>Event Log Diagnostic Flags... will overwrite the file and therefore nix any hand edits made to it.

It makes sense but just a heads up to the unwary.

Again, not quite true. Using the log flags GUI will re-write the file in its entirety, true - but it should write the current working state of the file, including any hand edits previously read in and acted on at startup. Mine still has an edited option I first set years ago, which has stayed in place through many, many BOINC version upgrades and (more recently) use of the log flag dialog.

Your cc_config looks OK, including the placement of <use_all_gpus>1</use_all_gpus> - but I may not have had enough coffee yet. You are aware, I presume, that <use_all_gpus> is one of the few flags which requires a full client restart to become active - most of the other options (and all the log flags) can be re-read and become active while BOINC is running.

I was going to suggest making a small benign change to cc_config manually, and verifying that it is reflected properly in the startup log: <sched_op_debug>1</sched_op_debug> logging is a handy one, and can be turned off again if you don't need it.


Interesting, maybe it's indicative of something else that when I make changes to Advanced>etc... it does indeed nix the file in favour of a new one.

I will do as you say to work the problem the other way around. Thanks.

Edit to add:
I changed </sched_op_debug> and indeed the change was picked up in the logs and the GUI. Turning the option off (unchecking the now checked box in the GUI) definitely overwrote the <use_all_gpus> key.
~W

ID: 1638132 · Report as offensive
Profile Woodgie
Avatar

Send message
Joined: 6 Dec 99
Posts: 134
Credit: 89,630,417
RAC: 55
United Kingdom
Message 1638133 - Posted: 6 Feb 2015, 11:07:41 UTC - in response to Message 1638132.  
Last modified: 6 Feb 2015, 11:07:52 UTC

(And I did re-boot the host after adding <use_all_gpus>)
~W

ID: 1638133 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1638137 - Posted: 6 Feb 2015, 11:17:14 UTC

Just in case - what are you using to make changes to cc_config.xml? Although 'typed' as an XML file, it should be treated as a plain ASCII text file and edited as such. Don't use a full-feature XML or Unicode editor - notepad will do.
ID: 1638137 · Report as offensive
Profile Woodgie
Avatar

Send message
Joined: 6 Dec 99
Posts: 134
Credit: 89,630,417
RAC: 55
United Kingdom
Message 1638139 - Posted: 6 Feb 2015, 11:19:24 UTC - in response to Message 1638137.  

Just in case - what are you using to make changes to cc_config.xml? Although 'typed' as an XML file, it should be treated as a plain ASCII text file and edited as such. Don't use a full-feature XML or Unicode editor - notepad will do.


Notepad is indeed what I'm using to edit the file. Though that is useful info, thanks!
~W

ID: 1638139 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1638147 - Posted: 6 Feb 2015, 11:49:50 UTC - in response to Message 1638107.  
Last modified: 6 Feb 2015, 11:54:27 UTC

Well, now I just need to get cc_config.xml to honour <use_all_gpus> and I can use the TITAN and the 750ti together. It's not quite as good as 2 TITANS but not too shoddy :) But there's the thing, It doesn't seem to be honouring it and with the 750ti in it uses it in favour of the TITAN.

So the obvious next step is: What's going on here? I made the cc_config.xml file by using Advanced>Event Log Diagnostic Flags... to add an element. This created cc_config.xml in the right place and with the right format. Then I hand edited the file to add the <use_all_gpus> key.

And here's the startup log:
06/02/2015 09:11:22 |  | CUDA: NVIDIA GPU 0 (not used): GeForce GTX TITAN (driver version 347.25, CUDA version 7.0, compute capability 3.5, 4096MB, 4096MB available, 4707 GFLOPS peak)
06/02/2015 09:11:22 |  | CUDA: NVIDIA GPU 1: GeForce GTX 750 Ti (driver version 347.25, CUDA version 7.0, compute capability 5.0, 2048MB, 1947MB available, 1388 GFLOPS peak)
06/02/2015 09:11:22 |  | OpenCL: NVIDIA GPU 0 (not used): GeForce GTX TITAN (driver version 347.25, device version OpenCL 1.1 CUDA, 6144MB, 4096MB available, 4707 GFLOPS peak)
06/02/2015 09:11:22 |  | OpenCL: NVIDIA GPU 1: GeForce GTX 750 Ti (driver version 347.25, device version OpenCL 1.1 CUDA, 2048MB, 1947MB available, 1388 GFLOPS peak)
06/02/2015 09:11:22 | SETI@home | Found app_info.xml; using anonymous platform
06/02/2015 09:11:22 |  | Host name: outlander
06/02/2015 09:11:22 |  | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz [Family 6 Model 60 Stepping 3]
06/02/2015 09:11:22 |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 pbe fsgsbase bmi1 smep bmi2
06/02/2015 09:11:22 |  | OS: Microsoft Windows 7: Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)


As we can see, the TITAN in slot 0 isn't being used even though cc_config.xml is set to use all.


The Config: use all coprocessors line isn't showing up, I tried setting it on my i5-3210M/GT650M/Intel_Graphics_HD4000 host, and it isn't displayed eithier,
and using the Diagnostic Flags option deletes the <use_all_gpus>1</use_all_gpus> line from the cc_config.xml:

06/02/2015 11:41:44 | | Starting BOINC client version 7.4.36 for windows_x86_64
06/02/2015 11:41:44 | | log flags: file_xfer, sched_ops, task, coproc_debug, cpu_sched, sched_op_debug
06/02/2015 11:41:44 | | log flags: unparsed_xml
06/02/2015 11:41:44 | | Libraries: libcurl/7.39.0 OpenSSL/1.0.1j zlib/1.2.8
06/02/2015 11:41:44 | | Data directory: C:\ProgramData\BOINC
06/02/2015 11:41:44 | | Running under account Stephen
06/02/2015 11:41:44 | | [coproc] launching child process at C:\Program Files\BOINC\boinc.exe
06/02/2015 11:41:44 | | [coproc] relative to directory C:\ProgramData\BOINC
06/02/2015 11:41:44 | | [coproc] with data directory "C:\ProgramData\BOINC"
06/02/2015 11:41:44 | | CUDA: NVIDIA GPU 0: GeForce GT 650M (driver version 347.25, CUDA version 7.0, compute capability 3.0, 2048MB, 1977MB available, 730 GFLOPS peak)
06/02/2015 11:41:44 | | OpenCL: NVIDIA GPU 0: GeForce GT 650M (driver version 347.25, device version OpenCL 1.1 CUDA, 2048MB, 1977MB available, 730 GFLOPS peak)
06/02/2015 11:41:44 | | OpenCL: Intel GPU 0: Intel(R) HD Graphics 4000 (driver version 10.18.10.3621, device version OpenCL 1.2, 1298MB, 1298MB available, 141 GFLOPS peak)
06/02/2015 11:41:44 | | OpenCL CPU: Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 3.0.1.10878, device version OpenCL 1.2 (Build 76413))
06/02/2015 11:41:44 | | NVIDIA library reports 1 GPU
06/02/2015 11:41:44 | | No ATI library found.
06/02/2015 11:41:44 | SETI@home | Found app_info.xml; using anonymous platform
06/02/2015 11:41:44 | | Host name: Samsung-NP550P5
06/02/2015 11:41:44 | | Processor: 4 GenuineIntel Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz [Family 6 Model 58 Stepping 9]
06/02/2015 11:41:44 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes f16c rdrandsyscall nx lm avx vmx tm2 pbe fsgsbase smep
06/02/2015 11:41:44 | | OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)
06/02/2015 11:41:44 | | Memory: 5.90 GB physical, 11.79 GB virtual
06/02/2015 11:41:44 | | Disk: 906.67 GB total, 629.34 GB free
06/02/2015 11:41:44 | | Local time is UTC +0 hours
06/02/2015 11:41:44 | Albert@Home | Found app_config.xml
06/02/2015 11:41:44 | Einstein@Home | Found app_config.xml
06/02/2015 11:41:44 | SETI@home | Found app_config.xml
06/02/2015 11:41:44 | SETI@home Beta Test | Found app_config.xml
06/02/2015 11:41:44 | Albert@Home | URL http://albert.phys.uwm.edu/; Computer ID 9008; resource share 100
06/02/2015 11:41:44 | climateprediction.net | URL http://climateprediction.net/; Computer ID 1346443; resource share 20
06/02/2015 11:41:44 | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 8941572; resource share 100
06/02/2015 11:41:44 | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 7054027; resource share 100
06/02/2015 11:41:44 | SETI@home Beta Test | URL http://setiweb.ssl.berkeley.edu/beta/; Computer ID 64245; resource share 100
06/02/2015 11:41:44 | PrimeGrid | URL http://www.primegrid.com/; Computer ID 414801; resource share 30
06/02/2015 11:41:44 | SETI@home | General prefs: from SETI@home (last modified 02-Feb-2015 14:45:48)
06/02/2015 11:41:44 | SETI@home | Computer location: home
06/02/2015 11:41:44 | | General prefs: using separate prefs for home
06/02/2015 11:41:44 | | Reading preferences override file
06/02/2015 11:41:44 | | Preferences:
06/02/2015 11:41:44 | | max memory usage when active: 4529.91MB
06/02/2015 11:41:44 | | max memory usage when idle: 5435.89MB
06/02/2015 11:41:44 | | max disk usage: 100.00GB
06/02/2015 11:41:44 | | max CPUs used: 2
06/02/2015 11:41:44 | | (to change preferences, visit a project web site or select Preferences in the Manager)
06/02/2015 11:41:44 | | Not using a proxy


This is what it should look like:

http://setiathome.berkeley.edu/forum_thread.php?id=75931&postid=1599863
11-Nov-2014 05:31:12 [---] CUDA: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.60, CUDA version 6.50, compute capability 3.5, 3072MB, 2937MB available, 4636 GFLOPS peak)
11-Nov-2014 05:31:12 [---] CUDA: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.60, CUDA version 6.50, compute capability 3.5, 4096MB, 4096MB available, 4698 GFLOPS peak)
11-Nov-2014 05:31:12 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.60, device version OpenCL 1.1 CUDA, 3072MB, 2937MB available, 4636 GFLOPS peak)
11-Nov-2014 05:31:12 [---] OpenCL: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.60, device version OpenCL 1.1 CUDA, 6144MB, 4096MB available, 4698 GFLOPS peak)
11-Nov-2014 05:31:12 [SETI@home] Found app_info.xml; using anonymous platform
11-Nov-2014 05:31:12 [---] Config: use all coprocessors


Claggy
ID: 1638147 · Report as offensive
Profile Woodgie
Avatar

Send message
Joined: 6 Dec 99
Posts: 134
Credit: 89,630,417
RAC: 55
United Kingdom
Message 1638152 - Posted: 6 Feb 2015, 12:30:12 UTC - in response to Message 1638147.  


The Config: use all coprocessors line isn't showing up, I tried setting it on my i5-3210M/GT650M/Intel_Graphics_HD4000 host, and it isn't displayed eithier,
and using the Diagnostic Flags option deletes the <use_all_gpus>1</use_all_gpus> line from the cc_config.xml:

<snip>

This is what it should look like:

http://setiathome.berkeley.edu/forum_thread.php?id=75931&postid=1599863
11-Nov-2014 05:31:12 [---] CUDA: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.60, CUDA version 6.50, compute capability 3.5, 3072MB, 2937MB available, 4636 GFLOPS peak)
11-Nov-2014 05:31:12 [---] CUDA: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.60, CUDA version 6.50, compute capability 3.5, 4096MB, 4096MB available, 4698 GFLOPS peak)
11-Nov-2014 05:31:12 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.60, device version OpenCL 1.1 CUDA, 3072MB, 2937MB available, 4636 GFLOPS peak)
11-Nov-2014 05:31:12 [---] OpenCL: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.60, device version OpenCL 1.1 CUDA, 6144MB, 4096MB available, 4698 GFLOPS peak)
11-Nov-2014 05:31:12 [SETI@home] Found app_info.xml; using anonymous platform
11-Nov-2014 05:31:12 [---] Config: use all coprocessors


Claggy


Nice to know I'm not going completely mad!

It's truly fantastic that you lot are willing to actually get 'down and dirty' and test on your own machines to help troubleshoot my problems, humbling is what it is.

Thanks.
~W

ID: 1638152 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1638156 - Posted: 6 Feb 2015, 12:36:33 UTC - in response to Message 1638152.  

I've reported these Bugs to the boinc_alpha list, expect it to be fixed in Boinc 7.4.42 or later.

Claggy
ID: 1638156 · Report as offensive
Profile Woodgie
Avatar

Send message
Joined: 6 Dec 99
Posts: 134
Credit: 89,630,417
RAC: 55
United Kingdom
Message 1638159 - Posted: 6 Feb 2015, 12:47:59 UTC - in response to Message 1638156.  
Last modified: 6 Feb 2015, 12:48:11 UTC

I've reported these Bugs to the boinc_alpha list, expect it to be fixed in Boinc 7.4.42 or later.

Claggy


Thank you very much. Do you want any more system details? I'm always more than happy to help.
~W

ID: 1638159 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1638161 - Posted: 6 Feb 2015, 12:52:09 UTC - in response to Message 1638156.  

It's a strange one: the tag still seems to be present in lib/cc_config.cpp, and I don't see anything in the history to suggest a deliberate change. We'll see what the dev response is, and I'll read some more code after lunch.

Do we know when <use_all_gpus> was last active? I'm surprised this hasn't come up on the boards before. People like Jacob Klein usually run all releases, and have multi-GPU hosts.
ID: 1638161 · Report as offensive
Paul Bowyer
Volunteer tester

Send message
Joined: 15 Aug 99
Posts: 11
Credit: 137,603,890
RAC: 0
United States
Message 1638175 - Posted: 6 Feb 2015, 13:29:25 UTC

<use_all_gpus>1</use_all_gpus>
<no_info_fetch>0</no_info_fetch>
<no_priority_change>0</no_priority_change>
<os_random_only>0</os_random_only>
<proxy_info>
<socks_server_name></socks_server_name>
<socks_server_port>80</socks_server_port>
<http_server_name></http_server_name>
<http_server_port>80</http_server_port>
<socks5_user_name></socks5_user_name>
<socks5_user_passwd></socks5_user_passwd>
<http_user_name></http_user_name>
<http_user_passwd></http_user_passwd>
<no_proxy></no_proxy>
</proxy_info>
<rec_half_life_days>10.000000</rec_half_life_days>
<report_results_immediately>0</report_results_immediately>
<run_apps_manually>0</run_apps_manually>
<save_stats_days>30</save_stats_days>
<skip_cpu_benchmarks>0</skip_cpu_benchmarks>
<simple_gui_only>0</simple_gui_only>
<start_delay>0.000000</start_delay>
<stderr_head>0</stderr_head>
<suppress_net_info>0</suppress_net_info>
<unsigned_apps_ok>0</unsigned_apps_ok>
<use_all_gpus>0</use_all_gpus>
<use_certs>0</use_certs>
<use_certs_only>0</use_certs_only>
<vbox_window>0</vbox_window>
</options>
</cc_config>


In your cc_config file, it looks like there is an additional <use_all_gpus> entry about 6 lines from the bottom. This one resets to 0.
ID: 1638175 · Report as offensive
Profile Woodgie
Avatar

Send message
Joined: 6 Dec 99
Posts: 134
Credit: 89,630,417
RAC: 55
United Kingdom
Message 1638176 - Posted: 6 Feb 2015, 13:51:54 UTC - in response to Message 1638175.  
Last modified: 6 Feb 2015, 13:52:26 UTC

Well I never, you're right! Let's check the current config on the host...

Yep, for some reason there's a second <use_all_gpus> set to 0!

Changing
...
Rebooting
...

OK, that's now all working properly. I'm embarrassed and should learn to read. I honestly didn't see that there!

Paul, you're a star, thank you very very much. I owe you a
<beverage_of_choice>Beer|Wine|Other</beverage_of_choice>

should you ever be in London and at a loose end.

Now, where DID I put my reading glasses?
~W

ID: 1638176 · Report as offensive
Profile Woodgie
Avatar

Send message
Joined: 6 Dec 99
Posts: 134
Credit: 89,630,417
RAC: 55
United Kingdom
Message 1638204 - Posted: 6 Feb 2015, 15:09:55 UTC

In case anyone is searching the forums, I think it's worth a mention as to what I think has been happening with my 'disappearing' settings in the cc_config.xml file.

I'm pretty certain what was happening was that it was only the <use_all_gpus> line which I added by hand was being removed when using the GUI to update settings.

I'd guess this is because BOINC decided (correctly) that line was superfluous due to the fact there was a <use_all_gpus> line already in the file in the 'proper' place, set to 0, which to me (being blind and not noticing it) made it look as if the key was being removed completely.

Here's the tip, kids: READ and read it again. And if it still doesn't make sense ask the rubber duck (or in my case, the coconut)

I hope this thread helps others as it's helped me. Thanks to everyone who contributed.
~W

ID: 1638204 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1638215 - Posted: 6 Feb 2015, 16:00:16 UTC - in response to Message 1638204.  

In case anyone is searching the forums, I think it's worth a mention as to what I think has been happening with my 'disappearing' settings in the cc_config.xml file.

I'm pretty certain what was happening was that it was only the <use_all_gpus> line which I added by hand was being removed when using the GUI to update settings.

I'd guess this is because BOINC decided (correctly) that line was superfluous due to the fact there was a <use_all_gpus> line already in the file in the 'proper' place, set to 0, which to me (being blind and not noticing it) made it look as if the key was being removed completely.

I'm also going blind, I had a 2nd <use_all_gpus> entry right at the bottom,
I was taking the <proxy_info> area as being the end of the <options> section as it wasn't aligned with the rest of the cc_config.xml
(Laptop with small screen and a lowish resolution, so everything isn't on screen at once)

Claggy
ID: 1638215 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1638219 - Posted: 6 Feb 2015, 16:05:56 UTC
Last modified: 6 Feb 2015, 16:12:55 UTC

As I've just written to the developer bug-list:

The diagnostic flags dialog doesn't remove a line from, or otherwise edit, the config file: like other BOINC operations, it simply blasts out a completely fresh instance of the file, with the 'current' (i.e. default, or most recently encountered) value in the predetermined sequence.

For Windows, I still regret the passing of the old [Get|Set]PrivateProfileString OS API call, which would change arbitrary values in an INI file without disturbing the sequence or other key values. But that's progress, I suppose.


[Edit: WritePrivateProfileString, that would be - Get is correct. I was too lazy to look it up in Dan Appleman's trusty "Programmer's Guide to the Win32 API" - best reference book I've ever bought.]
ID: 1638219 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Suddenly Missing GPU (nVidia driver update issue?)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.