Ubuntu Server 12.04 - Nvidia GeForce 610 inst. OK, BUT lm-sensors affected! No temp for GPU anymore


log in

Advanced search

Questions and Answers : Unix/Linux : Ubuntu Server 12.04 - Nvidia GeForce 610 inst. OK, BUT lm-sensors affected! No temp for GPU anymore

Author Message
DanHansen@Denmark
Volunteer tester
Avatar
Send message
Joined: 14 Nov 12
Posts: 182
Credit: 4,090,545
RAC: 13,947
Denmark
Message 1475211 - Posted: 10 Feb 2014, 14:27:03 UTC

Hi,

The Rack mounted Ubuntu Linux BOINC Server
Ubuntu Server 12.04.4 64Bit
Intel i5/4Gb RAM/Asus MB.
MSI GeForce GT 610 2 Gb
LM-Sensors
BOINC

Link to history of the issue: http://setiathome.berkeley.edu/forum_thread.php?id=73032&postid=1475153#1475153

Well, we succeeded in installing the GeForce GT610 on Ubuntu Server 12.04 in a non graphical environment! (currently making a todo) But, before we started making changes, lm-sensors showed a third result.
Now it only shows two results. The third result was the GPU. Why is this gone? Yes, because of the installing of the Nvidia driver, but what can we do two get it back? This is how it looks now. Unfortunately I didn't save a screendump of the sensors before, but there was a third result! I seem to remember it was isa-bus something.. not much, I know..

acpitz-virtual-0 Adapter: Virtual device temp1: +27.8°C (crit = +106.0°C) temp2: +29.8°C (crit = +106.0°C) coretemp-isa-0000 Adapter: ISA adapter Physical id 0: +52.0°C (high = +85.0°C, crit = +105.0°C) Core 0: +47.0°C (high = +85.0°C, crit = +105.0°C) Core 1: +49.0°C (high = +85.0°C, crit = +105.0°C) Core 2: +52.0°C (high = +85.0°C, crit = +105.0°C) Core 3: +47.0°C (high = +85.0°C, crit = +105.0°C)


The reason I need this, is because I'm re-doing a script which will control the the heat of multiple CPU's, GPU's, harddrives.

Any Ideas?
____________
Project Headless Linux Multiple GPU Boinc Servers
Ubuntu Server 14.04.1 64bit
Kernel 3.13.0-32-generic
CPU's i5-4690K
GPU's Asus GT640
Nvidia v.340.29
BOINC v.7.2.42

Profile Bil
Send message
Joined: 27 Jan 01
Posts: 75
Credit: 621,055
RAC: 917
Latvia
Message 1475214 - Posted: 10 Feb 2014, 14:45:40 UTC - in response to Message 1475211.
Last modified: 10 Feb 2014, 14:45:53 UTC

use
nvidia-smi -a |grep Gpu

DanHansen@Denmark
Volunteer tester
Avatar
Send message
Joined: 14 Nov 12
Posts: 182
Credit: 4,090,545
RAC: 13,947
Denmark
Message 1475483 - Posted: 11 Feb 2014, 1:57:03 UTC - in response to Message 1475214.

Hi Bil,


Well thank you for that ;)

It works, pretty nice, but I would like my lm-sensor result to return. Because of the script.

# nvidia-smi -a |grep Gpu
Gpu : N/A
Gpu : 36 C

Well, it can be done with this, no question about it. But do you have an idea why it dissapeared after installation of the driver? I found a site where, I guess what they are talking about is the same issue, almost. And they think it's because of the missing Xserver (removal of nouveau, is this a part of x-server?) Do you know that?
____________
Project Headless Linux Multiple GPU Boinc Servers
Ubuntu Server 14.04.1 64bit
Kernel 3.13.0-32-generic
CPU's i5-4690K
GPU's Asus GT640
Nvidia v.340.29
BOINC v.7.2.42

Profile Bil
Send message
Joined: 27 Jan 01
Posts: 75
Credit: 621,055
RAC: 917
Latvia
Message 1475567 - Posted: 11 Feb 2014, 6:58:12 UTC - in response to Message 1475483.

i think, free nvidia driver ( nouveau) have some integrations with lm-sensors, legacy driver - not.
i encounter the same situation on slackware64 14.0 amd with ati 4350 card - with free driver who comes with kernel, "sensors" show GPU temperature also, but when i switch to closed drivers - catalyst 13.1 - sensors stop show gpu temp.
i use aticonfig --odgt

DanHansen@Denmark
Volunteer tester
Avatar
Send message
Joined: 14 Nov 12
Posts: 182
Credit: 4,090,545
RAC: 13,947
Denmark
Message 1475722 - Posted: 11 Feb 2014, 18:24:06 UTC - in response to Message 1475567.

Hi Bil,


i encounter the same situation on slackware64 14.0 amd with ati 4350 card - with free driver who comes with kernel, "sensors" show GPU temperature also, but when i switch to closed drivers - catalyst 13.1 - sensors stop show gpu temp.
i use aticonfig --odgt

Like my command "# nvidia-smi -a |grep Gpu".. OK, I see.

In another post, Guy writes to me:
In my step by step, I blacklisted nouveau and then deleted it before installing the nvidia driver, so I was not asked if I wanted to disable it. From what I've read, allowing nvidia to disable nouveau causes other problems.

Do you think it will help, if the driver gets installed right? Or is it still a problem like it is in your case? I have to ask, because I'm not that good yet. Sorry for the newbie questions o)
____________
Project Headless Linux Multiple GPU Boinc Servers
Ubuntu Server 14.04.1 64bit
Kernel 3.13.0-32-generic
CPU's i5-4690K
GPU's Asus GT640
Nvidia v.340.29
BOINC v.7.2.42

DanHansen@Denmark
Volunteer tester
Avatar
Send message
Joined: 14 Nov 12
Posts: 182
Credit: 4,090,545
RAC: 13,947
Denmark
Message 1475853 - Posted: 12 Feb 2014, 0:18:14 UTC - in response to Message 1475722.
Last modified: 12 Feb 2014, 0:34:04 UTC

Hello again,


Regarding the GPU/Graphic card as a cruncher. First of all, I'm changing to Asus GeForce 640! Because it has a lot better performance and because it can endure a lot of heat. This I learned from you guys. And thank you for that ;)

Now I saw, because BilBg showed me, that the GPU is indeed doing some work, but not much! Look here:

Here's a sample of the problem:

name: ps_140207_12281_43_1
WU name: ps_140207_12281_43
project URL: http://asteroidsathome.net/boinc/
report deadline: Fri Feb 21 01:04:53 2014
ready to report: no
got server ack: no
final CPU time: 0.000000
state: downloaded
scheduler state: scheduled
exit_status: 0
signal: 0
suspended via GUI: no
active_task_state: EXECUTING
app version num: 10111
checkpoint CPU time: 3.783987
current CPU time: 79419.500000
fraction done: 0.044290
swap size: 17325015040.000000
working set size: 24055808.000000
estimated CPU time remaining: 3446.603843


30 sek. later i checked again, and now est. CPU time is 6 sec. more!?!?

name: ps_140207_12281_43_1
WU name: ps_140207_12281_43
project URL: http://asteroidsathome.net/boinc/
report deadline: Fri Feb 21 01:04:53 2014
ready to report: no
got server ack: no
final CPU time: 0.000000
state: downloaded
scheduler state: scheduled
exit_status: 0
signal: 0
suspended via GUI: no
active_task_state: EXECUTING
app version num: 10111
checkpoint CPU time: 3.783987
current CPU time: 79551.560000
fraction done: 0.044290
swap size: 17325015040.000000
working set size: 24055808.000000
estimated CPU time remaining: 3452.179189


I think its probably not doing anything after all! Can't find 1 CUDA55 finished! Take a look, and let me know what you guys think: http://asteroidsathome.net/boinc/results.php?hostid=74516&offset=0&show_names=0&state=1&appid=

And, I did change the setup from 20% use of GPU to 100%. No change in heat or anything..

.
____________
Project Headless Linux Multiple GPU Boinc Servers
Ubuntu Server 14.04.1 64bit
Kernel 3.13.0-32-generic
CPU's i5-4690K
GPU's Asus GT640
Nvidia v.340.29
BOINC v.7.2.42

DanHansen@Denmark
Volunteer tester
Avatar
Send message
Joined: 14 Nov 12
Posts: 182
Credit: 4,090,545
RAC: 13,947
Denmark
Message 1475934 - Posted: 12 Feb 2014, 4:10:12 UTC

Hi,


OK, my new todo seems to have done the trick. I'm not sure though. current CPU time looks funny.

The estimated CPU time looks OK, it gets smaller now. But this is not what I expected of a GPU! My i5 processor is 100 times faster. Or am I mistaking?

1) ----------- name: ps_140207_12281_43_1 WU name: ps_140207_12281_43 project URL: http://asteroidsathome.net/boinc/ report deadline: Fri Feb 21 01:04:53 2014 ready to report: no got server ack: no final CPU time: 0.000000 state: downloaded scheduler state: scheduled exit_status: 0 signal: 0 suspended via GUI: no active_task_state: EXECUTING app version num: 10111 checkpoint CPU time: 5.328888 current CPU time: 5.385871 fraction done: 0.063443 swap size: 17325019136.000000 working set size: 24006655.926331 estimated CPU time remaining: 25146.133314


About a minut later:
1) ----------- name: ps_140207_12281_43_1 WU name: ps_140207_12281_43 project URL: http://asteroidsathome.net/boinc/ report deadline: Fri Feb 21 01:04:53 2014 ready to report: no got server ack: no final CPU time: 0.000000 state: downloaded scheduler state: scheduled exit_status: 0 signal: 0 suspended via GUI: no active_task_state: EXECUTING app version num: 10111 checkpoint CPU time: 5.702561 current CPU time: 5.741095 fraction done: 0.069188 swap size: 17325019136.000000 working set size: 24006656.000000 estimated CPU time remaining: 24931.730650

____________
Project Headless Linux Multiple GPU Boinc Servers
Ubuntu Server 14.04.1 64bit
Kernel 3.13.0-32-generic
CPU's i5-4690K
GPU's Asus GT640
Nvidia v.340.29
BOINC v.7.2.42

Profile Bil
Send message
Joined: 27 Jan 01
Posts: 75
Credit: 621,055
RAC: 917
Latvia
Message 1476035 - Posted: 12 Feb 2014, 8:17:49 UTC - in response to Message 1475722.

no, i think, that problem is in association in driver. in free drivers you got temperature via sensors, in closed ( proprietary) - not. and way in what you install a proprietary driver not change that thing.
maybe i am wrong, but thats is what i think.

btw, i be happy if you share your temperature monitoring script.

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2894
Credit: 6,641,219
RAC: 7,903
Bulgaria
Message 1476068 - Posted: 12 Feb 2014, 9:52:54 UTC - in response to Message 1475934.

You have one Windows computer:
http://setiathome.berkeley.edu/show_host_detail.php?hostid=7104439

If it is on the same LAN as your Linux computers, please make your life easier by installing (on Windows):
BoincTasks
http://www.efmer.eu/boinc/boinc_tasks/

"The program should run on Windows 2003 / Windows XP / Windows Vista / Windows 7 / Windows 8 as well as on Linux and Mac, with Wine."

Then use BoincTasks to monitor your other Linux computers
(it gives much more info than boinccmd)
____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

DanHansen@Denmark
Volunteer tester
Avatar
Send message
Joined: 14 Nov 12
Posts: 182
Credit: 4,090,545
RAC: 13,947
Denmark
Message 1477442 - Posted: 15 Feb 2014, 3:20:30 UTC - in response to Message 1476068.
Last modified: 15 Feb 2014, 3:36:31 UTC

Hi BilBg,

If it is on the same LAN as your Linux computers, please make your life easier by installing (on Windows):
BoincTasks
http://www.efmer.eu/boinc/boinc_tasks/

I use BoincView on that 1 windows computer. But I will try the one you suggest! I didn't know which one to choose! Thanks for making my life easier ;)


Hi Bil,
no, i think, that problem is in association in driver. in free drivers you got temperature via sensors, in closed ( proprietary) - not. and way in what you install a proprietary driver not change that thing.
maybe i am wrong, but that's is what i think.

btw, i be happy if you share your temperature monitoring script.


Thanks for that one ;)

Regarding script, I'll be more than happy to share it with you ;) I've made a script, with help from other scripts like it, which runs with 1 processor, and a script which runs with several processors. I working on it, so that it works fine with both types, single CPU's and multiple CPU's. I also have a script for the HDD. Both scripts work with lm-sensors. BUT, because of my driver problems, I may need to make a script for the GPU too. I wanted to make it all in one script, but because of the driver problems lm-sensors can't "see" the GPU any more. Maybe if we get it to work properly, the GPU result will reappear!

But the script is your of course! And I can make you a list of all the sites I used to get this far, if you need it. It's a shell script by the way ".sh" . I only use Ubuntu Servers for BOINC as you know. I configures the script to be controlled by CRON every minute, and then if one CPU gets hotter than, e.g. 40 degrees Celsius, it warns by email and adds to a log file. If one CPU gets hotter than e.g. 50 degrees Celsius it alerts by email, adds to a log file and shuts the computer down! But, please let me finis it properly ;) Just need this Nvidia stuff and BOINC to go together ;)

.
____________
Project Headless Linux Multiple GPU Boinc Servers
Ubuntu Server 14.04.1 64bit
Kernel 3.13.0-32-generic
CPU's i5-4690K
GPU's Asus GT640
Nvidia v.340.29
BOINC v.7.2.42

DanHansen@Denmark
Volunteer tester
Avatar
Send message
Joined: 14 Nov 12
Posts: 182
Credit: 4,090,545
RAC: 13,947
Denmark
Message 1478354 - Posted: 17 Feb 2014, 16:43:16 UTC
Last modified: 17 Feb 2014, 16:43:54 UTC

Closing thread...

Almost success. Wrong OS edition, right result!

Will be followed up by a todo, if we succeed, in one of these two threads:

http://setiathome.berkeley.edu/forum_thread.php?id=74108
http://setiathome.berkeley.edu/forum_thread.php?id=73032
.
____________
Project Headless Linux Multiple GPU Boinc Servers
Ubuntu Server 14.04.1 64bit
Kernel 3.13.0-32-generic
CPU's i5-4690K
GPU's Asus GT640
Nvidia v.340.29
BOINC v.7.2.42

Questions and Answers : Unix/Linux : Ubuntu Server 12.04 - Nvidia GeForce 610 inst. OK, BUT lm-sensors affected! No temp for GPU anymore

Copyright © 2014 University of California