Ubuntu Server 12.04 - Nvidia GeForce 610 inst. OK, BUT lm-sensors affected! No temp for GPU anymore

Questions and Answers : Unix/Linux : Ubuntu Server 12.04 - Nvidia GeForce 610 inst. OK, BUT lm-sensors affected! No temp for GPU anymore
Message board moderation

To post messages, you must log in.

AuthorMessage
DanHansen@Denmark
Volunteer tester
Avatar

Send message
Joined: 14 Nov 12
Posts: 194
Credit: 5,881,465
RAC: 0
Denmark
Message 1475211 - Posted: 10 Feb 2014, 14:27:03 UTC

Hi,

The Rack mounted Ubuntu Linux BOINC Server
Ubuntu Server 12.04.4 64Bit
Intel i5/4Gb RAM/Asus MB.
MSI GeForce GT 610 2 Gb
LM-Sensors
BOINC

Link to history of the issue: http://setiathome.berkeley.edu/forum_thread.php?id=73032&postid=1475153#1475153

Well, we succeeded in installing the GeForce GT610 on Ubuntu Server 12.04 in a non graphical environment! (currently making a todo) But, before we started making changes, lm-sensors showed a third result.
Now it only shows two results. The third result was the GPU. Why is this gone? Yes, because of the installing of the Nvidia driver, but what can we do two get it back? This is how it looks now. Unfortunately I didn't save a screendump of the sensors before, but there was a third result! I seem to remember it was isa-bus something.. not much, I know..

acpitz-virtual-0
Adapter: Virtual device
temp1:        +27.8°C  (crit = +106.0°C)
temp2:        +29.8°C  (crit = +106.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +52.0°C  (high = +85.0°C, crit = +105.0°C)
Core 0:         +47.0°C  (high = +85.0°C, crit = +105.0°C)
Core 1:         +49.0°C  (high = +85.0°C, crit = +105.0°C)
Core 2:         +52.0°C  (high = +85.0°C, crit = +105.0°C)
Core 3:         +47.0°C  (high = +85.0°C, crit = +105.0°C)


The reason I need this, is because I'm re-doing a script which will control the the heat of multiple CPU's, GPU's, harddrives.

Any Ideas?
Project Headless CLI Linux Multiple GPU Boinc Servers
Ubuntu Server 14.04.1 64bit
Kernel 3.13.0-32-generic
CPU's i5-4690K
GPU's GT640/GTX750TI
Nvidia v.340.29
BOINC v.7.2.42

ID: 1475211 · Report as offensive
Profile Bil

Send message
Joined: 27 Jan 01
Posts: 76
Credit: 1,887,795
RAC: 0
Latvia
Message 1475214 - Posted: 10 Feb 2014, 14:45:40 UTC - in response to Message 1475211.  
Last modified: 10 Feb 2014, 14:45:53 UTC

use
nvidia-smi -a |grep Gpu
ID: 1475214 · Report as offensive
DanHansen@Denmark
Volunteer tester
Avatar

Send message
Joined: 14 Nov 12
Posts: 194
Credit: 5,881,465
RAC: 0
Denmark
Message 1475483 - Posted: 11 Feb 2014, 1:57:03 UTC - in response to Message 1475214.  

Hi Bil,


Well thank you for that ;)

It works, pretty nice, but I would like my lm-sensor result to return. Because of the script.

# nvidia-smi -a |grep Gpu
Gpu : N/A
Gpu : 36 C

Well, it can be done with this, no question about it. But do you have an idea why it dissapeared after installation of the driver? I found a site where, I guess what they are talking about is the same issue, almost. And they think it's because of the missing Xserver (removal of nouveau, is this a part of x-server?) Do you know that?
Project Headless CLI Linux Multiple GPU Boinc Servers
Ubuntu Server 14.04.1 64bit
Kernel 3.13.0-32-generic
CPU's i5-4690K
GPU's GT640/GTX750TI
Nvidia v.340.29
BOINC v.7.2.42

ID: 1475483 · Report as offensive
Profile Bil

Send message
Joined: 27 Jan 01
Posts: 76
Credit: 1,887,795
RAC: 0
Latvia
Message 1475567 - Posted: 11 Feb 2014, 6:58:12 UTC - in response to Message 1475483.  

i think, free nvidia driver ( nouveau) have some integrations with lm-sensors, legacy driver - not.
i encounter the same situation on slackware64 14.0 amd with ati 4350 card - with free driver who comes with kernel, "sensors" show GPU temperature also, but when i switch to closed drivers - catalyst 13.1 - sensors stop show gpu temp.
i use aticonfig --odgt
ID: 1475567 · Report as offensive
DanHansen@Denmark
Volunteer tester
Avatar

Send message
Joined: 14 Nov 12
Posts: 194
Credit: 5,881,465
RAC: 0
Denmark
Message 1475722 - Posted: 11 Feb 2014, 18:24:06 UTC - in response to Message 1475567.  

Hi Bil,


i encounter the same situation on slackware64 14.0 amd with ati 4350 card - with free driver who comes with kernel, "sensors" show GPU temperature also, but when i switch to closed drivers - catalyst 13.1 - sensors stop show gpu temp.
i use aticonfig --odgt

Like my command "# nvidia-smi -a |grep Gpu".. OK, I see.

In another post, Guy writes to me:
In my step by step, I blacklisted nouveau and then deleted it before installing the nvidia driver, so I was not asked if I wanted to disable it. From what I've read, allowing nvidia to disable nouveau causes other problems.

Do you think it will help, if the driver gets installed right? Or is it still a problem like it is in your case? I have to ask, because I'm not that good yet. Sorry for the newbie questions o)
Project Headless CLI Linux Multiple GPU Boinc Servers
Ubuntu Server 14.04.1 64bit
Kernel 3.13.0-32-generic
CPU's i5-4690K
GPU's GT640/GTX750TI
Nvidia v.340.29
BOINC v.7.2.42

ID: 1475722 · Report as offensive
DanHansen@Denmark
Volunteer tester
Avatar

Send message
Joined: 14 Nov 12
Posts: 194
Credit: 5,881,465
RAC: 0
Denmark
Message 1475853 - Posted: 12 Feb 2014, 0:18:14 UTC - in response to Message 1475722.  
Last modified: 12 Feb 2014, 0:34:04 UTC

Hello again,


Regarding the GPU/Graphic card as a cruncher. First of all, I'm changing to Asus GeForce 640! Because it has a lot better performance and because it can endure a lot of heat. This I learned from you guys. And thank you for that ;)

Now I saw, because BilBg showed me, that the GPU is indeed doing some work, but not much! Look here:

Here's a sample of the problem:

name: ps_140207_12281_43_1
WU name: ps_140207_12281_43
project URL: http://asteroidsathome.net/boinc/
report deadline: Fri Feb 21 01:04:53 2014
ready to report: no
got server ack: no
final CPU time: 0.000000
state: downloaded
scheduler state: scheduled
exit_status: 0
signal: 0
suspended via GUI: no
active_task_state: EXECUTING
app version num: 10111
checkpoint CPU time: 3.783987
current CPU time: 79419.500000
fraction done: 0.044290
swap size: 17325015040.000000
working set size: 24055808.000000
estimated CPU time remaining: 3446.603843


30 sek. later i checked again, and now est. CPU time is 6 sec. more!?!?

name: ps_140207_12281_43_1
WU name: ps_140207_12281_43
project URL: http://asteroidsathome.net/boinc/
report deadline: Fri Feb 21 01:04:53 2014
ready to report: no
got server ack: no
final CPU time: 0.000000
state: downloaded
scheduler state: scheduled
exit_status: 0
signal: 0
suspended via GUI: no
active_task_state: EXECUTING
app version num: 10111
checkpoint CPU time: 3.783987
current CPU time: 79551.560000
fraction done: 0.044290
swap size: 17325015040.000000
working set size: 24055808.000000
estimated CPU time remaining: 3452.179189


I think its probably not doing anything after all! Can't find 1 CUDA55 finished! Take a look, and let me know what you guys think: http://asteroidsathome.net/boinc/results.php?hostid=74516&offset=0&show_names=0&state=1&appid=

And, I did change the setup from 20% use of GPU to 100%. No change in heat or anything..

.
Project Headless CLI Linux Multiple GPU Boinc Servers
Ubuntu Server 14.04.1 64bit
Kernel 3.13.0-32-generic
CPU's i5-4690K
GPU's GT640/GTX750TI
Nvidia v.340.29
BOINC v.7.2.42

ID: 1475853 · Report as offensive
DanHansen@Denmark
Volunteer tester
Avatar

Send message
Joined: 14 Nov 12
Posts: 194
Credit: 5,881,465
RAC: 0
Denmark
Message 1475934 - Posted: 12 Feb 2014, 4:10:12 UTC

Hi,


OK, my new todo seems to have done the trick. I'm not sure though. current CPU time looks funny.

The estimated CPU time looks OK, it gets smaller now. But this is not what I expected of a GPU! My i5 processor is 100 times faster. Or am I mistaking?
1) -----------
   name: ps_140207_12281_43_1
   WU name: ps_140207_12281_43
   project URL: http://asteroidsathome.net/boinc/
   report deadline: Fri Feb 21 01:04:53 2014
   ready to report: no
   got server ack: no
   final CPU time: 0.000000
   state: downloaded
   scheduler state: scheduled
   exit_status: 0
   signal: 0
   suspended via GUI: no
   active_task_state: EXECUTING
   app version num: 10111
   checkpoint CPU time: 5.328888
   current CPU time: 5.385871
   fraction done: 0.063443
   swap size: 17325019136.000000
   working set size: 24006655.926331
   estimated CPU time remaining: 25146.133314


About a minut later:
1) -----------
   name: ps_140207_12281_43_1
   WU name: ps_140207_12281_43
   project URL: http://asteroidsathome.net/boinc/
   report deadline: Fri Feb 21 01:04:53 2014
   ready to report: no
   got server ack: no
   final CPU time: 0.000000
   state: downloaded
   scheduler state: scheduled
   exit_status: 0
   signal: 0
   suspended via GUI: no
   active_task_state: EXECUTING
   app version num: 10111
   checkpoint CPU time: 5.702561
   current CPU time: 5.741095
   fraction done: 0.069188
   swap size: 17325019136.000000
   working set size: 24006656.000000
   estimated CPU time remaining: 24931.730650

Project Headless CLI Linux Multiple GPU Boinc Servers
Ubuntu Server 14.04.1 64bit
Kernel 3.13.0-32-generic
CPU's i5-4690K
GPU's GT640/GTX750TI
Nvidia v.340.29
BOINC v.7.2.42

ID: 1475934 · Report as offensive
Profile Bil

Send message
Joined: 27 Jan 01
Posts: 76
Credit: 1,887,795
RAC: 0
Latvia
Message 1476035 - Posted: 12 Feb 2014, 8:17:49 UTC - in response to Message 1475722.  

no, i think, that problem is in association in driver. in free drivers you got temperature via sensors, in closed ( proprietary) - not. and way in what you install a proprietary driver not change that thing.
maybe i am wrong, but thats is what i think.

btw, i be happy if you share your temperature monitoring script.
ID: 1476035 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1476068 - Posted: 12 Feb 2014, 9:52:54 UTC - in response to Message 1475934.  

You have one Windows computer:
http://setiathome.berkeley.edu/show_host_detail.php?hostid=7104439

If it is on the same LAN as your Linux computers, please make your life easier by installing (on Windows):
BoincTasks
http://www.efmer.eu/boinc/boinc_tasks/

"The program should run on Windows 2003 / Windows XP / Windows Vista / Windows 7 / Windows 8 as well as on Linux and Mac, with Wine."

Then use BoincTasks to monitor your other Linux computers
(it gives much more info than boinccmd)
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1476068 · Report as offensive
DanHansen@Denmark
Volunteer tester
Avatar

Send message
Joined: 14 Nov 12
Posts: 194
Credit: 5,881,465
RAC: 0
Denmark
Message 1477442 - Posted: 15 Feb 2014, 3:20:30 UTC - in response to Message 1476068.  
Last modified: 15 Feb 2014, 3:36:31 UTC

Hi BilBg,
If it is on the same LAN as your Linux computers, please make your life easier by installing (on Windows):
BoincTasks
http://www.efmer.eu/boinc/boinc_tasks/

I use BoincView on that 1 windows computer. But I will try the one you suggest! I didn't know which one to choose! Thanks for making my life easier ;)


Hi Bil,
no, i think, that problem is in association in driver. in free drivers you got temperature via sensors, in closed ( proprietary) - not. and way in what you install a proprietary driver not change that thing.
maybe i am wrong, but that's is what i think.

btw, i be happy if you share your temperature monitoring script.


Thanks for that one ;)

Regarding script, I'll be more than happy to share it with you ;) I've made a script, with help from other scripts like it, which runs with 1 processor, and a script which runs with several processors. I working on it, so that it works fine with both types, single CPU's and multiple CPU's. I also have a script for the HDD. Both scripts work with lm-sensors. BUT, because of my driver problems, I may need to make a script for the GPU too. I wanted to make it all in one script, but because of the driver problems lm-sensors can't "see" the GPU any more. Maybe if we get it to work properly, the GPU result will reappear!

But the script is your of course! And I can make you a list of all the sites I used to get this far, if you need it. It's a shell script by the way ".sh" . I only use Ubuntu Servers for BOINC as you know. I configures the script to be controlled by CRON every minute, and then if one CPU gets hotter than, e.g. 40 degrees Celsius, it warns by email and adds to a log file. If one CPU gets hotter than e.g. 50 degrees Celsius it alerts by email, adds to a log file and shuts the computer down! But, please let me finis it properly ;) Just need this Nvidia stuff and BOINC to go together ;)

.
Project Headless CLI Linux Multiple GPU Boinc Servers
Ubuntu Server 14.04.1 64bit
Kernel 3.13.0-32-generic
CPU's i5-4690K
GPU's GT640/GTX750TI
Nvidia v.340.29
BOINC v.7.2.42

ID: 1477442 · Report as offensive
DanHansen@Denmark
Volunteer tester
Avatar

Send message
Joined: 14 Nov 12
Posts: 194
Credit: 5,881,465
RAC: 0
Denmark
Message 1478354 - Posted: 17 Feb 2014, 16:43:16 UTC
Last modified: 17 Feb 2014, 16:43:54 UTC

Closing thread...

Almost success. Wrong OS edition, right result!

Will be followed up by a todo, if we succeed, in one of these two threads:

http://setiathome.berkeley.edu/forum_thread.php?id=74108
http://setiathome.berkeley.edu/forum_thread.php?id=73032
.
Project Headless CLI Linux Multiple GPU Boinc Servers
Ubuntu Server 14.04.1 64bit
Kernel 3.13.0-32-generic
CPU's i5-4690K
GPU's GT640/GTX750TI
Nvidia v.340.29
BOINC v.7.2.42

ID: 1478354 · Report as offensive

Questions and Answers : Unix/Linux : Ubuntu Server 12.04 - Nvidia GeForce 610 inst. OK, BUT lm-sensors affected! No temp for GPU anymore


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.