Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 43 · 44 · 45 · 46 · 47 · 48 · 49 . . . 162 · Next
Author | Message |
---|---|
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
OK, you definitely have another sensor besides the coretemp one. . . OK, the news is not good. stephen@Mi-Burrito:~$ sudo service module-init-tools start module-init-tools: unrecognized service stephen@Mi-Burrito:~$ service module-init-tools start module-init-tools: unrecognized service stephen@Mi-Burrito:~$ sensors coretemp-isa-0000 Adapter: ISA adapter Core 0: +32.0°C (high = +76.0°C, crit = +100.0°C) Core 1: +32.0°C (high = +76.0°C, crit = +100.0°C) stephen@Mi-Burrito:~$ sudo modprobe coretemp stephen@Mi-Burrito:~$ ls /etc/modprobe.d alsa-base.conf dkms.conf blacklist-ath_pci.conf fbdev-blacklist.conf blacklist.conf iwlwifi.conf blacklist-firewire.conf mlx4.conf blacklist-framebuffer.conf nvidia-384_hybrid.conf blacklist-modem.conf nvidia-graphics-drivers.conf blacklist-oss.conf nvidia-installer-disable-nouveau.conf blacklist-rare-network.conf vmwgfx-fbdev.conf blacklist-watchdog.conf . . It seems nothing that you would expect to be there is actually there .... ?? Stephen ? |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Ok, it looks like the sensors-detect script DIDN'T add the driver to your modules file and directories like it said it would. It's got to be on your system though. Back to basics. Try this. Post back your results. If it isn't on your system we need to fix that first with an apt-get install. locate coretemp You should get at least this or similar with whatever kernel you are using. keith@Darksider:~$ locate coretemp /lib/modules/4.10.0-40-generic/kernel/drivers/hwmon/coretemp.ko /lib/modules/4.10.0-42-generic/kernel/drivers/hwmon/coretemp.ko /usr/src/linux-headers-4.10.0-40-generic/include/config/sensors/coretemp.h /usr/src/linux-headers-4.10.0-42-generic/include/config/sensors/coretemp.h keith@Darksider:~$ Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Ok, it looks like the sensors-detect script DIDN'T add the driver to your modules file and directories like it said it would. It's got to be on your system though. . . OK here it is .. stephen@Mi-Burrito:~$ locate coretemp /home/stephen/.gconf/apps/psensor/sensors/lmsensor@32@coretemp-isa-0000@32@Core@32@0 /home/stephen/.gconf/apps/psensor/sensors/lmsensor@32@coretemp-isa-0000@32@Core@32@1 /home/stephen/.gconf/apps/psensor/sensors/lmsensor@32@coretemp-isa-0000@32@Core@32@0/%gconf.xml /home/stephen/.gconf/apps/psensor/sensors/lmsensor@32@coretemp-isa-0000@32@Core@32@0/alarm /home/stephen/.gconf/apps/psensor/sensors/lmsensor@32@coretemp-isa-0000@32@Core@32@0/alarm/%gconf.xml /home/stephen/.gconf/apps/psensor/sensors/lmsensor@32@coretemp-isa-0000@32@Core@32@1/%gconf.xml /home/stephen/.gconf/apps/psensor/sensors/lmsensor@32@coretemp-isa-0000@32@Core@32@1/alarm /home/stephen/.gconf/apps/psensor/sensors/lmsensor@32@coretemp-isa-0000@32@Core@32@1/alarm/%gconf.xml /lib/modules/4.4.0-104-generic/kernel/drivers/hwmon/coretemp.ko /lib/modules/4.4.0-66-generic/kernel/drivers/hwmon/coretemp.ko /lib/modules/4.4.0-96-generic/kernel/drivers/hwmon/coretemp.ko /usr/src/linux-headers-4.4.0-104-generic/include/config/sensors/coretemp.h /usr/src/linux-headers-4.4.0-66-generic/include/config/sensors/coretemp.h /usr/src/linux-headers-4.4.0-70-generic/include/config/sensors/coretemp.h /usr/src/linux-headers-4.4.0-72-generic/include/config/sensors/coretemp.h /usr/src/linux-headers-4.4.0-75-generic/include/config/sensors/coretemp.h /usr/src/linux-headers-4.4.0-78-generic/include/config/sensors/coretemp.h /usr/src/linux-headers-4.4.0-79-generic/include/config/sensors/coretemp.h /usr/src/linux-headers-4.4.0-81-generic/include/config/sensors/coretemp.h /usr/src/linux-headers-4.4.0-83-generic/include/config/sensors/coretemp.h /usr/src/linux-headers-4.4.0-87-generic/include/config/sensors/coretemp.h /usr/src/linux-headers-4.4.0-89-generic/include/config/sensors/coretemp.h /usr/src/linux-headers-4.4.0-92-generic/include/config/sensors/coretemp.h /usr/src/linux-headers-4.4.0-93-generic/include/config/sensors/coretemp.h /usr/src/linux-headers-4.4.0-96-generic/include/config/sensors/coretemp.h stephen@Mi-Burrito:~$ . Stephen . |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
OK, so that explains how you got any sensor output BEFORE you installed lm-sensors. Looks like psensor ships with coretemp by default. You have the coretemp.ko driver in the normal and expected places: /lib/modules/4.4.0-104-generic/kernel/drivers/hwmon/coretemp.ko /lib/modules/4.4.0-66-generic/kernel/drivers/hwmon/coretemp.ko /lib/modules/4.4.0-96-generic/kernel/drivers/hwmon/coretemp.ko So back to basics again. Type this: sudo modprobe coretemp Then provide the output from sensors Next type this: gksudo gedit /etc/modules In the gedit window type this and SAVE: # Chip drivers coretemp That should insert the coretemp module into the kernel you booted with. I see you have both a Core2 Duo and and a Core i5. Are you having issues with BOTH systems? I still want the name of the Super I/O monitoring chip on your motherboard so we can add it into the modules. You can download this tool which will probe and identify the SIO chips on the motherboard. It will also tell me at what address to force the chip with modprobe later. This is the link to the Superiotool superiotool.8gz Unpack the tool in your download directory and run "superiotool" from the command line. Post back the resultant output. That will tell me the ITE and SMSC chips that sensors-detect detected. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
OK, so that explains how you got any sensor output BEFORE you installed lm-sensors. Looks like psensor ships with coretemp by default. Burrito:~$ sensors coretemp-isa-0000 Adapter: ISA adapter Core 0: +32.0°C (high = +76.0°C, crit = +100.0°C) Core 1: +32.0°C (high = +76.0°C, crit = +100.0°C)
. . I didn't have to add it, it was already there, added by sensors-detect as it said it would.
. . That is a long story, but briefly if I update the i5 to any release later than 96 it screws up the boot loader and it will not boot from the USB drive. It goes into some kind of shell which has a lot of commands but I don't know what to do with it so I go back to release 96, that works AOK. But for the purpose of this exercise it appears to be in the same basic state as this, and I have added lm-sensors to it as well. I think the unidentified temp sensors may in fact be in the SSD and/or flashdrives as I believe they incorporate such sensors.
. . This is all it provided. Burrito:~/Downloads$ sudo ./superiotool.8 ./superiotool.8: 1: ./superiotool.8: .TH: not found ./superiotool.8: 2: ./superiotool.8: .SH: not found ./superiotool.8: 3: ./superiotool.8: superiotool: not found ./superiotool.8: 4: ./superiotool.8: .SH: not found ./superiotool.8: 5: ./superiotool.8: .B: not found ./superiotool.8: 6: ./superiotool.8: .SH: not found ./superiotool.8: 7: ./superiotool.8: .B: not found ./superiotool.8: 12: ./superiotool.8: Syntax error: "(" unexpected . . Apparently my setup doesn't agree with many things :( Stephen :( |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Hi Stephen, I'm out of suggestions other than being able to sit at your computer to figure things out. The sensors-detect found other sensors available that would likely provide much more sensor output other than the Core2 Duo core temps. But I am unable to figure out what those SIO chips are so am at a standstill. Sorry I couldn't help and made you do so much work for naught. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Hi Stephen, I'm out of suggestions other than being able to sit at your computer to figure things out. The sensors-detect found other sensors available that would likely provide much more sensor output other than the Core2 Duo core temps. But I am unable to figure out what those SIO chips are so am at a standstill. . . No, thanks for the help, it has been interesting. The problem with that superio tool is that it seems to be a shell file (or batch file as I know them) but the syntax seems to be wrong for this flavour of Linux. For what it is worth psensor detects the GPU temp AOK except for when I originally changed to release 104. But I am not convinced that was simply a sensor reading problem and not an actual GPU overheating issue. After installing lm-sensor the temps are now reading normal running 104 but Linux can so often be contrary. The updated Firefox has turned into a real CPU hog when accessing the forums. The worst part of that is that has replaced the common firefox files and is the same when I go back to release 96. But I wish I knew Linux better. I have that script open in sublime text editor and all the lines that are intended to display when run start with " . " then something else before the text. That is what is shown in that printout. So what is .B supposed to do anyway? Or .PP for that matter? If they wanted to display text on the screen why not just use "echo"? . . Another problem since changing to 104 is that using the tab key to complete a file or command name no longer works. I think I will go back to 96 and delete this 104 image ... :( Stephen ?? |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Yeah, that superiotool seems to have a lot of dependencies that aren't readily available. I was just grabbing at straws and trying a brute force approach. I had my own sensor troubles this morning because of the Meltdown security patch and new kernel release thrown at us. My IT87 sensor module no longer loaded and I had nothing but the Nvidia gpu sensor outputs available. Fought it for an hour before I threw in the towel and just decided to download the latest IT87 git clone and recompiled the driver and then reinstalled it. What was frustrating was that the driver was where it was supposed to be, all the scripts were in place and configured correctly and it still wouldn't load. I think it must have had something to do with the security patch and new kernel as there was a lot of firmware and new drivers loaded by the kernel. Something was incompatible with the driver source code from November. Finally got it all back working like it should. Wasn't the only inflicted damage either. The patch trashed all my gpu work for both Einstein and Seti. Juan got bit too and lost all his gpu cache. And I only thought you had to worry about Windows downloading new Nvidia drivers that prevent your gpu from working. Ha Ha.! Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Yeah, that superiotool seems to have a lot of dependencies that aren't readily available. I was just grabbing at straws and trying a brute force approach. . . Yes, I am finding the later Linux upgrades problematical. I guess their standards are slipping ... . . Touch wood, has not trashed BOINC ... yet! Stephen :( |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
. . Worth a look! https://setiathome.berkeley.edu/forum_thread.php?id=80636&postid=1914059 Stephen :) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
. . Hello to anyone still reading this thread. I am running CUDA80 special using TBars complete install package on an i5-6600 with 2 x GTX970s. This rig has never received many AP tasks but those it has received have run AOK. Now with the problems at SETi HQ in sourcing new tapes it has received 5 APs, the first in quite a long time and now they are not playing ball. . . They try to run but immediately halt and the status changes to "waiting to run". I have checked app_info.xml and there is section for the AP app which has the right app files in the folder. There was also a second section referring to an older version of AP which does not have the appropriate apps in the folder. I didn't think this was the problem as it had worked before and I have never before edited/added anything to this file re APs, but I commented out the superfluous section anyway, the APs still will not crunch. . . Anyone have any suggestions about what I might be missing? Stephen ?? |
![]() ![]() ![]() ![]() Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 ![]() ![]() |
A couple of things I can think of since I know you have moved cards around in the past. - Do you have an app_config to run more than 1 AP task? if so, try just 1 at a time. - The cmd_line file might be for a different card with too many -unroll's or mem setup. Could just rename it and let it try stock settings. - The CL files that the AP app builds could be wrong or corrupt. It build one for each card type, so don't be surprised if you see GTX1050 in there, they can be removed and the app will rebuild them when needed. You could delete all the CL files that are card specific and let them rebuild --- BUT DON'T delete the ones that are not ... it is needed, and part app !!!! ![]() |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
The AP gpu app is OpenCL based. Did you install the Nvidia OpenCL driver component? Check your Seti projects directory for both the AP cpu and gpu apps. The gpu app needs the CL file too along with r2751 app. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I think you meant you can remove the old clFFT files that are generated for each new task type. Don't delete the AP r2751 .CL file. You won't run any if that file is missing and the stderr.txt will most assuredly raise alarm if it is missing. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() ![]() Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 ![]() ![]() |
Yes, only the card specific ones i.e.... AP_clFFTplan_GeForceGTX1080 AstroPulse_Kernels_2751_GeForceGTX1080 ... are the ones compiled by the app. The AstroPulse_Kernels_2751.cl (or whatever version) has to remain and defined in the app_info. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
. . Anyone have any suggestions about what I might be missing? The first thing to do is to look into the Slot directory for that AP task and see was is printed on the stderr.txt. It usually prints the error there. Another thing to do would be to compare the AP files in the original Download. I think you will find there are 2 AP Apps there, one for the GPU and one for the CPU. Both are listed in the downloaded app_info.xml. |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
. . Because of the transient nature of stderr files (being overwritten by next task) I never think to look there for things like this. But I was surprised to find a slot for each of these tasks with a massive stderr.txt file in each listing all the attempts to run going right back to the start. It is the same issue all the time. Running on device number: 1 . . Does this relate to the .cl files created for each device type? Or should I be looking elsewhere? I am running only one AP task at a time. From the description is that perhaps the part where it compiles the device specific .cl file? I will check for a file GPU_lock.cpp. [edit] That file is not in the Seti Project folder and there is no /src folder under it. I guess I need more path info to find it. I went right back to / {root} and file manager cannot find that file on this drive. . . Thanks for all the input guys. Stephen <frown> |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
You won't find any .cpp file in the BOINC directories. That is the source file that gets compiled into the AP .CL file. That is just where the compile crapped out in the code. Normally, when you have .CL wisdom file generation failure, it is because of an overzealous AV program preventing the file to be compiled. Do you have any AV program running? I always put the main BOINC directory in Program Files and the hidden /ProgramData directory in my Exclude list for my AV programs. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Just remembered you are talking about a Linux installation, so ignore my last. So in Linux are the directory permissions OK? Are you running repository version of BOINC or TBar version? Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
You won't find any .cpp file in the BOINC directories. That is the source file that gets compiled into the AP .CL file. That is just where the compile crapped out in the code. . . Hmm nope, unless the linux install included one. What else might cause that failure? Stephen ? ? |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.