Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 155 · 156 · 157 · 158 · 159 · 160 · 161 . . . 162 · Next
Author | Message |
---|---|
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
good to know. here's a new link to the package of builds: https://drive.google.com/open?id=1ZXl8naZRdfTfozWUzZWAnS21keu5CYCH I fixed the MP file. since I don't have any Maxwell cards, that was the one I didn't test. but you don't necessarily have to re-test for missed pulse, you've already shown it. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
A very long shoot... Could this "mistake" leave us to the source of the problem related by Tbar? Not know what lines was removed, but the output file seems very similar to those who are generated when the problems appears. Maybe is interesting to look more closely to this lines. Or that could be just an incredible coincidence? Who knows? |
elec999 Send message Joined: 24 Nov 02 Posts: 375 Credit: 416,969,548 RAC: 141 |
Problem with the card or the slot on the motherboard it is plugged into. Try moving to a different slot. Check PCIe power connectors on the card for burned pins. Change PCIe power cables. Try a different power supply. Card always come back after reboot. I will try to schedule some sort of automatic rebooting. Ubuntu is been driving me crazy. Sometimes it fails to boot, sits at black screen and then need to power off system complete and try again. This happens on multiple systems. I wish I could get an AMD board with IPMI so I can do all the work remotely without the need to be there physically. Is there a lighter or better distro I can try? |
elec999 Send message Joined: 24 Nov 02 Posts: 375 Credit: 416,969,548 RAC: 141 |
good to know. Can you remind me whats the difference between the three versions? For my 2060, 1070, 1080 cards which one should I try? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
The PT version which stands for Pascal-Turing. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
good to know. what Keith said. your 2060 is Turing your 1070 and 1080 are Pascal Maxwell cards are the cards in the GTX 900 series, and the GTX 750ti. so the PT or MPT files will work. but PT might be a little faster in some cases. see here for more info about the mutex enabled builds: https://setiathome.berkeley.edu/forum_thread.php?id=84933 Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Phud Redux Send message Joined: 20 Apr 16 Posts: 270 Credit: 2,976,272 RAC: 1 |
so could someone check my work? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
so could someone check my work? Since you are running Linux, you could get a lot more production out your Nvidia cards by running the special app that is provided by the AIO installer. http://www.arkayn.us/lunatics/BOINC.7z Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
so could someone check my work? . . As Keith said, you would do best to install the AIO on the two machines with a) - the 2 x GTX1060s and b) - with the RTX2060. If not then at the very least scroll back several messages and find the link to adding the extra repository so you can add OpenCL functionality to your video drivers and then get your new work as SoG or SaH tasks, much faster than as Cuda60. (See the results on your GTX760) Stephen . . |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
For Tom, What PCIe errors are you seeing? Please post the error text itself, as well as what log exactly. syslog? kern.log? Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
For Tom, Ian, I lost the actual log when I tried to figure out what was going on. I then spent two days trying to get the Launchpad-based Nvidia drivers to fully install. A little while I go I had a brain storm and re-burned the flash drive with my newest copy of Ubuntu 18.04 and I finally got Nvidia 440 to install. What I haven't tested is if it will install AFTER I have run all the security updates. I haven't done that at all, this time. Anyway, I have backgraded to a single Gtx 1060 3GB so that the Gtx 1660 Supers won't get in the way. And I am getting every error I got previously in the Log except the PCIe error. I left the gtx 1060 3GB plugged in and plugged everything else back (10 cards). The mouse/keyboard is extremely laggy but I got this A proud member of the OFA (Old Farts Association). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
The Gnome Log utility only saves the current logs from the start of the latest reboot. You need to look at the system logs from when you had the errors. You can look at them in /var/log/syslog or /var/log syslog.1 Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
i get the feeling he may have overwritten them. he said he reinstalled ubuntu in the other thread. can't really help without more info. but judging from his recent errored tasks, it looks pretty clear that his system and/or driver crashed. several finish file present too long errors (which was fixed in 7.16, maybe use that instead) Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
i get the feeling he may have overwritten them. he said he reinstalled ubuntu in the other thread. I may or may not have the very latest release of the AIO. Is 7.16 the latest? After I turned off the PSU to the "last 2" gpus suddenly the screen is not laggy anymore. I edited the previous message and added the 2nd Screen shot. I have been getting both the PPM failure message and I THINK (but am not sure) I have been getting the gpu time out error. I can't for the life of me remember how I converted the above messages into "PCIe" error messages unless when I was googling around I made that connection. It is still running 9 gpus with everything moved over one row of shot slots since the gpu that is sitting in the long slot is covering 3 short slots. The other question I have is does it help/hinder/who knows to be running the video out the iGPU port? I got a strange message about the video drivers installed (something about manually installed) and may have jumped to the conclusion that part of the problem was running that iGPU off the intel cpu. Tom A proud member of the OFA (Old Farts Association). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
The AIO only has the 7.14.2 client in it. You need to get the later 7.16 clients from our team website. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
https://askubuntu.com/questions/1155263/new-install-desktop-ubuntu-19-04-shows-error-message-ucsi-ccg-0-0008-failed-to I had the same entries in my log. also my computer had a 50 second hang after resuming from suspend during which desktop is black. worth looking into. I never run off the iGPU (only 1 of my systems supports that anyway). I thought i read somewhere that fan control and overclocking of the gpu's wouldn't work if you didnt have X server running on the nvidia cards. maybe this has changed from when I last heard that. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I thought i read somewhere that fan control and overclocking of the gpu's wouldn't work if you didnt have X server running on the nvidia cards. maybe this has changed from when I last heard that. As far as I know, that is still the case. You need X-server to overclock and control fans. I'm running the newer 5.3 kernels and I have never had any issue with the Type C port or controller on my RTX cards. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
https://askubuntu.com/questions/1155263/new-install-desktop-ubuntu-19-04-shows-error-message-ucsi-ccg-0-0008-failed-to It two trys to get this working. It works a LOT better when the module name is spelled with a ccg instead of a cfg (my flying fingers). I was having trouble with the laggy screen/mouse/keyboard and with the Nano editor. But that isn't showing up in the Log anymore. Tom A proud member of the OFA (Old Farts Association). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I'm wondering why you had to mess with it in the first place. I never had to modprobe that module into the kernel. It seems to be handled by the Nvidia drivers by itself. lspci shows a type c usb controller under the Nvidia controller. lspci | grep -i "usb type-c" 08:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1) 0a:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1) Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
I'm wondering why you had to mess with it in the first place. I never had to modprobe that module into the kernel. It seems to be handled by the Nvidia drivers by itself. It was a "target of opportunity" after the Boinc Manager quit/crashed/stopped running. I suppose I could take it out for a test but I want to try to get the MSI B360-F Pro w/i9 cpu to run a couple of 2-6 weeks without interruption. That controller was the only "important" error along with a Pcie gpu complaint that I could find. It might have have been correlation not causation. If it runs without interruption then I will be tempted to disable the blacklist and see if it will run "without interruption". Tom A proud member of the OFA (Old Farts Association). |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.