Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 91 · 92 · 93 · 94 · 95 · 96 · 97 . . . 162 · Next
Author | Message |
---|---|
Sleepy Send message Joined: 21 May 99 Posts: 219 Credit: 98,947,784 RAC: 28,360 |
Just in case someone else experienced the same problem. Yesterdays was a bit nightmarish, since every few minutes the BOINC client on one of two machines would stop and I needed to reactivate it manually (eventually, I put a software watchdog to automate the process, though it has not worked tonight :-( ). It usually happened amidst a GPU WUs work. It definitely was related to GPU working, since snoozing GPU work would eliminate the problem. This morning everything seems to work well as usual. What have I changed before the problem manifested? Nothing, apart standard updates to the system. What have I done to solve the problem? Basically nothing, apart a couple of shutdown/reboots with no immediate improvements. I do not know if this was caused by a special temporary strand of Arecibo WUs which made my system hiccup. I am not overclocking. Also, it is Winter here, therefore not particularly warm. Good crunching to everybody. |
Joe Januzzi Send message Joined: 13 Apr 03 Posts: 54 Credit: 307,134,110 RAC: 492 |
Just in case someone else experienced the same problem. Sleepy, I had the same problem too. Like you, I tried suspending GPU work but it still did it after about 5 minutes on my system. When I suspended Network activity, I was able to finish all my WU's without a problem. I wonder if it had something to do with the stuck uploads? BoincTasks started acting up at that time and still is, so I stopped using it. Real Join Date: Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC Try to learn something new everyday. |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Just in case someone else experienced the same problem. I am getting the error message "Boinc Manager exited 3 times in [5 minutes/15 minutes], do you want to restart?". Is that the same error message you two are getting? Tom A proud member of the OFA (Old Farts Association). |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
I have just taken 2 of my 5 gpus offline. Will see if the problem continues. Tom A proud member of the OFA (Old Farts Association). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
If you see that kind of message, the compute primitives in ComputeCache have been corrupted. The computer is segfaulting on the application and offers to restart the app. If you delete the contents of ComputeCache and restart BOINC, it should clear up. But then investigate why the compute primitives got corrupted. Too much overclocking on the card is likely the reason. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
If you see that kind of message, the compute primitives in ComputeCache have been corrupted. The computer is segfaulting on the application and offers to restart the app. If you delete the contents of ComputeCache and restart BOINC, it should clear up. But then investigate why the compute primitives got corrupted. Too much overclocking on the card is likely the reason. Thank you for the diagnosis. Since I am a bit confused about the terminology let me ask where exactly is the "ComputeCache"? Are you talking about the folder where all the downloaded tasks from Seti are? If yes that was yes, which is better "reset the project" or take down Boinc Manager and delete all the data files? Thank you. Tom A proud member of the OFA (Old Farts Association). |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
he's talking about things more specific to how the GPU is interacting with the OS. you can clear it with some commands in the Terminal, or an easier way just reboot the computer. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
he's talking about things more specific to how the GPU is interacting with the OS. Ah, as you know I am usually only upto "simple" solutions. So reboot it is. Since I have dropped off two of my slower gpu's will see if that might be "the issue". I had managed to get it to boot/run with 5 GPUs after turning on the "upper memory" for PCIe option in the bios. It is a shame I can't find that in the AMD bios. Tom Latest URL for the system under discussion is: https://setiathome.berkeley.edu/show_host_detail.php?hostid=8661108 A proud member of the OFA (Old Farts Association). |
Sleepy Send message Joined: 21 May 99 Posts: 219 Credit: 98,947,784 RAC: 28,360 |
After another short hiccup (while uploading where slow, but it may be a coincidence), now everything is working fine. I was not receiving any error message, I just saw boinc-client go down. For the record, crisis after crisis, yesterday there also was an update to libcurl which caused my version of Boinc (the 7.4.4 by TBar) to go down as well. So I switched to the repository client. Today I received an update from the special repository with the libcurl34 package and also this sorted out. After months with everything going smoothly by itself and getting into trouble only when experimenting too "hard", these days where a bit shaky without me doing anything to cause it... Good crunching! |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
If you see that kind of message, the compute primitives in ComputeCache have been corrupted. The computer is segfaulting on the application and offers to restart the app. If you delete the contents of ComputeCache and restart BOINC, it should clear up. But then investigate why the compute primitives got corrupted. Too much overclocking on the card is likely the reason. No the ComputeCache is the folder in Linux and Windows where the compute kernels or primitives are generated for OpenCL and CUDA tasks. Ever notice the messages in stderr.txt on the first task computed with a new driver or new card. Something along the lines of "can't find so and so file, . . . .generating. That is the application generating the compute primitives for the API platform. It only has to do it once for each driver or card. Unless they get buggered up and any task referencing the corrupted files will fail. The folder or directory is in different places for each OS. For Windows the folder or directory is located in C:\Users\[User_Name]\AppData\Roaming\NVIDIA\ComputeCache For Linux the ComputeCache is located in the hidden folder in /home/[login_user_name]/.nv/ComputeCache Just delete all the folders and the index file in the directory. The primitives get regenerated the first time a gpu task is started after restarting BOINC. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
After another short hiccup (while uploading where slow, but it may be a coincidence), now everything is working fine. You have to be aware of the client's dependence on libcurl. TBar version compiled on older distros and used the libcurl3 library. But the latest distros past 18.10 deprecated libcurl3 and removed it from the sources. 18.04 straddles both camps. It ships with libcurl4 stock but still has the older libcurl3 library in its software sources for downloading and substituting. Any new package installation may remove libcurl3 and install the stock libcurl4 so you have to watch what a package intends to install and what it is going to remove. One way to get around this issue as you discovered is to use the curl34 ppa package which ships a libcurl4 library that has both libcurl3 and libcurl4 in the same library. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Joe Januzzi Send message Joined: 13 Apr 03 Posts: 54 Credit: 307,134,110 RAC: 492 |
|
J. Mileski Send message Joined: 9 Jun 02 Posts: 632 Credit: 172,116,532 RAC: 572 |
After another short hiccup (while uploading where slow, but it may be a coincidence), now everything is working fine. To make it easy for others: sudo add-apt-repository ppa:xapienz/curl34 sudo apt-get update |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
I did an update on a Ubuntu 14 computer earlier today and noticed it loaded a new libcurl3 with the LibreOffice update. That could be what is breaking things on UB 18. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
No the stock libcurl library is libcurl4 in Ubuntu 18.04 and every release since then. That is why the very first installation instruction in TBar's BOINC versions says you have to install the older libcurl3 library to satisfy the dependency of his client. His clients were static linked in compiling on Ubuntu 16 I believe where the libcurl3 library was the default library. Also why the manager needs the libwebkitgtk-1.0 library because the manager is static compiled with the WxWidgets. If the LibreOffice update updated the libcurl library to libcurl4 on Ubuntu 14.04, then you would have run into the same issue. At least with that older distro, you can put back the libcurl3 library with no problems. Or use the curl34 ppa library. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Starting over again. I have a clean install of Lubuntu without allowing any of the "updates since the image was created" to be applied. I am going to see if that clears all the mysterious "system errors" I have been getting as well as all the crap that seems to be determined to "rain on my parade" :) Just think, without Tbars efforts after petri's creative programming I would probably still be running Windows 10 and the stock apps on ALL my computers instead one or two :) Tom A proud member of the OFA (Old Farts Association). |
Joe Januzzi Send message Joined: 13 Apr 03 Posts: 54 Credit: 307,134,110 RAC: 492 |
I have a question. I switched Boinc from 7.8.3 to 7.4.44 Everything is working fine, except for I can't get it to increase my WU's limit size. I tried changing the minimum work and the Max work buffer but with no luck. My settings for those are 10.00 and 0.10 I would appreciate any help. Thanks Real Join Date: Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC Try to learn something new everyday. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
We are all still limited to 100 tasks per device. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I have a question. Think you might be confused. We are still limited by the servers to 100 tasks per device. What the 7.4.44 client allows is to increase the max tasks allowed per host up to 3000 tasks from the standard 1000 tasks that 7.8.3 allows. To get more tasks onto the host requires rescheduling. Visit the GUPPI rescheduler thread to read how to reschedule. https://setiathome.berkeley.edu/forum_thread.php?id=79954&sort_style=5&start=675 Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Joe Januzzi Send message Joined: 13 Apr 03 Posts: 54 Credit: 307,134,110 RAC: 492 |
|
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.