Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 92 · 93 · 94 · 95 · 96 · 97 · 98 . . . 162 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
So are you going to stick with 7.8.3 for the meantime or eventually move to 7.4.44? Just slight differences in menu layouts. But the 7.4.44 is more conducive to large rescheduling numbers. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Joe Januzzi Send message Joined: 13 Apr 03 Posts: 54 Credit: 307,134,110 RAC: 492 |
So are you going to stick with 7.8.3 for the meantime or eventually move to 7.4.44? Just slight differences in menu layouts. But the 7.4.44 is more conducive to large rescheduling numbers. LOL I just switched back to 7.8.3 but I will be going back to 7.4.44 Real Join Date: Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC Try to learn something new everyday. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Just one caveat about 7.4.44. Some projects won't work with a client that old. MilkyWay is one of them. It has a lower limit of 7.6.31 version or the gpu tasks fail. 3000 tasks on a 3 card host should last for one our Grand Mal outages at 12 hours. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
EdwardPF Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374 |
OK ... here goes; first attempt to do a reschedule. I AssUMe this is the proper protocol: 1) stop BOINC 2) run resched 3) move everything to CPU's 4) restart BOINC (& kill resched - 1 step) 5) wait 3 - 5 minutes (for BOINC to time down to "0 sec's") 6) download to fill queue (in my case 2 GPUs so download about 200 WUs - or less) 7) If BOINC has <1000 WUs go to 1 Yes?? If not, what is the correct (better) protocol? Ed F Edit: Thanks, again, Keith! |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Yes, this is correct. Remember to tally up both the cpu and gpu caches for the total < 1000 task onboard. So once you get to 800 tasks on the cpu, stop because the next request for work will refill your gpu cache with 200. 800 + 200 =1000. If you go over 1000, BOINC will stop requesting work until you get back below 1000 which means retiring enough out of the cpu cache to get back below 800. You also could shift to the 7.4.44 client included in your package. That would allow you up to 3000 tasks rescheduled. The 7.4.44 client is in the zip file in the /docs sub-directory in the BOINC folder. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Joe Januzzi Send message Joined: 13 Apr 03 Posts: 54 Credit: 307,134,110 RAC: 492 |
|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Good to hear you have survived the mega outage Joe. Starting to get a few tasks back in but they are having a hard time downloading. I set the max requested back to default 2 to keep them from going into backoff if a download stalls out. I have been out of work on 4 of 5 hosts since this morning. The slowest host still has some work. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Good to hear you have survived the mega outage Joe. Starting to get a few tasks back in but they are having a hard time downloading. I set the max requested back to default 2 to keep them from going into backoff if a download stalls out. . . I've been out of work (OOW) on 3 hosts since yesterday morning local time ... The early crash start to the outage caught me on the hop .... Stephen :( |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Just to add to your list Edward (if you like) I first run my CPU tasks 10-15 sec then suspend them first. This 'Locks' them as CPU tasks, so when you move tasks back to the GPU later, only GPU downloaded tasks are moved. If just keeps server assigned CPU/GPU tasks running as they were intended. |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Has anyone been bothered by cpu resources being used by invisible to the task manager tasks? I found myself turning up the Boinc Manager to 85% for non-boinc tasks before the cpu would stop suspending. Unfortunately my task manager pegged 100% and the cpu tasks are taking 6+ hours. Yes, a full cold boot fixed the issue. Do I basically need to do a daily cold boot? Tom A proud member of the OFA (Old Farts Association). |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Has anyone been bothered by cpu resources being used by invisible to the task manager tasks? . . Personally I would have done a reboot of boinc manager/client first, since I would suspect that a task had gotten itself into a loop of some kind. Rebooting Windows would have come later. Stephen ? ? |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Has anyone been bothered by cpu resources being used by invisible to the task manager tasks? Re-booted Linux :) Good point, I hadn't re-cycled the Boinc Manager. Just poked around trying to find something in the Task Manager, and changed the "suspend bionc if non-bionc tasks get above XX%". I don't think I tried shutting the manager down and restarting it. Tom A proud member of the OFA (Old Farts Association). |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Has anyone been bothered by cpu resources being used by invisible to the task manager tasks? Well, it did it again. Only this time I shutdown the Boinc Manager. And it hasn't shut down the cpu tasks. The boinc manager hasn't been able to re-connect to them either. Let me look around for the command line shutdown. ---edit--- To install the command line so I can try to shut it down, it wants to install "libcurl4" which as far as I know breaks, the Tbar-all-in-one distro? ---edit---- --edit-- I have been limiting the updates to my 18.04.1 Lubuntu to "security" updates. I have just applied the last set that showed up (seem to mostly be Nvidia related for my 410 driver). Even though it didn't ask for a re-boot after the install I did anyway. Now we will see if the problem shows up on Tuesday or later today. --edit--- Tom A proud member of the OFA (Old Farts Association). |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
From eariler; After another short hiccup (while uploading where slow, but it may be a coincidence), now everything is working fine. To make it easy for others: sudo add-apt-repository ppa:xapienz/curl34 sudo apt-get update |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . You need to shut down and restart the client as well. Have you checked that shutting down manager actually shuts down the client too? Stephen ? ? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Hi ppl, . . With SETI down last weekend I decided to have another shot at getting Linux to work on the new rig. With some help from Keith I got the libcurl34 package installed and that seemed to fix things. Boinc fired up properly and I was able to join a project. With SETI down I joined World Community Grid and the machine was happily crunching their tasks on the CPU cores. I monitored the system for about 20 mins and everything seemed AOK so I went away to do other things. When I came back BOINC had crashed with the explanation there had been too many exits, 3 in under 2 mins. So I tried to restart it but immediately got the same result. I then noticed that the network icon on the control bar was gone. I checked the ethernet port and the Leds were off. Firefox reported the same problems so the system had lost the network connection completely. Thinking the port might have died I rebooted into windows and that worked AOK and the Led's were back on. . . So does anyone have any suggestions how I might restore the ethernet port in Linux when I have no ethernet port ... :( . . I am wondering if there are any 'recovery' tools on the Linux Live disk ... Stephen ? |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
. . Hi ppl, I have almost none expertise on Linux, but i had the same problem on my host about a month ago. And that crashed my host. What i was able to find is there are something on the libcurl34 package who crash the network when you install it and uninstall the old libcurl3. But I was unable to find exactly what is the missing part, apparently the network uses one of the files removed in the uninstall process. After i change to 7.15.0 who uses libcurl4 i never encounter the problem again. But i never tried to uninstall the libcurl4 to see if the same happenings. |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
That was the latest issue. The cpu clients refused to shutdown. That is why I installed the latest security update and started it up again. Tom A proud member of the OFA (Old Farts Association). |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
This is interesting. I noticed Stephen didn't name the Version of Linux he was working with. I decided to try my Ubuntu 18.04.2 system just to see if anything had changed. I installed this system a while back and managed to install the downloaded nVidia driver after a few hours of trying normal methods. I was discouraged by how many files Autoremove had removed, but, it seemed everything still worked. It was already at 18.04.2 LTS this time, so I checked out what version of libcurl was installed using Synaptic. It said I still had libcurl3, and only libcurl3. I have not manually changed anything since the first system and driver install. I then ran the Updates and now have 4.15.0-46-generic. I again checked Synaptic and again it said I only have libcurl3. Naturally, BOINC 7.8.3 works without any trouble on this system. I checked the computer lists and see I'm not the only one, there are a few people with 4.15.0-46-generic running 7.8.3. I'm not sure what to tell you, I suppose I could try curl34 a little later, but, I apparently don't need it at this point.. . Hi ppl, You can find out how to reinstall networking here; https://askubuntu.com/questions/422928/how-to-reinstall-network-manager-without-internet-access This sounds reasonable; sudo apt-get remove --purge network-manager The above command will purge all the packages that was related to the service network-manager.You can download all packages as .deb file using a Ubuntu Live disk and then install it to your original OS. First boot from a Ubuntu Live disk. Once you go there open a terminal and run the below command, sudo apt-get download network-manager* This will download all the network-manager packages to the home directory. Now copy all the .deb packages to a folder in that pen drive or other partitions in your HDD and then reboot to your system. Once you go there open terminal and do the following: cd /path/to/the/directory/where/.deb/files/are/located sudo dpkg -i *.deb The above command will install all the .deb files. Now restart your network-manager by running sudo service network-manager restart Now you have the package network-manager-gnome running again.Let's hope so anyway. |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
If I am understanding you right, this is recommended? Tom A proud member of the OFA (Old Farts Association). |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.