Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 72 · 73 · 74 · 75 · 76 · 77 · 78 . . . 162 · Next
Author | Message |
---|---|
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Although I can install the repository nvidia 390 drivers, whenever I try to use the recovery method for 396 drivers the system goes off into limbo when I try to enable the network. :( . . Thanks Tom, yes TBar has made that quite clear and it is very good to know. But the problem using recovery mode is one of the issues plaguing me and I am trying to get past them. I am seeking a solution rather than a compromise, but I may have to settle for the latter, if I can resolve the other issues. Stephen <shrug> |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Since the Ubuntu main repository has the 390 drivers and you have been told that that is CUDA91, why are you even messing with recovery mode? Install Ubuntu. Choose the 390 Nvidia drivers and install them with the Software Updater. Reboot and unpack the TBar CUDA91 package. Satisfy the dependencies by installing libwebkitgtk-1.0 and maybe libcurl3. Start BOINC and crunch. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
A little update on the Missing Pulses. Since moving to the Older version of BOINC the problem seems much less common although I have found at least one task with missed pulses since changing, Best pulse: peak=0. I also found a similar task on the Linux machine I sometimes use for browsing. I decided to see if the same problem existed on the other two machines, and yes, in fact you don't even have to actively browse, just open FireFox and leave it minimized for a couple of hours. That will do it; https://setiathome.berkeley.edu/results.php?hostid=6796479&state=5 https://setiathome.berkeley.edu/results.php?hostid=8097309&state=5 Sometimes the only way to stop the problem after it begins is to Reboot the machine. Those two machines have been changed to the Older version of BOINC since those Errors. The Postman runs on Sunday? Apparently so in some locations. The New GPU arrived Today, a day early. So, we'll see how well a 1070 does against the Web Browser problem. It's definitely a step up from the 1050Ti. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
A little update on the Missing Pulses. Since moving to the Older version of BOINC the problem seems much less common although I have found at least one task with missed pulses since changing, Best pulse: peak=0. I also found a similar task on the Linux machine I sometimes use for browsing. I decided to see if the same problem existed on the other two machines, and yes, in fact you don't even have to actively browse, just open FireFox and leave it minimized for a couple of hours. That will do it; . . OK, I am just throwing this out there, it may mean nothing. I generally have a browser open (Firefox) but have yet to experience a task with zero pulses found. I say that because I have yet to have an invalid task and I am sure such a result would be invalid. But I often find Firefox has shut itself down. I am wondering if that is due to some conflict that has been resolved by the shut down action thus preventing a task finding the zero pulse problem. So maybe it is an issue with some particular versions of browsers? Just speculating ... Stephen ? ? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I'm fairly certain this is one of the strangest problems I've ever encountered with computers. I'm not sure what's going on, other than it took about 20 minutes for a rather new 1070 to suffer the same fate as the other cards. Along with being a pain to get it to stop once it started missing pulses. As I've told Petri a couple times, a small problem in Linux usually becomes a Big problem in OSX. So for now, I've sacrificed a slot and a 750Ti to just run the browser, and told BOINC to ignore that card. So far it seems to be working that way. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
possibly be an issue with the motherboard in that one system. you said it only showed up on one of your machines. and looking at your hosts, you're running very old hardware. could be signs of early hardware issues from a worn out old board. maybe a problem with the MCH since the host in question is using a board that controls PCIe via the northbridge. maybe it's over heating? try putting a fan on it, or maybe just going bad from old age/use. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Hmmm, that post is almost as strange as the problem first mentioned by Juan back here, As you could see the Cuda 9.2 shows 0 Pulses and the others two shows 17.. Back around that post I also gave a link to another machine running the CUDA 9.2 App with a similar Invalid missing All pulses. That's 2 machines there. If you look up just 2 posts from yours you will see where I mentioned FOUR machines of mine having the Same problem. One Linux and Three Macs. A little above that you will see where I mention a couple of times the Problem Completely Disappears if I just go back to zi3v CUDA90. How you arrive at a problem on just one old motherboard is itself quite a mystery. The symptoms indicate a Small problem on Linux, which as many times before, has turned into a Big problem in OSX. Also, the problem doesn't exist in zi3v. The next step would be for some people running Linux to launch FireFox in the background and see who has the problem. I might try that myself a little later. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
2 or 3 machines out of the hundreds or more running the app? it's certainly possible for more than one machine to have a slight motherboard issue that could have a problem with the PCIe signals in some way. i did check the other system you linked, but it appears that it's invalids have dropped off so it's no longer able to view. i think it's safe to say it's not the GPU in question since it tests fine in other systems, and the problem stays with that system no matter the GPUs that are in it, but there are lots of components handling the data between the GPU and the software. and to be fair, ALL of your systems have very old hardware. if its not faulty, it could simply be an idiosyncrasy to your boards? you could swap the hard drives from one machine to another, and see where the problem stays, with the software or with the hardware. i'll look through my invalids, but it might take a while to scrub them all manually. edit, i'll also try the firefox test on my 2x linux systems with v97b2. i'll even setup a tab reloader and have it refresh every 5 mins to give it some use Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
My 2 Linux systems dont have many invalids, but i scrubbed through the 80 or so inconclusives. I dont know if this fits the exact criteria, but i just looked for tasks that showed no pulses. I don't know if it has to find only triplets instead, or if the lack of pulses is sufficient. Computer 1: Supermicro X9DRi-LN4F+ 2x E5-2690(v1) 2x GTX 1060 3GB Ubuntu 17.10 nvidia drivers 396.45 v0.97b2 tasks: https://setiathome.berkeley.edu/result.php?resultid=6967019939 https://setiathome.berkeley.edu/result.php?resultid=6962566026 Computer 2: Supermicro X7DA8+ 2x E5440 2x GTX 1050ti 4GB Ubuntu 17.10 nvidia drivers 396.45 v0.97b2 tasks: https://setiathome.berkeley.edu/result.php?resultid=6965614049 https://setiathome.berkeley.edu/result.php?resultid=6965932003 https://setiathome.berkeley.edu/result.php?resultid=6964472476 https://setiathome.berkeley.edu/result.php?resultid=6912299711 is this the same problem you are describing? i can't correlate these to any specific FF usage at this time. they dont normally have FF open, and dont normally do anything other than crunch SETI, but i'll open FF occasionally if i'm browsing the forums from the machine, or googling something or whatever. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
...and the problem stays with that system no matter the GPUs that are in it.... Why do you keep saying the problem is just with One system? I've said a FEW times now the problem exists with FOUR of my Five systems, and probably the only reason it's not 5 out of 5 is that I haven't bothered to check the 5th system. Strange. I've also said the problem Disappears when using zi3v a few times as well. When checking for the Problem you need to look at the Best Pulse, not the reported Pulses. All it means if missing any reported Pulses is there wasn't any reportable Pulses found. If the Best Pulse= 0, that mean NO Pulses were found at all in the task. Except for the Quick Overflows, Most tasks have at least a Best Pulse. So, look for a task where the Best Pulse = 0. Concurrent with my recent falling out with the Mac BOINC Versions that keep insisting my Maxwell's & Pascals are Teslas and can't be used, I've decided to go back and recompile the Mac App with an earlier version of bonic-master, somewhere around 7.4.0 and see if that makes a difference. The zi3v App was compiled with 7.5.0, and as stated, it doesn't have the problem. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Hmmm, that post is almost as strange as the problem first mentioned by Juan back here, As you could see the Cuda 9.2 shows 0 Pulses and the others two shows 17.. Back around that post I also gave a link to another machine running the CUDA 9.2 App with a similar Invalid missing All pulses. That's 2 machines there. If you look up just 2 posts from yours you will see where I mentioned FOUR machines of mine having the Same problem. One Linux and Three Macs. A little above that you will see where I mention a couple of times the Problem Completely Disappears if I just go back to zi3v CUDA90. Tbar, I know the problem is well beyound my knowledge but just to remember, i never use Firefox on this host so, at least on my case, the problem was not Firefox related. Another interesting fact is, after that, i did not have any new invalids anymore. Not a single one! Even if i still ussing CUDA92 V0.97b2 and running for almost a week after that 24/7. I expect some because my host crunches about >5K WU/Day as Keith posted in the range of few %. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Yes Juan, and I've mentioned I had the same problem with Safari. So, I'm very aware it happens with other Browsers. The fact is, Most people running at least Ubuntu are using FireFox because it comes Preinstalled, and in my tests the problem exists with FireFox, so, I'm telling people to use what most of them have....FireFox. Now if they want to try it with a different Browser, go for it. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
Why do you keep saying the problem is just with One system? probably because you said this in a recent post (top of the page): Oh, I've done everything possible, including swapping out cards with other machines. The problem was it was always just the one machine having the problem. i'll rescrub for no best pulse. that's all 0's right? Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Why do you keep saying the problem is just with One system? Try looking Here; Posted: 10 Sep 2018, 2:12:47 UTC A little update on the Missing Pulses. Since moving to the Older version of BOINC the problem seems much less common although I have found at least one task with missed pulses since changing, Best pulse: peak=0. I also found a similar task on the Linux machine I sometimes use for browsing. I decided to see if the same problem existed on the other two machines, and yes, in fact you don't even have to actively browse, just open FireFox and leave it minimized for a couple of hours. That will do it; It's much more recent, and only One or Two posts above one of Your posts. This is what you will see if the App Doesn't find Any Pulses; Best pulse: peak=0, time=-2.124e+11, period=0, d_freq=0, score=0, chirp=0, fft_len=0 Some quick overflows end before finding any Pulses, most of the other tasks will at least have a Best Pulse. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
My preference has always been Chrome over Firefox. But I did fire up Firefox for a few hours a while back when the Firefox suspicion arose. I never caught a Best Pulse=0 during that period. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
looks like this is the only one lacking a best pulse. but it's also a task that used pfl 64. so maybe that's why. it it was a quick one. 6 seconds. https://setiathome.berkeley.edu/result.php?resultid=6915440517 since it looks like i'm not seeing this issue on either of my systems. i'll run FF and have it auto-reload the seti forums over and over every 5 mins. if browser activity truly triggers it, it should pop up. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
So.... I compiled a new Mac App using boinc_client 7.4.23. That is one version before the suspect change here, client: If CUDA driver 6.5 or later is installed, prevent use of NVIDIA GPUs with Compute Capability < 2.0 and show explanation in Event Log and Notices. The App seems to work as usual, so I placed it on this machine and launched FireFox well over an Hour ago, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=300 So far Device One, which is running the monitor, is Still finding Pulses. Interesting. I'm not holding my breath, the thing could fail at any time, but it does seem to be an improvement, and I will stop FireFox before going to bed. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
So.... I compiled a new Mac App using boinc_client 7.4.23. That is one version before the suspect change here, client: If CUDA driver 6.5 or later is installed, prevent use of NVIDIA GPUs with Compute Capability < 2.0 and show explanation in Event Log and Notices. . . Fingers crossed ... Stephen :) |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
my 2 systems have been running over 12 hours with Firefox open and refreshing the SETI main page every 5 mins with a tab reloader. These ones: https://setiathome.berkeley.edu/show_host_detail.php?hostid=8390155 https://setiathome.berkeley.edu/show_host_detail.php?hostid=8432395 feel free to scrub through the recent tasks, or check back on them. so far nothing. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
My dual e5-2670 seti won't start up. Apparently it won't even clear the Seti log (if I found the right file in BOINC). I shut it down without stopping BOINC manager and the tasks, accidently, a while ago, machine boots, browser works. Seti says "nothing". As a Linux newbie, is there a "log" file I can be pointed to that might tell me something more? Thanks, Tom A proud member of the OFA (Old Farts Association). |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.