Setting up Linux to crunch CUDA90 and above for Windows users

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 72 · 73 · 74 · 75 · 76 · 77 · 78 . . . 162 · Next

AuthorMessage
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1954489 - Posted: 9 Sep 2018, 5:44:19 UTC - in response to Message 1954474.  

Although I can install the repository nvidia 390 drivers, whenever I try to use the recovery method for 396 drivers the system goes off into limbo when I try to enable the network. :(

Stephen,
It is my understanding that you don't need anything beyond the nvidia 390 drivers to run CUDA91 (I would swear Tbar said so) so I would simply stop trying to push that bolder up the hill and get on with the rest of it.
The worst that could happen with having 390 installed is "computation errors" rather than melting down the computer.
HTH,
Tom


. . Thanks Tom, yes TBar has made that quite clear and it is very good to know. But the problem using recovery mode is one of the issues plaguing me and I am trying to get past them. I am seeking a solution rather than a compromise, but I may have to settle for the latter, if I can resolve the other issues.

Stephen

<shrug>
ID: 1954489 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1954491 - Posted: 9 Sep 2018, 6:05:09 UTC - in response to Message 1954489.  

Since the Ubuntu main repository has the 390 drivers and you have been told that that is CUDA91, why are you even messing with recovery mode? Install Ubuntu. Choose the 390 Nvidia drivers and install them with the Software Updater. Reboot and unpack the TBar CUDA91 package. Satisfy the dependencies by installing libwebkitgtk-1.0 and maybe libcurl3. Start BOINC and crunch.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1954491 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1954585 - Posted: 10 Sep 2018, 2:12:47 UTC

A little update on the Missing Pulses. Since moving to the Older version of BOINC the problem seems much less common although I have found at least one task with missed pulses since changing, Best pulse: peak=0. I also found a similar task on the Linux machine I sometimes use for browsing. I decided to see if the same problem existed on the other two machines, and yes, in fact you don't even have to actively browse, just open FireFox and leave it minimized for a couple of hours. That will do it;
https://setiathome.berkeley.edu/results.php?hostid=6796479&state=5
https://setiathome.berkeley.edu/results.php?hostid=8097309&state=5
Sometimes the only way to stop the problem after it begins is to Reboot the machine.
Those two machines have been changed to the Older version of BOINC since those Errors.

The Postman runs on Sunday? Apparently so in some locations. The New GPU arrived Today, a day early.
So, we'll see how well a 1070 does against the Web Browser problem. It's definitely a step up from the 1050Ti.
ID: 1954585 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1954613 - Posted: 10 Sep 2018, 7:25:48 UTC - in response to Message 1954585.  

A little update on the Missing Pulses. Since moving to the Older version of BOINC the problem seems much less common although I have found at least one task with missed pulses since changing, Best pulse: peak=0. I also found a similar task on the Linux machine I sometimes use for browsing. I decided to see if the same problem existed on the other two machines, and yes, in fact you don't even have to actively browse, just open FireFox and leave it minimized for a couple of hours. That will do it;
https://setiathome.berkeley.edu/results.php?hostid=6796479&state=5
https://setiathome.berkeley.edu/results.php?hostid=8097309&state=5
Sometimes the only way to stop the problem after it begins is to Reboot the machine.
Those two machines have been changed to the Older version of BOINC since those Errors.

The Postman runs on Sunday? Apparently so in some locations. The New GPU arrived Today, a day early.
So, we'll see how well a 1070 does against the Web Browser problem. It's definitely a step up from the 1050Ti.


. . OK, I am just throwing this out there, it may mean nothing. I generally have a browser open (Firefox) but have yet to experience a task with zero pulses found. I say that because I have yet to have an invalid task and I am sure such a result would be invalid. But I often find Firefox has shut itself down. I am wondering if that is due to some conflict that has been resolved by the shut down action thus preventing a task finding the zero pulse problem. So maybe it is an issue with some particular versions of browsers? Just speculating ...

Stephen

? ?
ID: 1954613 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1954635 - Posted: 10 Sep 2018, 11:56:47 UTC - in response to Message 1954613.  

I'm fairly certain this is one of the strangest problems I've ever encountered with computers. I'm not sure what's going on, other than it took about 20 minutes for a rather new 1070 to suffer the same fate as the other cards. Along with being a pain to get it to stop once it started missing pulses. As I've told Petri a couple times, a small problem in Linux usually becomes a Big problem in OSX. So for now, I've sacrificed a slot and a 750Ti to just run the browser, and told BOINC to ignore that card. So far it seems to be working that way.
ID: 1954635 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1954674 - Posted: 10 Sep 2018, 14:57:47 UTC

possibly be an issue with the motherboard in that one system. you said it only showed up on one of your machines. and looking at your hosts, you're running very old hardware. could be signs of early hardware issues from a worn out old board. maybe a problem with the MCH since the host in question is using a board that controls PCIe via the northbridge. maybe it's over heating? try putting a fan on it, or maybe just going bad from old age/use.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1954674 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1954687 - Posted: 10 Sep 2018, 17:41:43 UTC - in response to Message 1954674.  

Hmmm, that post is almost as strange as the problem first mentioned by Juan back here, As you could see the Cuda 9.2 shows 0 Pulses and the others two shows 17.. Back around that post I also gave a link to another machine running the CUDA 9.2 App with a similar Invalid missing All pulses. That's 2 machines there. If you look up just 2 posts from yours you will see where I mentioned FOUR machines of mine having the Same problem. One Linux and Three Macs. A little above that you will see where I mention a couple of times the Problem Completely Disappears if I just go back to zi3v CUDA90.

How you arrive at a problem on just one old motherboard is itself quite a mystery.

The symptoms indicate a Small problem on Linux, which as many times before, has turned into a Big problem in OSX. Also, the problem doesn't exist in zi3v.
The next step would be for some people running Linux to launch FireFox in the background and see who has the problem. I might try that myself a little later.
ID: 1954687 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1954693 - Posted: 10 Sep 2018, 18:42:40 UTC
Last modified: 10 Sep 2018, 18:46:44 UTC

2 or 3 machines out of the hundreds or more running the app? it's certainly possible for more than one machine to have a slight motherboard issue that could have a problem with the PCIe signals in some way. i did check the other system you linked, but it appears that it's invalids have dropped off so it's no longer able to view.

i think it's safe to say it's not the GPU in question since it tests fine in other systems, and the problem stays with that system no matter the GPUs that are in it, but there are lots of components handling the data between the GPU and the software. and to be fair, ALL of your systems have very old hardware. if its not faulty, it could simply be an idiosyncrasy to your boards?

you could swap the hard drives from one machine to another, and see where the problem stays, with the software or with the hardware.

i'll look through my invalids, but it might take a while to scrub them all manually.

edit, i'll also try the firefox test on my 2x linux systems with v97b2. i'll even setup a tab reloader and have it refresh every 5 mins to give it some use
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1954693 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1954694 - Posted: 10 Sep 2018, 19:11:15 UTC
Last modified: 10 Sep 2018, 19:31:58 UTC

My 2 Linux systems dont have many invalids, but i scrubbed through the 80 or so inconclusives. I dont know if this fits the exact criteria, but i just looked for tasks that showed no pulses. I don't know if it has to find only triplets instead, or if the lack of pulses is sufficient.

Computer 1:
Supermicro X9DRi-LN4F+
2x E5-2690(v1)
2x GTX 1060 3GB
Ubuntu 17.10
nvidia drivers 396.45
v0.97b2

tasks:
https://setiathome.berkeley.edu/result.php?resultid=6967019939
https://setiathome.berkeley.edu/result.php?resultid=6967261390 edit: looks like this one validated
https://setiathome.berkeley.edu/result.php?resultid=6962566026

Computer 2:
Supermicro X7DA8+
2x E5440
2x GTX 1050ti 4GB
Ubuntu 17.10
nvidia drivers 396.45
v0.97b2

tasks:
https://setiathome.berkeley.edu/result.php?resultid=6965614049
https://setiathome.berkeley.edu/result.php?resultid=6965932003
https://setiathome.berkeley.edu/result.php?resultid=6964472476
https://setiathome.berkeley.edu/result.php?resultid=6912299711

is this the same problem you are describing? i can't correlate these to any specific FF usage at this time. they dont normally have FF open, and dont normally do anything other than crunch SETI, but i'll open FF occasionally if i'm browsing the forums from the machine, or googling something or whatever.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1954694 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1954698 - Posted: 10 Sep 2018, 20:05:23 UTC - in response to Message 1954693.  
Last modified: 10 Sep 2018, 20:12:10 UTC

...and the problem stays with that system no matter the GPUs that are in it....

Why do you keep saying the problem is just with One system?
I've said a FEW times now the problem exists with FOUR of my Five systems, and probably the only reason it's not 5 out of 5 is that I haven't bothered to check the 5th system.
Strange.
I've also said the problem Disappears when using zi3v a few times as well.

When checking for the Problem you need to look at the Best Pulse, not the reported Pulses. All it means if missing any reported Pulses is there wasn't any reportable Pulses found. If the Best Pulse= 0, that mean NO Pulses were found at all in the task. Except for the Quick Overflows, Most tasks have at least a Best Pulse.
So, look for a task where the Best Pulse = 0.

Concurrent with my recent falling out with the Mac BOINC Versions that keep insisting my Maxwell's & Pascals are Teslas and can't be used, I've decided to go back and recompile the Mac App with an earlier version of bonic-master, somewhere around 7.4.0 and see if that makes a difference. The zi3v App was compiled with 7.5.0, and as stated, it doesn't have the problem.
ID: 1954698 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1954699 - Posted: 10 Sep 2018, 20:11:20 UTC - in response to Message 1954687.  
Last modified: 10 Sep 2018, 20:13:26 UTC

Hmmm, that post is almost as strange as the problem first mentioned by Juan back here, As you could see the Cuda 9.2 shows 0 Pulses and the others two shows 17.. Back around that post I also gave a link to another machine running the CUDA 9.2 App with a similar Invalid missing All pulses. That's 2 machines there. If you look up just 2 posts from yours you will see where I mentioned FOUR machines of mine having the Same problem. One Linux and Three Macs. A little above that you will see where I mention a couple of times the Problem Completely Disappears if I just go back to zi3v CUDA90.

How you arrive at a problem on just one old motherboard is itself quite a mystery.

The symptoms indicate a Small problem on Linux, which as many times before, has turned into a Big problem in OSX. Also, the problem doesn't exist in zi3v.
The next step would be for some people running Linux to launch FireFox in the background and see who has the problem. I might try that myself a little later.

Tbar,

I know the problem is well beyound my knowledge but just to remember, i never use Firefox on this host so, at least on my case, the problem was not Firefox related.

Another interesting fact is, after that, i did not have any new invalids anymore. Not a single one! Even if i still ussing CUDA92 V0.97b2 and running for almost a week after that 24/7.

I expect some because my host crunches about >5K WU/Day as Keith posted in the range of few %.
ID: 1954699 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1954703 - Posted: 10 Sep 2018, 20:19:58 UTC - in response to Message 1954699.  

Yes Juan, and I've mentioned I had the same problem with Safari. So, I'm very aware it happens with other Browsers. The fact is, Most people running at least Ubuntu are using FireFox because it comes Preinstalled, and in my tests the problem exists with FireFox, so, I'm telling people to use what most of them have....FireFox. Now if they want to try it with a different Browser, go for it.
ID: 1954703 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1954709 - Posted: 10 Sep 2018, 20:30:34 UTC - in response to Message 1954698.  
Last modified: 10 Sep 2018, 20:30:55 UTC

Why do you keep saying the problem is just with One system?

probably because you said this in a recent post (top of the page):

Oh, I've done everything possible, including swapping out cards with other machines. The problem was it was always just the one machine having the problem.



i'll rescrub for no best pulse. that's all 0's right?
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1954709 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1954712 - Posted: 10 Sep 2018, 20:40:33 UTC - in response to Message 1954709.  
Last modified: 10 Sep 2018, 20:45:28 UTC

Why do you keep saying the problem is just with One system?

probably because you said this in a recent post (top of the page):

Oh, I've done everything possible, including swapping out cards with other machines. The problem was it was always just the one machine having the problem.



i'll rescrub for no best pulse. that's all 0's right?

Try looking Here;
Posted: 10 Sep 2018, 2:12:47 UTC
A little update on the Missing Pulses. Since moving to the Older version of BOINC the problem seems much less common although I have found at least one task with missed pulses since changing, Best pulse: peak=0. I also found a similar task on the Linux machine I sometimes use for browsing. I decided to see if the same problem existed on the other two machines, and yes, in fact you don't even have to actively browse, just open FireFox and leave it minimized for a couple of hours. That will do it;
https://setiathome.berkeley.edu/results.php?hostid=6796479&state=5
https://setiathome.berkeley.edu/results.php?hostid=8097309&state=5
Sometimes the only way to stop the problem after it begins is to Reboot the machine.
Those two machines have been changed to the Older version of BOINC since those Errors.

The Postman runs on Sunday? Apparently so in some locations. The New GPU arrived Today, a day early.
So, we'll see how well a 1070 does against the Web Browser problem. It's definitely a step up from the 1050Ti.

It's much more recent, and only One or Two posts above one of Your posts.

This is what you will see if the App Doesn't find Any Pulses;
Best pulse: peak=0, time=-2.124e+11, period=0, d_freq=0, score=0, chirp=0, fft_len=0
Some quick overflows end before finding any Pulses, most of the other tasks will at least have a Best Pulse.
ID: 1954712 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1954713 - Posted: 10 Sep 2018, 20:41:13 UTC

My preference has always been Chrome over Firefox. But I did fire up Firefox for a few hours a while back when the Firefox suspicion arose. I never caught a Best Pulse=0 during that period.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1954713 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1954714 - Posted: 10 Sep 2018, 20:46:58 UTC
Last modified: 10 Sep 2018, 20:51:52 UTC

looks like this is the only one lacking a best pulse. but it's also a task that used pfl 64. so maybe that's why. it it was a quick one. 6 seconds.

https://setiathome.berkeley.edu/result.php?resultid=6915440517

since it looks like i'm not seeing this issue on either of my systems. i'll run FF and have it auto-reload the seti forums over and over every 5 mins. if browser activity truly triggers it, it should pop up.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1954714 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1954769 - Posted: 11 Sep 2018, 5:07:21 UTC

So.... I compiled a new Mac App using boinc_client 7.4.23. That is one version before the suspect change here, client: If CUDA driver 6.5 or later is installed, prevent use of NVIDIA GPUs with Compute Capability < 2.0 and show explanation in Event Log and Notices.
The App seems to work as usual, so I placed it on this machine and launched FireFox well over an Hour ago, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=300 So far Device One, which is running the monitor, is Still finding Pulses. Interesting. I'm not holding my breath, the thing could fail at any time, but it does seem to be an improvement, and I will stop FireFox before going to bed.
ID: 1954769 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1954778 - Posted: 11 Sep 2018, 6:42:01 UTC - in response to Message 1954769.  

So.... I compiled a new Mac App using boinc_client 7.4.23. That is one version before the suspect change here, client: If CUDA driver 6.5 or later is installed, prevent use of NVIDIA GPUs with Compute Capability < 2.0 and show explanation in Event Log and Notices.
The App seems to work as usual, so I placed it on this machine and launched FireFox well over an Hour ago, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=300 So far Device One, which is running the monitor, is Still finding Pulses. Interesting. I'm not holding my breath, the thing could fail at any time, but it does seem to be an improvement, and I will stop FireFox before going to bed.


. . Fingers crossed ...

Stephen

:)
ID: 1954778 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1954832 - Posted: 11 Sep 2018, 14:43:43 UTC

my 2 systems have been running over 12 hours with Firefox open and refreshing the SETI main page every 5 mins with a tab reloader.

These ones:
https://setiathome.berkeley.edu/show_host_detail.php?hostid=8390155
https://setiathome.berkeley.edu/show_host_detail.php?hostid=8432395

feel free to scrub through the recent tasks, or check back on them.

so far nothing.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1954832 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1955066 - Posted: 12 Sep 2018, 22:07:51 UTC
Last modified: 12 Sep 2018, 22:08:19 UTC

My dual e5-2670 seti won't start up. Apparently it won't even clear the Seti log (if I found the right file in BOINC). I shut it down without stopping BOINC manager and the tasks, accidently, a while ago, machine boots, browser works. Seti says "nothing".

As a Linux newbie, is there a "log" file I can be pointed to that might tell me something more?

Thanks,
Tom
A proud member of the OFA (Old Farts Association).
ID: 1955066 · Report as offensive     Reply Quote
Previous · 1 . . . 72 · 73 · 74 · 75 · 76 · 77 · 78 . . . 162 · Next

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.