Setting up Linux to crunch CUDA90 and above for Windows users

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 70 · 71 · 72 · 73 · 74 · 75 · 76 . . . 162 · Next

AuthorMessage
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1953988 - Posted: 6 Sep 2018, 3:20:48 UTC - in response to Message 1953963.  


. . I may have missed a detail but I believe Tom was running zi3v not 0.97b1. Anyway, the deed is done ... :)
Stephen

I think I was running CUDA90. Not sure if that is zi3v or not.
But it was completely painless as an upgrade. My favorite! :)
Tom


. . OK I just checked your older results and you were definitely running zi3v so TBar must have included an updated app section to point that to 0.971b, well done! That makes it a lot safer for the hoi poloi to use :)

Stephen

:)
ID: 1953988 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1954004 - Posted: 6 Sep 2018, 7:22:23 UTC - in response to Message 1953988.  
Last modified: 6 Sep 2018, 7:23:26 UTC

Stephen you are still really confused about the structure of an app_info I guess. No, no section is included to handle zi3v, it is not necessary. Just the usual app_version statement for the new 0.97b2 application. Why don't you download the archive and have a look at the app_info for yourself. Might clear things up for you. As long as you run the new app with its new app_info that is in the archive you will not dump tasks or make ghosts. If you try editing it on your own for some silly reason, all bets are off.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1954004 · Report as offensive     Reply Quote
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1954007 - Posted: 6 Sep 2018, 8:22:13 UTC

The important thing in app_info to avoid changing and making ghosts is the <plan_class> whatever it is originally has to match your tasks in progress, which are listed in the client_state file. If the <plan_class> changes, all your tasks will disappear.

That is why it is recommended to empty your cache first. As long as that doesn't change, your tasks are safe.

My #1 computer is still on
<plan_class>cuda80</plan_class>
Two computers have cuda60. It doesn't matter the number, just as long as they match.
ID: 1954007 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1954033 - Posted: 6 Sep 2018, 12:40:40 UTC
Last modified: 6 Sep 2018, 12:55:01 UTC

Actually, what causes Ghosts is the difference between the client_state and app_info in these lines;
<platform></platform>
<version_num></version_num>
<plan_class></plan_class>
Look in your client_state.xml for task assignments, down near the bottom framed by <result></result>
As long as the result sections agree with the app_info, as is well. If there is a difference, you must change either the client_state or app_info so they match.
<platform></platform> differences are between 64 & 32 bit, most Linux & Mac Apps are 64 bit so this field usually doesn't change.
<version_num></version_num> these numbers MUST be the same in client_state & app_info
<plan_class></plan_class> these values also MUST match

If you just download the last three CUDA packages at C.A. and Look, you will see the app_info sections ALL MATCH;
<platform>x86_64-pc-linux-gnu</platform>
<version_num>801</version_num>
<plan_class>cuda90</plan_class>
This means you can change between any of the Three CUDA 9.x packages Without creating Ghosts.

Now if you want to Ghost all your assigned tasks, just change the app_info so it doesn't match your <result></result> sections, guaranteed Ghosts.

Also, you DO NOT need driver 396.xx to run the CUDA 91 App. The CUDA 91 App will work just fine with a CUDA 91 Driver, CUDA 91 drivers are the ones listed as 390.xx as with this machine, https://setiathome.berkeley.edu/show_host_detail.php?hostid=6813106
ID: 1954033 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1954144 - Posted: 7 Sep 2018, 0:22:11 UTC - in response to Message 1954033.  

Also, you DO NOT need driver 396.xx to run the CUDA 91 App. The CUDA 91 App will work just fine with a CUDA 91 Driver, CUDA 91 drivers are the ones listed as 390.xx as with this machine, https://setiathome.berkeley.edu/show_host_detail.php?hostid=6813106


So the baseline "Nvidia-390" driver will work with the CUDA91? That will be a relief. It was a bit of a pain getting my setup to "find" the nvidia-396.

Ok, everyone.

On your mart.

Get set.

Upgrade!!!!

ROFLing.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1954144 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1954152 - Posted: 7 Sep 2018, 0:43:15 UTC

Does this change in "the code" effect the code we are running?

https://setiathome.berkeley.edu/forum_thread.php?id=83285#1952776

Tom
A proud member of the OFA (Old Farts Association).
ID: 1954152 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1954161 - Posted: 7 Sep 2018, 1:13:56 UTC - in response to Message 1954152.  

No, everything done in Nebula is "after the fact". We just crunch the tasks. Nebula takes our results and works with them. Nebula code has nothing to do with our science app code.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1954161 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1954163 - Posted: 7 Sep 2018, 1:30:17 UTC - in response to Message 1954161.  

No, everything done in Nebula is "after the fact". We just crunch the tasks. Nebula takes our results and works with them. Nebula code has nothing to do with our science app code.


Sorry, it wasn't clear that it was "in Nebula".

Tom
A proud member of the OFA (Old Farts Association).
ID: 1954163 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1954166 - Posted: 7 Sep 2018, 1:34:45 UTC - in response to Message 1954163.  

No, everything done in Nebula is "after the fact". We just crunch the tasks. Nebula takes our results and works with them. Nebula code has nothing to do with our science app code.


Sorry, it wasn't clear that it was "in Nebula".

Tom
That came from the Nebula section of these forums so it should've been evident. ;-)

Cheers.
ID: 1954166 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1954179 - Posted: 7 Sep 2018, 2:15:06 UTC - in response to Message 1954166.  

No, everything done in Nebula is "after the fact". We just crunch the tasks. Nebula takes our results and works with them. Nebula code has nothing to do with our science app code.


Sorry, it wasn't clear that it was "in Nebula".

Tom
That came from the Nebula section of these forums so it should've been evident. ;-)

Cheers.


Mental fart. Apparently I thought I was in "Seti Science" when that question occurred :(
A proud member of the OFA (Old Farts Association).
ID: 1954179 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1954273 - Posted: 7 Sep 2018, 18:14:54 UTC - in response to Message 1953421.  
Last modified: 7 Sep 2018, 18:26:27 UTC

As for the Pulses completely missing, as well as the Best Pulse missing, I tracked a similar problem down to a GPU not fitting squarely in the Slot. It was tilted slightly upwards. See if you keep getting the same problem with the same GPU.
Will check that, of course, but it's hard to be that because i use a cube case, the GPU's are in the vertical position and firmily fixed. The host crunches 1000's of WU every day, and only few (3 yesterday) WU are mark as invalids and all from diferent GPU's.
Well, it looks like that problem is back, and it's on the card that was having the most problems even though it's in a different slot. So, I don't think it's the slot, it was the Top slot before, now the card is in the bottom slot and having the same problem. Fortunately it's only happening on One machine of mine, but I did see the same problem on someone else's machine, so, that makes three of us now having the problem. I wonder why it disappeared for a couple of days, I had been seeing a few a day. I have seen it on all cards, but mostly it was on the card in the Top Slot.
No Pulses Recorded, even the Best Pulse is Blank meaning it didn't find any non-reportable pulses either, https://setiathome.berkeley.edu/results.php?hostid=6796475&&state=5
Weird at least.
After the one i posted (now cleared from the list) i did not notice any other with a similar problem.
It's above my pay grade but is possible to run the WU in the test program to see if that repeats?
Oh, I've done everything possible, including swapping out cards with other machines. The problem was it was always just the one machine having the problem. If you ask Petri how many times I've mentioned this, his answer would be LOTS. Now that other people are having the same problem, perhaps he will look at it again. Another one invalid... Don't why it disappeared, but it's back with a vengeance. I've found a few more from last night, seems something happened between 6-7PM EDT, nothing since around 7 last night. That's interesting, seems the problems stopped just after making this post, https://setiathome.berkeley.edu/forum_thread.php?id=83306. Before that I was looking for an Overflow filled with Triplets. However, I have seen the missing Pulses on the two cards Not running the monitor, just not many times.

After a few more days of testing I'm convinced this problem where the GPU misses All the Pulses is caused by using a Web Browser. It seems to only happen on a few machines and in My case it doesn't matter which Browser or which browser settings are used. It's unfortunate seeing as how I never had this problem with zi3v, it appears to have just started with the v0.9x versions. I have switched to using the machine with the GTX 1060s for Web browsing and so far haven't seen the problem with the 1060. It seems to be a rather severe problem with the 1050 Ti on the other machine though. The only solution I see is to just use a different card for Web Browsing, so, I've ordered a 'New' Card from eBay. It seems there are some interesting deals now that the new GPUs have been announced. The pricing on eBay is much better than a few months ago, just don't order any GPU from China and make sure the seller has an established history.
ID: 1954273 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1954277 - Posted: 7 Sep 2018, 18:55:19 UTC - in response to Message 1954273.  

When you say you tried changing settings, I assume you tried turning off "hardware acceleration" in the browser?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1954277 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1954280 - Posted: 7 Sep 2018, 19:11:23 UTC - in response to Message 1954277.  

I tried different combinations of all Three settings listed under Performance in FireFox without seeing any change, "hardware acceleration" is just one of the settings. I also tried Safari with similar results. About the only thing left is to go back to zi3v for a while and confirm the problem doesn't exist there. I had FireFox open continuously with zi3v, never had a problem with the 1050 Ti back then.
ID: 1954280 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1954288 - Posted: 7 Sep 2018, 20:15:53 UTC - in response to Message 1954280.  
Last modified: 7 Sep 2018, 20:16:34 UTC

I tried different combinations of all Three settings listed under Performance in FireFox without seeing any change, "hardware acceleration" is just one of the settings. I also tried Safari with similar results. About the only thing left is to go back to zi3v for a while and confirm the problem doesn't exist there. I had FireFox open continuously with zi3v, never had a problem with the 1050 Ti back then.

Please forgive if i enter in the wrong thread. But FYI
I never use firefox on this host only use Chrome and had at least one WU with that problem.
So is not related to Firefox only.
Browser setting to use hardware aceleration (on) & the GPU connected to the monitor is a EVGA 1070FTW hybrid.
Can't say if the WU with problem was before or after i installed the new app.
ID: 1954288 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1954289 - Posted: 7 Sep 2018, 20:46:39 UTC - in response to Message 1954273.  

The only solution I see is to just use a different card for Web Browsing, so, I've ordered a 'New' Card from eBay. It seems there are some interesting deals now that the new GPUs have been announced. The pricing on eBay is much better than a few months ago, just don't order any GPU from China and make sure the seller has an established history.


Make sure anything you order has more of a brand name than just Nvidia (eg. MSI, ASUS, EGA etc). And doesn't say "generic Nvidia" and you should be alright.

I will admit that on the two machines I am still running Linux/CUDA91 on, I have been using "other" cards to drive the monitor.

I didn't have a problem with a gtx 750Ti on one box but the dual gtx 750Ti's were a little cranky on the other box. (One box retired, the other moved back to Windows since I had trouble with the browser and my bank).

Tom
A proud member of the OFA (Old Farts Association).
ID: 1954289 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1954308 - Posted: 7 Sep 2018, 22:54:48 UTC
Last modified: 7 Sep 2018, 23:03:13 UTC

Well, I've loaded zi3v, have FireFox surfing as much as possible, and don't see any missed Pulses. So, I'm going to have to suggest there is some change to V0.9.x which makes it not work correctly on some Systems/Cards when running a Web Browser. The trick is to look at the Best Pulse result, if that number is Zero it means the App has completely missed All pulses. I'll run it for a while and surf some more, but, it looks good right now, https://setiathome.berkeley.edu/results.php?hostid=6796475&offset=260
ID: 1954308 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1954349 - Posted: 8 Sep 2018, 3:00:17 UTC

Twilight Zone.
The machine ran for a couple of hours with zi3v while surfing with FireFox. No Missed Pulses.
I quit BOINC, switched back to V0.97b2, Stopped FireFox, and reset the active tasks.
First task back with V0.97b2 Missed All the Pulses, it was running on the 960.
The 960 isn't even connected to a monitor...
https://setiathome.berkeley.edu/result.php?resultid=6961236359
The 1050 Ti is running the monitor, and it found the pulses...
https://setiathome.berkeley.edu/result.php?resultid=6961236733
Something strange going on. The 960 missed the next task too.
ID: 1954349 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1954359 - Posted: 8 Sep 2018, 4:12:30 UTC

That really is Twilight Zone territory. Hard to troubleshoot a intermittent problem.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1954359 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1954363 - Posted: 8 Sep 2018, 4:56:59 UTC - in response to Message 1954359.  
Last modified: 8 Sep 2018, 5:47:25 UTC

It's worse than that. I decided to swap the cards back the way they were. Now the Problem has been moved to the 1050 in the Middle slot. Right now the 1050Ti in the Top slot and the 960 in the bottom slot are working fine. It's the 1050's turn to Miss all Pulses. Don't know, maybe it's just BOINC , being BOINC. This certainly doesn't look encouraging, but it looks the same with zi3v, and zi3v doesn't have the problem with missing pulses;
08-Sep-2018 00:10:57 [---] Starting BOINC client version 7.10.3 for x86_64-apple-darwin
08-Sep-2018 00:10:57 [---] NVIDIA GPU 1: GeForce GTX 960 cannot be used for CUDA or OpenCL computation with CUDA driver 6.5 or later
08-Sep-2018 00:10:57 [---] NVIDIA GPU 2: GeForce GTX 1050 cannot be used for CUDA or OpenCL computation with CUDA driver 6.5 or later
08-Sep-2018 00:10:57 [---] CUDA: NVIDIA GPU 0: GeForce GTX 1050 Ti (driver version 9.1.178, CUDA version 9.1, compute capability 6.1, 4096MB, 3513MB available, 2255 GFLOPS peak)
08-Sep-2018 00:10:57 [---] CUDA: NVIDIA GPU 1: GeForce GTX 1050 (driver version 9.1.178, CUDA version 9.1, compute capability 6.1, 2048MB, 1993MB available, 1960 GFLOPS peak)
08-Sep-2018 00:10:57 [---] CUDA: NVIDIA GPU 2: GeForce GTX 960 (driver version 9.1.178, CUDA version 9.1, compute capability 5.2, 2048MB, 1991MB available, 2748 GFLOPS peak)
08-Sep-2018 00:10:57 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 1050 Ti (driver version 10.18.5 378.05.05.25f10, device version OpenCL 1.2, 4096MB, 3513MB available, 2255 GFLOPS peak)
08-Sep-2018 00:10:57 [---] OpenCL: NVIDIA GPU 1: GeForce GTX 960 (driver version 10.18.5 378.05.05.25f10, device version OpenCL 1.2, 2048MB, 2048MB available, 829 GFLOPS peak)
08-Sep-2018 00:10:57 [---] OpenCL: NVIDIA GPU 2: GeForce GTX 1050 (driver version 10.18.5 378.05.05.25f10, device version OpenCL 1.2, 2048MB, 2048MB available, 607 GFLOPS peak)
08-Sep-2018 00:10:57 [---] OpenCL CPU: Intel(R) Xeon(R) CPU E5472 @ 3.00GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2)
08-Sep-2018 00:11:03 [---] OS: Mac OS X 10.12.6 (Darwin 16.7.0)
I dunno, maybe if I swapped the 1050 & 960...

Well, that didn't work either. The 1050 is in the bottom slot now and Still missing All the Pulses.
Strange it was working before I switched over to zi3v for a couple of hours. Since then nothing seems to get it back to the way it was before then. Only other choice would be to swap the 960 for another 1050. But then I wouldn't be able to Boot to El Capitan to compile Petri's code without swapping in another Maxwell card. El Capitan doesn't do Pascal, but it will Ignore the Pascals and use the Maxwell long enough to compile an App.
ID: 1954363 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1954374 - Posted: 8 Sep 2018, 6:19:42 UTC

I would have suggested that your symptoms were caused by a PCIe slot going bad. But you have moved cards around to other slots and the problem tracked with the card and not the slot. So no help there.

I have had two different brand motherboards take a dump and lose a slot where the slot became non-functional and if ANY card was installed in the slot, the PC wouldn't even POST.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1954374 · Report as offensive     Reply Quote
Previous · 1 . . . 70 · 71 · 72 · 73 · 74 · 75 · 76 . . . 162 · Next

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.