Setting up Linux to crunch CUDA90 and above for Windows users

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 143 · 144 · 145 · 146 · 147 · 148 · 149 . . . 162 · Next

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2031462 - Posted: 8 Feb 2020, 19:13:09 UTC - in response to Message 2031441.  

Wasn't it determined that those mining cards are only running at X1 and that is probably why the system got laggy.
Nevermind. I see I got the swap direction reversed.
OK, really confused. Did you mean you turned off cpu hyperthreading, or that you turned off VT or virtualization? Probably interfering with the Above4G decoding.


I guess he means VT/virtualization, since he’s still got HT enabled (reporting 16 CPUs)

I’m not sure why virtualization would matter for the problem he was experiencing, unless it’s some weird bug in the BIOS. But he mentioned in another thread that he changed another setting at the same time (turned mining mode off) so when you change 2 variables at once and observe a change it’s hard to say which one really caused the change.


I will turn the Mining mode back on and see if it enables the Cpu virtualization or not. I accidentally turned off the Mining mode when I toggled at least one of the Pcie settings in the bio to "Auto".

Will poke around and make report(s) on a fully loaded set of gpus with limited cpu threads. I should also see what happens with "-nobs" engaged even though I can't use that when I am running more than 16 gpus....

Oh, yes. I am limiting the Seti@Home cpu tasks being crunched because my dinky little Lga 1151 cpu cooler wasn't really sized for this cpu. I can't fit the larger lga 1151 cooler in there without moving the MB to the back of the rack. And I am not sure (yet) if I have holes to do that. Here flashlight.

Tom
A proud member of the OFA (Old Farts Association).
ID: 2031462 · Report as offensive     Reply Quote
Profile Buckeye4LF Project Donor
Avatar

Send message
Joined: 19 Jun 00
Posts: 173
Credit: 54,916,209
RAC: 833
United States
Message 2031498 - Posted: 8 Feb 2020, 23:00:54 UTC

so everything seemed to be working okay, but now the host is telling me nothing is computing and BOINC says i have 122 things in process. I tried resetting and updating via boinccmd. Is this a symptom of purging my install and starting over? I made sure nothing was computing before deleting the old directory but now it seems there might have been some ghosts created.

Do I just have to wait till scheduler pulls new work?

ID: 2031498 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031504 - Posted: 8 Feb 2020, 23:11:49 UTC - in response to Message 2031498.  

If you didn't set NNT and finish all your work, and just purged the distro folder as you said you did. Then you have "ghosted" all those work units. The scheduler still thinks you have them and why they show as "In Progress" Plus you have abandoned all your existing work and you are in the "Penalty Box"

Until you return some work, the scheduler will only send you one task per day. Once you start returning valid work, the scheduler will start bumping up the amount of work it sends you.

You can try the ghost recovery procedure and see if you can recover those 122 tasks.
https://setiathome.berkeley.edu/forum_thread.php?id=84176
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031504 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13778
Credit: 208,696,464
RAC: 304
Australia
Message 2031507 - Posted: 8 Feb 2020, 23:17:55 UTC - in response to Message 2031498.  

Do I just have to wait till scheduler pulls new work?
I would probably start from scratch.
If this is a crunching only system, just wipe everything & reinstall the OS.
Once that is done, update the video driver from Nouveau to the Nvidia supplied driver. If you plan to process AstroPulse WUs you'll most likely need to manually install OpenCL support.

Once all of that is done, copy the All-In-One folder to the new system, extract all the files & folders in to a BOINC folder in your Home directory.
Follow the instructions in the readme file to setup the applications you wish to use (if different from the defaults).
Double click on boincmgr. Attach to Seti.
Crunch away.
Grant
Darwin NT
ID: 2031507 · Report as offensive     Reply Quote
Profile Buckeye4LF Project Donor
Avatar

Send message
Joined: 19 Jun 00
Posts: 173
Credit: 54,916,209
RAC: 833
United States
Message 2031509 - Posted: 8 Feb 2020, 23:25:10 UTC - in response to Message 2031504.  

That does make sense though, anything reported " anonymous platform" is running out of the new install directory. My heat map shows 0% CPU at the moment though. I have no idea what those 122 tasks are. I will try recovery as mentioned above. It is irritating that my dedicated machine is idle while my windows machine chugs along....

ID: 2031509 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031513 - Posted: 8 Feb 2020, 23:32:57 UTC - in response to Message 2031509.  

For a newbie Linux user, the first startup can be a bit bumpy. Once you get some work turned in, the rig will start cranking out the work and should soon eclipse the Windows rig. Might take a day is all and you need to have patience. Frustrating for sure with the capital outlay for new hardware and nothing to be shown for it initially.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031513 · Report as offensive     Reply Quote
Profile Buckeye4LF Project Donor
Avatar

Send message
Joined: 19 Jun 00
Posts: 173
Credit: 54,916,209
RAC: 833
United States
Message 2031524 - Posted: 9 Feb 2020, 0:57:03 UTC

no luck on ghost recovery.... guess i will only be running seti 1 wu day for the next month until everything expires. Just connected dedicated machine to Collatz and got 100+ tasks at least I can crunch something....

ID: 2031524 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2031525 - Posted: 9 Feb 2020, 1:04:53 UTC - in response to Message 2031524.  

If I'm looking at the right machine, it shows you haven't contacted the Server since 8 Feb 2020, 8:00:16 UTC, is 8894243 the right machine? https://setiathome.berkeley.edu/hosts_user.php?userid=74809
If true, that's a good reason for not reporting/receiving work....
ID: 2031525 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13778
Credit: 208,696,464
RAC: 304
Australia
Message 2031533 - Posted: 9 Feb 2020, 2:01:24 UTC - in response to Message 2031524.  

no luck on ghost recovery.... guess i will only be running seti 1 wu day for the next month until everything expires.
For every WU that Validates, you get 2 more to process.

As Tbar posted, that system hasn't even contacted the server lately, so you could do as i suggested- start from scratch with a clean slate, and a new system ID with no outstanding errors. Get the system set up & ready to run the Special Application, with no previous installations to deal with. Copy over the All in one, select your chosen Applications to use, attach to Seti and off you go.
Grant
Darwin NT
ID: 2031533 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2031538 - Posted: 9 Feb 2020, 3:47:28 UTC

Well, a little good news for the upcoming Ubuntu 20.04. I installed today's daily build and aside from a little video driver weirdness everything CUDA seems to work. My recent version of boinc 7.16.2 works without any dependencies, and just One dependency is needed for the Manager. Seems they, so far, have left out a needed gtk2.0-x11 library, which can be solved by installing the libgtk2.0-dev package. Hopefully the release will include the library later on, but, running one command-line to install the package isn't bad. The Main install includes nVidia driver 440.59 out of the box and worked fine until I installed the cool-bits command. After that it was a stuck in the login loop until you went back to the Nouveau driver. Nothing seemed to fix it after that, but, if you boot to the recovery mode and then hit Resume it will boot normally with a working driver even though they say the driver needs a full boot. Hey, if it works...and saves doing a reinstall, I'll use it for now.

Seems to work, Operating System: Linux Ubuntu
Ubuntu Focal Fossa (development branch) [5.4.0-12-generic|libc 2.30 (Ubuntu GLIBC 2.30-0ubuntu3)]
ID: 2031538 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031539 - Posted: 9 Feb 2020, 4:06:25 UTC

Good to know TBar. April is approaching fast and I hope that Ubuntu 20.04 LTS is fully baked. I doubt I will try the in place distro upgrade from 18.04 to 20.04. My attempt to move from 19.04 to 19.10 via distro-upgrade did not go well. Probably will just zip up the BOINC folder and blow the 18.04 away and install from scratch. Too much deadwood in the existing build likely.

If there is just one missing dependency for the Manager, that is not that bad and easily overcome.

Thanks for being the early pioneer with arrows in your back.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031539 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2031542 - Posted: 9 Feb 2020, 6:11:25 UTC - in response to Message 2031425.  

Great suggestion Richard, need to mention it to the GPUUG developers to hunt out the checkpointing code in the special app and disable it. Then there would be no need to set a checkpoint interval longer than the normal crunching time for gpu tasks.


. . It has an appeal and would solve issues with special sauce on GPUs, but what if the host is also crunching on CPUs? Would it not mean that a task that was restarted after running for an hour or so would have to go back to the beginning?

Stephen

? ?
ID: 2031542 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2031543 - Posted: 9 Feb 2020, 6:21:21 UTC - in response to Message 2031533.  

no luck on ghost recovery.... guess i will only be running seti 1 wu day for the next month until everything expires.
For every WU that Validates, you get 2 more to process.

As Tbar posted, that system hasn't even contacted the server lately, so you could do as i suggested- start from scratch with a clean slate, and a new system ID with no outstanding errors. Get the system set up & ready to run the Special Application, with no previous installations to deal with. Copy over the All in one, select your chosen Applications to use, attach to Seti and off you go.


. . Once that is done you can use the Juan method to restore it to the original Host identity if you wish. It would be the cleanest way to get past the current issues.

Stephen
ID: 2031543 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13778
Credit: 208,696,464
RAC: 304
Australia
Message 2031544 - Posted: 9 Feb 2020, 7:01:01 UTC - in response to Message 2031542.  

Great suggestion Richard, need to mention it to the GPUUG developers to hunt out the checkpointing code in the special app and disable it. Then there would be no need to set a checkpoint interval longer than the normal crunching time for gpu tasks.
. . It has an appeal and would solve issues with special sauce on GPUs, but what if the host is also crunching on CPUs? Would it not mean that a task that was restarted after running for an hour or so would have to go back to the beginning?
Nope.
The change would just be to the Special Application; if it doesn't actually write a checkpoint file to disk when it's asked to, then when the client restarts, the WU starts from scratch.
Grant
Darwin NT
ID: 2031544 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031547 - Posted: 9 Feb 2020, 8:38:23 UTC - in response to Message 2031542.  

Great suggestion Richard, need to mention it to the GPUUG developers to hunt out the checkpointing code in the special app and disable it. Then there would be no need to set a checkpoint interval longer than the normal crunching time for gpu tasks.


. . It has an appeal and would solve issues with special sauce on GPUs, but what if the host is also crunching on CPUs? Would it not mean that a task that was restarted after running for an hour or so would have to go back to the beginning?

Stephen

? ?

No Richard requested the special app have the checkpoint removed. Not the client. The cpu would continue to checkpoint normally set by the account settings. Same for the stock SoG gpu or any other app. Just the special app would have the change.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031547 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2031562 - Posted: 9 Feb 2020, 11:29:46 UTC - in response to Message 2031547.  


No Richard requested the special app have the checkpoint removed. Not the client. The cpu would continue to checkpoint normally set by the account settings. Same for the stock SoG gpu or any other app. Just the special app would have the change.


And that would allow us to go back to the checkpoint every 60 seconds so instead of losing 4 minutes on a shutdown, we would lose a minute.

Tom
A proud member of the OFA (Old Farts Association).
ID: 2031562 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2031572 - Posted: 9 Feb 2020, 13:29:06 UTC - in response to Message 2031547.  

No Richard requested the special app have the checkpoint removed. Not the client. The cpu would continue to checkpoint normally set by the account settings. Same for the stock SoG gpu or any other app. Just the special app would have the change.


. . OK, that would be good, I had the impression that checkpointing was under the control of the BOINC client.

Stephen

. .
ID: 2031572 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2031573 - Posted: 9 Feb 2020, 13:29:57 UTC - in response to Message 2031562.  


No Richard requested the special app have the checkpoint removed. Not the client. The cpu would continue to checkpoint normally set by the account settings. Same for the stock SoG gpu or any other app. Just the special app would have the change.


And that would allow us to go back to the checkpoint every 60 seconds so instead of losing 4 minutes on a shutdown, we would lose a minute.

Tom

. . True.

Stephen
ID: 2031573 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14658
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2031577 - Posted: 9 Feb 2020, 13:40:07 UTC - in response to Message 2031572.  

. . OK, that would be good, I had the impression that checkpointing was under the control of the BOINC client.
BOINC permits checkpointing at the stated interval, but doesn't control it. Only the science application knows when the task data is in a sufficiently coherent state to make a consistent checkpoint file, usable for a restart.
ID: 2031577 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22293
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2031578 - Posted: 9 Feb 2020, 13:40:28 UTC - in response to Message 2031572.  

While it is under the control of BOINC it is up to the developers of each application to include it or not, and exactly what is written to disk during a checkpoint operation.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2031578 · Report as offensive     Reply Quote
Previous · 1 . . . 143 · 144 · 145 · 146 · 147 · 148 · 149 . . . 162 · Next

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.