Posts by David Anderson (not *that* DA)

1) Message boards : Number crunching : Linux and nvidia 390.77: ubuntu update leads to errors (Message 1962503)
Posted 30 Oct 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
Thanks Keith, I have been working toward that, and it seems to have succeeded.
No more worrying messages in dmesg output.
The sequence:
In additional drivers, select nouveau driver. Apply.
sudo apt-get purge 'nvidia*'
sudo reboot
sudo apt purge 'libnvidia*'
( similar sudo purge '*nvidia*'' did not work to remove libnvidia, it seems)
sudo apt autoremove (that removed a fair amount of old stuff!)
dpkg -l |grep -i nvidia (to be sure all nvidia gone before next step)
In additional drivers, selected 390, applied
sudo apt install ocl-icd-opencl-dev
reboot

Whew. We'll see if that's sufficient. Looks promising so far,
I have new GPU tasks, one running now. I'll watch the error count.
2) Message boards : Number crunching : Linux and nvidia 390.77: ubuntu update leads to errors (Message 1962480)
Posted 29 Oct 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
Ubuntu has some mistake. 18.04.
Did a normal update and now dmesg reports:


[515551.148450] NVRM: API mismatch: the client has the version 390.77, but
NVRM: this kernel module has the version 390.48. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
[515931.257453] NVRM: API mismatch: the client has the version 390.77, but
NVRM: this kernel module has the version 390.48. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.

400+ gpu tasks errored off. Well, drat.

For now just no gpu on the machine, I hope uninstalling nvidia,
running nouveau briefly, and reinstall nvidia driver (and opencl part)
will make this annoyance go away. Nobody appears to have reported
this anywhere on askubuntu.
3) Message boards : Number crunching : Ubuntu 18.04 nvidia 390 and GPU Missing [solved] (Message 1950013)
Posted 16 Aug 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
The name and description of the ocl-icd* package
is essentially no help whatever.
A complete disconnect from the problem report
from boinc. AFAICT.

Yet some folks figured it out. Thank you to
those folks.
4) Message boards : Number crunching : Ubuntu 18.04 nvidia 390 and GPU Missing [solved] (Message 1949721)
Posted 14 Aug 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
Upgraded Ubuntu 16.04 to 18.04 today and boinc could not find the GPU,
the gpu tasks said 'GPU Missing'.

The new driver for nvidia is 390.
I found a comment on a boinc thread suggesting:

sudo apt-get install ocl-icd-libopencl1
(yes, it's current according to synaptic)
sudo service boinc-client restart

And now the NVIDIA 750 is known to boinc and Seti is using it.
This problem was reported to ubuntu folks before 18.04 released,but...
the problem is still present in the distribution making the above
install necessary.
5) Message boards : Number crunching : linux ps and project tasks (Message 1943941)
Posted 12 Jul 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
Thanks for that reference. I found that checkbox and unchecked it!
DavidA
6) Message boards : Number crunching : linux ps and project tasks (Message 1943888)
Posted 12 Jul 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
I'm not used to seeing boinc processes (setiathome) suspended
and hanging around in memory.

My largest machine 12*2 cores (the *2 as hyperthreading is on)
currently shows 35 processes, 19 seti, 16
einstein. Boinc computer-options of boincmgr
is set to use 82% of the cores.

Other machines (only seti at present) only show what
is actually running.

htop is very useful, but I have not even scratched the
surface of what one can do with it...

Where will I find the options about keeping/not keeping
suspended processes in memory?
7) Message boards : Number crunching : linux ps and project tasks (Message 1943634)
Posted 10 Jul 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
A) I've noticed that two computers with boincmgr 7.6.31 show tasks still running via ps -eaf when
boincmgr says they are suspended. (dseti3,q2) q2 has an old GPU and is running the GNU driver so
boinc sees no known graphics card. dseti3 has two nvidia 760's.

B) Two others with boincmgr 7.6.31 show no tasks running with ps -eaf when boincmgr
says they are suspended.

C) Two others with boincmgr 7.9.3 show no tasks running with ps -eaf when boincmgr
says they are suspended

The command in all cases is ps -eaf | grep projects
I don't understand why case A) exists at all.
A typical output of the ps with case A machines, all boinc projects suspended
using :
Dseti3 1999: ps -eaf |grep projects
boinc 5765 5536 86 15:54 ? 01:59:25 ../../projects/setiathome.berkeley.edu/setiathome_8.00_x86_64-pc-linux-gnu

What started this was the desire to know how many project tasks were running.
For case A) machines I've not been able to see how to always count them correctly.

Suspended via
boinccmd --set_gpu_mode never
boinccmd --set_run_mode never

Interpretations? Suggestions?
Thanks in advance.
8) Message boards : Number crunching : Boinc 7.12.0 problem with headless computer on Linux [Solved] (Message 1941223)
Posted 26 Jun 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
That VGA dummy plug thing was no help to me running boinc
on a headless odroid-xu4.

Is there a good reason to expect Linux would see that
in a similar way to windows (the original poster
linked-to made it for Windows)?
9) Message boards : Number crunching : Ubuntu 18.04 (Message 1938099)
Posted 3 Jun 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
I had a similar problem when Ubuntu 16.04 installed 384.130
a couple days ago.

I fixed it with the series of actions I document in
https://setiathome.berkeley.edu/forum_thread.php?id=82997

which was advice I think I got from Tbar in march 2017.
Worked then and worked today.
10) Message boards : Number crunching : Opencl Ubuntu 16.04 nvidia 384.130 fails (Message 1938097)
Posted 3 Jun 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
Looks as if GPU tasks completing ok now.
I'll keep watching.
11) Message boards : Number crunching : Opencl Ubuntu 16.04 nvidia 384.130 fails (Message 1938088)
Posted 3 Jun 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
I switched to Noveau, the non-nvidia driver.
Rebooted.
dpkg -l |grep -i nvidia
sudo apt purge (the stuff listed by the above)
sudo bash
cd /var/lib/boinc*/projects/seti*
and rm *wisdom* *bin*
exit (back to being me)


Used 'additional drivers' panel to request 384.130

Rebooted
Restarted seti.
One gpu task seems to be running toward completion.
There are a few more. Turned off getting new tasks
while I wait to see how the opencl_nvidia_SoG
tasks go.
12) Message boards : Number crunching : Opencl Ubuntu 16.04 nvidia 384.130 fails (Message 1937961)
Posted 1 Jun 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
WARNING: boinc_get_opencl_ids failed with code -1
Error: Getting Platforms. (clGetPlatformsIDs)
BOINC assigns slot on device #0.
WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities
ERROR: OpenCL kernel/call 'clGetDeviceIDs (second call)' call failed (-32) in file ../../src/GPU_lock.cpp near line 1311.

computer: 7748035
This Ubuntu 16.04 machine got a new nvidia
driver (not that I requested such) replacing one that worked with...
the above. The good one, 340.104, worked fine a long time.

In about a day the bad one accumulated 248 error tasks.
Erroring very quickly indeed. There are reports of opencl issues
with the latest driver on backports (which means on 16.04 ).

Well, ugh. Took me a whole day to notice.
I hope the older driver still works, I switched to the older one.
I won't restart seti on this machine till I will be where
I can watch progress and
can stop it if nvidia/opencl still broken.
13) Message boards : Number crunching : No protocol specified - in boincerr.log (Message 1934006)
Posted 6 May 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
A linux machine running boinc-client has no monitor
plugged in and after a couple minutes up time
boinc-client starts writing "No protocol specified"
to /var/log/boincerr.log once every second. Forever.

boinc-client runs fine. seti tasks run fine.
boincmgr (ver 7.6.31) , run via ssh from
another machine, runs fine.
The message on boincerr.log seems completely pointless
and is a waste of flash-disc update cycles.

Why doesn't the message repeat-time back-off so the rate goes asymptotically to zero?
Or something.

Is there a way to suppress this message?
Thanks for any suggestions.
14) Message boards : Number crunching : SETI/BOINC Milestones [ v2.0 ] - XXIX (Message 1929790)
Posted 13 Apr 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
I finally arrived at 50,000,000 cobblestones!
Took quite a while and I managed to 'enlist'
a couple more computers and keep them
focused on Seti...

The most interesting of the lot is a tiny
single-board machine with 8 arm cores each
at nearly 2GHz. It's still hard to believe
that such a thing exists. Under US$100.

What to set as the next goal?
15) Message boards : Number crunching : boinc-client and X and monitor (Message 1927535)
Posted 31 Mar 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
I tried that and it did not make a difference.
It does not surprise that Linux and X don't use the
same monitor-sense approach as past Windows.

For now on the system I'm just letting boinc-client log
in boincerr.log every second :-(

Thanks for the tip, though. Worth a try.
DavidA.
16) Message boards : Number crunching : GPU Not Detected - Linux (Message 1926777)
Posted 26 Mar 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
In the menu system (Assuming Ubuntu or equiv)
Go to Settings->Additional Drivers and in the Additional Drivers
area of that page click on the driver you want (radio buttons
there...).

Simply installing a driver does not (necessarily?)
make it the current GPU driver.

That page will install the driver named if it's not installed
already.
========
17) Message boards : Number crunching : boinc-client and X and monitor (Message 1923138)
Posted 7 Mar 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
Was unable to find anything (meaningful to me)
about running boinc-client with no monitor.
What little I found was old(er) boinc-client and
the logging has, according to some messages
I found, changed over time.

Reducing the clock rate slightly on the
single board computer has let it run
boinc for two days (with monitor attached).
It seems possible the boinc-client log
messages (when headless) were not a problem at all,
just an annoyance.

If anyone has knowledge of current boinc-client run headless
and controlling boinc-client logging in that environment
I'd appreciate hearing about such.
But otherwise I'll just take this as a learning experience :-)
18) Message boards : Number crunching : boinc-client and X and monitor (Message 1922239)
Posted 3 Mar 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
Focusing on confusing issues with odroid-xu4 itself right now.
It's run 24 hours with 4 seti tasks at once, no errors.

For now just letting it run with os configure changes
on that machine. In a couple days will review
again unless something changes sooner.
19) Message boards : Number crunching : boinc-client and X and monitor (Message 1921752)
Posted 28 Feb 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
Thanks Keith. I will look for such threads.
I was out of town a couple days (now back) and
notice that plugging in a monitor (
while system down, then rebooting)
does not remove the messages from boinc.
Even logged in via ssh.
But logging in at the local keyboard/monitor stops
the
No protocol specified
boinc-client messages.

Have yet to have the system in question
run two days
without stopping...sigh.
20) Message boards : Number crunching : boinc-client and X and monitor (Message 1921106)
Posted 25 Feb 2018 by Profile David Anderson (not *that* DA) Project Donor
Post:
My concern is that a side effect of the 'error' is to pile up
some set of resources leading to eventual kernel
hangup. Not that I have any evidence of that
other than the system just seeming to stop.

So far, with monitor in use, I see no boincerr.log
records being added and no hangs.
But odroid-xu4 needs to keep going for much longer
to mean much.


Next 20


 
©2018 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.