High performance Linux clients at SETI

Message boards : Number crunching : High performance Linux clients at SETI
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 21 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13196
Credit: 154,810,305
RAC: 200,321
United Kingdom
Message 1984567 - Posted: 11 Mar 2019, 16:26:08 UTC

OK, since I'm the one who started this hare running, I'd better try and answer some of the points raised overnight.

Tom M wrote:
I dusted off a Linux/Cuda91 HD and Seti didn't want to play. I removed Seti, re-installed Tbars-all-in-One.
That's where I came in. There had been an http 500 internal server error. On a Sunday night, that didn't seem like the normal Tuesday outage recovery problem, and I wondered whether something else might be causing it. Given that the only thing the servers ever see from our computers is that sched_request file, I asked (with a question mark) whether there might be something unusual about it? More for future reference than anything else, since Tom had already re-installed and got past the problem by that stage.

TBar wrote:
As the person that compiled 7.8.3 ...
That's the first time the code version number was quoted, and to be honest it makes me even more suspicious. In my opinion, v7.8.3 is probably one of the poorest code versions to base a special version on.

I've nothing against home builds - I build and run them myself - but I try to keep awareness of what is good and bad about each one, and I usually build them to explore an ongoing problem, or to test a potential fix.

Some background: BOINC development was funded by the US Government's NSF through a succession of renewed grants. But sometime in late 2014 or early 2015 the grant was not renewed, and in Summer 2015 BOINC started losing core staff. A version 7.6 was released around that time (and several patches/fixes were added later), but development work basically stopped. The BOINC project management was nominally handed over to "The Community", but with no preparatory community development work and no visible structure. A management committee existed, but on paper only - I'm pretty certain it never met.

Then, in Summer 2017, representatives from some of the major projects - most notably Kevin Reed from World Community Grid - called together a working party to get the show back on the road. Myself and Jord van der Elst (who you'll know as 'Ageless') were invited to sit on that working party to provide perspective from the user/volunteer viewpoint. And round about that time, BOINC v7.8 was released with whatever sporadic improvements had trickled in by then. But it was never thoroughly tested, and many significant improvements which were made almost as soon as development restarted were never incorporated into 7.8

Since then, I've spent many hours in teleconference meetings with BOINC staff, project staff, and volunteer developers, and I've got to know and understand them much better than before. I've even managed to code some user-requested improvements myself. And those improvements are in the C++ codebase which is common across all platforms - I personally test them in Windows, but they will appear on Mac and Linux as well.

I was personally asked to act as Release Manager for the 7.10 version - and it was a big eye-opener as to how much work goes into a BOINC release. The codebase is essentially complete before we start, but packaging, documentation, included files and so on all have to be checked. My personal mailing list for that release included Laurence Field of CERN (BOINC lead for Linux), Gianfranco Costamagna (LocutusOfBorg PPA), and Germano Massullo (Fedora package maintainer). So I can assure you that Linux was not ignored in the release process.

One of the things that became clear during this process is that many key BOINC people are very knowledgeable about Linux indeed - indeed, I think there's no recognised way of running BOINC *servers* except on Linux. But these people don't run the BOINC client in the same way that we home volunteers would. So when people complain about Linux users being ignored by the developers, that really should be read in the context of 'Linux CLIENT users'. And I think much of the feeling of isolation comes, not so much from the BOINC tools, but from the way they're packaged and distributed.

Windows users have always had an Installer which presents them with a choice of Service mode or User mode installations. In recent years, User mode installations have become more popular because of the restrictions which Microsoft have placed on GPU drivers. For Linux, GPU drivers are integrated with the kernel (something which keeps catching people out), and so Service mode continues to be viable. That's the way the package maintainers like to work, too, and it's the only reason why the lonely v7.2.42 user-mode script still appears on the download page. The binary files would be the same whichever way they were installed, so in principle that old script could be updated to deliver modern binaries - any takers?

Another problem which I think I've identified from the outside is the nature of the growing home Linux user base. Linux users in academic and scientific circles seem to be happy (happiest?) working at the command line, but refugees from Windows 8 and 10 prefer working with a GUI. And there are a lot of them to choose from now... So, perhaps we should think in terms of a new, GUI-based installation interface which allowed a choice of User or Service mode installations on as wide a range of Linux distros as possible. Does that sound achievable, and if so - how?

I think most of the other misunderstandings have been - most helpfully - covered by Vyper and Juan's description of how the buffered high-performance clients were designed to be used. But if there are any other remaining issues which require additional work, I think I have the contacts now and I'll be happy to feed them in where I can.
ID: 1984567 · Report as offensive     Reply Quote
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41577
Credit: 41,999,167
RAC: 464
Message 1984572 - Posted: 11 Mar 2019, 17:07:09 UTC

At the bottom of every seti page we see this

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.


Are we now saying that should be amended to "was"?
ID: 1984572 · Report as offensive     Reply Quote
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 21021
Credit: 2,925,692
RAC: 599
Ireland
Message 1984574 - Posted: 11 Mar 2019, 17:12:56 UTC - in response to Message 1984567.  
Last modified: 11 Mar 2019, 17:14:26 UTC

Great post.
Another problem which I think I've identified from the outside is the nature of the growing home Linux user base. Linux users in academic and scientific circles seem to be happy (happiest?) working at the command line, but refugees from Windows 8 and 10 prefer working with a GUI. And there are a lot of them to choose from now... So, perhaps we should think in terms of a new, GUI-based installation interface which allowed a choice of User or Service mode installations on as wide a range of Linux distros as possible. Does that sound achievable, and if so - how?
I've tried Linux (Mint) & liked it. However & I have brought this up in the past on these boards, is the main issue you've highlighted.

Blame MS for the GUI. During the DoS days, one had to use CLI. The number of times I messed up Autoexec.bat & Config.sys files - had to learn to check files before saving & replacing current files. GUI made things so easy. :-)

After so many years of that, it won't be easy to return to using CLI. :-(

Edited for spelling.
ID: 1984574 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13196
Credit: 154,810,305
RAC: 200,321
United Kingdom
Message 1984578 - Posted: 11 Mar 2019, 17:27:59 UTC - in response to Message 1984572.  

At the bottom of every seti page we see this

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.
Are we now saying that should be amended to "was"?
You'll have to ask Eric.

https://www.nsf.gov/awardsearch/showAward?AWD_ID=0307956&HistoricalAwards=false still says 'Continuing grant'. But it doesn't say anything about high performance Linux clients. SETI != BOINC.
ID: 1984578 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 17808
Credit: 407,290,929
RAC: 114,145
United Kingdom
Message 1984579 - Posted: 11 Mar 2019, 17:28:16 UTC

These days with Linux there are some pretty good GUI around which make the command line all but redundant.
But it is there, and can make some things easier than using a GUI - especially the way many of the GUI remember the last (big number) of commands used, even after a shutdown in many cases.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1984579 · Report as offensive     Reply Quote
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41577
Credit: 41,999,167
RAC: 464
Message 1984582 - Posted: 11 Mar 2019, 17:50:05 UTC

I was happy using WFWG 3.11 amd MSDOS 6.22, you could tweak files to customise them using a CLI. Then came along WIN95 and changed everything. Win 10 is the latest nightmare.

Back in the day you could open the bonnet of a car, clean & reset, the plugs, points, carburettor, even decoke the damn engine. These days its all electronic ignition, fuel injection and exhaust emission gadgets.

The manufacturers decreed that it would be a cartel/monopoly and home users were excluded from doing anything, and being forced to pay garage prices. Software went the same way, you have to use experts to service or mend anything.

RIP the CLI !!
ID: 1984582 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3575
Credit: 213,111,673
RAC: 505,371
United States
Message 1984583 - Posted: 11 Mar 2019, 17:57:37 UTC - in response to Message 1984536.  

As suggested, starting a thread to separate this discussion from the "panic mode thread"

Thanks to Richard for suggesting it. I had the same thought while out for my morning walk!


Thank you!
A proud member of the OFA (Old Farts Association)
"Over the hill? WHAT Hill? I don't REMEMBER any hill...." (from a bumper sticker I bought at a truck stop).
"If its Tourist Season why can't we shoot them?" (another bumper sticker)
ID: 1984583 · Report as offensive     Reply Quote
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 7741
Credit: 2,843,973
RAC: 6,581
Italy
Message 1984585 - Posted: 11 Mar 2019, 18:04:34 UTC

On my SuSE Linux boxen I use KDE 5.15.2, a GUI I prefer to Gnome although I have the chance of using Gnome when I install the OS. I am using SuSE Leap 15.0 on a HP laptop and Thimbleweed, a development version, on a Virtual Machine hosted on a Windows 10 PC.
On my AT&T Olivetti UNIX PC, dated 1986 and still working, I have a primitive GUI on which I can see the LOGO turtle. Happy days!
Tullio
ID: 1984585 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3575
Credit: 213,111,673
RAC: 505,371
United States
Message 1984586 - Posted: 11 Mar 2019, 18:04:42 UTC - in response to Message 1984548.  
Last modified: 11 Mar 2019, 18:08:28 UTC

We even have a guide that is set to only report 100 results at a time just to prevent these timeouts, because that's only unneccesary requests.


Where do I apply that limit?

--edit----Think I found it:
<cc_config>
 <log_flags>
   <sched_op_debug>1</sched_op_debug>
 </log_flags>
 <options>
   <use_all_gpus>1</use_all_gpus>
   <save_stats_days>365</save_stats_days>
<max_tasks_reported>100</max_tasks_reported>
 </options>
</cc_config>



<max_tasks_reported>100</max_tasks_reported>
---edit----

Right?
A proud member of the OFA (Old Farts Association)
"Over the hill? WHAT Hill? I don't REMEMBER any hill...." (from a bumper sticker I bought at a truck stop).
"If its Tourist Season why can't we shoot them?" (another bumper sticker)
ID: 1984586 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4869
Credit: 594,830,285
RAC: 1,392,645
United States
Message 1984587 - Posted: 11 Mar 2019, 18:08:14 UTC - in response to Message 1984567.  
Last modified: 11 Mar 2019, 18:13:55 UTC

OK, since I'm the one who started this hare running, I'd better try and answer some of the points raised overnight.

Tom M wrote:
I dusted off a Linux/Cuda91 HD and Seti didn't want to play. I removed Seti, re-installed Tbars-all-in-One.
That's where I came in. There had been an http 500 internal server error. On a Sunday night, that didn't seem like the normal Tuesday outage recovery problem, and I wondered whether something else might be causing it. Given that the only thing the servers ever see from our computers is that sched_request file, I asked (with a question mark) whether there might be something unusual about it? More for future reference than anything else, since Tom had already re-installed and got past the problem by that stage.

TBar wrote:
As the person that compiled 7.8.3 ...
That's the first time the code version number was quoted, and to be honest it makes me even more suspicious. In my opinion....
In my opinion...
That's the point that matters. The fact is, since the 7.8.3 version was released almost TWO Years ago, every time You hear it mentioned you Jump at the chance to find something wrong with it. The Fact is, Not a single person has had any trouble with it in almost Two Years, and it cured such ills as the jumping Tasks/Transfers page and a non-working Simple View. Again, there is absolutely NOTHING Special about it , it is 100% stock. I suggest you look at the computer list and note which versions are being used. There are a number of people using 7.8.3 without any trouble. The only trouble mentioned in almost Two years is one user who has been known to have troubles trying to get as many GPUs as he can to work with a Non-Ubuntu system. No other Users have reported any trouble, but, You saw 'Tbars-all-in-One' and again, not knowing any facts, or even what versions are being used, again decided to Jump in and try to find trouble were there is none. Until at least One other person reports the same behavior, I'd suggest you lay off 7.8.3 and admit that after almost Two Years it works Very nicely for those that actually Use it. It might help if you actually tried running Linux before forming any opinions on something you know nothing about. BTW, Raistmer has a machine running Linux, guess which version he is using, https://setiathome.berkeley.edu/show_host_detail.php?hostid=8647915
ID: 1984587 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13196
Credit: 154,810,305
RAC: 200,321
United Kingdom
Message 1984589 - Posted: 11 Mar 2019, 18:13:30 UTC - in response to Message 1984586.  

Yes, that's what I think Vyper was referring to - it needs to be inside the Options setting, not dropping off the bottom.

Note that it's a maximum, not a 'wait until...' setting, so it probably won't help during normal running (BOINC will report completed tasks when the oldest is 1 hour old anyway). But it certainly helps clear the backlog after maintenance.
ID: 1984589 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13196
Credit: 154,810,305
RAC: 200,321
United Kingdom
Message 1984590 - Posted: 11 Mar 2019, 18:20:26 UTC - in response to Message 1984587.  

If there's nothing wrong with v7.8.3, why did I find a list of over 20 fixed bugs which had been omitted from the release code? Github #2065

But as you say, that was two years ago, and the patches were included in v7.10 and later. Can we both agree to let bygones be bygones, please?
ID: 1984590 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4869
Credit: 594,830,285
RAC: 1,392,645
United States
Message 1984592 - Posted: 11 Mar 2019, 18:25:23 UTC - in response to Message 1984590.  

I think most of those were Your Windows Errors, if I remember correctly. None actually affected Linux. The Fact is, No One is reporting any trouble with 7.8.3 in Ubuntu. That is Not an opinion, it's fact.
ID: 1984592 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13196
Credit: 154,810,305
RAC: 200,321
United Kingdom
Message 1984594 - Posted: 11 Mar 2019, 18:34:10 UTC - in response to Message 1984592.  

As my comment says,

This list does NOT include commits targeted on Mac, Linux, or release 7.10
because I knew I wasn't qualified to judge whether those were ready for release. The purpose of my list was to draw other developers' (and users') attention to how much had been left behind. If they had been taken up, we'd have gone through the rest of the list in much more detail. But the consensus was to leave them dangling for another release cycle, and leave users to muddle through as best they could. The policy was "fix showstoppers only", so I agree that the ones left behind were minor and/or cosmetic. But it was still poor quality control, IMO.
ID: 1984594 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 8139
Credit: 498,395,082
RAC: 390,684
Panama
Message 1984597 - Posted: 11 Mar 2019, 19:10:44 UTC

More info about the spoofed builds:

Actually i run with the 7.8.3 boinc manager and the modified boinc 7.15.0 client. I know is a mix but works perfect. I encounter some problem when i tried to run with the boinc manager 7.15.0 on my host, something related to the way the latest CCX compiler works . Keith knows about and has a fix for that (nothing related to Boinc itself), but since my mix works i never tried, he could explain better.
ID: 1984597 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9904
Credit: 935,722,677
RAC: 1,501,963
United States
Message 1984701 - Posted: 12 Mar 2019, 8:22:13 UTC

Late getting to this thread as I was out and about today. I set NNT before or during the outage since you aren't going to get work anyway with the schedulers unavailable. I the only way I know to consistently make a connection to the scheduler after the outage and they come back online is to only report finished work. Still NNT set and not asking for work. With the spoofed client I can go a day crunching from my cache. I have my max reported tasks set to 100 and that is reasonable size that works pretty much all the time even when the schedulers are busiest directly after they come back and are deluged with requests from normal empty hosts. It takes 6-8 hours to report my finished tasks on the normal 305 second connection schedule. I try and limit my impact to the servers by just reporting a modest amount of finished tasks at each connection.

I too run the 7.8.3 Manager on most of my hosts. I do have a couple of hosts running the 7.15.0 Manager. No difference in performance or stability. The menus are just slightly different. I had to satisfy a few more dependencies to compile the 7.15.0 Manager and install a half dozen extra libraries. Not a big deal once I saw what was missing. But I decided since the 7.8.3 Manager that comes in the All-in-One is perfectly functional, why go through the extra work for the 7.15.0 Manager when the already provided 7.8.3 works fine.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1984701 · Report as offensive     Reply Quote
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41577
Credit: 41,999,167
RAC: 464
Message 1984703 - Posted: 12 Mar 2019, 8:36:06 UTC - in response to Message 1984701.  

Whatever it is you are doing it obviously works with an RAC of 1 1/2 million!!!

The rest of us oiks will trundle along with Mr Gates offering :-)
ID: 1984703 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9863
Credit: 85,222,120
RAC: 67,169
United Kingdom
Message 1984704 - Posted: 12 Mar 2019, 8:58:12 UTC

I try and limit my impact to the servers by just reporting a modest amount of finished tasks at each connection.


Thank you Keith, all of the info I am seeing suggests that you and your fellow "extended GPU" users are aware of the impact you could have and are mitigating the consequences.

I was unaware of this and it is nice to know, as seeing the total of 595 apparent GPU's in the top 20 machines caused me some concern.
ID: 1984704 · Report as offensive     Reply Quote
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1584
Credit: 865,225,942
RAC: 1,116,986
Sweden
Message 1984709 - Posted: 12 Mar 2019, 9:43:12 UTC

This is slightly offtopic!
But i Think that if everyone starts to "delete" their old systems in their their own account the database will shrink when they do a maintenance and removing obsolete computer IDs.
I've cleared up my old computers because why should they ever need to be there when no Workunits have been assigned for ages to that ID.

Thinking this is one part that all of us users can do to minimize ram usage to somewhat degree for seti@home. Get rid of obsolete systems that takes up rows in the database.

https://setiathome.berkeley.edu/hosts_user.php?sort=rpc_time&rev=0&show_all=1&userid=

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1984709 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9863
Credit: 85,222,120
RAC: 67,169
United Kingdom
Message 1984730 - Posted: 12 Mar 2019, 12:24:13 UTC
Last modified: 12 Mar 2019, 12:28:40 UTC

For those who don't realise the link in the last post:

https://setiathome.berkeley.edu/hosts_user.php?sort=rpc_time&rev=0&show_all=1&userid=

Shows you all your computers, even ones that you may no longer own, each person sees their own computers.
ID: 1984730 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 21 · Next

Message boards : Number crunching : High performance Linux clients at SETI


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.