High performance Linux clients at SETI

Message boards : Number crunching : High performance Linux clients at SETI
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 21 · Next

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3391
Credit: 179,006,185
RAC: 818,851
United States
Message 1984435 - Posted: 10 Mar 2019, 20:24:00 UTC

I just started getting http internal server errors. Is it just me?

Tom
I will stop procrastinating tomorrow.
\\// Live Long & Prosper (starting tomorrow ;)
ID: 1984435 · Report as offensive     Reply Quote
Profile Wiggo "Democratic Socialist"
Avatar

Send message
Joined: 24 Jan 00
Posts: 16585
Credit: 221,799,587
RAC: 175,495
Australia
Message 1984440 - Posted: 10 Mar 2019, 20:56:12 UTC

It maybe just you Tom as it's all working fine and quick here.

Cheers.
ID: 1984440 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3391
Credit: 179,006,185
RAC: 818,851
United States
Message 1984444 - Posted: 10 Mar 2019, 21:13:13 UTC - in response to Message 1984440.  

It maybe just you Tom as it's all working fine and quick here.

Cheers.


Yes, it was "just me". I dusted off a Linux/Cuda91 HD and Seti didn't want to play. I removed Seti, re-installed Tbars-all-in-One. Started the Boinc Manager up and am running a new computer id. :)

Ton
I will stop procrastinating tomorrow.
\\// Live Long & Prosper (starting tomorrow ;)
ID: 1984444 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13035
Credit: 143,796,689
RAC: 197,931
United Kingdom
Message 1984449 - Posted: 10 Mar 2019, 21:25:49 UTC - in response to Message 1984435.  

I just started getting http internal server errors. Is it just me?

Tom
You really ought to look at what you send to the server to see what makes it fall over. It's the contents of the file

sched_request_setiathome.berkeley.edu.xml

which remains unaltered in your top-level BOINC data directory until overwritten by the next request five minutes later.

Other people - notably Keith - also complain about http 500 Internal Server Errors. Could something indigestible be being generated by that non-standard BOINC version?
ID: 1984449 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9354
Credit: 839,256,037
RAC: 1,989,028
United States
Message 1984475 - Posted: 11 Mar 2019, 0:30:19 UTC - in response to Message 1984449.  

I only get those on congested Tuesdays as can be expected.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1984475 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4762
Credit: 517,689,697
RAC: 1,285,988
United States
Message 1984476 - Posted: 11 Mar 2019, 0:43:24 UTC - in response to Message 1984449.  

Could something indigestible be being generated by that non-standard BOINC version?
Just curious as to your definition of 'non-standard'. As the person that compiled 7.8.3 I can declare it is probably More 'standard' than any other version of BOINC for Ubuntu you will find. The only change from the client release is the omission of a failed Manager Bug of a Bug Fix that caused more problems than it allegedly fixed. The BOINC part is untouched from the release. So, without that Bad Manager 'Bug Fix', it is more standard than any compile that includes that alleged 'Bug Fix' which itself is a Bug. I think I've had One Person say they had a display problem in a Non-Ubuntu system, No complaints from those running Ubuntu. There is only one part of the Berkeley download page for Linux that is still current, that would be the part that claims;
Linux x64
Tested on the current Ubuntu distribution; may work on others.

Shame Linux is the only Platform BOINC will not provide an App. The 7.8.3 version would have been a welcome addition to that page, as it is, those Linux Apps at Berkeley should probably be removed as they haven't worked on any newer versions of Linux in years.

BTW, Keith hasn't used 7.8.3 in a very long time. At present he isn't using any Version of BOINC from the All-In-One package.
The only time I see 'http 500 Internal Server Errors' in Linux is when My other Non-Linux Machines are giving the same Error after an Outage.
ID: 1984476 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9354
Credit: 839,256,037
RAC: 1,989,028
United States
Message 1984479 - Posted: 11 Mar 2019, 1:25:40 UTC
Last modified: 11 Mar 2019, 1:27:48 UTC

Anybody who reads the forums regularly probably has developed the impression as I have that there is an anti-Linux bias present by many members. Even have one member using a username professing his bias. So pretty obvious.

And Richard, you yourself have told me you know nothing about Linux and that I was left to my own devices to compile the BOINC platform once I uncovered a bug that had been present for years. So any wonder since we have no official Linux support, for apps or recent BOINC releases that we are left to our own devices to support ourselves?

And TBar's BOINC 7.8.3 fixed a bug that had been present for years and never attended to by the BOINC code maintainers with the jumping task lists sorting bug. Did he get any recognition of that fact? No. Just ignored as usual with Linux being the red-headed stepchild.

I agree that old 7.2.42 BOINC repository version should be removed. Causes more issues than it helps. If the new user needs to install Linux versions of BOINC, they should get it from their distro repository as it at least will be current within a couple of releases of the current code branch. Or approach one of the current Linux users for help and support as we actually can offer some practical support and knowledge that the Windows BOINC support mechanism is useless for now.

We certainly are not getting any direct support from normal BOINC resources that all Windows and MAC users regularly receive. So the Linux user community is left to support themselves. As we traditionally have.

But then we see negative posts and comments alluding to our use of Linux. I just try an ignore the obvious bias and soldier on as best I can. But it is disheartening to regularly see the anti Linux bias since I want to assume all members who contribute their processing power would be welcomed with open arms and it shouldn't matter what platform they run their computing devices on.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1984479 · Report as offensive     Reply Quote
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41575
Credit: 41,972,921
RAC: 258
Message 1984514 - Posted: 11 Mar 2019, 7:41:34 UTC
Last modified: 11 Mar 2019, 7:46:32 UTC

I wouldn't say that there is an active anti-Linux bias here. If anything it is recognised that a properly set up Linux box will out perform a Windows 7 or 10 rig many times over. I don't know the statistics here but I would guess that Linux users are likely to be about 10% of all users, although I could be completely wrong.. This is mainly because that to set up and manage a Linux system requires a higher degree of technical knowledge than with Windows, and is generally seen as a niche system.

People find it easier to run a Windows box because there is generally more support for them. Boinc developers necessarily have to prioritise their meagre resources on the majority of users. That is not an active bias against Linux,just being practical.

I know nothing at all about Linux and I haven't got the interest to want to learn, but good luck to those that do.
ID: 1984514 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9354
Credit: 839,256,037
RAC: 1,989,028
United States
Message 1984524 - Posted: 11 Mar 2019, 8:46:44 UTC

Last stats said Seti had 92K active users. There is nowhere near 10% or 9K Linux users. I would hazard a guess at maybe a couple hundred of active Linux users. Definitely a minority.

But what I was commenting on are the digs buried in comments that the Linux users are "cheating the system" "causing undo pressures to the database" or "Linux users are not using the system as designed" vis-a-vis that they aren't using Windows as the majority of users are. That is the type of bias I read in the posts.

The Seti servers run on Scientific Linux so there must be someone at Seti that knows Linux.

I don't have any animosity towards Windows. I started BOINC on Windows but before BOINC on Classic I was an OS/2 Warp user, an even smaller minority. It never got any love either. I never had much issues with Windows 7 but the one machine I converted over to Windows 10 was a large mistake. Lots of issues and finally I decided I had had enough of Windows and decided to revisit Linux. It just so happened that the Linux special app appeared at the same time and soon proved itself to be the most efficient at processing Seti MB tasks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1984524 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13035
Credit: 143,796,689
RAC: 197,931
United Kingdom
Message 1984526 - Posted: 11 Mar 2019, 8:57:47 UTC

Sorry - I seem to have touched a raw nerve here. I haven't got time to answer all those points before some of you leave us for a night's sleep, but I hope to have a full reply posted before you wake up again. Apologies if you felt I disrespected Linux users - that wan't my intention at all - but it does go to the heart of how we analyse the causes of problems, and what we do with that information once we have it. Later.
ID: 1984526 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9832
Credit: 80,426,533
RAC: 284,791
United Kingdom
Message 1984528 - Posted: 11 Mar 2019, 9:18:25 UTC
Last modified: 11 Mar 2019, 9:26:53 UTC

I think the problem is two fold.

1) The Linux app is so fast that each machine with a modern GPU cannot last through a normal outage, let alone the sort of problems we had last week.
This has caused problem 2

2) Some Linux special app users have recompiled Boinc so as to report large numbers of GPU's typically 64 or 48 so to be able to stockpile work for the outage.

What this does is this.

If you look at just the top 20 machines, and assume each one has an average of 4 GPU's, after an outage or longer shutdown they would be trying to return 20x4x100=8,000 tasks

Of course they will then request a further 8,000 tasks, however there are not 80 GPU's in the top 20 machines but 595, so they will be returning and asking for up to 595x100=59,500 tasks

The top twenty machines are asking for 7 times the amount of work you might reasonably expect.

Now to me that would appear to help with the slowness of the recovery after an outage, and I think it might just annoy or upset others.

Please note I having nothing against Linux, I have even started a new Linux cruncher yesterday, however I can see how it make make others question things
ID: 1984528 · Report as offensive     Reply Quote
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1559
Credit: 806,395,181
RAC: 1,087,461
Sweden
Message 1984532 - Posted: 11 Mar 2019, 10:08:47 UTC - in response to Message 1984528.  
Last modified: 11 Mar 2019, 10:09:49 UTC


Now to me that would appear to help with the slowness of the recovery after an outage, and I think it might just annoy or upset others.

Please note I having nothing against Linux, I have even started a new Linux cruncher yesterday, however I can see how it make make others question things



This is part of our agenda which we have provided as an extra information when using "buffered versions".

".....Again to releave the pressure on the servers to give everybody else what they need until it's time to start to fill up the buffers again.
Why? To use the advantage of GPU spoofed Boinc executable to actually benefit the servers instead of doing DDOS war amongst all other users. Use the benefits of larger cache to ease the pressure instead as a polite favour. "

To put it in Another perspective, my hosts doesn't even try to download after the servers comes live. Aprox. 12 hours after give or take a few hours the first host is allowed to Contact to download workunits, then about one hour later, the other one gets allowance and so on.
We don't want to be part of the flood of users want to get their share of WU's because it's empty or near empty.

Some of us has such amount of fast GPUs so it's needed to try to get something amongst the other perhaps 140K hosts+ or so but for me it's not needed to be part and try to elbow to get work.
So this special version has allowed us to AID instead of hampering the system.
That's a huge difference isn't it.

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1984532 · Report as offensive     Reply Quote
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41575
Credit: 41,972,921
RAC: 258
Message 1984533 - Posted: 11 Mar 2019, 10:18:17 UTC

I did say that 10% was a guess, it seems to be more like less than 0.25%.

Linux users are "cheating the system" "causing undo pressures to the database" or "Linux users are not using the system as designed"

To be fair, back 1999 Seti@Home was not set up to cater for Linux users because the system developers saw that the vast majority of users would be Windows based. But Admins constantly say that they need more computing power from the user base, which Linux produces. OK there may be 200 (.05%) of users running Linux, but the interesting stat would be to find out how much % of the total results are produced by Linux users. If it could be shown say, that those 200 people produce %5 of the throughput then I'm sure that more official support would be forthcoming. I remember back in the 90's we ran a mail server on OS/2 an IBM OS. It never took off as a desktop system. The truth is that we live in a Wintel world and Nvidia cards perform better on Seti than ATI ones do.

I have two Win 10 boxes that came delivered with Win 10. I found I couldn't happily use them without installing classic-shell first, so they looked and behaved like Win 7. But I have to say that under the bonnet Win10 seems faster and smoother but harder to use than Win 7. And updating Win 7 to Win 10 is a no-no. Clean installs only.

I would suggest that the average Setizen just wants a simple set & forget rig, Wintel & Nvidia give that out of the box. But if you are of a mind to go tweak tweak fiddle fiddle, to get the maximum output, then by all means beat your own way along the LInux path. But minority users have to expect minority support.
ID: 1984533 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9832
Credit: 80,426,533
RAC: 284,791
United Kingdom
Message 1984536 - Posted: 11 Mar 2019, 11:17:30 UTC
Last modified: 11 Mar 2019, 11:28:39 UTC

As suggested, starting a thread to separate this discussion from the "panic mode thread"

Thanks to Richard for suggesting it. I had the same thought while out for my morning walk!
ID: 1984536 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9832
Credit: 80,426,533
RAC: 284,791
United Kingdom
Message 1984538 - Posted: 11 Mar 2019, 11:35:48 UTC

".....Again to releave the pressure on the servers to give everybody else what they need until it's time to start to fill up the buffers again.
Why? To use the advantage of GPU spoofed Boinc executable to actually benefit the servers instead of doing DDOS war amongst all other users. Use the benefits of larger cache to ease the pressure instead as a polite favour. "

To put it in Another perspective, my hosts doesn't even try to download after the servers comes live. Aprox. 12 hours after give or take a few hours the first host is allowed to Contact to download workunits, then about one hour later, the other one gets allowance and so on.
We don't want to be part of the flood of users want to get their share of WU's because it's empty or near empty.


So you are saying that after any outage, all the users with "spoofed" GPU's wait until the traffic dies down before reporting or asking for new work.

If so then I applaud you community spirit, and it may help to let this fact be better known.
ID: 1984538 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13035
Credit: 143,796,689
RAC: 197,931
United Kingdom
Message 1984540 - Posted: 11 Mar 2019, 11:49:39 UTC - in response to Message 1984538.  

I applaud your community spirit, and it may help to let this fact be better known.
Here, here. I agree on both counts.
ID: 1984540 · Report as offensive     Reply Quote
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41575
Credit: 41,972,921
RAC: 258
Message 1984541 - Posted: 11 Mar 2019, 12:00:37 UTC

Hear hear from me too. The vast majority of Setizens run rigs that don't need hundreds of tasks downloaded at a time like the top performers do. It makes sense to satisfy the needs of the majority of users first, before letting the heavy gang chime in for work.
ID: 1984541 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 7925
Credit: 477,274,218
RAC: 518,708
Panama
Message 1984543 - Posted: 11 Mar 2019, 12:40:01 UTC
Last modified: 11 Mar 2019, 13:18:24 UTC

Adding more info to the thread, not gasoline please understand it.

We are only few who use this spoofed builds, all are heavy users, with high performance crunchers and besides me (who knows almost nothing) the others are well trained on the "boinc mysteries". We All care about the SETI project and the integrity of the DB and try to do all we can to preserve it. That's is one of the main reason why we keep this builds on a closed circle.

Makes little sense the claims about them, we all use the rescheduling before this builds, the only thing who changes is now we not need the rescheduler anymore, nothing changes about the number of WU stored on our computer caches and they present no crunching speed or performance gain.

I know fear about the unknown is natural, but this is not the case to be fear about. We only change the program to avoid the use of the rescheduler, not the way the boinc works, so I'm sure has nothing to do with the http error posted on the thread or the last week DB crash as was suggested on another thread.

Adding of what Vyper's posted, from our team closed forum:
------------------------------------------------------------
Also pulling 6000 extra tasks from the server right after maintenance/outage is just stupid management in my eyes. Be kind to others, and the servers!
I see Vyper shows a scheduled network disable during that time to prevent that -- Good going there!

Sorry. My mistake. I incorrectly imagine the mate who use this builds is an advanced user and knows about bunkering and how to control the request for new work after the outages like Vypers explained. The main idea of this builds is exactly help to pass the outages without pain. For us & the servers.
In my case the host automatically shout down the request for the new work an hour before the outages and return to ask for new work after 12 hrs only. 7AM to 7PM in my time zone (UTC -5 similar to EST in the US just not have the DST).

   <day_prefs>
      <day_of_week>2</day_of_week>
      <net_start_hour>19.00</net_start_hour>
      <net_end_hour>7.00</net_end_hour>
   </day_prefs>


<edit> If the outage is unscheduled i manage the start/stop request for new work manually. But the large cache could keep my host crunching for about 1.5 days even on this unexpected outages.
------------------------------------------------------------

So you could see we are trying to do our best to avoid pushing large amount of WU when the outages happening.

If anyone wish to blame us for anything be free to post but please not because the fear of the unknown. We just find a different path to follow instead of the rescheduling.

I know a lot uses the reschedulers (there are a lot of them who works fine), and nobody say nothing about and the impact on the Wu cache is exactly the same
ID: 1984543 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9832
Credit: 80,426,533
RAC: 284,791
United Kingdom
Message 1984545 - Posted: 11 Mar 2019, 13:45:41 UTC

We are only few who use this spoofed builds, all are heavy users, with high performance crunchers and besides me (who knows almost nothing) the others are well trained on the "boinc mysteries". We All care about the SETI project and the integrity of the DB and try to do all we can to preserve it. That's is one of the main reason why we keep this builds on a closed circle.


Thank you for explaining.

This is the very first I, and I expect others have heard about the "shutdown" during and after outages, this is to be commended, but I think that when anyone who cares to look can see these 64 GPU machines in the top 20, it might have been a good "PR" exercise to let people know that you weren't "swamping the server"


I know a lot uses the reschedulers (there are a lot of them who works fine), and nobody say nothing about and the impact on the Wu cache is exactly the same


Yes I should probably have mentioned them in my earlier post, however they are not as easy to see, but after what you have said, I hope they are doing the same thing and not swamping the servers with thousands of tasks after an outage and displaying the same community spirit.
ID: 1984545 · Report as offensive     Reply Quote
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1559
Credit: 806,395,181
RAC: 1,087,461
Sweden
Message 1984548 - Posted: 11 Mar 2019, 14:11:24 UTC - in response to Message 1984545.  

We are only few who use this spoofed builds, all are heavy users, with high performance crunchers and besides me (who knows almost nothing) the others are well trained on the "boinc mysteries". We All care about the SETI project and the integrity of the DB and try to do all we can to preserve it. That's is one of the main reason why we keep this builds on a closed circle.


Thank you for explaining.

This is the very first I, and I expect others have heard about the "shutdown" during and after outages, this is to be commended, but I think that when anyone who cares to look can see these 64 GPU machines in the top 20, it might have been a good "PR" exercise to let people know that you weren't "swamping the server"


I know a lot uses the reschedulers (there are a lot of them who works fine), and nobody say nothing about and the impact on the Wu cache is exactly the same


Yes I should probably have mentioned them in my earlier post, however they are not as easy to see, but after what you have said, I hope they are doing the same thing and not swamping the servers with thousands of tasks after an outage and displaying the same community spirit.


I can't speak for that "everyone" does it to 100% in the team, in the guide how to use the Boinc executable i've written about that it's a recommendation to do so because when we all got tasks to Crunch, why should the host act as a part and "steal" bandwidth for 140000 other hosts trying to report/download stuff in a crowded Connection.
If the particular host is participating to that we only add a whole bunch of TCP timeouts to no avail anyway because seti is totally flooded with demand/report requests. We even have a guide that is set to only report 100 results at a time just to prevent these timeouts, because that's only unneccesary requests.

https://setistats.haveland.com/sah_v8_creation.html If we look at this chart (weekly) we can see that after an extended outage it took almost two whole days for the systems to get back to where it was Before. And that is for only a real downtime for about 12 to 16 hours.

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1984548 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 21 · Next

Message boards : Number crunching : High performance Linux clients at SETI


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.