High performance Linux clients at SETI

Message boards : Number crunching : High performance Linux clients at SETI
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 20 · Next

AuthorMessage
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1984808 - Posted: 12 Mar 2019, 21:04:59 UTC - in response to Message 1984804.  

I Use Boinc manager 7.9.3 and got that bug to!

Boinc executable is 7.15.0.

installed with: apt-get install boinc-client from Ubuntu / Debian repository not downloaded elsewhere.

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1984808 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1984817 - Posted: 12 Mar 2019, 21:20:57 UTC

Now I'm up and running with a boincmgr without the offending (not-)fix. No list-jumps in the first five minutes :-)

As this bug is somewhat sporadic, it may take a while before I'm certain that it's gone.

Let's see what TBar says.
ID: 1984817 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1984824 - Posted: 12 Mar 2019, 21:49:59 UTC
Last modified: 12 Mar 2019, 21:55:19 UTC

While I'm at it:

The reason I decided to build my own boincmgr in the first place, was that I noticed that when the task list tab was selected, the process would use a lot of cpu even when the window was then minimized to the task bar. The problem was not very visible on a client with 400 tasks, but on my main cruncher, with 1600 tasks (admittedly not completely rightfully obtained), the boincmgr process would use 20% cpu when minimized.

And I found out why: Lines 1052 to 1056 in the current version of MainDocument.cpp contain the following:
    // Don't do periodic RPC calls when hidden / minimized
    if (!pFrame->IsShown()) return;
#ifdef __WXMAC__
    if (!wxGetApp().IsApplicationVisible()) return;
#endif

But when the window is minimized, it is apparently both Shown and Visible. So I added a call to IsIconized:
    // Don't do periodic RPC calls when hidden / minimized
    if (!pFrame->IsShown() || pFrame->IsIconized()) return;
#ifdef __WXMAC__
    if (!wxGetApp().IsApplicationVisible()) return;
#endif

And with that change, cpu usage practically disappears once the window is minimized.

[edit: not sure about this line at all]The new call probably belongs inside the #ifdef block, but I haven't tested this under windows, so I'm not sure.

Could anyone in here forward this modification as a suggestion to the right people? Richard?
ID: 1984824 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1984825 - Posted: 12 Mar 2019, 21:51:06 UTC - in response to Message 1984804.  

Well, it was pointed out back here, Posted: 23 Sep 2017, 13:32:50 UTC
The commit you linked in that post is headed

Work around an apparent bug in wxWidgets 3.0 on Linux
Can you confirm that removing that (I have to assume well-intentioned) new line "m_pListPane->EnsureVisible(iDocCount - 1);" was the only change you made to eliminate the jumping bug?

Christian Beer has just started work on making the necessary compatibility changes to enable us to develop with wxWidgets 3.1 and thus (I sincerely hope) remove the apparent bug we were apparently working round. I'll ask him tomorrow to include your report in his testing.

I did actually pass your previous report upstream as #2147, so there's a convenient reference to use, but as you say nobody appears to have actioned it yet.
Hmmm, this sounds similar to a question the FBI would ask when trying to accuse you of lying to the FBI. I cannot recall... Actually, the disk with that system on it died long ago. However, I vaguely remember something about using that section from 7.4.42, which doesn't have the Bug, and changing 7.4.44 & 7.8.3 to match that section of 7.4.42. Anyway, it seems to have worked, and I love my 7.4.44. But, it is getting old...
ID: 1984825 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1984828 - Posted: 12 Mar 2019, 22:06:43 UTC - in response to Message 1984704.  

I try and limit my impact to the servers by just reporting a modest amount of finished tasks at each connection.


Thank you Keith, all of the info I am seeing suggests that you and your fellow "extended GPU" users are aware of the impact you could have and are mitigating the consequences.

I was unaware of this and it is nice to know, as seeing the total of 595 apparent GPU's in the top 20 machines caused me some concern.

Yes, I only have 17 physical gpus in my five hosts. The same type of low physical count in the other top hosts who show a spoofed gpu client. There a some members with actual counts up to about 12 I think is the most we've seen on a repurposed crypto mining motherboard. The members who have more than 3 or 4 gpus in a single host can be counted on one hand. It is very difficult to get more than 4 gpus to cooperate together on typical motherboard hardware. I applaud their efforts because it shows great skills and perseverance.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1984828 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1984844 - Posted: 12 Mar 2019, 23:17:46 UTC - in response to Message 1984545.  
Last modified: 12 Mar 2019, 23:38:46 UTC

This is the very first I, and I expect others have heard about the "shutdown" during and after outages, this is to be commended, but I think that when anyone who cares to look can see these 64 GPU machines in the top 20, it might have been a good "PR" exercise to let people know that you weren't "swamping the server"

A little more explanation about how the "shutdown" works for example in the today's outage:

The SETI returns from the outage few hours ago, and my host still not ask or report a single WU (i have about 2K WU ready to UL & report) and will not do in the next 4 hours and when it does, it will report only 100 WU each 5 min.

Now multiply that for 10 or more hungry hosts who do the same, some with a lot more WU due their top GPU's .

This is why we said, with this approach we are actually making we & the servers pass the outages with a little less pain.
ID: 1984844 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1984847 - Posted: 12 Mar 2019, 23:30:05 UTC
Last modified: 12 Mar 2019, 23:30:33 UTC

Yes, same for me. Seti was very early to be back online again but all times are due to be adjusted in the future. My top hosts are now "disabled" , you'll see that by looking here. https://setiathome.berkeley.edu/hosts_user.php?userid=1635 .. They're not allowed to "talk" to the network. That setting unfortunately affect all boinc Projects but i don't care because i only do s@h solely.

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1984847 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1984849 - Posted: 12 Mar 2019, 23:34:45 UTC - in response to Message 1984847.  
Last modified: 12 Mar 2019, 23:36:36 UTC

Yes, same for me. Seti was very early to be back online again but all times are due to be adjusted in the future. My top hosts are now "disabled" , you'll see that by looking here. https://setiathome.berkeley.edu/hosts_user.php?userid=1635 .. They're not allowed to "talk" to the network. That setting unfortunately affect all boinc Projects but i don't care because i only do s@h solely.

Yes this is why i change today the setting to:

<day_prefs>
      		<day_of_week>2</day_of_week>
      		<net_start_hour>22.00</net_start_hour>
      		<net_end_hour>8.00</net_end_hour>
</day_prefs>

Stopping a little latter more close to the outage start time and waking 3 hours later for a 14 hrs of shutdown period instead of 12.
ID: 1984849 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1984850 - Posted: 12 Mar 2019, 23:36:55 UTC - in response to Message 1984849.  

My first one starts to send in about 12 hours from now.

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1984850 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1984851 - Posted: 12 Mar 2019, 23:45:32 UTC - in response to Message 1984825.  

Hmmm, this sounds similar to a question the FBI would ask when trying to accuse you of lying to the FBI. I cannot recall... Actually, the disk with that system on it died long ago. However, I vaguely remember something about using that section from 7.4.42, which doesn't have the Bug, and changing 7.4.44 & 7.8.3 to match that section of 7.4.42. Anyway, it seems to have worked, and I love my 7.4.44. But, it is getting old...
Well, I assure you that I'm not connected with the FBI - nor with its British equivalent (in this context) Special Branch.

But this is the way that bugs get fixed. It is incredibly helpful to have clear, unemotional, reports of

  • what you were doing at the time
  • what you saw on the screen
    * what you did to work round it


I once wrote a program (for commercial release) that had a bug. People kept ringing up about it, every few weeks. The same bug, every time. Neither I nor the office staff could make it happen. We were stumped.

After about 18 months, a guy rang up. He'd hit the bug. I spoke with him, and the conversation went something like...

... I was entering the data
... and I'd reached ...
... then the phone rang.
And I forgot what I'd been doing.
So I went back to ...
... and then it crashed.

That was enough. Nobody else had said "So I went back to ...". With that clue, I ran the program, reproduced the problem in a couple of minutes, and fixed it in another five minutes or so.

That was over 20 years ago, and it's stuck in my mind ever since (partly because I feel embarrassed about having written the bug in the first place).

We need those clear, detailed, reports to find what needs fixing.

I can pass it on. And I will, in the morning. But please give me the information I need, or it will never be fixed.

ID: 1984851 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1984867 - Posted: 13 Mar 2019, 1:05:44 UTC - in response to Message 1984851.  

Perhaps if you go back to where I was compiling numerous versions of BOINC and running each one until I found where the BUG started?
You can look around here, and below, https://setiathome.berkeley.edu/forum_thread.php?id=81916&postid=1891240#1891240

I probably compiled and tested a couple dozen versions of BOINC before narrowing it down to 7.4.43. From there it was a simple case of testing what was new in 7.4.43 from 7.4.42. That didn't take long, after tracking it down.
A couple of victims have already posted about it in this thread;
I ask because I've just compiled the 7.14.2 manager straight from the source repository, and today, after the outage, each time I get new tasks, the list jumps to the end (as if to show off the new tasks)!
I Use Boinc manager 7.9.3 and got that bug to!
Everyone running Linux is a victim here, they will all say the same thing, "each time I get new tasks, the list jumps to the end"
All you have to do is run the BOINC Manager in the Tasks Tab, scrolled to the Top. Soon it will start, then just about every 5 minutes, the page will be jumping to the bottom.
Unless, you are running one of My versions of BOINC, My BOINCs don't do that.
ID: 1984867 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1984870 - Posted: 13 Mar 2019, 1:47:10 UTC - in response to Message 1984867.  

If the branches you wanted to compare were newer, then you could use the compare function of Github at the commit level. But I can't figure out how to do a branch commit compare between 7.4.42 and 7.4.43 since the picklist only gives you the major 7.4 branch to choose from and not the sub branches.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1984870 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1984901 - Posted: 13 Mar 2019, 5:29:47 UTC - in response to Message 1984867.  

Everyone running Linux is a victim here, they will all say the same thing, "each time I get new tasks, the list jumps to the end"
All you have to do is run the BOINC Manager in the Tasks Tab, scrolled to the Top. Soon it will start, then just about every 5 minutes, the page will be jumping to the bottom.
Unless, you are running one of My versions of BOINC, My BOINCs don't do that.
Also; the offending call is inside some logic where GetDocCount is compared to GetCacheCount. I think this code means that the bug only shows itself when the number of tasks in the list changes. Specifically when it is reduced, as when, after the outage, I report 50 tasks and typically only get 7 new tasks back.
ID: 1984901 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1984917 - Posted: 13 Mar 2019, 11:03:08 UTC

OK, I hear what you say. I've updated #2147 to describe the problem in more formal terms (please check I've got that right, because I can't see it myself), and I've asked Christian Beer to investigate while he's working on #3050. He was online and active at the same time I was, so hopefully he will see the report quickly.
ID: 1984917 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1984940 - Posted: 13 Mar 2019, 14:59:20 UTC - in response to Message 1984828.  


The members who have more than 3 or 4 gpus in a single host can be counted on one hand. It is very difficult to get more than 4 gpus to cooperate together on typical motherboard hardware. I applaud their efforts because it shows great skills and perseverance.


I don't feel like I have great skill. :) I won't deny the "keep trying" though :)

Tom
A proud member of the OFA (Old Farts Association).
ID: 1984940 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1984956 - Posted: 13 Mar 2019, 16:26:47 UTC - in response to Message 1984917.  
Last modified: 13 Mar 2019, 16:55:06 UTC

OK, I hear what you say. I've updated #2147 to describe the problem in more formal terms (please check I've got that right, because I can't see it myself), and I've asked Christian Beer to investigate while he's working on #3050. He was online and active at the same time I was, so hopefully he will see the report quickly.

It's better than it was. However, My choice of Titles would have been different. "bug/workround no longer needed?" doesn't imply it's actually causing display problems to the effect it is. Also, to the best of My knowledge, this Bug affects All Linux users, especially those returning completed tasks every 5 minutes. The title would imply there isn't any problem, just that some thing isn't needed anymore. Quite misleading in My Opinion. For someone who keeps the Manager open to Tasks, trying to display the Competed/Active Tasks, this Bug is a Showstopper. I certainly won't use a version of BOINC with this Bug, which is why I built a version without this Bug.

Try running the Manager with this view, it won't happen with any of the BOINC versions released in the past FOUR years. Every 5 minutes the page will jump to the bottom displaying only recently downloaded tasks.

ID: 1984956 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1985294 - Posted: 15 Mar 2019, 12:35:57 UTC

Christian Beer has tested and reported back:

I've tested the described behavior with a local built using wxWidgets 3.1. The workaround is still required to handle the initial problem but it seems the side-effect described by Richard are gone now. So for the next Client Release we should update wxWidgets to also be in sync with the Mac Client.
So, assuming my description was accurate and complete, there is the prospect of a resolution on the horizon.

Is anyone currently experiencing the problem in a position to build and test a Manager using the development code from the referenced pull request, and confirm that the result is as you would like it?

Note that this solution is due to a change in wxWidgets code, not BOINC code.
ID: 1985294 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1985311 - Posted: 15 Mar 2019, 15:22:22 UTC - in response to Message 1985294.  

Christian Beer has tested and reported back:

I've tested the described behavior with a local built using wxWidgets 3.1. The workaround is still required to handle the initial problem but it seems the side-effect described by Richard are gone now. So for the next Client Release we should update wxWidgets to also be in sync with the Mac Client.
So, assuming my description was accurate and complete, there is the prospect of a resolution on the horizon.

Is anyone currently experiencing the problem in a position to build and test a Manager using the development code from the referenced pull request, and confirm that the result is as you would like it?

Note that this solution is due to a change in wxWidgets code, not BOINC code.

I suppose I can do a build this weekend and see how it behaves.

But I could use a slightly less convoluted description, just so I'll know that I do exactly what you want me to do and not something else.

Regular version 3.1.2 of wxWidgets?
Is there a more precise definition of "development code from the referenced pull request"?
ID: 1985311 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1985315 - Posted: 15 Mar 2019, 15:40:36 UTC - in response to Message 1985311.  

Is there a more precise definition of "development code from the referenced pull request"?
I would start with a clean clone from master.

Pull Request #3050 comprises commits:

Manager: remove deprecated wxWidgets flags
bbceb19b967f77132a78da8c2e94ca7c1df6cf0e

Build: update wxWidgets macro from wxWidgets 3.1.2
fbd15aff54030ffacc629d9139f1ec8bca76fa0f

Build: prepare m4 macros for wxWidgets 3.1
af1e1cb3607e4e2d46ed99489f74f9b261167a1f

You probably need to visit the individual pages to read the more detailed notes on the changes.
ID: 1985315 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1985316 - Posted: 15 Mar 2019, 15:43:23 UTC
Last modified: 15 Mar 2019, 15:45:38 UTC

Well, if you compile 7.14.2 with wxWidgets 3.0.3 all goes well. Just as with BOINC 7.8.3. Trying it with wxWidgets 3.1.0, which is what My Mac is using, ends with an Error;

/home/tbar/wxWidgets-3.1.0/include/wx/vector.h:44:23: note: previous declaration of ‘void wxQsort(void*, size_t, size_t, wxSortCallback, const void*)’
 WXDLLIMPEXP_BASE void wxQsort(void* pbase, size_t total_elems,
                       ^
In file included from BOINCListCtrl.h:59:0,
                 from ViewProjects.cpp:29:
ViewProjects.cpp: In constructor ‘CViewProjects::CViewProjects(wxNotebook*)’:
BOINCBaseView.h:26:58: error: ‘wxADJUST_MINSIZE’ was not declared in this scope
 #define DEFAULT_TASK_FLAGS             wxTAB_TRAVERSAL | wxADJUST_MINSIZE | wxFULL_REPAIN
                                                          ^
ViewProjects.cpp:186:53: note: in expansion of macro ‘DEFAULT_TASK_FLAGS’
     CBOINCBaseView(pNotebook, ID_TASK_PROJECTSVIEW, DEFAULT_TASK_FLAGS, ID_LIST_PROJECTSV
                                                     ^
Makefile:1685: recipe for target 'boincmgr-ViewProjects.o' failed
make[2]: *** [boincmgr-ViewProjects.o] Error 1
make[2]: Leaving directory '/home/tbar/boinc/clientgui'
So, that's going to need to be fixed to use 3.1.0.
ID: 1985316 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 20 · Next

Message boards : Number crunching : High performance Linux clients at SETI


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.