setiathome and mfold running at the same time ?!

Message boards : Number crunching : setiathome and mfold running at the same time ?!
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Goner

Send message
Joined: 4 Jun 99
Posts: 10
Credit: 34,310
RAC: 0
Netherlands
Message 95551 - Posted: 6 Apr 2005, 7:06:19 UTC

hi,

using an optimized BOINC 4.25 (http://boinc.us.tt/) ;
the GUI says that Proteinpredictor is paused and Seti@Home is running, but with 'top' (running Linux) i see this :

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18659 rob 20 19 43120 42m 600 R 47.5 8.4 0:59.91 mfoldB125_4.27_
18664 rob 19 19 15108 14m 1252 R 47.5 2.9 0:40.58 setiathome_4.02

so both clients are running and sharing cpu !? is this normal ?
when i stop the gui and client and start them again, only 1 client runs ...


ID: 95551 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 95586 - Posted: 6 Apr 2005, 11:38:54 UTC - in response to Message 95551.  

> hi,
>
> using an optimized BOINC 4.25 (http://boinc.us.tt/) ;
> the GUI says that Proteinpredictor is paused and Seti@Home is running, but
> with 'top' (running Linux) i see this :
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 18659 rob 20 19 43120 42m 600 R 47.5 8.4 0:59.91 mfoldB125_4.27_
> 18664 rob 19 19 15108 14m 1252 R 47.5 2.9 0:40.58 setiathome_4.02
>
> so both clients are running and sharing cpu !? is this normal ?
> when i stop the gui and client and start them again, only 1 client runs ...

There is a problem with Unix type machines where BOINC seems to continue to run more than the number of CPUs worth of processes. For example, on my Dual Processor Mac I have seen as many as 4 BOINC Science Applications running. More commonly 3, most of the time, it "fixes" itself after a short time.

ID: 95586 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3224
Credit: 4,603,826
RAC: 0
United States
Message 95607 - Posted: 6 Apr 2005, 13:45:03 UTC

I currently do not run a *nix, but on the Wind machines (we all know that wind blows)... The projects stay running in memory, even when not running. Could the be the same case? Or is it really crunching multiple units simutaneously?

Just an observation, and maybe other *nix gurus could better explain this.

Some day I will get my *nix machines back and running.



My movie https://vimeo.com/manage/videos/502242
ID: 95607 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 95656 - Posted: 6 Apr 2005, 16:07:22 UTC - in response to Message 95607.  
Last modified: 6 Apr 2005, 16:08:07 UTC

> I currently do not run a *nix, but on the Wind machines (we all know that wind
> blows)... The projects stay running in memory, even when not running. Could
> the be the same case? Or is it really crunching multiple units
> simutaneously?

No, what I use is the "Activity Monitor" which allows you to see the running processes. If you look at this you will see a "normal" look with 2 Predictor@Home models actually running; and 1 Seti@Home, and 2 Einstein@Home models suspended in memory.

You can tell that by the fact that the CPU is allocated to the two, while the other 5 have 0 CPU.

If you look at this
you will see a case where I had 3 Models running at the same time. This is a reported Bug that the bug database says is "Unconfirmed", why it is unconfirmed when you have examples like this is beyond me ...
ID: 95656 · Report as offensive
N/A
Volunteer tester

Send message
Joined: 18 May 01
Posts: 3718
Credit: 93,649
RAC: 0
Message 95675 - Posted: 6 Apr 2005, 18:08:02 UTC - in response to Message 95656.  

Or you could top in the Terminal.
ID: 95675 · Report as offensive
Profile FalconFly
Avatar

Send message
Joined: 5 Oct 99
Posts: 394
Credit: 18,053,892
RAC: 0
Germany
Message 95687 - Posted: 6 Apr 2005, 19:13:29 UTC - in response to Message 95675.  
Last modified: 6 Apr 2005, 19:14:52 UTC

I'm also seeing that all the time on my Linux machines (optimized BOINC V4.19)...

This has been with BOINC since a loong time, and I have no clue when they're going to permanently bugfix it.
Worst case, one of the affected Projects (either the one supposed to be running, or the one supposed to be Paused) is slowed down to an absolute crawl.

Suggested Workaround :
Temporarily switch affected BOINC RunMode to "Suspend", then back to Normal (Based on Preferences or Run always).

In 99.9% of all Cases, this has fixed the Problems for me.
ID: 95687 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 95691 - Posted: 6 Apr 2005, 19:25:30 UTC - in response to Message 95687.  

> Suggested Workaround :
> Temporarily switch affected BOINC RunMode to "Suspend", then back to Normal
> (Based on Preferences or Run always).
>
> In 99.9% of all Cases, this has fixed the Problems for me.

I am going to have to try that the next time I see that problem.

The other "Unix" bug that gets me is the one where BOINC leaves a process running after it quits. Normal "kill" won't stop it, and you can't start BOINC again ... Force Quit kills it, but in the case of CPDN it also trashes the work ...



ID: 95691 · Report as offensive
N/A
Volunteer tester

Send message
Joined: 18 May 01
Posts: 3718
Credit: 93,649
RAC: 0
Message 95703 - Posted: 6 Apr 2005, 20:28:15 UTC - in response to Message 95691.  
Last modified: 6 Apr 2005, 20:28:35 UTC

How about sudo kill -9 PID? The WU doesn't get trashed, but you do lose up to the last n minutes of work (depending on how often you write to disk).
ID: 95703 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 95722 - Posted: 6 Apr 2005, 21:32:53 UTC
Last modified: 6 Apr 2005, 21:33:28 UTC

I see the "multiple things keep running" quite often on redhat with a 2.4 kernel. It seems to be most common when a wu for one project complete, and then appears to fire up the next project, along a new WU for the one just completed (after upload).

This has happened with CC 4.19 and 4.27. (I moved to 4.27 for the pure reason to see if this problem had been resolved)

One thing I've found which seems to make it a bit better is to set the "connect to network" days large enough that's there's always atleast one "ready to run" WU for each project. With this, I've been able to run a few days with on the problem when running CP and SETI. If I throw Einstin into the soup, the problem re-appears.

The box I'm running windows on had never should this problem. A project pauses, and another starts up consistantly.. (4.19 and 4.26)
ID: 95722 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 95794 - Posted: 7 Apr 2005, 2:31:59 UTC - in response to Message 95703.  

> How about sudo kill -9 PID? The WU doesn't get trashed, but you
> do lose up to the last n minutes of work (depending on how often you write to
> disk).

I have been trying the "quit" in activity monitor ... which should be Kill -9 ...

Never works ... so I "Force Quit" the process. Only way to stop it ... I could reboot, but this happens often enough that it get too difficult sometimes.

ID: 95794 · Report as offensive
N/A
Volunteer tester

Send message
Joined: 18 May 01
Posts: 3718
Credit: 93,649
RAC: 0
Message 95798 - Posted: 7 Apr 2005, 2:45:28 UTC - in response to Message 95794.  
Last modified: 7 Apr 2005, 2:48:04 UTC

"Force Quit" only kills OS X graphical apps. Example: Try to kill BOINC Menubar. You can't.

Activity Monitor only asks the thread to die - It won't annihilate zombies and sleepers effectively.

Trust me, a kill -9 outweighs a ⌘⌥⎋ any day.
ID: 95798 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 95822 - Posted: 7 Apr 2005, 4:20:14 UTC - in response to Message 95794.  
Last modified: 7 Apr 2005, 4:29:45 UTC

> I have been trying the "quit" in activity monitor ... which should be Kill -9
> ...
>
> Never works ... so I "Force Quit" the process. Only way to stop it ... I
> could reboot, but this happens often enough that it get too difficult
> sometimes.
>
>

Paul.. I'm very glad that someone tied to Boinc (as the documentation man) has acknowled this problem. That means that the developers might acknowledge that there is a problem and start looking into it!

BTW I've had a few times that kill -9 doesn't do it (boinc and all crunchers after a couple minutes) einstien seems to result in a state where no crunchers run and the kill -9 doesn't fix things... A shared semaphone that doesn't get freed correctly?

I might notice it more often than some, in that I wrote a text based monito that prints the project/percent/total credits/ active WU's every 15 minutes, so it's very easy to notice if the percentages aren't changing over time....

(here's just an example of three machine, will all working well, and the first machine offline..)

Wed 19:45 | ? | C 2.39% 114523 2 | C 10.22% 114618 4 |
Wed 20:00 | ? | C 2.43% 114523 2 | C 10.24% 114618 4 |
Wed 20:15 | ? | C 2.48% 114523 2 | C 10.28% 114618 4 |
Wed 20:30 | ? | S 48.68% 114523 2 | S 85.73% 114618 4 |
Wed 20:45 | ? | S 54.34% 114523 2 | S 88.06% 114618 4 |
Wed 21:00 | ? | S 60.00% 114523 2 | S 90.44% 114618 4 |
Wed 21:15 | ? | C 2.52% 114523 3 | S 92.77% 114618 4 |

The data is obtained by reading client_state.xml. The projects displayed are Seti and CP


I think I've posted the symptoms on the forums for every project I'm crunching a few months back, but this is the closest to even acknowledging a problem as I've seen!

Maybe soon I can stop babysitting every Linux box I have where I try to run multiple projects!
ID: 95822 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 95969 - Posted: 7 Apr 2005, 16:13:58 UTC - in response to Message 95822.  

> Paul.. I'm very glad that someone tied to Boinc (as the documentation man)
> has acknowled this problem. That means that the developers might acknowledge
> that there is a problem and start looking into it!

Um, well, no ...

I am just another participant like you ...

I just have lots of time on my hands as I am disabled and bored out of my socks (most days I don't even get them out of the drawer) ...

In fact, *MY* impression is that UCB listens to me less than anyone ... then again, just because I am paranoid does not mean that *THEY* are not out to get me ...

YOu can try to post your data to the bug database ... I am sure I saw an open/unconfirmed report already there ... I have posted mine to the developer's mailing list ... if I don't forget and find the time I can try to add mine to there (bug base) also ...
ID: 95969 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 96224 - Posted: 8 Apr 2005, 5:42:54 UTC - in response to Message 95969.  


> YOu can try to post your data to the bug database ... I am sure I saw an
> open/unconfirmed report already there ... I have posted mine to the
> developer's mailing list ...

It's been a subject on the "questions and problems" section of may project websites for months.

Does no one from development pay any attention to the "problems" type issues posted in the "problems and questions" sections of the forums?

"we have no problems unless it shows up in the bug database, even though we don't tell people how to post there, but give them a "problem and questions" section in the forum instead, which we ignore"

I can tell you that 10's of people have reported this problem in the the "problems and question" sections on various projects....

Maybe they need a "report a bug" link on the website which will allow reporting a bug to the bug database. Maybe then, they'll actually start fixing things that have been around for months....
ID: 96224 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 96242 - Posted: 8 Apr 2005, 7:35:14 UTC - in response to Message 96224.  


> I can tell you that 10's of people have reported this problem in the the
> "problems and question" sections on various projects....

10's of people, out of what, 150,000? Yep, that's important.

... and we know Rom reads your comments, Woody, just for the entertainment value.
ID: 96242 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 96751 - Posted: 10 Apr 2005, 2:06:47 UTC - in response to Message 96242.  

>
> > I can tell you that 10's of people have reported this problem in the the
> > "problems and question" sections on various projects....
>
> 10's of people, out of what, 150,000? Yep, that's important.
>
> ... and we know Rom reads your comments, Woody, just for the entertainment
> value.
>

Ok. so lets say there were 100 reports (but there were more) by different users on all different projects. That does seem to be a greater number of posts when the Screen Saver has a hickup, yet the screen saver is Windows, and seems to get fixed. For each person that posted this problem, there could be 10 that saw the problem and said "too flakey for me... I won't run it!" Out of the 150k users, how many are actually active? (let's say more than 100 credits) and how hany are non-windows? Maybe 20k on a good day! (a guess, as I don't see anyway to extract this from the database at the user level....) So the 100 posts could represent 10% of the active user base for that platform! Just liook at this thread and you'll see that people are providing info, yet the "bug database" has one "unconfirmed" report???

If you realy knew what Rom did, you'd also understand that he has minimal involvement with the Linux code (he's a windows guy).

I also find it quite childish of you claim this is "enetertainment value", as I see you speaking with only two windows boxes attached, and no first-hand knowledge to BOINC in in Linux environment, yet there seem to be a few more than just me that have reported the problem and are looking for some form of feedback...

Remember, I did not start this thread, as others are experiencing this problem. Why, even the guy that has done more for Boinc than any 10 of you "want-a-bees" (Paul Buck) reports it! Does Rom consider him "Entertainment value" too?
ID: 96751 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 96754 - Posted: 10 Apr 2005, 2:12:50 UTC - in response to Message 96751.  

> >
> > > I can tell you that 10's of people have reported this problem in the
> the
> > > "problems and question" sections on various projects....
> >
> > 10's of people, out of what, 150,000? Yep, that's important.
> >
> > ... and we know Rom reads your comments, Woody, just for the
> entertainment
> > value.
> >
>
> Ok. so lets say there were 100 reports (but there were more) by different
> users on all different projects. That does seem to be a greater number of
> posts when the Screen Saver has a hickup, yet the screen saver is Windows, and
> seems to get fixed. For each person that posted this problem, there could be
> 10 that saw the problem and said "too flakey for me... I won't run it!" Out
> of the 150k users, how many are actually active? (let's say more than 100
> credits) and how hany are non-windows? Maybe 20k on a good day! (a guess, as
> I don't see anyway to extract this from the database at the user level....)
> So the 100 posts could represent 10% of the active user base for that
> platform! Just liook at this thread and you'll see that people are providing
> info, yet the "bug database" has one "unconfirmed" report???
>
> If you realy knew what Rom did, you'd also understand that he has minimal
> involvement with the Linux code (he's a windows guy).
>
> I also find it quite childish of you claim this is "enetertainment value", as
> I see you speaking with only two windows boxes attached, and no first-hand
> knowledge to BOINC in in Linux environment, yet there seem to be a few more
> than just me that have reported the problem and are looking for some form of
> feedback...
>
> Remember, I did not start this thread, as others are experiencing this
> problem. Why, even the guy that has done more for Boinc than any 10 of you
> "want-a-bees" (Paul Buck) reports it! Does Rom consider him "Entertainment
> value" too?
>

Remember - you shouldn't attack the messenger!

ID: 96754 · Report as offensive
Pascal, K G
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2343
Credit: 150,491
RAC: 0
United States
Message 96758 - Posted: 10 Apr 2005, 2:27:01 UTC - in response to Message 96754.  
Last modified: 10 Apr 2005, 2:27:35 UTC

> > >
> > > > I can tell you that 10's of people have reported this problem
> in the
> > the
> > > > "problems and question" sections on various projects....
> > >
> > > 10's of people, out of what, 150,000? Yep, that's important.
> > >
> > > ... and we know Rom reads your comments, Woody, just for the
> > entertainment
> > > value.
> > >
> >
> > Ok. so lets say there were 100 reports (but there were more) by
> different
> > users on all different projects. That does seem to be a greater number
> of
> > posts when the Screen Saver has a hickup, yet the screen saver is
> Windows, and
> > seems to get fixed. For each person that posted this problem, there
> could be
> > 10 that saw the problem and said "too flakey for me... I won't run it!"
> Out
> > of the 150k users, how many are actually active? (let's say more than
> 100
> > credits) and how hany are non-windows? Maybe 20k on a good day! (a
> guess, as
> > I don't see anyway to extract this from the database at the user
> level....)
> > So the 100 posts could represent 10% of the active user base for that
> > platform! Just liook at this thread and you'll see that people are
> providing
> > info, yet the "bug database" has one "unconfirmed" report???
> >
> > If you realy knew what Rom did, you'd also understand that he has
> minimal
> > involvement with the Linux code (he's a windows guy).
> >
> > I also find it quite childish of you claim this is "entertainment
> value", as
> > I see you speaking with only two windows boxes attached, and no
> first-hand
> > knowledge to BOINC in in Linux environment, yet there seem to be a few
> more
> > than just me that have reported the problem and are looking for some form
> of
> > feedback...
> >
> > Remember, I did not start this thread, as others are experiencing this
> > problem. Why, even the guy that has done more for BOINC than any 10 of
> you
> > "want-a-bees" (Paul Buck) reports it! Does Rom consider him
> "Entertainment
> > value" too?
> >
>
> Remember - you shouldn't attack the messenger!
>
>



That is "us" wannabees, as it includes you also, AZ, hi, runs back to his hidy hole to contemplate his naval orange.......
Semper Eadem
So long Paul, it has been a hell of a ride.

Park your ego's, fire up the computers, Science YES, Credits No.
ID: 96758 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 96766 - Posted: 10 Apr 2005, 2:51:18 UTC - in response to Message 96758.  

Pascal.. Time to consult you doc on the med dosage again it seems....
>
> That is "us" wannabees, as it includes you also, AZ, hi, runs back to his hidy
> hole to contemplate his naval orange.......
>
ID: 96766 · Report as offensive
N/A
Volunteer tester

Send message
Joined: 18 May 01
Posts: 3718
Credit: 93,649
RAC: 0
Message 96800 - Posted: 10 Apr 2005, 3:53:28 UTC - in response to Message 96766.  

His alter ego is Pillz-E...
ID: 96800 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : setiathome and mfold running at the same time ?!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.