Apps stall with core client 4.19

Questions and Answers : Unix/Linux : Apps stall with core client 4.19
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Trane Francks

Send message
Joined: 18 Jun 99
Posts: 221
Credit: 122,319
RAC: 0
Japan
Message 75791 - Posted: 1 Feb 2005, 15:07:03 UTC

I'm seeing apps stalling with BOINC 4.19 on a semi-regular basis. System load is at 1.0, but no progress is indicated at all. Killing the client and restarting puts things right again. I've seen the problem with S@H, P@H and E@H. Surprisingly, CPDN has been unaffected.
ID: 75791 · Report as offensive
Profile Trane Francks

Send message
Joined: 18 Jun 99
Posts: 221
Credit: 122,319
RAC: 0
Japan
Message 80238 - Posted: 17 Feb 2005, 12:56:15 UTC

I'd like to bring this back to the top if for no other reason than this is a very frustrating issue. This stalling occurs during startup of new work units. P@H and E@H see this regularly. S@H is only occasional. I haven't received an L@H WU since the core-client upgrade, but I assume that'll be an issue, too. Because of the way CPDN checkpoints (infrequently), I have BOINC set to keep stuff in memory when pausing.

AMD Athlon XP 2500+, 768 MB RAM. If you need more info, just ask.

ID: 80238 · Report as offensive
parkut
Volunteer tester

Send message
Joined: 9 Aug 99
Posts: 69
Credit: 9,779,243
RAC: 0
United States
Message 80802 - Posted: 19 Feb 2005, 4:31:14 UTC
Last modified: 19 Feb 2005, 4:32:15 UTC

I see this problem across nearly all of my Linux clients. Redhat 7.2, 9.0, FC1, FC2 and FC3, single proc, dual proc, HT-P4's, Celerons, P-2, P-3's, XP's and AMD64. Doesn't seem to make much difference. Yes, I have BOINC set to keep applications loaded in memory.

The work around for me was to make a script to check processor utilization

if uptime proc shows 0.00 on single processor machines or 1.00 on dualies
killall boinc, sleep 10 and restart boinc.

I have to watch, sometimes more than one project will run, but that is easy to spot again by looking at the utilization.

for example:
beta 11:00pm up 213 days, 7:12, 0 users, load average: 1.00, 1.00, 1.00  
tukus 23:00:01 up 3 days, 2:10, 0 users, load average: 1.00, 1.00, 1.00  
asrok 23:00:00 up 3 days, 1:44, 0 users, load average: 2.00, 2.00, 2.00  
p2266 23:00:00 up 19 days, 10:06, 0 users, load average: 2.00, 2.00, 2.00  


ID: 80802 · Report as offensive
Profile Trane Francks

Send message
Joined: 18 Jun 99
Posts: 221
Credit: 122,319
RAC: 0
Japan
Message 81009 - Posted: 19 Feb 2005, 23:11:38 UTC - in response to Message 80802.  

> The work around for me was to make a script to check processor utilization

In my case, system load remains at 1.00, but the WU doesn't progress at all. I run the jobs in cron and pipe output to log files. I tail -f the log file and watch progress with BOINCprog. cron starts BOINC every 10 min., which makes a great timestamp in the log. It's easy to see, then, when BOINC stalls because the usual hourly swaps no longer happen.

> if uptime proc shows 0.00 on single processor machines or 1.00 on dualies
> killall boinc, sleep 10 and restart boinc.

Oddly, I've never seen the 0-load situation on this box.

> I have to watch, sometimes more than one project will run, but that is easy to
> spot again by looking at the utilization.

Yeah. That one's important to kill, too, because it causes the exit with no file and the WU needs to be crunched again.

Cheers.
ID: 81009 · Report as offensive

Questions and Answers : Unix/Linux : Apps stall with core client 4.19


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.