Questions and Answers :
Unix/Linux :
Apps stall with core client 4.19
Message board moderation
Author | Message |
---|---|
Trane Francks Send message Joined: 18 Jun 99 Posts: 221 Credit: 122,319 RAC: 0 |
I'm seeing apps stalling with BOINC 4.19 on a semi-regular basis. System load is at 1.0, but no progress is indicated at all. Killing the client and restarting puts things right again. I've seen the problem with S@H, P@H and E@H. Surprisingly, CPDN has been unaffected. |
Trane Francks Send message Joined: 18 Jun 99 Posts: 221 Credit: 122,319 RAC: 0 |
I'd like to bring this back to the top if for no other reason than this is a very frustrating issue. This stalling occurs during startup of new work units. P@H and E@H see this regularly. S@H is only occasional. I haven't received an L@H WU since the core-client upgrade, but I assume that'll be an issue, too. Because of the way CPDN checkpoints (infrequently), I have BOINC set to keep stuff in memory when pausing. AMD Athlon XP 2500+, 768 MB RAM. If you need more info, just ask. |
parkut Send message Joined: 9 Aug 99 Posts: 69 Credit: 9,779,243 RAC: 0 |
I see this problem across nearly all of my Linux clients. Redhat 7.2, 9.0, FC1, FC2 and FC3, single proc, dual proc, HT-P4's, Celerons, P-2, P-3's, XP's and AMD64. Doesn't seem to make much difference. Yes, I have BOINC set to keep applications loaded in memory. The work around for me was to make a script to check processor utilization if uptime proc shows 0.00 on single processor machines or 1.00 on dualies killall boinc, sleep 10 and restart boinc. I have to watch, sometimes more than one project will run, but that is easy to spot again by looking at the utilization. for example: beta 11:00pm up 213 days, 7:12, 0 users, load average: 1.00, 1.00, 1.00 tukus 23:00:01 up 3 days, 2:10, 0 users, load average: 1.00, 1.00, 1.00 asrok 23:00:00 up 3 days, 1:44, 0 users, load average: 2.00, 2.00, 2.00 p2266 23:00:00 up 19 days, 10:06, 0 users, load average: 2.00, 2.00, 2.00 |
Trane Francks Send message Joined: 18 Jun 99 Posts: 221 Credit: 122,319 RAC: 0 |
> The work around for me was to make a script to check processor utilization In my case, system load remains at 1.00, but the WU doesn't progress at all. I run the jobs in cron and pipe output to log files. I tail -f the log file and watch progress with BOINCprog. cron starts BOINC every 10 min., which makes a great timestamp in the log. It's easy to see, then, when BOINC stalls because the usual hourly swaps no longer happen. > if uptime proc shows 0.00 on single processor machines or 1.00 on dualies > killall boinc, sleep 10 and restart boinc. Oddly, I've never seen the 0-load situation on this box. > I have to watch, sometimes more than one project will run, but that is easy to > spot again by looking at the utilization. Yeah. That one's important to kill, too, because it causes the exit with no file and the WU needs to be crunched again. Cheers. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.