Questions and Answers :
Unix/Linux :
All CPU tasks not running. Now all are: - "Waiting to run"
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
OK. I am really confused. From David's reply it seems that <project_max_concurrent> is designed to only allow the project to process the N number of tasks and then quit?No, what I observed on my test was that it ran until the last task had finished, and then fetched. It was idle for six or seven seconds, but of course that wouldn't work for SETI on Tuesdays... I've just come off the conference call: David was there, and that was one of the points I made in the hearing of the other contributors (it'll be on the recording too, when Keith (Uplinger) publishes it). The process will involve finalising this pull request (I found a small bug using the event log you uploaded last night - thanks. That'll need fixing), and then work fetch needs to be tackled as a separate issue. We got independent support from around the table to ensure that both scheduling and work fetch are properly concluded before the next client release, which included a strong request that work starts on Phase 2 as soon as possible. I've been given the job of writing up what needs to be covered in the work fetch phase, and I'll be including Jacob Klein's comments from the alpha mailing list this morning. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
I also wanted to point out I run BOTH MB and AP apps. So based on your previous comments that max_concurrent cannot be used on apps within the same project, the only way to limit the total number of running tasks for Seti is to use the project_max_concurrent.Don't just rely on my comments - confirm with the documentation. AstroPulse v7 and SETI@home v8 are separate applications in the terminology used in the documentation, so you can set separate max_concurrent levels for each - as well as a project_max_concurrent as an overall limit, if you wish. It's only at the lower app_version level that the limit isn't defined. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
No, what I observed on my test was that it ran until the last task had finished, and then fetched. It was idle for six or seven seconds, but of course that wouldn't work for SETI on Tuesdays... So what you are saying is that David has designed the client to not fetch work until the last of my 500 tasks have finished and then reported? And the finished reporting of work will not happen till the last task is finished leading to the common issue of the schedulers not accepting a connection when you attempt to report work and ask for replacement in the same transaction. I always set NNT on Tuesday's so I can at least report my finished work. Only after I have reported all finished work do I unset NNT and ask for work. Doing anything else just produces no connection messages or no work available. That is even with setting max_reported to a very low 50 tasks per setting. So what is the point of establishing a cache of 100 tasks per cpu and gpu. Or the store so many days of work setting in computing preferences. This client effectively removes the cache from the host until the current cache is completely finished and reported then attempts to refill the cache back to the server limits. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Richard, I have been discussing this with Jacob Klein over at GPUGrid. He stated he was the OP that asked for the max_concurrent parameters for the client in the first place. He made this comment and I have a little better understanding of what DA is doing.
He made the statement that:
I had no idea that some projects have tasks that take 300 days to complete. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I misspoke in my previous post. Jacob Klein was the person who asked for the <gpu_exclude> parameters. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
I had a brief conversation with Jacob yesterday on the boinc_alpha mailing list. I'm on GPUGrid too, so I'll go over there and see what you had to say to each other. I think that the general consensus is that David has called this one wrong, and should start on a 'proportional fetch' solution asap. A single 300 day task with a max_concurrent of 1 should get a simple 'cache full' without an artificial block on work fetch. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Just one final comment. Basically boils down to "the battle was won, but the war lost" I see the #PR2918 commit was merged yesterday and while the original cpu scheduling bug was fixed, I won't be using it because of the detrimental change to a hosts work cache with any max_concurrent statement. Hope that DA starts work on the work fetch module to fix this new flaw. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
I'll be making that exact point tonight. There's a case to be made, perhaps, for an emergency BOINC release because of the Ryzen GPU driver problems: and I will state that will have to be a v7.14 hotfix, because the current state of master (between two halves of the same patch) means it's not fit for release. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
We held another of the developer conference calls today. David Anderson was not present (this one was at 13:00 UTC, convenient for Europe but very poorly timed for California), but sent in a written report. He said he's ready to start work on a new client release, but knows that work fetch issues have to be addressed first. I was asked to liaise with David on the matters which need attention. I've opened a new issue to consolidate the state of play: #3065. Please add any relevant comments, either here or in the issue. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
Potential fix available at #3076. Windows binaries available as an Appveyor artifact: self-builders pull branch 'dpa_work_fetch_mc'. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Richard, I've been running the boinc-dpa_work_fetch_mc branch for a good portion of the day. Haven't seen anything out of the ordinary jump out at me. Seems to be obeying my max_concurrent settings, getting work from all projects, keeping all cpus busy per my settings and playing nice among the projects when it is their turn to get some time to crunch. I have uploaded scenario #174 to the emulator. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
I've loaded it too, and with two projects dry, it fetched correctly when I resumed work fetch. If it works for both of us throughout Sunday, I think we can sign this off. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Yes, I let projects go completely dry too a few times and work fetch resumed pulled back in my cache allowances based on my 0.5/0.01 day settings. I think this version is a winner too. The scenarios should help promote it I would think. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
I've noticed a few changes in work fetch while we've been testing these, but I think they are a proper result of the changes we set out to make. My testbed is looking a little under-fetched at the moment, but the figures say it's OK. Assuming everything is still OK for both of us when I'm ready to start heading for bed (say in about 3 hours), I'll give the formal go-ahead for the code checkers to look it over. We won't have any problems getting it into the field - David it itching to get the whole next version into testing. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Hi Richard, my sim finished and I couldn't see anything wrong or that jumped out at me. But I am no expert in interpreting the results from the sims. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.