Questions and Answers :
GPU applications :
Fast task switching
Message board moderation
Author | Message |
---|---|
Grenadier Send message Joined: 15 May 99 Posts: 63 Credit: 5,445,784 RAC: 0 |
Since installing the new BOINC and receiving CUDA work, I've noticed that the other 3 CPUs are rapidly cycling through tasks, as if the Manager cannot decide which tasks should be run on them. So my log looks like this: 12/21/2008 9:05:04 PM|malariacontrol.net|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557 12/21/2008 9:05:05 PM|Spinhenge@home|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312 12/21/2008 9:05:06 PM|QMC@HOME|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501 12/21/2008 9:05:08 PM|malariacontrol.net|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557 12/21/2008 9:05:09 PM|Spinhenge@home|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312 12/21/2008 9:05:10 PM|QMC@HOME|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501 12/21/2008 9:05:11 PM|malariacontrol.net|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557 12/21/2008 9:05:12 PM|Spinhenge@home|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312 12/21/2008 9:05:13 PM|QMC@HOME|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501 12/21/2008 9:05:14 PM|malariacontrol.net|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557 12/21/2008 9:05:16 PM|Spinhenge@home|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312 12/21/2008 9:05:17 PM|QMC@HOME|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501 12/21/2008 9:05:18 PM|malariacontrol.net|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557 12/21/2008 9:05:19 PM|Spinhenge@home|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312 12/21/2008 9:05:20 PM|QMC@HOME|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501 The 4th CPU seems to be feeding the GPU properly, and runs continuously. What can be done to stop all this wasteful task switching? |
Grenadier Send message Joined: 15 May 99 Posts: 63 Credit: 5,445,784 RAC: 0 |
Looks like this may be tied to the problem someone else mentioned about the units running at high priority. Basically, Manager decides that the CUDA WU's are going to miss deadline, even though they aren't, and goes into EDF mode. Then, it tries to switch tasks to allow the CUDA jobs to run. But since there isn't another GPU available, it gets stuck in an infinite loop of task switching between the non-CUDA jobs. Suggested fix: fix the run times or expiry dates on the CUDA jobs, and/or make the Manager recognize this situation and avoid the loop in the first place. Basically, scheduler needs to separate queues now, one for CUDA jobs, and one for the others. |
Byron S Goodgame Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 |
It's possible to take it out of high priority too by reducing the DCF, and this might also stop the task switching, though for me when I run more than one project now, I have to suspend tasks for a while, while one project uses the cpu, and then resume it while the other is suspended. Otherwise, because my system is a single core with an edited cc_config file, the cpu tries to do two project tasks with the one cpu. |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
I've reported this problem to the BOINC development team. Hopefully they will come up with a solution. @SETIEric@qoto.org (Mastodon) |
Grenadier Send message Joined: 15 May 99 Posts: 63 Credit: 5,445,784 RAC: 0 |
Thanks, Eric. For now, I end up managing my own scheduling. I suppose I could turn off SETI, but that would defeat the purpose of the new CUDA functions. ;-) |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
It may be that eventually your "duration correction factor" will change to accomodate the shorter run times. (But the BOINC fix shouldn't rely on that, if I have anything to say about it.) @SETIEric@qoto.org (Mastodon) |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Perhaps there should be a DCF for each project one joins? Boinc....Boinc....Boinc....Boinc.... |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
Perhaps there should be a DCF for each project one joins? There is. It takes some time for it to adjust, though. Eric @SETIEric@qoto.org (Mastodon) |
Grenadier Send message Joined: 15 May 99 Posts: 63 Credit: 5,445,784 RAC: 0 |
Yes, I realize that the DCF might fix things eventually. But the Scheduler really needs to be fixed to recognize this situation in the first place. It should not be getting into an infinite loop of task switching just because of EDF mode. |
Grenadier Send message Joined: 15 May 99 Posts: 63 Credit: 5,445,784 RAC: 0 |
I turned off CUDA due to this and the other problems seen and posted about here (video driver crashes, BSOD, 'sparkly' screen). The odd task switching remains, even with non-CUDA projects as the only active ones. Something seems to be broken with the scheduler with this new Manager release, regardless of CUDA. |
Grenadier Send message Joined: 15 May 99 Posts: 63 Credit: 5,445,784 RAC: 0 |
Any news on this? I'm still seeing it even with the most recent dev releases. I am no longer even running CUDA, due to suspicions that it fried a video card on me. Could the CUDA app have left a bad setting in client_state.xml that could be causing this? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.