Fast task switching

Author	Message
Grenadier Volunteer tester Send message Joined: 15 May 99 Posts: 63 Credit: 5,445,784 RAC: 0	Message 843374 - Posted: 22 Dec 2008, 2:06:17 UTC Since installing the new BOINC and receiving CUDA work, I've noticed that the other 3 CPUs are rapidly cycling through tasks, as if the Manager cannot decide which tasks should be run on them. So my log looks like this: 12/21/2008 9:05:04 PM\|malariacontrol.net\|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557 12/21/2008 9:05:05 PM\|Spinhenge@home\|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312 12/21/2008 9:05:06 PM\|QMC@HOME\|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501 12/21/2008 9:05:08 PM\|malariacontrol.net\|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557 12/21/2008 9:05:09 PM\|Spinhenge@home\|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312 12/21/2008 9:05:10 PM\|QMC@HOME\|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501 12/21/2008 9:05:11 PM\|malariacontrol.net\|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557 12/21/2008 9:05:12 PM\|Spinhenge@home\|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312 12/21/2008 9:05:13 PM\|QMC@HOME\|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501 12/21/2008 9:05:14 PM\|malariacontrol.net\|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557 12/21/2008 9:05:16 PM\|Spinhenge@home\|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312 12/21/2008 9:05:17 PM\|QMC@HOME\|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501 12/21/2008 9:05:18 PM\|malariacontrol.net\|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557 12/21/2008 9:05:19 PM\|Spinhenge@home\|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312 12/21/2008 9:05:20 PM\|QMC@HOME\|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501 The 4th CPU seems to be feeding the GPU properly, and runs continuously. What can be done to stop all this wasteful task switching? ID: 843374 ·

Grenadier Volunteer tester Send message Joined: 15 May 99 Posts: 63 Credit: 5,445,784 RAC: 0	Message 843702 - Posted: 22 Dec 2008, 16:57:11 UTC Looks like this may be tied to the problem someone else mentioned about the units running at high priority. Basically, Manager decides that the CUDA WU's are going to miss deadline, even though they aren't, and goes into EDF mode. Then, it tries to switch tasks to allow the CUDA jobs to run. But since there isn't another GPU available, it gets stuck in an infinite loop of task switching between the non-CUDA jobs. Suggested fix: fix the run times or expiry dates on the CUDA jobs, and/or make the Manager recognize this situation and avoid the loop in the first place. Basically, scheduler needs to separate queues now, one for CUDA jobs, and one for the others. ID: 843702 ·

Byron S Goodgame Volunteer tester Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0	Message 843723 - Posted: 22 Dec 2008, 17:43:52 UTC - in response to Message 843702. Last modified: 22 Dec 2008, 17:51:35 UTC It's possible to take it out of high priority too by reducing the DCF, and this might also stop the task switching, though for me when I run more than one project now, I have to suspend tasks for a while, while one project uses the cpu, and then resume it while the other is suspended. Otherwise, because my system is a single core with an edited cc_config file, the cpu tries to do two project tasks with the one cpu. ID: 843723 ·

Eric Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60	Message 843728 - Posted: 22 Dec 2008, 17:55:46 UTC - in response to Message 843723. I've reported this problem to the BOINC development team. Hopefully they will come up with a solution. @SETIEric@qoto.org (Mastodon) ID: 843728 ·

Grenadier Volunteer tester Send message Joined: 15 May 99 Posts: 63 Credit: 5,445,784 RAC: 0	Message 843743 - Posted: 22 Dec 2008, 18:32:29 UTC Thanks, Eric. For now, I end up managing my own scheduling. I suppose I could turn off SETI, but that would defeat the purpose of the new CUDA functions. ;-) ID: 843743 ·

Eric Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60	Message 843746 - Posted: 22 Dec 2008, 18:35:33 UTC - in response to Message 843743. It may be that eventually your "duration correction factor" will change to accomodate the shorter run times. (But the BOINC fix shouldn't rely on that, if I have anything to say about it.) @SETIEric@qoto.org (Mastodon) ID: 843746 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 843754 - Posted: 22 Dec 2008, 18:58:05 UTC Perhaps there should be a DCF for each project one joins? Boinc....Boinc....Boinc....Boinc.... ID: 843754 ·

Eric Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60	Message 843769 - Posted: 22 Dec 2008, 19:18:02 UTC - in response to Message 843754. Perhaps there should be a DCF for each project one joins? There is. It takes some time for it to adjust, though. Eric @SETIEric@qoto.org (Mastodon) ID: 843769 ·

Grenadier Volunteer tester Send message Joined: 15 May 99 Posts: 63 Credit: 5,445,784 RAC: 0	Message 843792 - Posted: 22 Dec 2008, 20:20:13 UTC Yes, I realize that the DCF might fix things eventually. But the Scheduler really needs to be fixed to recognize this situation in the first place. It should not be getting into an infinite loop of task switching just because of EDF mode. ID: 843792 ·

Grenadier Volunteer tester Send message Joined: 15 May 99 Posts: 63 Credit: 5,445,784 RAC: 0	Message 844738 - Posted: 24 Dec 2008, 20:59:37 UTC I turned off CUDA due to this and the other problems seen and posted about here (video driver crashes, BSOD, 'sparkly' screen). The odd task switching remains, even with non-CUDA projects as the only active ones. Something seems to be broken with the scheduler with this new Manager release, regardless of CUDA. ID: 844738 ·

Grenadier Volunteer tester Send message Joined: 15 May 99 Posts: 63 Credit: 5,445,784 RAC: 0	Message 867131 - Posted: 19 Feb 2009, 19:34:04 UTC Any news on this? I'm still seeing it even with the most recent dev releases. I am no longer even running CUDA, due to suspicions that it fried a video card on me. Could the CUDA app have left a bad setting in client_state.xml that could be causing this? ID: 867131 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.