Fast task switching


log in

Advanced search

Questions and Answers : GPU applications : Fast task switching

Author Message
Grenadier
Volunteer tester
Avatar
Send message
Joined: 15 May 99
Posts: 63
Credit: 5,297,930
RAC: 1,291
United States
Message 843374 - Posted: 22 Dec 2008, 2:06:17 UTC

Since installing the new BOINC and receiving CUDA work, I've noticed that the other 3 CPUs are rapidly cycling through tasks, as if the Manager cannot decide which tasks should be run on them. So my log looks like this:

12/21/2008 9:05:04 PM|malariacontrol.net|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557
12/21/2008 9:05:05 PM|Spinhenge@home|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312
12/21/2008 9:05:06 PM|QMC@HOME|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501
12/21/2008 9:05:08 PM|malariacontrol.net|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557
12/21/2008 9:05:09 PM|Spinhenge@home|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312
12/21/2008 9:05:10 PM|QMC@HOME|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501
12/21/2008 9:05:11 PM|malariacontrol.net|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557
12/21/2008 9:05:12 PM|Spinhenge@home|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312
12/21/2008 9:05:13 PM|QMC@HOME|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501
12/21/2008 9:05:14 PM|malariacontrol.net|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557
12/21/2008 9:05:16 PM|Spinhenge@home|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312
12/21/2008 9:05:17 PM|QMC@HOME|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501
12/21/2008 9:05:18 PM|malariacontrol.net|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557
12/21/2008 9:05:19 PM|Spinhenge@home|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312
12/21/2008 9:05:20 PM|QMC@HOME|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501

The 4th CPU seems to be feeding the GPU properly, and runs continuously. What can be done to stop all this wasteful task switching?
____________

Grenadier
Volunteer tester
Avatar
Send message
Joined: 15 May 99
Posts: 63
Credit: 5,297,930
RAC: 1,291
United States
Message 843702 - Posted: 22 Dec 2008, 16:57:11 UTC

Looks like this may be tied to the problem someone else mentioned about the units running at high priority. Basically, Manager decides that the CUDA WU's are going to miss deadline, even though they aren't, and goes into EDF mode. Then, it tries to switch tasks to allow the CUDA jobs to run. But since there isn't another GPU available, it gets stuck in an infinite loop of task switching between the non-CUDA jobs.

Suggested fix: fix the run times or expiry dates on the CUDA jobs, and/or make the Manager recognize this situation and avoid the loop in the first place. Basically, scheduler needs to separate queues now, one for CUDA jobs, and one for the others.
____________

Profile Byron S Goodgame
Volunteer tester
Avatar
Send message
Joined: 16 Jan 06
Posts: 1151
Credit: 3,936,993
RAC: 0
United States
Message 843723 - Posted: 22 Dec 2008, 17:43:52 UTC - in response to Message 843702.
Last modified: 22 Dec 2008, 17:51:35 UTC

It's possible to take it out of high priority too by reducing the DCF, and this might also stop the task switching, though for me when I run more than one project now, I have to suspend tasks for a while, while one project uses the cpu, and then resume it while the other is suspended. Otherwise, because my system is a single core with an edited cc_config file, the cpu tries to do two project tasks with the one cpu.
____________

Eric KorpelaProject donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 3 Apr 99
Posts: 1088
Credit: 8,777,767
RAC: 11,804
United States
Message 843728 - Posted: 22 Dec 2008, 17:55:46 UTC - in response to Message 843723.

I've reported this problem to the BOINC development team. Hopefully they will come up with a solution.


____________

Grenadier
Volunteer tester
Avatar
Send message
Joined: 15 May 99
Posts: 63
Credit: 5,297,930
RAC: 1,291
United States
Message 843743 - Posted: 22 Dec 2008, 18:32:29 UTC

Thanks, Eric. For now, I end up managing my own scheduling. I suppose I could turn off SETI, but that would defeat the purpose of the new CUDA functions. ;-)
____________

Eric KorpelaProject donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 3 Apr 99
Posts: 1088
Credit: 8,777,767
RAC: 11,804
United States
Message 843746 - Posted: 22 Dec 2008, 18:35:33 UTC - in response to Message 843743.

It may be that eventually your "duration correction factor" will change to accomodate the shorter run times. (But the BOINC fix shouldn't rely on that, if I have anything to say about it.)

____________

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,095,957
RAC: 24,095
United States
Message 843754 - Posted: 22 Dec 2008, 18:58:05 UTC

Perhaps there should be a DCF for each project one joins?

____________
Boinc....Boinc....Boinc....Boinc....

Eric KorpelaProject donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 3 Apr 99
Posts: 1088
Credit: 8,777,767
RAC: 11,804
United States
Message 843769 - Posted: 22 Dec 2008, 19:18:02 UTC - in response to Message 843754.

Perhaps there should be a DCF for each project one joins?


There is. It takes some time for it to adjust, though.

Eric
____________

Grenadier
Volunteer tester
Avatar
Send message
Joined: 15 May 99
Posts: 63
Credit: 5,297,930
RAC: 1,291
United States
Message 843792 - Posted: 22 Dec 2008, 20:20:13 UTC

Yes, I realize that the DCF might fix things eventually. But the Scheduler really needs to be fixed to recognize this situation in the first place. It should not be getting into an infinite loop of task switching just because of EDF mode.
____________

Grenadier
Volunteer tester
Avatar
Send message
Joined: 15 May 99
Posts: 63
Credit: 5,297,930
RAC: 1,291
United States
Message 844738 - Posted: 24 Dec 2008, 20:59:37 UTC

I turned off CUDA due to this and the other problems seen and posted about here (video driver crashes, BSOD, 'sparkly' screen). The odd task switching remains, even with non-CUDA projects as the only active ones. Something seems to be broken with the scheduler with this new Manager release, regardless of CUDA.
____________

Grenadier
Volunteer tester
Avatar
Send message
Joined: 15 May 99
Posts: 63
Credit: 5,297,930
RAC: 1,291
United States
Message 867131 - Posted: 19 Feb 2009, 19:34:04 UTC

Any news on this? I'm still seeing it even with the most recent dev releases. I am no longer even running CUDA, due to suspicions that it fried a video card on me. Could the CUDA app have left a bad setting in client_state.xml that could be causing this?
____________

Questions and Answers : GPU applications : Fast task switching

Copyright © 2014 University of California