Fast task switching

Questions and Answers : GPU applications : Fast task switching
Message board moderation

To post messages, you must log in.

AuthorMessage
Grenadier
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 63
Credit: 5,445,784
RAC: 0
United States
Message 843374 - Posted: 22 Dec 2008, 2:06:17 UTC

Since installing the new BOINC and receiving CUDA work, I've noticed that the other 3 CPUs are rapidly cycling through tasks, as if the Manager cannot decide which tasks should be run on them. So my log looks like this:

12/21/2008 9:05:04 PM|malariacontrol.net|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557
12/21/2008 9:05:05 PM|Spinhenge@home|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312
12/21/2008 9:05:06 PM|QMC@HOME|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501
12/21/2008 9:05:08 PM|malariacontrol.net|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557
12/21/2008 9:05:09 PM|Spinhenge@home|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312
12/21/2008 9:05:10 PM|QMC@HOME|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501
12/21/2008 9:05:11 PM|malariacontrol.net|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557
12/21/2008 9:05:12 PM|Spinhenge@home|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312
12/21/2008 9:05:13 PM|QMC@HOME|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501
12/21/2008 9:05:14 PM|malariacontrol.net|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557
12/21/2008 9:05:16 PM|Spinhenge@home|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312
12/21/2008 9:05:17 PM|QMC@HOME|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501
12/21/2008 9:05:18 PM|malariacontrol.net|Restarting task wu_409_47454749_1_2_1229710930_1 using malariacontrol version 557
12/21/2008 9:05:19 PM|Spinhenge@home|Restarting task 9_Fe30_map_237_753_1 using metropolis version 312
12/21/2008 9:05:20 PM|QMC@HOME|Restarting task two_prism_w6-ecp2-QZ-b3lyp.7924_0 using Amolqc-preRC1 version 501

The 4th CPU seems to be feeding the GPU properly, and runs continuously. What can be done to stop all this wasteful task switching?
ID: 843374 · Report as offensive
Grenadier
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 63
Credit: 5,445,784
RAC: 0
United States
Message 843702 - Posted: 22 Dec 2008, 16:57:11 UTC

Looks like this may be tied to the problem someone else mentioned about the units running at high priority. Basically, Manager decides that the CUDA WU's are going to miss deadline, even though they aren't, and goes into EDF mode. Then, it tries to switch tasks to allow the CUDA jobs to run. But since there isn't another GPU available, it gets stuck in an infinite loop of task switching between the non-CUDA jobs.

Suggested fix: fix the run times or expiry dates on the CUDA jobs, and/or make the Manager recognize this situation and avoid the loop in the first place. Basically, scheduler needs to separate queues now, one for CUDA jobs, and one for the others.
ID: 843702 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 843723 - Posted: 22 Dec 2008, 17:43:52 UTC - in response to Message 843702.  
Last modified: 22 Dec 2008, 17:51:35 UTC

It's possible to take it out of high priority too by reducing the DCF, and this might also stop the task switching, though for me when I run more than one project now, I have to suspend tasks for a while, while one project uses the cpu, and then resume it while the other is suspended. Otherwise, because my system is a single core with an edited cc_config file, the cpu tries to do two project tasks with the one cpu.
ID: 843723 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 843728 - Posted: 22 Dec 2008, 17:55:46 UTC - in response to Message 843723.  

I've reported this problem to the BOINC development team. Hopefully they will come up with a solution.


@SETIEric@qoto.org (Mastodon)

ID: 843728 · Report as offensive
Grenadier
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 63
Credit: 5,445,784
RAC: 0
United States
Message 843743 - Posted: 22 Dec 2008, 18:32:29 UTC

Thanks, Eric. For now, I end up managing my own scheduling. I suppose I could turn off SETI, but that would defeat the purpose of the new CUDA functions. ;-)
ID: 843743 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 843746 - Posted: 22 Dec 2008, 18:35:33 UTC - in response to Message 843743.  

It may be that eventually your "duration correction factor" will change to accomodate the shorter run times. (But the BOINC fix shouldn't rely on that, if I have anything to say about it.)

@SETIEric@qoto.org (Mastodon)

ID: 843746 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 843754 - Posted: 22 Dec 2008, 18:58:05 UTC

Perhaps there should be a DCF for each project one joins?

Boinc....Boinc....Boinc....Boinc....
ID: 843754 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 843769 - Posted: 22 Dec 2008, 19:18:02 UTC - in response to Message 843754.  

Perhaps there should be a DCF for each project one joins?


There is. It takes some time for it to adjust, though.

Eric
@SETIEric@qoto.org (Mastodon)

ID: 843769 · Report as offensive
Grenadier
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 63
Credit: 5,445,784
RAC: 0
United States
Message 843792 - Posted: 22 Dec 2008, 20:20:13 UTC

Yes, I realize that the DCF might fix things eventually. But the Scheduler really needs to be fixed to recognize this situation in the first place. It should not be getting into an infinite loop of task switching just because of EDF mode.
ID: 843792 · Report as offensive
Grenadier
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 63
Credit: 5,445,784
RAC: 0
United States
Message 844738 - Posted: 24 Dec 2008, 20:59:37 UTC

I turned off CUDA due to this and the other problems seen and posted about here (video driver crashes, BSOD, 'sparkly' screen). The odd task switching remains, even with non-CUDA projects as the only active ones. Something seems to be broken with the scheduler with this new Manager release, regardless of CUDA.
ID: 844738 · Report as offensive
Grenadier
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 63
Credit: 5,445,784
RAC: 0
United States
Message 867131 - Posted: 19 Feb 2009, 19:34:04 UTC

Any news on this? I'm still seeing it even with the most recent dev releases. I am no longer even running CUDA, due to suspicions that it fried a video card on me. Could the CUDA app have left a bad setting in client_state.xml that could be causing this?
ID: 867131 · Report as offensive

Questions and Answers : GPU applications : Fast task switching


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.