CUDA apps continue to run on CPU in suspend mode


log in

Advanced search

Questions and Answers : GPU applications : CUDA apps continue to run on CPU in suspend mode

Author Message
Profile Scott S Leach
Send message
Joined: 5 Apr 12
Posts: 4
Credit: 87,163
RAC: 134
United States
Message 1225758 - Posted: 1 May 2012, 14:41:04 UTC

My CUDA apps will not stop running. They normally run on the GPU until I or a setting puts Boinc into suspend, then the CUDA app just runs on the CPU and ignores and violates my CPU % usage rules as well. This also happens if windows puts the monitor to sleep.

I noticed this the other night when I heard the CPU fan screaming all the way across the house. My CPU's are set to 20% max but when the monitor went to sleep the CUDA app started running on both CPU's at the same time between 60% and 100% constant. To alleviate this I just set window to not put the monitor to sleep.

Then today I noticed that while an exclusive an was running, the system was bogging down, I looked and again, although Bioinc was supposedly suspended, the CUDA app was running on both CPU's, but this time only at about 30%.

Is anyone else having these issues? Is there a fix for this?

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2894
Credit: 6,630,690
RAC: 8,071
Bulgaria
Message 1225898 - Posted: 1 May 2012, 23:01:47 UTC - in response to Message 1225758.

This also happens if windows puts the monitor to sleep.

Why "also", it have to happen only when "windows puts the monitor to sleep" due to bug in NVIDIA driver 296.10

http://setiathome.berkeley.edu/forum_thread.php?id=67844&nowrap=true#1223973


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Profile Scott S Leach
Send message
Joined: 5 Apr 12
Posts: 4
Credit: 87,163
RAC: 134
United States
Message 1225906 - Posted: 1 May 2012, 23:26:12 UTC - in response to Message 1225898.

Because that is not the only time it happens.

The bigger concern is when it happens under suspension and then proceeds to ignore other settings and cook my CPU.

I have worked around the sleep bug, but have found no work around for what I will call the persistence bug. Except to quit running cuda for the time being and see if another project will play nicer with the GPU.

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2894
Credit: 6,630,690
RAC: 8,071
Bulgaria
Message 1225942 - Posted: 2 May 2012, 0:34:03 UTC - in response to Message 1225906.


What means "happens under suspension"?
(and where do you look to see what happens)

When you say "cook my CPU" - is this a laptop? What is the CPU temp?

OK, try "another project" (that runs on CUDA/GPU), I will be surprised if "this" depends on project.

The CUDA app (of any project) will either:
- error out/crash the task (when CUDA disappears)
- stop computing until CUDA is available again
- "fall back on host CPU processing" as SETI app do


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Profile Scott S Leach
Send message
Joined: 5 Apr 12
Posts: 4
Credit: 87,163
RAC: 134
United States
Message 1226211 - Posted: 2 May 2012, 15:29:44 UTC - in response to Message 1225942.

Thank you for your response, but you have overlooked the real problem.

1) That is, that the CUDA applications fail to stop running (entirely) when Boinc is in "suspend" mode (whether I put Boinc into suspend or a rule puts it into suspend, it does not matter). Instead of stopping completely, CUDA will instead quit running on the GPU and begin running on the CPU.

2) When CUDA applications are running on the CPU during times that they are not supposed to be running at all, they ignore the CPU settings of "max 20% CPU", and they use up to 100% on all cores.

So far there has been no damage because I have good cooling and I was near by and I heard the fans running at full speed and I intervened (I do not know what the temp was). Other may not have the same luck.

I then tried several troubleshooting steps and the condition continues to repeat itself.

I do suspect that it is an issue with CUDA which is why I posted it here, but you could be right and it could be an issue with GPU tasks in general which is why I am trying now is to see if another project does the same thing. I aborted all SETI CUDA tasks, and am now waiting for a GPU task to be assigned by Einstein. So far there are no new GPU tasks are in cue, so I am on hold.

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2894
Credit: 6,630,690
RAC: 8,071
Bulgaria
Message 1226271 - Posted: 2 May 2012, 18:10:29 UTC - in response to Message 1226211.
Last modified: 2 May 2012, 18:39:28 UTC

when Boinc is in "suspend" mode (whether I put Boinc into suspend or a rule puts it into suspend, it does not matter).

So you either:
- from icon menu choose Snooze or Snooze GPU
http://boinc.berkeley.edu/wiki/The_BOINC_Manager#The_BOINC_Manager_icon_and_menu

- from Activity Menu choose Suspend or Suspend GPU
http://boinc.berkeley.edu/wiki/Advanced_view#BOINC_Manager_Menus

- use cc_config.xml with <exclusive_gpu_app>important.exe</exclusive_gpu_app>
http://boinc.berkeley.edu/wiki/Client_configuration
http://www.boinc-wiki.info/Cc_config.xml
http://boincfaq.mundayweb.com/index.php?language=1&view=91

- have preference to "Suspend GPU work while computer is in use" or other similar


Instead of stopping completely, CUDA will instead quit running on the GPU and begin running on the CPU.

Strange behavior.
So you see in Windows Task Manager or Process Explorer that the CUDA app .exe starts using 100% of a core (~50% CPU in your case)?

If this happens again - look in stderr.txt (in slots directory) to see are there messages similar to these:
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
SETI@home NOT using CUDA, falling back on host CPU processing


2) When CUDA applications are running on the CPU during times that they are not supposed to be running at all, they ignore the CPU settings of "max 20% CPU", and they use up to 100% on all cores.

Yes, this have to be true as BOINC applies the setting of "max 20% CPU" only to CPU tasks/apps.

We may consider this as "lack of feature" or bug in BOINC but I don't know is it possible for BOINC to detect this.
BOINC see that CUDA app is running and don't know it have trouble to use CUDA and is "falling back on host CPU processing".

There are in fact projects with not very efficient GPU apps that use "normally" a big deal of CPU along with the computing on GPU (so "GPU app" have high CPU usage at all "normal" times)


So far there has been no damage because I have good cooling and I was near by and I heard the fans running at full speed and I intervened (I do not know what the temp was).

To "know what the temp is" use one or several of these Temperature Monitoring Programs:
http://setiathome.berkeley.edu/forum_thread.php?id=59292

Did you clean the dust/fur in the computer/fan/heatsink lately?


Other may not have the same luck.

By "Other" you mean "Other people"?
Most people use 100% CPU all the time (for years) especially if they "have good cooling".

If this is not laptop and the CPU can't run at 100% for a long time you in fact may have problem with cooling (dust, dried thermal compound, fan lubrication, ...)
if the Temperature of the CPU go near to TJMax (e.g. 5-10°C to TJMax).


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1226444 - Posted: 2 May 2012, 23:54:17 UTC

His first 3 CUDA units all fell back to CPU. stderr reported "emulation device, unusable". I've seen that one before, and IIRC, it's the sleep bug in 296.10. Interestingly, the last two CUDA's he reported both completed normally, though the 9400GT took awhile, and restarted several times (his suspension settings, no doubt). Then, he aborted the rest, so there's no way to know how they might have run.
____________

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2894
Credit: 6,630,690
RAC: 8,071
Bulgaria
Message 1226460 - Posted: 3 May 2012, 0:26:08 UTC - in response to Message 1226444.


One of the tasks that started with "device 1 is emulation device and should not be used" but then continued (Restarted at 21.09 percent) normally (i.e. "using CUDA accelerated device GeForce 9400 GT"):
http://setiathome.berkeley.edu/result.php?resultid=2418530649

Advanced search for emulation device in the last 6 months shows only 3 results (not counting the one from this thread) 2 of which are yours:
http://setiathome.berkeley.edu/forum_thread.php?id=67194&nowrap=true#1204021
http://setiathome.berkeley.edu/forum_thread.php?id=67225&nowrap=true#1207308
http://setiathome.berkeley.edu/forum_thread.php?id=65352&nowrap=true#1217254

I don't know why people go for the "latest" driver if the previous worked OK
I also don't think that NVIDIA will add new features in the new drivers for older hardware (as GeForce 9400 GT) which is no longer in production.

266.58 was the driver recommended for a long time by many users here.
Which is the new "best" version? (meaning - no bugs noticed by anyone here)


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1226462 - Posted: 3 May 2012, 0:29:41 UTC - in response to Message 1226460.



I don't know why people go for the "latest" driver if the previous worked OK
I also don't think that NVIDIA will add new features in the new drivers for older hardware (as GeForce 9400 GT) which is no longer in production.

266.58 was the driver recommended for a long time by many users here.
Which is the new "best" version? (meaning - no bugs noticed by anyone here)



From what I hear, 301.24 is reasonably bug-free, at least for Fermi-class cards, plus, it's the default for Kepler cards.
____________

Profile Scott S Leach
Send message
Joined: 5 Apr 12
Posts: 4
Credit: 87,163
RAC: 134
United States
Message 1227004 - Posted: 4 May 2012, 4:07:04 UTC - in response to Message 1226462.

I kept the 285.62 driver in-case I had issues with 296.10 which it seems now like I am having if I am reading you guys correctly.

Is there any comment on the 285.62 driver?

Questions and Answers : GPU applications : CUDA apps continue to run on CPU in suspend mode

Copyright © 2014 University of California