Message boards :
Number crunching :
SETI@home v8.12 Windows GPU applications support thread
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 17 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Your GPU is on low-performance path by default to reduce possible GUI lags. Low-performance GPU detected, default period_iterations_num set to 500 For low-performance GPU path use_sleep enabled with 5ms per iteration If you see no lags but want to speedup things disable sleep and increase performance by providing this tune string to app: -no_defaults_scaling -period_iterations_num 200 -sbs 256 Look into app ReadMe file in project directory for options meaning and look recommended tune line in ReadMe to add to this one. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Harri Liljeroos Send message Joined: 29 May 99 Posts: 3987 Credit: 85,281,665 RAC: 126 |
I am using the 3472 build of SoG that should help with application hangup when using -use_sleep parameter. I downloaded it from Raistmer's site. Unfortunately the hangups still happen on my host that has a GTX970 and a GTX650 Ti. Here is a link to the latest WU with problem: http://setiathome.berkeley.edu/result.php?resultid=4999412070 The WU was started with 650 Ti and finished with the 970 after I restarted Boinc. The Nvidia driver is 361.91 and commandline was -sbs 256 -period_iterations_num 30 -use_sleep_ex 4 -hp. I am running one WU at a time on both GPUs. This is what I have noticed about these hangups: - They happen when driver was not responding and was restarted (but not at every restart) - Happens only on my 650 Ti when doing Arecibo WU (but not on every WU) - The SoG application does not terminate if you stop Boinc, you have to kill the process manually to recover the WU, otherwise it's locked up and cannot be accessed when you restart Boinc - If the hangup goes on long enough the WU errors on TIME_LIMIT_EXCEEDED. I don't know is the SoG application terminated in this situation. I have now reduced -period_iteration_num to 35 to see when the driver restarts go away. This propably didn't provide any new info but anyway that's what I am seeing. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
It's the reason of hangup. Depending on runtime, app never receives error code for broken OpenCL context. In this case it will never return.
Param should be increased not reduced. Default is 50. Task you list has 30. It can be the reason of driver restart. Try to set it let say to 100. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
BOINC restart was done after driver restart or in usual conditions? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Harri Liljeroos Send message Joined: 29 May 99 Posts: 3987 Credit: 85,281,665 RAC: 126 |
Boinc restart was done an hour and half after driver restart when I came to the computer this morning and noticing the situation. Shutting down the Boinc resulted a message (from BoincTasks) that Boinc could not be stopped. But Boinc had actually stopped when I looked at TaskManager but one SoG application was still running. So I killed the SoG process and restarted Boinc. Then the stuck WU was resumed till end with the other GPU (970). Edit: When the WU was stuck the elapsed time was running, it was showing about 1,5 hours when Boinc was stopped. After restart the elapsed time was showing about 10 minutes, so the time that was ticking while the application was stuck got lost. I have witnessed the same before but then I didn't notice the orphaned SoG process when I restarted Boinc. This left the stuck WU locked and Boinc trying to run it again. After about 30 seconds Boinc switched to a new WU and message line said something like "Acquiring lock" & "task postponed for 600 seconds". Finally I aborted that WU manually. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
So try to configure app to avoid driver restarts. Driver restart is abnormal situation that doesn't properly handled by NV runtime. http://setiathome.berkeley.edu/forum_thread.php?id=79760&postid=1795582 https://msdn.microsoft.com/en-us/library/windows/hardware/ff569918%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396 SETI apps news We're not gonna fight them. We're gonna transcend them. |
Harri Liljeroos Send message Joined: 29 May 99 Posts: 3987 Credit: 85,281,665 RAC: 126 |
That's the plan. I will reduce the -period_iterations_num value until the driver restarts stop. For the TDR registry, should I specify only the TdrDelay value or should I create all of those mentioned on the Microsoft page you linked? Currently I cannot find any of those in my registry. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Harri wrote: I will reduce the -period_iterations_num value until the driver restarts stop. Double-check that. Raistmer wrote: Param should be increased not reduced. Default is 50. Task you list has 30. It can be the reason of driver restart. Try to set it let say to 100. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
For the TDR registry, should I specify only the TdrDelay value or should I create all of those mentioned on the Microsoft page you linked? Currently I cannot find any of those in my registry. You can experiment with them. From memory disabling TDR completely via first key didn't work as it should. Better attempt to tune app first. It worked on GT720 so should be doable. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Harri Liljeroos Send message Joined: 29 May 99 Posts: 3987 Credit: 85,281,665 RAC: 126 |
Harri wrote:I will reduce the -period_iterations_num value until the driver restarts stop. Ah a logical typo, of course I ment increase. So far with value 35 I've had no restarts. I'm off to celebrate mid-summer so next comments from me on sunday or monday. I'll leave the computer crunching. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Hi, . . I seem to be getting some recurring errors, the rate is still low but it seems to be increasing. Over the last couple of days I have had 5 of them which is about 1 in 50 or higher. . . I am wondering if I should reduce the sleep states and increase the period iterations. . . Currently using ... -use_sleep_ex 6 -sbs 384 -period_iterations_num 3 . . Considering ... -use_sleep_ex 1 -sbs 256 -period_iterations_num 10 . . Errors are: http://setiathome.berkeley.edu/result.php?resultid=4955880255 http://setiathome.berkeley.edu/result.php?resultid=5002653139 http://setiathome.berkeley.edu/result.php?resultid=5006094844 http://setiathome.berkeley.edu/result.php?resultid=5006094840 http://setiathome.berkeley.edu/result.php?resultid=5006094889 . . I am about to shut down and restart the machine |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
This one implies reboot indeed. http://setiathome.berkeley.edu/result.php?resultid=4955880255 Others are due data currently processing. This sanity check will be removed in next version. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
This one implies reboot indeed. . . I don't think I will ever get on top of these apps. . . I have added a pair of GTX 970 cards to my third rig. And it is running like a dead cat. . . I am getting much better numbers out of this GTX950 than they are producing. . . Currently running SoG with -use_sleep_ex 1 -sbs 384 -period_iterations_num 5 and the runtimes are terrible, on the 950 running 3 at a time Guppies take 54 to 57 mins. On the 970s running three at a time they are taking 150 mins. . . Considering that the 970s are more than twice as powerful as the 950 I would have expected the runtimes to be cut in half, not take three times as long :(. . . Can you offer any suggestions ? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Currently available tunings listed in ReadMe file. Due to massive changes in PulseFind area new builds behavior will differ so new tunings could be needed in sleeping and pulsefind area (in particular, -use_sleep expected to be more powerful and to give less impact on performance while saving CPU cycles for NV app flavour). SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
There is new set of RC builds available here: https://cloud.mail.ru/public/8RM1/LMYTwvGYp If you experience any issues with v8.12 please try new RC build instead. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Raistmer, I'm starting to see these errors on some of the SoG work units ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance Are these the same errors we saw on Beta? http://setiathome.berkeley.edu/result.php?resultid=5010738737 http://setiathome.berkeley.edu/result.php?resultid=5010028516 Zalster |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . OK I will have a read. But strange thing is that since that post I have tweaked the commandline and changed -sbs 384 to -sbs 512. I doubt that can take all the credit but runtimes have changed significantly. Now nonVLARs run in about 20 mins (3 at a time) and Guppies are running in about 37 mins (also 3 at a time). Not as efficient as I would like but much much closer. . . Thanks |
Harri Liljeroos Send message Joined: 29 May 99 Posts: 3987 Credit: 85,281,665 RAC: 126 |
There is new set of RC builds available here: https://cloud.mail.ru/public/8RM1/LMYTwvGYp Here's a WU done with the new app. http://setiathome.berkeley.edu/result.php?resultid=5010274940 The output on this task looks a lot different than before, can't even see which GPU was used. I see quite a lot of these. On the other hand here's another task that the output looks normal: http://setiathome.berkeley.edu/result.php?resultid=5010142578 |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
There is new set of RC builds available here: https://cloud.mail.ru/public/8RM1/LMYTwvGYp Thanks for report. Looks like leftover from increased verbosity build. I'll do rebuild. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Raistmer, As Richard said this should go into this thread: http://setiathome.berkeley.edu/forum_thread.php?id=79760 Yes, it's false positive from autocorr sanity check we saw on beta. New builds (will be r3480 and up) have this particular sanity check disabled. SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.