SETI@home v8.12 Windows GPU applications support thread

Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 17 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1796691 - Posted: 16 Jun 2016, 21:06:45 UTC - in response to Message 1796685.  

Your GPU is on low-performance path by default to reduce possible GUI lags.
Low-performance GPU detected, default period_iterations_num set to 500
For low-performance GPU path use_sleep enabled with 5ms per iteration


If you see no lags but want to speedup things disable sleep and increase performance by providing this tune string to app:

-no_defaults_scaling -period_iterations_num 200 -sbs 256

Look into app ReadMe file in project directory for options meaning and look recommended tune line in ReadMe to add to this one.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1796691 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 3987
Credit: 85,281,665
RAC: 126
Finland
Message 1798142 - Posted: 23 Jun 2016, 10:05:17 UTC

I am using the 3472 build of SoG that should help with application hangup when using -use_sleep parameter. I downloaded it from Raistmer's site.

Unfortunately the hangups still happen on my host that has a GTX970 and a GTX650 Ti. Here is a link to the latest WU with problem: http://setiathome.berkeley.edu/result.php?resultid=4999412070 The WU was started with 650 Ti and finished with the 970 after I restarted Boinc. The Nvidia driver is 361.91 and commandline was -sbs 256 -period_iterations_num 30 -use_sleep_ex 4 -hp. I am running one WU at a time on both GPUs.

This is what I have noticed about these hangups:
- They happen when driver was not responding and was restarted (but not at every restart)
- Happens only on my 650 Ti when doing Arecibo WU (but not on every WU)
- The SoG application does not terminate if you stop Boinc, you have to kill the process manually to recover the WU, otherwise it's locked up and cannot be accessed when you restart Boinc
- If the hangup goes on long enough the WU errors on TIME_LIMIT_EXCEEDED. I don't know is the SoG application terminated in this situation.

I have now reduced -period_iteration_num to 35 to see when the driver restarts go away.

This propably didn't provide any new info but anyway that's what I am seeing.
ID: 1798142 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1798144 - Posted: 23 Jun 2016, 10:15:17 UTC - in response to Message 1798142.  
Last modified: 23 Jun 2016, 10:15:26 UTC



This is what I have noticed about these hangups:
- They happen when driver was not responding and was restarted (but not at every restart)
don't know is the SoG application terminated in this situation.

It's the reason of hangup. Depending on runtime, app never receives error code for broken OpenCL context. In this case it will never return.



I have now reduced -period_iteration_num to 35 to see when the driver restarts go away.

Param should be increased not reduced. Default is 50. Task you list has 30. It can be the reason of driver restart. Try to set it let say to 100.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1798144 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1798145 - Posted: 23 Jun 2016, 10:17:31 UTC - in response to Message 1798142.  


- The SoG application does not terminate if you stop Boinc, you have to kill the process manually to recover the WU, otherwise it's locked up and cannot be accessed when you restart Boinc

BOINC restart was done after driver restart or in usual conditions?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1798145 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 3987
Credit: 85,281,665
RAC: 126
Finland
Message 1798149 - Posted: 23 Jun 2016, 12:17:45 UTC - in response to Message 1798145.  
Last modified: 23 Jun 2016, 12:22:15 UTC


- The SoG application does not terminate if you stop Boinc, you have to kill the process manually to recover the WU, otherwise it's locked up and cannot be accessed when you restart Boinc

BOINC restart was done after driver restart or in usual conditions?


Boinc restart was done an hour and half after driver restart when I came to the computer this morning and noticing the situation. Shutting down the Boinc resulted a message (from BoincTasks) that Boinc could not be stopped. But Boinc had actually stopped when I looked at TaskManager but one SoG application was still running. So I killed the SoG process and restarted Boinc. Then the stuck WU was resumed till end with the other GPU (970).

Edit: When the WU was stuck the elapsed time was running, it was showing about 1,5 hours when Boinc was stopped. After restart the elapsed time was showing about 10 minutes, so the time that was ticking while the application was stuck got lost.

I have witnessed the same before but then I didn't notice the orphaned SoG process when I restarted Boinc. This left the stuck WU locked and Boinc trying to run it again. After about 30 seconds Boinc switched to a new WU and message line said something like "Acquiring lock" & "task postponed for 600 seconds". Finally I aborted that WU manually.
ID: 1798149 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1798173 - Posted: 23 Jun 2016, 14:57:48 UTC - in response to Message 1798149.  
Last modified: 23 Jun 2016, 15:03:47 UTC

So try to configure app to avoid driver restarts. Driver restart is abnormal situation that doesn't properly handled by NV runtime.

http://setiathome.berkeley.edu/forum_thread.php?id=79760&postid=1795582
https://msdn.microsoft.com/en-us/library/windows/hardware/ff569918%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1798173 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 3987
Credit: 85,281,665
RAC: 126
Finland
Message 1798189 - Posted: 23 Jun 2016, 17:43:18 UTC - in response to Message 1798173.  

That's the plan. I will reduce the -period_iterations_num value until the driver restarts stop.

For the TDR registry, should I specify only the TdrDelay value or should I create all of those mentioned on the Microsoft page you linked? Currently I cannot find any of those in my registry.
ID: 1798189 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1798201 - Posted: 23 Jun 2016, 18:40:14 UTC - in response to Message 1798144.  

Harri wrote:
I will reduce the -period_iterations_num value until the driver restarts stop.

Double-check that.

Raistmer wrote:
Param should be increased not reduced. Default is 50. Task you list has 30. It can be the reason of driver restart. Try to set it let say to 100.
ID: 1798201 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1798211 - Posted: 23 Jun 2016, 20:15:10 UTC - in response to Message 1798189.  

For the TDR registry, should I specify only the TdrDelay value or should I create all of those mentioned on the Microsoft page you linked? Currently I cannot find any of those in my registry.

You can experiment with them. From memory disabling TDR completely via first key didn't work as it should.
Better attempt to tune app first. It worked on GT720 so should be doable.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1798211 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 3987
Credit: 85,281,665
RAC: 126
Finland
Message 1798223 - Posted: 23 Jun 2016, 21:29:05 UTC - in response to Message 1798201.  

Harri wrote:
I will reduce the -period_iterations_num value until the driver restarts stop.

Double-check that.

Raistmer wrote:
Param should be increased not reduced. Default is 50. Task you list has 30. It can be the reason of driver restart. Try to set it let say to 100.


Ah a logical typo, of course I ment increase. So far with value 35 I've had no restarts.

I'm off to celebrate mid-summer so next comments from me on sunday or monday. I'll leave the computer crunching.
ID: 1798223 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1798808 - Posted: 26 Jun 2016, 11:10:05 UTC

. . Hi,

. . I seem to be getting some recurring errors, the rate is still low but it seems to be increasing. Over the last couple of days I have had 5 of them which is about 1 in 50 or higher.

. . I am wondering if I should reduce the sleep states and increase the period iterations.

. . Currently using ...

-use_sleep_ex 6 -sbs 384 -period_iterations_num 3

. . Considering ...

-use_sleep_ex 1 -sbs 256 -period_iterations_num 10

. . Errors are:

http://setiathome.berkeley.edu/result.php?resultid=4955880255

http://setiathome.berkeley.edu/result.php?resultid=5002653139

http://setiathome.berkeley.edu/result.php?resultid=5006094844

http://setiathome.berkeley.edu/result.php?resultid=5006094840

http://setiathome.berkeley.edu/result.php?resultid=5006094889

. . I am about to shut down and restart the machine
ID: 1798808 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1798886 - Posted: 26 Jun 2016, 19:59:32 UTC - in response to Message 1798808.  

This one implies reboot indeed.
http://setiathome.berkeley.edu/result.php?resultid=4955880255
Others are due data currently processing. This sanity check will be removed in next version.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1798886 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1799240 - Posted: 28 Jun 2016, 15:19:40 UTC - in response to Message 1798886.  

This one implies reboot indeed.
http://setiathome.berkeley.edu/result.php?resultid=4955880255
Others are due data currently processing. This sanity check will be removed in next version.


. . I don't think I will ever get on top of these apps.

. . I have added a pair of GTX 970 cards to my third rig. And it is running like a dead cat.

. . I am getting much better numbers out of this GTX950 than they are producing.

. . Currently running SoG with -use_sleep_ex 1 -sbs 384 -period_iterations_num 5 and the runtimes are terrible, on the 950 running 3 at a time Guppies take 54 to 57 mins. On the 970s running three at a time they are taking 150 mins.

. . Considering that the 970s are more than twice as powerful as the 950 I would have expected the runtimes to be cut in half, not take three times as long :(.

. . Can you offer any suggestions ?
ID: 1799240 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1799244 - Posted: 28 Jun 2016, 16:01:34 UTC - in response to Message 1799240.  
Last modified: 28 Jun 2016, 16:01:47 UTC


. . Can you offer any suggestions ?

Currently available tunings listed in ReadMe file.
Due to massive changes in PulseFind area new builds behavior will differ so new tunings could be needed in sleeping and pulsefind area (in particular, -use_sleep expected to be more powerful and to give less impact on performance while saving CPU cycles for NV app flavour).
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1799244 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1799246 - Posted: 28 Jun 2016, 16:05:13 UTC

There is new set of RC builds available here: https://cloud.mail.ru/public/8RM1/LMYTwvGYp
If you experience any issues with v8.12 please try new RC build instead.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1799246 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1799279 - Posted: 29 Jun 2016, 2:18:35 UTC - in response to Message 1797801.  

Raistmer,

I'm starting to see these errors on some of the SoG work units

ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance
GPU device sync requested... ...GPU device synched
09:07:29 (7932): called boinc_finish(-1)


Are these the same errors we saw on Beta?
http://setiathome.berkeley.edu/result.php?resultid=5010738737
http://setiathome.berkeley.edu/result.php?resultid=5010028516

Zalster
ID: 1799279 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1799308 - Posted: 29 Jun 2016, 5:25:06 UTC - in response to Message 1799244.  


. . Can you offer any suggestions ?

Currently available tunings listed in ReadMe file.
Due to massive changes in PulseFind area new builds behavior will differ so new tunings could be needed in sleeping and pulsefind area (in particular, -use_sleep expected to be more powerful and to give less impact on performance while saving CPU cycles for NV app flavour).



. . OK I will have a read. But strange thing is that since that post I have tweaked the commandline and changed -sbs 384 to -sbs 512. I doubt that can take all the credit but runtimes have changed significantly. Now nonVLARs run in about 20 mins (3 at a time) and Guppies are running in about 37 mins (also 3 at a time). Not as efficient as I would like but much much closer.

. . Thanks
ID: 1799308 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 3987
Credit: 85,281,665
RAC: 126
Finland
Message 1799316 - Posted: 29 Jun 2016, 6:32:10 UTC - in response to Message 1799246.  

There is new set of RC builds available here: https://cloud.mail.ru/public/8RM1/LMYTwvGYp
If you experience any issues with v8.12 please try new RC build instead.


Here's a WU done with the new app. http://setiathome.berkeley.edu/result.php?resultid=5010274940

The output on this task looks a lot different than before, can't even see which GPU was used. I see quite a lot of these.

On the other hand here's another task that the output looks normal: http://setiathome.berkeley.edu/result.php?resultid=5010142578
ID: 1799316 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1799322 - Posted: 29 Jun 2016, 7:01:18 UTC - in response to Message 1799316.  

There is new set of RC builds available here: https://cloud.mail.ru/public/8RM1/LMYTwvGYp
If you experience any issues with v8.12 please try new RC build instead.


Here's a WU done with the new app. http://setiathome.berkeley.edu/result.php?resultid=5010274940

The output on this task looks a lot different than before, can't even see which GPU was used. I see quite a lot of these.

On the other hand here's another task that the output looks normal: http://setiathome.berkeley.edu/result.php?resultid=5010142578

Thanks for report.
Looks like leftover from increased verbosity build. I'll do rebuild.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1799322 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1799324 - Posted: 29 Jun 2016, 7:17:55 UTC - in response to Message 1799279.  

Raistmer,

I'm starting to see these errors on some of the SoG work units

ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance
GPU device sync requested... ...GPU device synched
09:07:29 (7932): called boinc_finish(-1)


Are these the same errors we saw on Beta?
http://setiathome.berkeley.edu/result.php?resultid=5010738737
http://setiathome.berkeley.edu/result.php?resultid=5010028516

Zalster

As Richard said this should go into this thread: http://setiathome.berkeley.edu/forum_thread.php?id=79760

Yes, it's false positive from autocorr sanity check we saw on beta.
New builds (will be r3480 and up) have this particular sanity check disabled.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1799324 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 17 · Next

Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.