Forced to abandon GPU processing of SETI work

Message boards : Number crunching : Forced to abandon GPU processing of SETI work
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Graham Thomas

Send message
Joined: 19 May 99
Posts: 32
Credit: 3,880,875
RAC: 4
United Kingdom
Message 1759022 - Posted: 24 Jan 2016, 15:13:52 UTC

When SETI@home v8 units were first released I hoped that the problems I'd had with v7 units and my ATI GPU might go away.

With longer work units (those estimated to take around 2 hours on my GPU, as opposed to the shorter 45-minute units which were fine, the unit would complete a certain percentage of its work, often around 20%, then cause the display driver to stop working. The display driver would recover, but processing on the unit would then stop (i.e. the percentage of work done would remain the same while both the elapsed time and the remaining time would increase).

When I discussed this on the SETI message board a while back, several people made helpful suggestions, typically involving tweaking my config file. Thanks to everyone who made these suggestions, but alas nothing worked.

So, I hoped things would change with v8, but they haven't. Longer units still stop at around 20% completed. I haven't received any shorter units to compare yet. I suspect they'd be OK, and Astropulse units - on the rare occasions when I get any - are also fine.

However, in the absence of an option to have only Astropulse and/or shorter SETI@home v8 units processed on the GPU, I've decided to allow no SETI units to run on the GPU rather than having to check which ones to abort several times a day. It means I'll run a lot fewer SETI units than before, but there are plenty of other machines out there so it's OK.

I don't expect anyone to try to fix the problem, because I doubt whether enough people have the same GPU to make it worthwhile.

Here are the details of my setup, in case anyone else has the same problem:

23/01/2016 18:56:21 | | CAL: ATI GPU 0: AMD Radeon HD 6520G/6530D/6550D/6620G (SuperSumo) (CAL version 1.4.1848, 512MB, 479MB available, 960 GFLOPS peak)
23/01/2016 18:56:21 | | OpenCL: AMD/ATI GPU 0: AMD Radeon HD 6520G/6530D/6550D/6620G (SuperSumo) (driver version 1800.11 (VM), device version OpenCL 1.2 AMD-APP (1800.11), 512MB, 479MB available, 960 GFLOPS peak)
23/01/2016 18:56:21 | | OpenCL CPU: AMD A8-3850 APU with Radeon(tm) HD Graphics (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 1800.11 (sse2), device version OpenCL 1.2 AMD-APP (1800.11))
Graham Thomas
ID: 1759022 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1759027 - Posted: 24 Jan 2016, 15:32:02 UTC

First, the application for AstroPulse is still at version 7, its version is independent of that of SETI@Home, so you can always chose to only run AstroPulse tasks, but you will have periods of drought. This is an option on your "Account" page. (http://setiathome.berkeley.edu/prefs.php?subset=project make sure that only "AstroPulse v7" is set to "yes", and that "If no work for selected applications is available, accept work from other applications?" IS SET TO "no".

Second, check that the drivers you are using are supplied by AMD/ATI, and not from Microsoft - the latter are famed for being "somewhat weak".
There are several sets of instructions around describing how to make sure that you have a clean set of drivers installed, and which drivers to use.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1759027 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1759029 - Posted: 24 Jan 2016, 15:33:01 UTC - in response to Message 1759022.  

If its any consolation, I stopped crunching multibeam on my 6950 last year, the display driver occasionally crashed, but I was forever getting glitching and jerky mouse movements. AP's run fine and cause no isues, but are few and far between, so now my gpu crunches Einstein instead.
ID: 1759029 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1759096 - Posted: 24 Jan 2016, 20:41:58 UTC - in response to Message 1759022.  
Last modified: 24 Jan 2016, 21:03:28 UTC

When SETI@home v8 units were first released I hoped that the problems I'd had with v7 units and my ATI GPU might go away.

With longer work units (those estimated to take around 2 hours on my GPU, as opposed to the shorter 45-minute units which were fine, the unit would complete a certain percentage of its work, often around 20%, then cause the display driver to stop working. The display driver would recover, but processing on the unit would then stop (i.e. the percentage of work done would remain the same while both the elapsed time and the remaining time would increase).

When I discussed this on the SETI message board a while back, several people made helpful suggestions, typically involving tweaking my config file. Thanks to everyone who made these suggestions, but alas nothing worked.

So, I hoped things would change with v8, but they haven't. Longer units still stop at around 20% completed. I haven't received any shorter units to compare yet. I suspect they'd be OK, and Astropulse units - on the rare occasions when I get any - are also fine.

However, in the absence of an option to have only Astropulse and/or shorter SETI@home v8 units processed on the GPU, I've decided to allow no SETI units to run on the GPU rather than having to check which ones to abort several times a day. It means I'll run a lot fewer SETI units than before, but there are plenty of other machines out there so it's OK.

I don't expect anyone to try to fix the problem, because I doubt whether enough people have the same GPU to make it worthwhile.

Here are the details of my setup, in case anyone else has the same problem:

23/01/2016 18:56:21 | | CAL: ATI GPU 0: AMD Radeon HD 6520G/6530D/6550D/6620G (SuperSumo) (CAL version 1.4.1848, 512MB, 479MB available, 960 GFLOPS peak)
23/01/2016 18:56:21 | | OpenCL: AMD/ATI GPU 0: AMD Radeon HD 6520G/6530D/6550D/6620G (SuperSumo) (driver version 1800.11 (VM), device version OpenCL 1.2 AMD-APP (1800.11), 512MB, 479MB available, 960 GFLOPS peak)
23/01/2016 18:56:21 | | OpenCL CPU: AMD A8-3850 APU with Radeon(tm) HD Graphics (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 1800.11 (sse2), device version OpenCL 1.2 AMD-APP (1800.11))

Graham,

I had the same problem on my APU and I had a little help from the people here and it fixed my problem. Maybe yours can be fixed as mine was.
In your, mb_cmdline_win_x86_SSE2_OpenCL_ATi_HD5.txt, located in your seti directory, add the lines below, save and then quit and restart Boinc. It is a text file and you can open it with Notepad.

-sbs 256 -cpu_lock
-period_iterations_num 450

You may be able to reduce the 450 number, but on mine it didn't really take until I hit 450. I started with 256 as I recall and it was better but still erred.

I wish you success!
Allen
ID: 1759096 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1759176 - Posted: 25 Jan 2016, 16:15:34 UTC

On HD6950 MB operation highly depends from driver version installed.
Currently I run exactly HD6950 2 tasks per GPU w/o issues under Vista x86.
Under Windows 10 IP x64 I get driver restarts on exactly same hardware.

Regarding low-end GPUs: -periods_iteration_num N can help indeed.
I plan to add some additional tuning possibility in next versions though (emdedded profiling ability required for this). Advanced users could try to profile current build on such hardware and post info about longest kernel calls (that cause Windows driver watchdog restart and "sluggishness").
ID: 1759176 · Report as offensive
Profile Graham Thomas

Send message
Joined: 19 May 99
Posts: 32
Credit: 3,880,875
RAC: 4
United Kingdom
Message 1759423 - Posted: 26 Jan 2016, 11:27:56 UTC - in response to Message 1759176.  

Thanks for all your help.

Bob Smith: I'm pretty sure I'm using the ATI drivers, as I update them using the AMD Catalyst Control Center. Your suggestion for choosing only Astropulse would mean that I couldn't run SETI@home v8 tasks on my CPUs, and I'd still like to be able to do that: on balance I'd prefer to turn off GPU processing if necessary.

AllenIN: I've now implemented your suggested change and I'll see how that goes. I had something similar (though with a lower number of iterations) in the file for a previous version and it didn't work, but I now see I had implemented the earlier change in the wrong file (for v7.03 rather than v7.07). Anyway, I've now made the change in the file for v8.00. I'll post again to let you know if it works.
Graham Thomas
ID: 1759423 · Report as offensive
Profile Graham Thomas

Send message
Joined: 19 May 99
Posts: 32
Credit: 3,880,875
RAC: 4
United Kingdom
Message 1759630 - Posted: 27 Jan 2016, 10:09:20 UTC - in response to Message 1759423.  

Fantastic!

The tweaks to the config file appear to have done the trick. All SETI@home v8 units that ran on my GPU yesterday completed without problems.

Thanks once again, everyone, for your help.
Graham Thomas
ID: 1759630 · Report as offensive
Profile Graham Thomas

Send message
Joined: 19 May 99
Posts: 32
Credit: 3,880,875
RAC: 4
United Kingdom
Message 1761005 - Posted: 31 Jan 2016, 12:27:16 UTC - in response to Message 1759630.  

OK - so not 100% fantastic. I've had two SETI@Home v8 GPU units stall in the last four days. I'm not sure what they have in common, but I suspect they're the ones that are scheduled to take the longest times to run.

However, I can live with that.

Would increasing the number of iterations in AllenIN's tweak (above) be likely to help - and if so, why?
Graham Thomas
ID: 1761005 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1761014 - Posted: 31 Jan 2016, 13:54:48 UTC

Yes, it should

period_iterations_num N: Splits single PulseFind kernel call.
That defines the chunks beeing processed at each kernel call in Pulse find kernel.

Also increasing single buffer size should help.

-sbs 192 -period_iterations_num 450


With each crime and every kindness we birth our future.
ID: 1761014 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1761171 - Posted: 1 Feb 2016, 0:56:00 UTC - in response to Message 1761005.  
Last modified: 1 Feb 2016, 1:02:21 UTC

OK - so not 100% fantastic. I've had two SETI@Home v8 GPU units stall in the last four days. I'm not sure what they have in common, but I suspect they're the ones that are scheduled to take the longest times to run.

However, I can live with that.

Would increasing the number of iterations in AllenIN's tweak (above) be likely to help - and if so, why?


Graham,

I haven't had any errors at all since setting my APU's, both A8 3870K's, to these settings.........

-sbs 256 -cpu_lock
-period_iterations_num 450

If you have problems at 450, then I would try something higher.

Good luck!
ID: 1761171 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1761270 - Posted: 1 Feb 2016, 8:39:24 UTC - in response to Message 1761171.  

-cpu_lock is important part in case of full CPU usage by other apps. It can prevent stalls. In last builds it should be ON by default though so just check that -no_cpu_lock not present or free some CPU cores.
ID: 1761270 · Report as offensive
Profile Graham Thomas

Send message
Joined: 19 May 99
Posts: 32
Credit: 3,880,875
RAC: 4
United Kingdom
Message 1761380 - Posted: 1 Feb 2016, 19:37:08 UTC - in response to Message 1761270.  

Thanks again, everyone. I'd already changed my settings to those given in AllenIN's last message, including -cpu_lock. I've now upped -period_iterations_num to 550. If that doesn't work I'll increase -sbs too. Should I increase -sbs by a number such as 64 or 128, or doesn't it matter?
Graham Thomas
ID: 1761380 · Report as offensive
The_Matrix
Volunteer tester

Send message
Joined: 17 Nov 03
Posts: 414
Credit: 5,827,850
RAC: 0
Germany
Message 1761381 - Posted: 1 Feb 2016, 19:39:27 UTC
Last modified: 1 Feb 2016, 20:04:46 UTC

----- no more necessary ----
ID: 1761381 · Report as offensive
Profile Philip Stehno
Avatar

Send message
Joined: 17 May 99
Posts: 7
Credit: 10,241,552
RAC: 0
United States
Message 1761720 - Posted: 2 Feb 2016, 23:29:18 UTC

I've had a different problem with the release of V8. The computer will not download two required .dll files. I've had to say no to SETI GPU V8 because of it. V7 did not have this problem.
ID: 1761720 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1761735 - Posted: 3 Feb 2016, 1:13:17 UTC

I gave up GPU processing when I upgraded to Win 10, AMD user as well. It's sad and as of mid November I'm starting to move away from my peak position.

Oh well, my ego isn't exactly tied up in this.
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1761735 · Report as offensive

Message boards : Number crunching : Forced to abandon GPU processing of SETI work


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.