AMD Rx Vega 56 issues

Questions and Answers : GPU applications : AMD Rx Vega 56 issues
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Sean
Volunteer tester

Send message
Joined: 10 Aug 00
Posts: 23
Credit: 59,729,876
RAC: 86,311
United States
Message 1891401 - Posted: 22 Sep 2017, 18:54:30 UTC

I've recently added a Vega 56 to my main SETI crunching box:
https://setiathome.berkeley.edu/show_host_detail.php?hostid=6993114

Despite what the details say, the system has a Vega 56 and a R9 280x running together (not 2 Vega cards). I initially ran just the Vega card to try to work out any problems and it seemed like after updating to the newest AMD HD5 SoG app and setting the Vega to run at its base GPU clock speed, tasks were (mostly) validating. At that point, I added the 280x back in and now both cards are running tasks.

So the main issue is that the Vega is still producing quite a few invalids and inconclusives (which I suspect will turn invalid). It also seems that when the Vega and 280x run similar point value tasks, the 280x consistently finishes them slightly faster (and with no errors).

Some potentially important details:
I'm using a 1000 watt platinum power supply (I can mine 24/7 on both cards with no problems)
I am running the AMD HD5 app with settings tuned for the 280x. Maybe Vega doesn't work well with those settings? I'll try reverting to the default settings this evening.
I wiped the old drivers with 'Display Driver Uninstaller' and did a clean install of Crimson 17.9.2.
I run 2 tasks on both cards simultaneously and keep 1 CPU core free for each GPU task.

I would really appreciate feedback from anyone else running a Vega.

Thanks!
ID: 1891401 · Report as offensive     Reply Quote
Profile Ageless
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 14240
Credit: 3,527,344
RAC: 828
Netherlands
Message 1891411 - Posted: 22 Sep 2017, 20:26:36 UTC - in response to Message 1891401.  

The one error was a (Unknown error) - exit code -6 (0xfffffffa) error, which can happen when there are tasks with zero byte length around.
I had reports on the BOINC forums from people running RX480s that their 'applications disconnected from BOINC and ran on their own'. They also ran the 17.9.2 drivers, which make me believe these things are bad. I run an RX470 with the 17.5.2 drivers and haven't the slightest problem.

So perhaps that 17.9.2 isn't good enough or stable enough to run OpenCL work with. You could shop around and see if later (beta), or earlier drivers fix this problem. A good clean-out of the drivers may also be of help.
Jord

Ancient Astronaut Theorists suggest that in many ways, you can be considered an alien conspiracy!
ID: 1891411 · Report as offensive     Reply Quote
Profile Sean
Volunteer tester

Send message
Joined: 10 Aug 00
Posts: 23
Credit: 59,729,876
RAC: 86,311
United States
Message 1891420 - Posted: 22 Sep 2017, 21:59:19 UTC - in response to Message 1891411.  

The one error was a (Unknown error) - exit code -6 (0xfffffffa) error

I guess I should have mentioned that the error(s) occurred when I crashed the system while tinkering, not during normal operation.
I will take a look at different drivers though!
ID: 1891420 · Report as offensive     Reply Quote
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,371,306
RAC: 413
United States
Message 1893376 - Posted: 5 Oct 2017, 16:16:55 UTC
Last modified: 5 Oct 2017, 16:17:33 UTC

Scanning thru the first page of your results under the inconclusive column, the RX280 is throwing more than the Vega 56.
... and still I fear, and still I dare not laugh at the Mad Man!

Queen - The Prophet's Song
ID: 1893376 · Report as offensive     Reply Quote
Profile Sean
Volunteer tester

Send message
Joined: 10 Aug 00
Posts: 23
Credit: 59,729,876
RAC: 86,311
United States
Message 1893506 - Posted: 6 Oct 2017, 0:28:24 UTC - in response to Message 1893376.  

Scanning thru the first page of your results under the inconclusive column, the RX280 is throwing more than the Vega 56.

I hate to admit it, but that's because I've had the Vega solely on mining duty for the past couple weeks.
The 280x and CPU I've kept dedicated to SETI tasks. The 280x seems particularly well suited for SETI - I can't remember the last time it gave an invalid result and I'm quite happy with 30k RAC from the machine. I almost wish I had just bought another 280x instead of the Vega!
ID: 1893506 · Report as offensive     Reply Quote

Questions and Answers : GPU applications : AMD Rx Vega 56 issues


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.