Message boards :
Number crunching :
ROCm 1.8
Message board moderation
Author | Message |
---|---|
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
I finally have my 1950X/RX-Vega64x4 machine back to running on Linux. After I swapped the Triple ProDuo GPU's for a Quad block of Vega64's, the latest drivers from AMD site did not support Vega on Threadripper, as it was using ROCm 1.6. I installed ROCm 1.8 as described on the ROCm GitHub Site and it is working fine. Only issue is that the cards are identified as "[4] AMD Device 687f (8176MB) OpenCL: 1.2" for this computer Eos. Doesn't seem to be causing any problems though. GitHub: Ricks-Lab Instagram: ricks_labs |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Looks nice Rick. Maybe you should upgrade the kernel. I`m running 4.15.8 on my Mint 18 partition. With each crime and every kindness we birth our future. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Looks nice Rick. I considered loading Ubuntu 18.04, but it looks like ROCm 1.8 was built for 16.04. Maybe not an issue, but I wanted to keep this attempt simple since my last attempt failed. Hope the next version is built for 18.04. GitHub: Ricks-Lab Instagram: ricks_labs |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
My older Linux machines running on Ubuntu 16.04 are using the latest 4.15.0-20 kernels which have all the Ryzen/TR fixes/patches baked in. That is the kernel that got installed in my newest Ubuntu 18.04 LTS machine. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
I woke this morning to 2 invalids and 1 computation error. I may have to drop back to Win10... I will let it run until the outage to see if it continues. GitHub: Ricks-Lab Instagram: ricks_labs |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
That's a lot of inconclusives you have there as well as a couple more invalids. Cheers. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
That's a lot of inconclusives you have there as well as a couple more invalids. I agree. Here is the task page for the same machine running windows https://setiathome.berkeley.edu/results.php?hostid=8507353. I also noticed a difference in command line arguments and will align to what is used in Windows when I get back to the machine this evening. GitHub: Ricks-Lab Instagram: ricks_labs |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
You should remove -pref_wg_num_per_cu from command line from your Linux install. Thats the biggest difference between your win and Linux tasks. With each crime and every kindness we birth our future. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
You should remove -pref_wg_num_per_cu from command line from your Linux install. I just updated to make it to be nearly identical, everything except -tt 500 vs -tt 600. Looks like I am still getting inconclusives... GitHub: Ricks-Lab Instagram: ricks_labs |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
You should also change -tune 1 from 2.1.18 to -tune 1 4 4 16. With each crime and every kindness we birth our future. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
You should also change -tune 1 from 2.1.18 to -tune 1 4 4 16. Yes, I have changed that. Everything except -tt is aligned with the window install for this machine. So far, I have no invalids with the aligned command line, but do have a few inconclusives. Hard to tell if the problem persists. I have it set to not download new tasks in case I need to boot to windows. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Looks like it still has issues so I booted the machine to Windows. I will do some more research... GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
I am hoping that an application expert may be able to comment on if the differences I am seeing in the OpenCL platform information between ROCm 1.8 and latest AMD Windows drivers. Here are the ROCm details with items different from windows colored in red and additional items in yellow: Max compute units: 64 Follows is the Platform Information from latest AMD Windows driver: Max compute units: 64 Seems like the most significant difference is that ROCm 1.8 in Linux is showing the actual memory on the GPU of 8GB, where windows is showing something much smaller, ~3GB, probably limited by 32bit. Is this what is causing the invalid results? Any recommendations on how to deal with it? |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
First of all Linux app is 64 bit whilst windows app is 32 bit. Of course Linux driver can access full memory range of the GPU. I`m almost sure this is not related to your invalids. I would suggest to run a few hours without any app args to see if the invalids stop. But it could be the driver itself because it don`t identify your GPU`s correctly IMHO. With each crime and every kindness we birth our future. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
First of all Linux app is 64 bit whilst windows app is 32 bit. Thanks for the insight on 32/64bit differences. For the name, I think maybe BOINC is picking up the new parameter "Board Name:" instead of "Name:". Both Windows and new ROCm 1.8 have the same value for Name. GitHub: Ricks-Lab Instagram: ricks_labs |
Rob Send message Joined: 7 Apr 12 Posts: 9 Credit: 951,019 RAC: 0 |
Don't really have an idea if this problem can be tweaked away - I have opened a ticket at ROCm though, maybe there is a bug in their OpenCL implementation? https://github.com/RadeonOpenCompute/ROCm/issues/423 Might be worth keeping an eye on that in case they need some more diagnostic info |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Don't really have an idea if this problem can be tweaked away - I have opened a ticket at ROCm though, maybe there is a bug in their OpenCL implementation? This is possible of course. With each crime and every kindness we birth our future. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Don't really have an idea if this problem can be tweaked away - I have opened a ticket at ROCm though, maybe there is a bug in their OpenCL implementation? Thanks for posting. I will keep on eye on the thread. I was thinking of giving the non-SoG version of the app a try. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
The ROCm 1.8 package includes a tool to monitor and configure the GPUs: rocmsmi.py Here is a summary report of the current state of my GPUs: ==================== ROCm System Management Interface ==================== ================================================================================ GPU Temp AvgPwr SCLK MCLK Fan Perf SCLK OD 3 39.0c 175.0W 1630Mhz 945Mhz 0.0% auto 0% 1 43.0c 144.0W 1630Mhz 945Mhz 0.0% auto 0% 2 37.0c 190.0W 1630Mhz 945Mhz 0.0% auto 0% 0 44.0c 178.0W 1630Mhz 945Mhz 0.0% auto 0% ================================================================================ ==================== End of ROCm SMI Log ==================== Definitely useful! GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Don't really have an idea if this problem can be tweaked away - I have opened a ticket at ROCm though, maybe there is a bug in their OpenCL implementation? Can someone point me to the latest Linux non-SoG AMD MB app? Thanks! GitHub: Ricks-Lab Instagram: ricks_labs |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.