Astropulse error with AMD/ATI GPU?

Message boards : Number crunching : Astropulse error with AMD/ATI GPU?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1973120 - Posted: 3 Jan 2019, 2:15:16 UTC

I have noticed today that I have four Astropulse GPU WUs that had errors. This is on my new Ryzen computer with integrated GPU (Ryzen 3 2200G, with Radeon Vega 8 integral graphics). I am running Radeon 18.12.2 drivers (not the optional 18.13.2). Is anyone else having errors with these WUs? I am having some return back completed on the GPU, so not every WU is having a problem. For what its worth, I have not received errors for any S@H v8 GPU WUs, or any errors for CPU WUs (Astropulse or otherwise). I did notice that my app_config did not have a dedicated CPU for the GPU for Astropulse (only for S@H v8). I have since added it, so we'll see if the error continues.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1973120 · Report as offensive
Profile lunkerlander
Avatar

Send message
Joined: 23 Jul 18
Posts: 82
Credit: 1,353,232
RAC: 4
United States
Message 1973188 - Posted: 3 Jan 2019, 12:20:06 UTC - in response to Message 1973120.  

Yes, I have a 50% error rate with my AMD astropulse tasks. I finally stopped doing astropulse tasks to not ruin the results data in case my wingman also had the same issue.
ID: 1973188 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1973718 - Posted: 6 Jan 2019, 0:55:12 UTC - in response to Message 1973188.  

Yes, I have a 50% error rate with my AMD astropulse tasks. I finally stopped doing astropulse tasks to not ruin the results data in case my wingman also had the same issue.

I have had a few more errors on Astropulse tasks running on AMD GPU, even with a dedicated CPU. So, I agree, it is probably best to stop crunching GPU tasks for Astropulse.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1973718 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 715
Credit: 8,032,827
RAC: 62
France
Message 1973783 - Posted: 6 Jan 2019, 10:38:28 UTC

no error with my AP tasks and my HD7750 ^^

perhaps need a new application for AMD RX GPU series
ID: 1973783 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1973785 - Posted: 6 Jan 2019, 10:43:13 UTC

I can see both of you are running on a 4 core CPU.

How many instances are running on CPU ?
You have to keep at least one CPU core free.


With each crime and every kindness we birth our future.
ID: 1973785 · Report as offensive
Profile lunkerlander
Avatar

Send message
Joined: 23 Jul 18
Posts: 82
Credit: 1,353,232
RAC: 4
United States
Message 1973799 - Posted: 6 Jan 2019, 13:24:18 UTC - in response to Message 1973785.  

I ran 3 CPU tasks, leaving 1 core free,
ID: 1973799 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1973803 - Posted: 6 Jan 2019, 14:02:45 UTC

The argument about "real" vs. "hyper" cores has raged for a very long time - in reality the difference in performance is not as big as one might expect. When hyper-threading is active there are two processes active on each core (well, more or less) and so the effective processing rate on each "real" or "virtual" core is about half that of a core on the same processor when non-hyper-threaded. Setting aside a CPU core (real or hyper-threaded) is required, while setting aside two cores on a hyper-threaded CPU does not mean that you end up with the equivalent performance of a single non-hyper-threaded core because SETI is not coded to use multiple cores (hyper or otherwise).
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1973803 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1973817 - Posted: 6 Jan 2019, 15:34:28 UTC

Because no volunteer has stepped forward to totally re-write the CPU application to use multiple CPUs. It is quite probable that the actual performance improvement for a muti-threaded (CPU) application would not be linear as there would need to be a large amount of time spent re-syncing the data at the end.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1973817 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1973824 - Posted: 6 Jan 2019, 16:48:33 UTC - in response to Message 1973801.  

Mike is quite correct that a GPU task requires support from 1 CPU core to work properly. So on a four core machine 1 x GPU and 3 x CPU is required. But as I understand it, that means a real core not a virtual core. I.e. a four core CPU with HT will give 8 threads, therefore you need to reserve 2 threads for the GPU and only run 6 CPU tasks.
I see what you are saying, but my computer has no hyper threading. 4 cores, 4 threads total. I have been running S@H with 3 CPU WUs and (1 CPU + 1 GPU) WU being crunched at the same time. WFIW, I'm running at 100% with no cooling issues whatsoever. S@H v8 has had no errors running like this, only a few instances of Astropulse v7 GPU WUs.

I also don't know if there is a difference that my GPU is integrated, compared to Lunker's discrete Radeon, but it does seem like we are both having similar problems.

If this is a bug in the way the WUs are processed then I guess there is nothing I can do; I wanted to check and see if anyone else was having the same problem and if they found a way to fix it.

Setting aside a CPU core (real or hyper-threaded) is required, while setting aside two cores on a hyper-threaded CPU does not mean that you end up with the equivalent performance of a single non-hyper-threaded core because SETI is not coded to use multiple cores (hyper or otherwise).
Forgive me because its been a long time since I took a basic computer architecture class, but even if you set aside two threads (for GPU WUs, or just to have free for other processes), is the code written so that those two threads would be from the same CPU?


I guess the wholesale move to GPU crunching is the main reason why no development work was done on CPU matters.
I would be curious to know what the percentage of WUs crunched are CPU WUs vs GPU WUs. On one hand I could see GPU WUs being a higher percentage because they work fast, but on the other hand I could see that more "casual" crunchers are doing CPU work only. Couple that with the fact that more CPUs are having more cores, I would think that keeping the code up to date for newer CPUs would be important. Of course, that's just me talking from the armchair here.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1973824 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1973829 - Posted: 6 Jan 2019, 17:11:36 UTC
Last modified: 6 Jan 2019, 17:12:50 UTC

We are talking about AMD GPU not nvidia.
It doesn`t matter if you keep a virtual or a physical core free for such GPU`s.
AMD drivers only need a little bit of resources from the CPU.
That`s the reason i only said one core.


Also the fact is MB`s seems to run O.K. on both computers.

So add the following line to the ap_cmdline**.txt file.

Usually its empty.
On stock app its named different.

-oclFFT_plan 256 16 256 -tune 1 64 4 1 -tune 2 64 4 1 -cpu_lock_fixed_cpu 3 -hp

Save as txt.

Beware you need one core free.


With each crime and every kindness we birth our future.
ID: 1973829 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 1973830 - Posted: 6 Jan 2019, 17:13:55 UTC - in response to Message 1973824.  

Greetings,

What is "WFIW"? I Googled it and all I found was an AM radio station in Illinois.

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 1973830 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1973852 - Posted: 6 Jan 2019, 18:54:02 UTC - in response to Message 1973830.  

What is "WFIW"? I Googled it and all I found was an AM radio station in Illinois.
That is me not proofreading, I mean for what its worth, FWIW.

We are talking about AMD GPU not nvidia.
It doesn`t matter if you keep a virtual or a physical core free for such GPU`s.
AMD drivers only need a little bit of resources from the CPU.
That`s the reason i only said one core.


Also the fact is MB`s seems to run O.K. on both computers.

So add the following line to the ap_cmdline**.txt file.

Usually its empty.
On stock app its named different.

-oclFFT_plan 256 16 256 -tune 1 64 4 1 -tune 2 64 4 1 -cpu_lock_fixed_cpu 3 -hp

Save as txt.

Beware you need one core free.

MB = motherboard? I'm confused.
For the addition to the ap_cmdline, I am assuming it should be my ap_cmdline_7.09_windows_intelx86__opencl_ati_100.txt file? The other two in the directory have "CPU" in the text file, so I assume that is not related to GPU tasks.
And what does the addition to the command line do exactly? I'm asking as someone that knows enough to be dangerous.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1973852 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 1973858 - Posted: 6 Jan 2019, 19:22:05 UTC - in response to Message 1973852.  

Greetings Bill,

Ah, now I know what you meant. Thanks! :)

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 1973858 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1973902 - Posted: 6 Jan 2019, 23:10:19 UTC - in response to Message 1973852.  

What is "WFIW"? I Googled it and all I found was an AM radio station in Illinois.
That is me not proofreading, I mean for what its worth, FWIW.

We are talking about AMD GPU not nvidia.
It doesn`t matter if you keep a virtual or a physical core free for such GPU`s.
AMD drivers only need a little bit of resources from the CPU.
That`s the reason i only said one core.


Also the fact is MB`s seems to run O.K. on both computers.

So add the following line to the ap_cmdline**.txt file.

Usually its empty.
On stock app its named different.

-oclFFT_plan 256 16 256 -tune 1 64 4 1 -tune 2 64 4 1 -cpu_lock_fixed_cpu 3 -hp

Save as txt.

Beware you need one core free.

MB = motherboard? I'm confused.
For the addition to the ap_cmdline, I am assuming it should be my ap_cmdline_7.09_windows_intelx86__opencl_ati_100.txt file? The other two in the directory have "CPU" in the text file, so I assume that is not related to GPU tasks.
And what does the addition to the command line do exactly? I'm asking as someone that knows enough to be dangerous.


With MB i`m talking about seti_v8 tasks.

Yes, the file you mentioned is the right one.

I think your last question is a little more complex.
The first part of the command line is just a little structuring of some kernels.
The second part will adjust CPU affinity of the GPU.
Usually core 0 and 1 are used by GPU app, this command will just pin the GPU task to core 4 in your case.
This should help getting the resources your computer needs.


With each crime and every kindness we birth our future.
ID: 1973902 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1973904 - Posted: 6 Jan 2019, 23:24:39 UTC - in response to Message 1973902.  

Thank you, I added this to the command line text. I’ll run this for awhile and report back what happens.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1973904 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1974950 - Posted: 12 Jan 2019, 19:33:37 UTC - in response to Message 1973904.  

Yup, I'm still getting errors for AP7 GPU WUs. I do notice that any that are queued up in BOINC have an estimated runtime of 0:00. I have only encountered a few more, but that is because AP is few and far between. Basically, any AP7 GPU WUs I receive end in an error.

Any other ideas? Otherwise I think I will need to stop running them.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1974950 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1974959 - Posted: 12 Jan 2019, 21:01:52 UTC

With a zero estimated run-time then it is hardly surprising that all the APs fail on your AMD APU, there is a trap to detect tasks that are taking far too long (I think its ten times the estimate, but given a zero estimate a few seconds will be "fat too long").
I can think of two ways that might allow APs to run on that computer.
Detach and then re-attach the computer, giving a few hours between the two steps - a bit of a "nuclear option"
Reset the project on that computer.
Both of these should clear out any wrongly set performance values, personally I would try the "reset" first.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1974959 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1974980 - Posted: 12 Jan 2019, 22:44:17 UTC - in response to Message 1973120.  

I have noticed today that I have four Astropulse GPU WUs that had errors. This is on my new Ryzen computer with integrated GPU (Ryzen 3 2200G, with Radeon Vega 8 integral graphics). I am running Radeon 18.12.2 drivers (not the optional 18.13.2). Is anyone else having errors with these WUs? I am having some return back completed on the GPU, so not every WU is having a problem. For what its worth, I have not received errors for any S@H v8 GPU WUs, or any errors for CPU WUs (Astropulse or otherwise). I did notice that my app_config did not have a dedicated CPU for the GPU for Astropulse (only for S@H v8). I have since added it, so we'll see if the error continues.


I am running the optional video driver but I am not getting AP's for the gpu. On my cpu it runs "fine" (just a bit slow though). Might check the MB website for chipset drivers.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1974980 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1974985 - Posted: 12 Jan 2019, 23:13:30 UTC - in response to Message 1974959.  

With a zero estimated run-time then it is hardly surprising that all the APs fail on your AMD APU, there is a trap to detect tasks that are taking far too long (I think its ten times the estimate, but given a zero estimate a few seconds will be "fat too long").
I can think of two ways that might allow APs to run on that computer.
Detach and then re-attach the computer, giving a few hours between the two steps - a bit of a "nuclear option"
Reset the project on that computer.
Both of these should clear out any wrongly set performance values, personally I would try the "reset" first.
I don't think I had noticed the reset project option before in BOINC (never had to use it). I'll NNT it first and then give it a shot.

I'm not sure I understand what you mean by detaching and reattaching the computer. I'm assuming you mean by software means in the project, and not physically?

I am running the optional video driver but I am not getting AP's for the gpu. On my cpu it runs "fine" (just a bit slow though). Might check the MB website for chipset drivers.
I am on the latest BIOS from ASRock. I think I have the latest chipset drivers...but when I run the install on them again I realized it is the Radeon driver update. I'll look into this again.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1974985 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 1974990 - Posted: 12 Jan 2019, 23:27:06 UTC - in response to Message 1974985.  

I'm not sure I understand what you mean by detaching and reattaching the computer. I'm assuming you mean by software means in the project, and not physically?

Yep, software.
The Remove option in the BOINC Manager, then Tools, Add project to re-join/re-attach it to the project.
Grant
Darwin NT
ID: 1974990 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : Astropulse error with AMD/ATI GPU?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.