Astropulse error with AMD/ATI GPU?

Message boards : Number crunching : Astropulse error with AMD/ATI GPU?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1975073 - Posted: 13 Jan 2019, 17:21:05 UTC - in response to Message 1974980.  
Last modified: 13 Jan 2019, 17:24:42 UTC

I am running the optional video driver but I am not getting AP's for the GPU. On my CPU it runs "fine" (just a bit slow though). Might check the MB website for chipset drivers.

Tom


I just reviewed my "all tasks" and I have 4 AP tasks aimed at my GPU. All have a "zero" time remaining. So I expect they will behave just like you have described.

Its odd.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1975073 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1975438 - Posted: 16 Jan 2019, 16:55:39 UTC - in response to Message 1974959.  

Reset the project on that computer.
Both of these should clear out any wrongly set performance values, personally I would try the "reset" first.
I did a reset, and I have one AMD AP7 task in the queue with a 0:00 estimated completion time, so I'm assuming that will error out. I will try the remove option in a few days.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1975438 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1975507 - Posted: 17 Jan 2019, 0:15:17 UTC

APs ran fine on my old HD7870, but never did on the RX470 I replaced it with. Something in the RX's is different that Astropulse really doesn't like, so they err within a minute.
ID: 1975507 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1975518 - Posted: 17 Jan 2019, 1:10:34 UTC - in response to Message 1975507.  

Did the AP task failures on RX cards ever get brought to Raistmer's attention? Just curious. Never have messed with ATI/AMD cards before.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1975518 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1975522 - Posted: 17 Jan 2019, 1:34:14 UTC - in response to Message 1975518.  
Last modified: 17 Jan 2019, 1:37:13 UTC

Did the AP task failures on RX cards ever get brought to Raistmer's attention? Just curious. Never have messed with ATI/AMD cards before.
I guess I have never "filed a complaint" with anyone with why they are not working. I could see how the new Ryzen APUs have kinks that need to be worked out. Is Raistmer the guy for this? For the little bit of attention I pay to these boards I know he does a lot of development for the project (at least with special builds), but I never assumed he is the one to fix it.

Edit: If it matters, when I was waiting for NNT to run its course, I had MilkyWay@Home running as my backup project, and all GPU tasks for that project ended in error. I could see this being an error with my setup, but since I haven't tweaked much of anything OOB I'm not sure what needs fixing.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1975522 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1975646 - Posted: 17 Jan 2019, 21:42:02 UTC - in response to Message 1975507.  

I rechecked that. APs on my RX470 tend to run from start to finish, they'll just never validate as they'll always have zero spikes and a large percentage radar blanking. I did report that here.
ID: 1975646 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1975719 - Posted: 18 Jan 2019, 9:04:05 UTC - in response to Message 1975507.  

APs ran fine on my old HD7870, but never did on the RX470 I replaced it with. Something in the RX's is different that Astropulse really doesn't like, so they err within a minute.


Although AP's also ran fine on my HD6950, the cursor didn't, it used to glitch (stick) something awful, if I needed to use the pc for anything serious, I had to suspend GPU work. The RX480 that replaced it kept giving occasional errors, so I stopped processing AP's on it.

P.
ID: 1975719 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1976146 - Posted: 21 Jan 2019, 0:11:52 UTC - in response to Message 1975719.  

So I have been looking into this a little bit more, and I discovered a few things. Task 7340502156 has the exit status of: 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED

The stderr output for this task is:
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
exceeded elapsed time limit 2.06 (18210521.14G/8839389.45G)</message>
<stderr_txt>
Running on device number: 0
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 0
Info: BOINC provided OpenCL device ID used
Used GPU device parameters are:
Number of compute units: 8
Single buffer allocation size: 256MB
Total device global memory: 3072MB
max WG size: 256
local mem type: Real
-unroll default value used: 8
-ffa_block default value used: 2048
-ffa_block_fetch default value used: 1024

Build features: Non-graphics BLANKIT OpenCL TWIN_FFA OCL_ZERO_COPY COMBINED_DECHIRP_KERNEL FFTW USE_INCREASED_PRECISION USE_SSE2 x86
CPUID: AMD Ryzen 3 2200G with Radeon Vega Graphics

Cache: L1=64K L2=512K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 FMA3 SSE4.1 SSE4.2 AVX SSE4A
AstroPulse v7 Windows x86 rev 2742, V7 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2

OpenCL version by Raistmer

oclFFT fix for ATI GPUs by Urs Echternacht
ffa threshold mods by Joe Segur
SSE3 dechirping by JDWhale
Combined dechirp kernel by Frizz
Number of OpenCL platforms: 1


OpenCL Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Max compute units: 8
Max work group size: 256
Max clock frequency: 42949672Mhz
Max memory allocation: 3221225472
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 3221225472
Constant buffer size: 3221225472
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Queue properties:
Out-of-Order: No
Name: gfx902
Vendor: Advanced Micro Devices, Inc.
Driver version: 2766.5 (PAL,HSAIL)
Version: OpenCL 1.2 AMD-APP (2766.5)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event cl_amd_liquid_flash


state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
GPU device sync requested... ...GPU device synched
Termination request detected or computations are finished. GPU device synched, exiting...

</stderr_txt>
In looking at the stderr for the MilkyWay@home GPU tasks that ended in error, they had a similar output of exceeding the time limit. I also noticed for MW@H that I have three warnings listed in the task. I'm not trying to get my MW@H problems solved here, but I can't help but think these are related somehow.

So I don't know how this works, but what I suspect is happening is that I download a task, and for whatever reason, it is given an estimated completion time of 0:00. Because the time is 0 (or presumably really really small), the task cannot be completed within this time and errors out. So, why is the task assigned such a small estimated completion time in the first place? I don't know if that is something happening in the BOINC software, parameters my computer is set at for my hardware, or something in between.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1976146 · Report as offensive
Ben

Send message
Joined: 15 Jun 99
Posts: 54
Credit: 60,003,756
RAC: 150
United States
Message 1976172 - Posted: 21 Jan 2019, 3:42:58 UTC - in response to Message 1975719.  

APs ran fine on my old HD7870, but never did on the RX470 I replaced it with. Something in the RX's is different that Astropulse really doesn't like, so they err within a minute.


Although AP's also ran fine on my HD6950, the cursor didn't, it used to glitch (stick) something awful, if I needed to use the pc for anything serious, I had to suspend GPU work. The RX480 that replaced it kept giving occasional errors, so I stopped processing AP's on it.

P.

Same here with an rx 570.
ID: 1976172 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1976185 - Posted: 21 Jan 2019, 6:15:35 UTC

I just found a clue - in the first part of the output file for your failed AP task I found this line:
Device peak FLOPS 	14,513,557.69 GFLOPS


The comparable line for the GTX1080 on one of my computers is:
Device peak FLOPS 	8,875.52 GFLOPS

Since this value is used to guess at the estimated run time and yours is 1600 times larger than that for a faster processor (it should be smaller) I think you can see the where the problem lies.

I think I've found a solution - you need to edit the "coproc_info.xml" file, look for the section that has info about your GPU, then edit the line that starts with "<peak_flops>" and reduce the very big number that follows that by a factor of say 2000.
Save the file, and restart BOINC.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1976185 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1976202 - Posted: 21 Jan 2019, 9:42:42 UTC - in response to Message 1976185.  

I just found a clue - in the first part of the output file for your failed AP task I found this line:
Device peak FLOPS 	14,513,557.69 GFLOPS


The comparable line for the GTX1080 on one of my computers is:
Device peak FLOPS 	8,875.52 GFLOPS

Since this value is used to guess at the estimated run time and yours is 1600 times larger than that for a faster processor (it should be smaller) I think you can see the where the problem lies.

I think I've found a solution - you need to edit the "coproc_info.xml" file, look for the section that has info about your GPU, then edit the line that starts with "<peak_flops>" and reduce the very big number that follows that by a factor of say 2000.
Save the file, and restart BOINC.


Not sure that would work, unless I'm reading my data wrong, my rx470's copro file shows
peak flops as "<peak_flops>5990400000000.000000</peak_flops>"

which seems less than your 1080.

P.
ID: 1976202 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1976205 - Posted: 21 Jan 2019, 10:07:14 UTC - in response to Message 1975522.  

I guess I have never "filed a complaint" with anyone with why they are not working. I could see how the new Ryzen APUs have kinks that need to be worked out. Is Raistmer the guy for this? For the little bit of attention I pay to these boards I know he does a lot of development for the project (at least with special builds), but I never assumed he is the one to fix it.

Yes, Raistmer is the prime developer of all the OpenCL apps. That includes the AP app too. The other place that a complaint should be lodged is on the Questions and Answers, GPU Applications forum so that it is noticed by the developers. I see Jord already commented so he also could lodge a complaint with the developers.

This eventually need to be logged into the BOINC/SETI github repository as a bug for the application. From my irregular perusal of all the logged issues, I have seen similar issues logged already about the failure of the platform to properly identify the processing power of gpus. I believe Richard Haselgrove just logged something very similar about the issue.
https://github.com/BOINC/boinc/issues/2949
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1976205 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1976214 - Posted: 21 Jan 2019, 11:37:20 UTC - in response to Message 1976202.  

Phil,
Looking at the output from one of your valid results you have:
Device peak FLOPS 	5,990.40 GFLOPS

compared with one from one of my GTX1080:
Device peak FLOPS 	8,875.52 GFLOPS



An RX470 is a slower GPU (for SETI) than a GTX1080, so I would expect the peak-flops (effectively the processing rate) to be lower for the RX470.

Bill's figure is "stupidly high", he might solve the zero expected runtime by just knocking three zeros off the value (divide by a thousand). Doing so would at least be a guide as to this being the right place to look.

There is of course a question - Why did the peak-flops value go so high in the first place?
I have a few ideas, some are "easy" to rule out, others are going to need some digging.
A few of the simplest are:
- did this happen very soon after installing the new GPU?
- are you heavily re-scheduling work between processors?
- have you just stopped using the I-GPU?
After that we are heading into the murky depths of BOINC, and that could well be a very messy can of worms.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1976214 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1976305 - Posted: 21 Jan 2019, 18:38:45 UTC - in response to Message 1976285.  

So, it's perfectly clear that the GFLOP calculation both from Boinc, and from the apps are totally and completely F'ed up, and nothing to count on at all. :-)
BOINC doesn't calculate the flops value, it reads it from the CUDA and OpenCL files provided by the drivers.
ID: 1976305 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1976307 - Posted: 21 Jan 2019, 19:15:39 UTC - in response to Message 1976214.  

- did this happen very soon after installing the new GPU?
Technically, yes. The computer was built from scratch and has been running for about a month. I didn't get any AP tasks until a few weeks ago.
- are you heavily re-scheduling work between processors?
No, I am just letting it run its course.
- have you just stopped using the I-GPU?
No, I am using it for s@h v8 all the time with no problems, both CPU and GPU tasks. I have either been suspending any AP v7 GPU tasks or updating my app config to not allow AP v7 GPU tasks to download until I have a better idea of what can be done to fix this.
After that we are heading into the murky depths of BOINC, and that could well be a very messy can of worms.
And that is above my level of expertise. I'd love to help, but I don't know that I have the time or experience to try to solve this myself in a reasonable amount of time.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1976307 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1976311 - Posted: 21 Jan 2019, 19:34:53 UTC

...actually BOINC does calculate the value for peak flops, based on data collected from the driver(s).
This triggers a couple of thoughts, first have AMD in their wisdom changed the API (RPC) for getting such data? and second, is the data coming from the driver "correct"? If either is true then it is a "bit of a problem" that may cause a lot of head scratching.

Here's an extract from coproc.cpp (for release 7.7.8) where the peak flops is calculated. I can see at least one place where an apparently innocent change could blow things out in the manner that Bill and others are seeing.
void COPROC_ATI::set_peak_flops() {
    double x = 0;
    if (attribs.numberOfSIMD) {
        x = attribs.numberOfSIMD * attribs.wavefrontSize * 5 * attribs.engineClock * 1.e6;
        // clock is in MHz
    } else if (opencl_prop.amd_simd_per_compute_unit) {

        // OpenCL w/ cl_amd_device_attribute_query extension
        // Per: https://www.khronos.org/registry/cl/extensions/amd/cl_amd_device_attribute_query.txt
        //
        // Single precision performance is calculated as two times the number of shaders multiplied by the base core clock speed.
        // Per: https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units
        //
        // clock is in MHz
        x = opencl_prop.max_compute_units * 
            opencl_prop.amd_simd_per_compute_unit * 
            opencl_prop.amd_simd_width *
            opencl_prop.amd_simd_instruction_width *
            2 *
            (opencl_prop.max_clock_frequency * 1.e6);

    } else if (opencl_prop.max_compute_units) {
        // OpenCL gives us only:
        // - max_compute_units
        //   (which I'll assume is the same as attribs.numberOfSIMD)
        // - max_clock_frequency (which I'll assume is the same as engineClock)
        // It doesn't give wavefrontSize, which can be 16/32/64.
        // So let's be conservative and use 16
        //
        x = opencl_prop.max_compute_units * 16 * 5 * opencl_prop.max_clock_frequency * 1e6;
    }
    peak_flops = (x>0)?x:5e10;
}

void COPROC_ATI::fake(double ram, double avail_ram, int n) {
    safe_strcpy(type, proc_type_name_xml(PROC_TYPE_AMD_GPU));
    safe_strcpy(version, "1.4.3");
    safe_strcpy(name, "foobar");
    count = n;
    available_ram = avail_ram;
    have_cal = true;
    memset(&attribs, 0, sizeof(attribs));
    memset(&info, 0, sizeof(info));
    attribs.localRAM = (int)(ram/MEGA);
    attribs.numberOfSIMD = 32;
    attribs.wavefrontSize = 32;
    attribs.engineClock = 50;
    for (int i=0; i<count; i++) {
        device_nums[i] = i;
    }
    set_peak_flops();
}

...and a few places that may well have been affected by later drivers.....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1976311 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1976326 - Posted: 21 Jan 2019, 21:08:43 UTC - in response to Message 1976311.  

I'd been looking through the source code for something like that and couldn't find it, just that it read and set peak_flops. I do remember that the science app calculates the flops separately on its own.
ID: 1976326 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1976330 - Posted: 21 Jan 2019, 21:24:41 UTC

I too have been plodding around the code, trying to find out if there is anywhere else those values are used. Nothing yet, and a glass of amber nectar beckons me.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1976330 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1976342 - Posted: 21 Jan 2019, 22:33:36 UTC

Did ATI make any kind of change between those families with respect to how they calculate the number of cores per SM on the card? Richard submitted a change in the code to accommodate the new Turing cards from Nvidia which changed the number of cores per SM. Pascal seems to use cores_per_proc = 128 but the the new Turing cards use cores_per_proc = 64. From the bug report:

The BOINC client decodes the 'Peak Flops' value for NVidia GPUs according to the architecture used in each succeeding card generation.

Did something similar happen with ATI and nobody noticed the change in how the drivers report the number of cores? And now BOINC decodes incorrectly the peak FLOPS for certain cards.
https://github.com/BOINC/boinc/issues/2706
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1976342 · Report as offensive
andivb

Send message
Joined: 9 Aug 09
Posts: 7
Credit: 14,510,909
RAC: 2
United States
Message 1989820 - Posted: 12 Apr 2019, 20:21:12 UTC
Last modified: 12 Apr 2019, 20:22:45 UTC

I recently installed an AMD RX 580 GPU and just noticed that Astropulse results are invalid. 2 bad results are ID 7584720987 and 7584330887. Was hoping for an answer/fix in this thread, but I'm not really sure I understand what the resolution to the problem is. Any suggestions?
http://setiathome.berkeley.edu/result.php?resultid=7584720987
http://setiathome.berkeley.edu/result.php?resultid=7584330887
ID: 1989820 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Astropulse error with AMD/ATI GPU?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.