Astropulse error with AMD/ATI GPU?

Author	Message
Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1975073 - Posted: 13 Jan 2019, 17:21:05 UTC - in response to Message 1974980. Last modified: 13 Jan 2019, 17:24:42 UTC I am running the optional video driver but I am not getting AP's for the GPU. On my CPU it runs "fine" (just a bit slow though). Might check the MB website for chipset drivers. Tom I just reviewed my "all tasks" and I have 4 AP tasks aimed at my GPU. All have a "zero" time remaining. So I expect they will behave just like you have described. Its odd. Tom A proud member of the OFA (Old Farts Association). ID: 1975073 ·

Bill Volunteer tester Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60	Message 1975438 - Posted: 16 Jan 2019, 16:55:39 UTC - in response to Message 1974959. Reset the project on that computer. Both of these should clear out any wrongly set performance values, personally I would try the "reset" first. I did a reset, and I have one AMD AP7 task in the queue with a 0:00 estimated completion time, so I'm assuming that will error out. I will try the remove option in a few days. Seti@home classic: 1,456 results, 1.613 years CPU time ID: 1975438 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 1975507 - Posted: 17 Jan 2019, 0:15:17 UTC APs ran fine on my old HD7870, but never did on the RX470 I replaced it with. Something in the RX's is different that Astropulse really doesn't like, so they err within a minute. ID: 1975507 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1975518 - Posted: 17 Jan 2019, 1:10:34 UTC - in response to Message 1975507. Did the AP task failures on RX cards ever get brought to Raistmer's attention? Just curious. Never have messed with ATI/AMD cards before. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1975518 ·

Bill Volunteer tester Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60	Message 1975522 - Posted: 17 Jan 2019, 1:34:14 UTC - in response to Message 1975518. Last modified: 17 Jan 2019, 1:37:13 UTC Did the AP task failures on RX cards ever get brought to Raistmer's attention? Just curious. Never have messed with ATI/AMD cards before. I guess I have never "filed a complaint" with anyone with why they are not working. I could see how the new Ryzen APUs have kinks that need to be worked out. Is Raistmer the guy for this? For the little bit of attention I pay to these boards I know he does a lot of development for the project (at least with special builds), but I never assumed he is the one to fix it. Edit: If it matters, when I was waiting for NNT to run its course, I had MilkyWay@Home running as my backup project, and all GPU tasks for that project ended in error. I could see this being an error with my setup, but since I haven't tweaked much of anything OOB I'm not sure what needs fixing. Seti@home classic: 1,456 results, 1.613 years CPU time ID: 1975522 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 1975646 - Posted: 17 Jan 2019, 21:42:02 UTC - in response to Message 1975507. I rechecked that. APs on my RX470 tend to run from start to finish, they'll just never validate as they'll always have zero spikes and a large percentage radar blanking. I did report that here. ID: 1975646 ·

Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0	Message 1975719 - Posted: 18 Jan 2019, 9:04:05 UTC - in response to Message 1975507. APs ran fine on my old HD7870, but never did on the RX470 I replaced it with. Something in the RX's is different that Astropulse really doesn't like, so they err within a minute. Although AP's also ran fine on my HD6950, the cursor didn't, it used to glitch (stick) something awful, if I needed to use the pc for anything serious, I had to suspend GPU work. The RX480 that replaced it kept giving occasional errors, so I stopped processing AP's on it. P. ID: 1975719 ·

Bill Volunteer tester Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60	Message 1976146 - Posted: 21 Jan 2019, 0:11:52 UTC - in response to Message 1975719. So I have been looking into this a little bit more, and I discovered a few things. Task 7340502156 has the exit status of: 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED The stderr output for this task is: <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> exceeded elapsed time limit 2.06 (18210521.14G/8839389.45G)</message> <stderr_txt> Running on device number: 0 Priority of worker thread raised successfully Priority of process adjusted successfully, below normal priority class used OpenCL platform detected: Advanced Micro Devices, Inc. BOINC assigns device 0 Info: BOINC provided OpenCL device ID used Used GPU device parameters are: Number of compute units: 8 Single buffer allocation size: 256MB Total device global memory: 3072MB max WG size: 256 local mem type: Real -unroll default value used: 8 -ffa_block default value used: 2048 -ffa_block_fetch default value used: 1024 Build features: Non-graphics BLANKIT OpenCL TWIN_FFA OCL_ZERO_COPY COMBINED_DECHIRP_KERNEL FFTW USE_INCREASED_PRECISION USE_SSE2 x86 CPUID: AMD Ryzen 3 2200G with Radeon Vega Graphics Cache: L1=64K L2=512K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 FMA3 SSE4.1 SSE4.2 AVX SSE4A AstroPulse v7 Windows x86 rev 2742, V7 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2 OpenCL version by Raistmer oclFFT fix for ATI GPUs by Urs Echternacht ffa threshold mods by Joe Segur SSE3 dechirping by JDWhale Combined dechirp kernel by Frizz Number of OpenCL platforms: 1 OpenCL Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Max compute units: 8 Max work group size: 256 Max clock frequency: 42949672Mhz Max memory allocation: 3221225472 Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 3221225472 Constant buffer size: 3221225472 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Queue properties: Out-of-Order: No Name: gfx902 Vendor: Advanced Micro Devices, Inc. Driver version: 2766.5 (PAL,HSAIL) Version: OpenCL 1.2 AMD-APP (2766.5) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event cl_amd_liquid_flash state.fold_buf_size_short=65536; state.fold_buf_size_long=262144 GPU device sync requested... ...GPU device synched Termination request detected or computations are finished. GPU device synched, exiting... </stderr_txt> In looking at the stderr for the MilkyWay@home GPU tasks that ended in error, they had a similar output of exceeding the time limit. I also noticed for MW@H that I have three warnings listed in the task. I'm not trying to get my MW@H problems solved here, but I can't help but think these are related somehow. So I don't know how this works, but what I suspect is happening is that I download a task, and for whatever reason, it is given an estimated completion time of 0:00. Because the time is 0 (or presumably really really small), the task cannot be completed within this time and errors out. So, why is the task assigned such a small estimated completion time in the first place? I don't know if that is something happening in the BOINC software, parameters my computer is set at for my hardware, or something in between. Seti@home classic: 1,456 results, 1.613 years CPU time ID: 1976146 ·

Ben Send message Joined: 15 Jun 99 Posts: 54 Credit: 60,003,756 RAC: 150	Message 1976172 - Posted: 21 Jan 2019, 3:42:58 UTC - in response to Message 1975719. APs ran fine on my old HD7870, but never did on the RX470 I replaced it with. Something in the RX's is different that Astropulse really doesn't like, so they err within a minute. Although AP's also ran fine on my HD6950, the cursor didn't, it used to glitch (stick) something awful, if I needed to use the pc for anything serious, I had to suspend GPU work. The RX480 that replaced it kept giving occasional errors, so I stopped processing AP's on it. P. Same here with an rx 570. ID: 1976172 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22190 Credit: 416,307,556 RAC: 380	Message 1976185 - Posted: 21 Jan 2019, 6:15:35 UTC I just found a clue - in the first part of the output file for your failed AP task I found this line: Device peak FLOPS 14,513,557.69 GFLOPS The comparable line for the GTX1080 on one of my computers is: Device peak FLOPS 8,875.52 GFLOPS Since this value is used to guess at the estimated run time and yours is 1600 times larger than that for a faster processor (it should be smaller) I think you can see the where the problem lies. I think I've found a solution - you need to edit the "coproc_info.xml" file, look for the section that has info about your GPU, then edit the line that starts with "<peak_flops>" and reduce the very big number that follows that by a factor of say 2000. Save the file, and restart BOINC. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1976185 ·

Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0	Message 1976202 - Posted: 21 Jan 2019, 9:42:42 UTC - in response to Message 1976185. I just found a clue - in the first part of the output file for your failed AP task I found this line: Device peak FLOPS 14,513,557.69 GFLOPS The comparable line for the GTX1080 on one of my computers is: Device peak FLOPS 8,875.52 GFLOPS Since this value is used to guess at the estimated run time and yours is 1600 times larger than that for a faster processor (it should be smaller) I think you can see the where the problem lies. I think I've found a solution - you need to edit the "coproc_info.xml" file, look for the section that has info about your GPU, then edit the line that starts with "<peak_flops>" and reduce the very big number that follows that by a factor of say 2000. Save the file, and restart BOINC. Not sure that would work, unless I'm reading my data wrong, my rx470's copro file shows peak flops as "<peak_flops>5990400000000.000000</peak_flops>" which seems less than your 1080. P. ID: 1976202 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1976205 - Posted: 21 Jan 2019, 10:07:14 UTC - in response to Message 1975522. I guess I have never "filed a complaint" with anyone with why they are not working. I could see how the new Ryzen APUs have kinks that need to be worked out. Is Raistmer the guy for this? For the little bit of attention I pay to these boards I know he does a lot of development for the project (at least with special builds), but I never assumed he is the one to fix it. Yes, Raistmer is the prime developer of all the OpenCL apps. That includes the AP app too. The other place that a complaint should be lodged is on the Questions and Answers, GPU Applications forum so that it is noticed by the developers. I see Jord already commented so he also could lodge a complaint with the developers. This eventually need to be logged into the BOINC/SETI github repository as a bug for the application. From my irregular perusal of all the logged issues, I have seen similar issues logged already about the failure of the platform to properly identify the processing power of gpus. I believe Richard Haselgrove just logged something very similar about the issue. https://github.com/BOINC/boinc/issues/2949 Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1976205 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22190 Credit: 416,307,556 RAC: 380	Message 1976214 - Posted: 21 Jan 2019, 11:37:20 UTC - in response to Message 1976202. Phil, Looking at the output from one of your valid results you have: Device peak FLOPS 5,990.40 GFLOPS compared with one from one of my GTX1080: Device peak FLOPS 8,875.52 GFLOPS An RX470 is a slower GPU (for SETI) than a GTX1080, so I would expect the peak-flops (effectively the processing rate) to be lower for the RX470. Bill's figure is "stupidly high", he might solve the zero expected runtime by just knocking three zeros off the value (divide by a thousand). Doing so would at least be a guide as to this being the right place to look. There is of course a question - Why did the peak-flops value go so high in the first place? I have a few ideas, some are "easy" to rule out, others are going to need some digging. A few of the simplest are: - did this happen very soon after installing the new GPU? - are you heavily re-scheduling work between processors? - have you just stopped using the I-GPU? After that we are heading into the murky depths of BOINC, and that could well be a very messy can of worms. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1976214 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 1976305 - Posted: 21 Jan 2019, 18:38:45 UTC - in response to Message 1976285. So, it's perfectly clear that the GFLOP calculation both from Boinc, and from the apps are totally and completely F'ed up, and nothing to count on at all. :-) BOINC doesn't calculate the flops value, it reads it from the CUDA and OpenCL files provided by the drivers. ID: 1976305 ·

Bill Volunteer tester Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60	Message 1976307 - Posted: 21 Jan 2019, 19:15:39 UTC - in response to Message 1976214. - did this happen very soon after installing the new GPU? Technically, yes. The computer was built from scratch and has been running for about a month. I didn't get any AP tasks until a few weeks ago. - are you heavily re-scheduling work between processors? No, I am just letting it run its course. - have you just stopped using the I-GPU? No, I am using it for s@h v8 all the time with no problems, both CPU and GPU tasks. I have either been suspending any AP v7 GPU tasks or updating my app config to not allow AP v7 GPU tasks to download until I have a better idea of what can be done to fix this. After that we are heading into the murky depths of BOINC, and that could well be a very messy can of worms. And that is above my level of expertise. I'd love to help, but I don't know that I have the time or experience to try to solve this myself in a reasonable amount of time. Seti@home classic: 1,456 results, 1.613 years CPU time ID: 1976307 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22190 Credit: 416,307,556 RAC: 380	Message 1976311 - Posted: 21 Jan 2019, 19:34:53 UTC ...actually BOINC does calculate the value for peak flops, based on data collected from the driver(s). This triggers a couple of thoughts, first have AMD in their wisdom changed the API (RPC) for getting such data? and second, is the data coming from the driver "correct"? If either is true then it is a "bit of a problem" that may cause a lot of head scratching. Here's an extract from coproc.cpp (for release 7.7.8) where the peak flops is calculated. I can see at least one place where an apparently innocent change could blow things out in the manner that Bill and others are seeing. void COPROC_ATI::set_peak_flops() { double x = 0; if (attribs.numberOfSIMD) { x = attribs.numberOfSIMD * attribs.wavefrontSize * 5 * attribs.engineClock * 1.e6; // clock is in MHz } else if (opencl_prop.amd_simd_per_compute_unit) { // OpenCL w/ cl_amd_device_attribute_query extension // Per: https://www.khronos.org/registry/cl/extensions/amd/cl_amd_device_attribute_query.txt // // Single precision performance is calculated as two times the number of shaders multiplied by the base core clock speed. // Per: https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units // // clock is in MHz x = opencl_prop.max_compute_units * opencl_prop.amd_simd_per_compute_unit * opencl_prop.amd_simd_width * opencl_prop.amd_simd_instruction_width * 2 * (opencl_prop.max_clock_frequency * 1.e6); } else if (opencl_prop.max_compute_units) { // OpenCL gives us only: // - max_compute_units // (which I'll assume is the same as attribs.numberOfSIMD) // - max_clock_frequency (which I'll assume is the same as engineClock) // It doesn't give wavefrontSize, which can be 16/32/64. // So let's be conservative and use 16 // x = opencl_prop.max_compute_units * 16 * 5 * opencl_prop.max_clock_frequency * 1e6; } peak_flops = (x>0)?x:5e10; } void COPROC_ATI::fake(double ram, double avail_ram, int n) { safe_strcpy(type, proc_type_name_xml(PROC_TYPE_AMD_GPU)); safe_strcpy(version, "1.4.3"); safe_strcpy(name, "foobar"); count = n; available_ram = avail_ram; have_cal = true; memset(&attribs, 0, sizeof(attribs)); memset(&info, 0, sizeof(info)); attribs.localRAM = (int)(ram/MEGA); attribs.numberOfSIMD = 32; attribs.wavefrontSize = 32; attribs.engineClock = 50; for (int i=0; i<count; i++) { device_nums[i] = i; } set_peak_flops(); } ...and a few places that may well have been affected by later drivers..... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1976311 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 1976326 - Posted: 21 Jan 2019, 21:08:43 UTC - in response to Message 1976311. I'd been looking through the source code for something like that and couldn't find it, just that it read and set peak_flops. I do remember that the science app calculates the flops separately on its own. ID: 1976326 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22190 Credit: 416,307,556 RAC: 380	Message 1976330 - Posted: 21 Jan 2019, 21:24:41 UTC I too have been plodding around the code, trying to find out if there is anywhere else those values are used. Nothing yet, and a glass of amber nectar beckons me. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1976330 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1976342 - Posted: 21 Jan 2019, 22:33:36 UTC Did ATI make any kind of change between those families with respect to how they calculate the number of cores per SM on the card? Richard submitted a change in the code to accommodate the new Turing cards from Nvidia which changed the number of cores per SM. Pascal seems to use cores_per_proc = 128 but the the new Turing cards use cores_per_proc = 64. From the bug report: The BOINC client decodes the 'Peak Flops' value for NVidia GPUs according to the architecture used in each succeeding card generation. Did something similar happen with ATI and nobody noticed the change in how the drivers report the number of cores? And now BOINC decodes incorrectly the peak FLOPS for certain cards. https://github.com/BOINC/boinc/issues/2706 Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1976342 ·

andivb Send message Joined: 9 Aug 09 Posts: 7 Credit: 14,510,909 RAC: 2	Message 1989820 - Posted: 12 Apr 2019, 20:21:12 UTC Last modified: 12 Apr 2019, 20:22:45 UTC I recently installed an AMD RX 580 GPU and just noticed that Astropulse results are invalid. 2 bad results are ID 7584720987 and 7584330887. Was hoping for an answer/fix in this thread, but I'm not really sure I understand what the resolution to the problem is. Any suggestions? http://setiathome.berkeley.edu/result.php?resultid=7584720987 http://setiathome.berkeley.edu/result.php?resultid=7584330887 ID: 1989820 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.