I am getting a lot of gpu tasks with zero (0) expected processing times.

Message boards : Number crunching : I am getting a lot of gpu tasks with zero (0) expected processing times.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next

AuthorMessage
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1976851 - Posted: 25 Jan 2019, 3:11:54 UTC - in response to Message 1976842.  

You are not muddying the waters any more than they already are Bill. The code is a definite patchwork by DA over many years. It is very fragmented code and very hard to follow. Everyone that looks at the code says the same thing.

Have you tried reverting to a much older driver for the APU? I'm fairly sure you haven't since you probably need the latest to cover both the APU and the Vega. So whatever changes in the ATI/AMD driver API that is fouling up the BOINC client code is probably going to be present. Think the solution has to come from new client code and nothing is wrong with the drivers.

This really needs to be brought up as a bug for the BOINC developers.
I haven't tried an older driver, but I don't think that is possible now. I needed to update to 18.10.20 in order to use the current BIOS version for my motherboard. I did that before installing BOINC, so I didn't get to see how it performed prior to that.

Regardless, MB works fine, so I would agree that it is a code issue and not a driver issue.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1976851 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22202
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1976868 - Posted: 25 Jan 2019, 5:53:57 UTC
Last modified: 25 Jan 2019, 5:57:10 UTC

I assume you've suspended the AP.
A quick look shows that the peak_flops is still way too high. Since that is written very early in the life of the task there might be some more digging needed to a) where it is written and b) exactly where the data used in its calculation comes from.

What is even more peculiar is that Tom had the same problem on MBs and doing the same sort of patch (or at least I think he did) and it worked for him (I must check back and see if it was a temporary fix or if it's stuck).

Checked on Tom's computer and it's still dropping loads and has a crazy peak_flops.

Since Tom & you are showing different values that rules out one thought I had, but didn't mention earlier, if would have really confused everyone.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1976868 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1976910 - Posted: 25 Jan 2019, 13:13:32 UTC - in response to Message 1976868.  

I assume you've suspended the AP.
A quick look shows that the peak_flops is still way too high. Since that is written very early in the life of the task there might be some more digging needed to a) where it is written and b) exactly where the data used in its calculation comes from.

What is even more peculiar is that Tom had the same problem on MBs and doing the same sort of patch (or at least I think he did) and it worked for him (I must check back and see if it was a temporary fix or if it's stuck).

Checked on Tom's computer and it's still dropping loads and has a crazy peak_flops.

Since Tom & you are showing different values that rules out one thought I had, but didn't mention earlier, if would have really confused everyone.
Just to be sure, when you say "doing the same sort of patch", what do you mean exactly? I have not updated anything, I just restarted BOINC. I think I assumed from your previous posts that anything you changed would have been handled external to my computer.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1976910 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22202
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1976912 - Posted: 25 Jan 2019, 13:17:19 UTC
Last modified: 25 Jan 2019, 13:17:31 UTC

Setting peak_flops to something a bit more sensible.....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1976912 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1976927 - Posted: 25 Jan 2019, 14:22:56 UTC - in response to Message 1976912.  

Setting peak_flops to something a bit more sensible.....
Okay, now I finally see what you're saying. I'll edit peak_flops in the coproc_info.xml file tonight when I get a chance and see what happens. I really appreciate the help, Rob. Sorry for being slow with picking up what you're dropping. I think as I get more familiar with BOINC I'll be less of a pain.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1976927 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 1977089 - Posted: 26 Jan 2019, 10:41:56 UTC

when i'm looking on my app details , i saw some averages Gflops per apps , can we use it to estimate the flop / gpu used ?

theorically, HD7750 have ~1500 gflops, Boinc give ~2048 Gflops but the autotuned AP app give ~500Gflops ^^
ID: 1977089 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1977129 - Posted: 26 Jan 2019, 16:12:22 UTC - in response to Message 1976927.  

Setting peak_flops to something a bit more sensible.....
Okay, now I finally see what you're saying. I'll edit peak_flops in the coproc_info.xml file tonight when I get a chance and see what happens. I really appreciate the help, Rob. Sorry for being slow with picking up what you're dropping. I think as I get more familiar with BOINC I'll be less of a pain.


I did edit that and it doesn't appear to have made a difference :( I still get a lot of 0 expected processing time gpu tasks.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1977129 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1977135 - Posted: 26 Jan 2019, 16:28:22 UTC - in response to Message 1977129.  
Last modified: 26 Jan 2019, 17:08:41 UTC

If the change to flops isn't being picked up in coproc_info.xml, then you will have to make the change in the client_state.xml file for the Astropulse entry in app_version entry for the gpu app.

[Edit] Thought a more permanent place to put the flops statement would be into the gpu section of the app_info.xml file. That one won't get changed by the project at every connection and will just be read.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1977135 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22202
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1977136 - Posted: 26 Jan 2019, 16:28:48 UTC

Thanks for the feedback Tom, I had a feeling that would be the case :-(
(Looking at the code again I can see that these values are re-set on restart, and are over-written regularly during operation PROVIDED tasks complete in a "timely" manner and without error. Keith's description of how convoluted the code is pretty accurate, there are so many patches and work-arounds that it is very difficult to actually see the flow through - and I do know that there are a lot of "required" features that are there for very good reasons.)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1977136 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1977188 - Posted: 26 Jan 2019, 20:03:57 UTC - in response to Message 1977135.  
Last modified: 26 Jan 2019, 20:04:33 UTC

If the change to flops isn't being picked up in coproc_info.xml, then you will have to make the change in the client_state.xml file for the Astropulse entry in app_version entry for the gpu app.

[Edit] Thought a more permanent place to put the flops statement would be into the gpu section of the app_info.xml file. That one won't get changed by the project at every connection and will just be read.


As you know I am fumbled fingered on stuff I don't use regularly. Could you show me an example for the app_config.xml file?

Thanks,
Tom
A proud member of the OFA (Old Farts Association).
ID: 1977188 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1977207 - Posted: 26 Jan 2019, 21:09:58 UTC - in response to Message 1977188.  

Not the app_config file but the app_info file. You need to put your halved flops value into the flops field in the app_version section of the ATI gpu section.

Since I don't know the structure of the ATI cards app_info I don't know the specifics. This is the relevant document explaining where the flops statement goes.

https://boinc.berkeley.edu/wiki/Anonymous_platform
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1977207 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1977234 - Posted: 26 Jan 2019, 23:30:51 UTC

I would just alter the flops value for the meantime. You will have to wait until DA fixes the issue but at least it has his attention now. He has opened a bug on the client.

Incorrect peak FLOPS for AMD GPU

It can take quite a while for the code to be fixed and then a new BOINC released with the fix. I would not wait and just modify the flops for now.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1977234 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1977258 - Posted: 27 Jan 2019, 1:54:02 UTC - in response to Message 1977207.  

Not the app_config file but the app_info file. You need to put your halved flops value into the flops field in the app_version section of the ATI gpu section.

Since I don't know the structure of the ATI cards app_info I don't know the specifics. This is the relevant document explaining where the flops statement goes.

https://boinc.berkeley.edu/wiki/Anonymous_platform
I don't even see that I have an app_info file. I'm okay with that for now, though. I feel like I'm getting past my skis by editing flops values in files I am not familiar with in the least. I'm just going to exclude using the GPU for all AP tasks for now. This has inspired me to start understanding the files for how BOINC works, and to maybe get back into programming! Now if I could only find the time to dedicate to it...

Thanks again to everyone who has helped with this problem!
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1977258 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1977263 - Posted: 27 Jan 2019, 2:17:25 UTC - in response to Message 1977258.  

Sorry, just made an assumption you were running anonymous platform. I have for so long and all my contacts and friends do also, I forget sometimes that people just run stock from the project mostly.

You are out of luck till BOINC is fixed I guess. Let's hope that DA considers that bug urgent and provides new code and and new release in short order.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1977263 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1977276 - Posted: 27 Jan 2019, 3:48:00 UTC - in response to Message 1977263.  

No worries, I might run anonymous applications eventually once I get a better handle. I can be patient for now.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1977276 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1977280 - Posted: 27 Jan 2019, 4:10:23 UTC - in response to Message 1977276.  

Yes, a good first step on that journey would be the optimized Lunatics Installer and applications.

http://mikesworld.eu/download.html

That will get you an anonymous platform and the app_info you would need for the temp fix.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1977280 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1978423 - Posted: 3 Feb 2019, 11:19:35 UTC - in response to Message 1977234.  

I would just alter the flops value for the meantime. You will have to wait until DA fixes the issue but at least it has his attention now. He has opened a bug on the client.

Incorrect peak FLOPS for AMD GPU

It can take quite a while for the code to be fixed and then a new BOINC released with the fix. I would not wait and just modify the flops for now.
David has been working with a SU user, and this morning put up a Windows test version accessible from

https://ci.appveyor.com/project/BOINC/boinc/builds/22082918/artifacts
(details in pull request #3001)

Could somebody observing this problem please try out the fix? You simply need to download the win-client archive and replace boinc.exe in you BOINC Programs directory.

Possible extra tools needed, if you don't have them already:
A 7z archive program, like https://www.7-zip.org/
The Microsoft Visual C++ Redistributable Packages for Visual Studio 2013, from https://www.microsoft.com/en-US/download/details.aspx?id=40784 (or your local Microsoft site)
ID: 1978423 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1978425 - Posted: 3 Feb 2019, 11:41:24 UTC

@ Rob Smith,

It is certainly the case that zero processing time estimates are caused by faulty GFlops Peak calculations in boinc.exe. In the case which David has been investigating, the device (an APU) is reported as having 1127 GFLOPS peak when running at normal speeds, but 43980464 GFLOPS peak when DOWN-clocked. Huh?

David has interpreted this as an arithmetic overflow in https://github.com/BOINC/boinc/blob/master/client/gpu_opencl.cpp#L609: I'm suspicious. It would be interesting to know whether the same fault appears for CAL flops, or whether it's just OpenCL.

My own initial expectation in #2988 was that we would need to be looking at https://github.com/BOINC/boinc/blob/master/lib/coproc.cpp#L840: that's where the driver API is queried, triggering in turn a driver query of the underlying hardware.

Thoughts?
ID: 1978425 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1978434 - Posted: 3 Feb 2019, 12:45:08 UTC

OK, we've got a reply back from the SU user:

03/02/2019 13:21:40 | | <![CDATA[cc_config.xml not found - using defaults]]>
03/02/2019 13:21:40 | | <![CDATA[Starting BOINC client version 7.15.0 for windows_x86_64]]>
03/02/2019 13:21:40 | | <![CDATA[This a development version of BOINC and may not function properly]]>
03/02/2019 13:21:40 | | <![CDATA[log flags: file_xfer, sched_ops, task]]>
03/02/2019 13:21:40 | | <![CDATA[Libraries: libcurl/7.39.0 OpenSSL/1.0.1j zlib/1.2.8]]>
03/02/2019 13:21:40 | | <![CDATA[Data directory: D:\BOINC]]>
03/02/2019 13:21:40 | | <![CDATA[OpenCL: AMD/ATI GPU 0: AMD Radeon(TM) Vega 8 Graphics (driver version 2766.5 (PAL,HSAIL), device version OpenCL 2.0 AMD-APP (2766.5), 6567MB, 6567MB available, 43980464 GFLOPS peak)]]>
03/02/2019 13:21:40 | | <![CDATA[Processor: 4 AuthenticAMD AMD Ryzen 3 2200G with Radeon Vega Graphics [Family 23 Model 17 Stepping 0]]]>
03/02/2019 13:21:40 | | <![CDATA[Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 svm sse4a osvw skinit wdt tce topx page1gb rdtscp fsgsbase bmi1 smep]]>
03/02/2019 13:21:40 | | <![CDATA[OS: Microsoft Windows 10: Professional x64 Edition, (10.00.17134.00)]]>
So, David's fix didn't fix anything - don't bother testing any further. Since it was in a

            } else {
                //////////// OTHER GPU OR ACCELERATOR //////////////
                // Put each coprocessor instance into a separate other_opencls element
section, I'm not wildly surprised.

But if anyone else can work their way through the spaghetti code, please pitch in.
ID: 1978434 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1978440 - Posted: 3 Feb 2019, 13:36:24 UTC

User has supplied their coproc_info:

   <ati_opencl>
      <name>AMD Radeon(TM) Vega 8 Graphics</name>
      <vendor>Advanced Micro Devices, Inc.</vendor>
      <vendor_id>4098</vendor_id>
      <available>1</available>
      <half_fp_config>0</half_fp_config>
      <single_fp_config>190</single_fp_config>
      <double_fp_config>63</double_fp_config>
      <endian_little>1</endian_little>
      <execution_capabilities>1</execution_capabilities>
      <extensions>cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash cl_amd_planar_yuv</extensions>
      <global_mem_size>6885523456</global_mem_size>
      <local_mem_size>32768</local_mem_size>
      <max_clock_frequency>42949672</max_clock_frequency>
      <max_compute_units>8</max_compute_units>
      <nv_compute_capability_major>0</nv_compute_capability_major>
      <nv_compute_capability_minor>0</nv_compute_capability_minor>
      <amd_simd_per_compute_unit>4</amd_simd_per_compute_unit>
      <amd_simd_width>16</amd_simd_width>
      <amd_simd_instruction_width>1</amd_simd_instruction_width>
      <opencl_platform_version>OpenCL 2.1 AMD-APP (2766.5)</opencl_platform_version>
      <opencl_device_version>OpenCL 2.0 AMD-APP (2766.5)</opencl_device_version>
      <opencl_driver_version>2766.5 (PAL,HSAIL)</opencl_driver_version>
      <device_num>0</device_num>
      <peak_flops>43980464128000000.000000</peak_flops>
      <opencl_available_ram>6885523456.000000</opencl_available_ram>
      <opencl_device_index>0</opencl_device_index>
      <warn_bad_cuda>0</warn_bad_cuda>
   </ati_opencl>
It shows the same fault in <max_clock_frequency>, which may be helpful.
ID: 1978440 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next

Message boards : Number crunching : I am getting a lot of gpu tasks with zero (0) expected processing times.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.