Message boards :
Number crunching :
I am getting a lot of gpu tasks with zero (0) expected processing times.
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14656 Credit: 200,643,578 RAC: 874 |
OK, I'll try that on some of my pendings. It might be helpful if we both make a note of any affected HostID numbers we find. |
rob smith Send message Joined: 7 Mar 03 Posts: 22273 Credit: 416,307,556 RAC: 380 |
Will do Off topic warning: While looking I spotted something "rather strange" on one of my tasks - could you take a quick look at https://setiathome.berkeley.edu/result.php?resultid=7403938376 The peak_flops just doesn't make sense Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14656 Credit: 200,643,578 RAC: 874 |
Off topic warning:It seems to be consistent on all CPU tasks - including my Intel Valid tasks for computer 5828732 Is 'Peak Flops' even defined for CPUs? I think I'd put that one down to David's sloppy web site designing. Let me look at the web code. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14656 Credit: 200,643,578 RAC: 874 |
Will doBack on topic. I have 1,100-and-some pendings, and I must have spot-checked maybe 20% of them. A few validation drop-outs on 5 Feb, an HD5 app sent to an HD4 device on Mac (twice - may be the same host), and a couple of other random crap-outs - but the vast majority, at least 95%, are simply waiting for a reply. Any reply. That's pretty dispiriting. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14656 Credit: 200,643,578 RAC: 874 |
Let me look at the web code.https://github.com/BOINC/boinc/blob/master/html/inc/result.inc#L707 https://github.com/BOINC/boinc/blob/master/checkin_notes_2010#L4395 - getting interesting. https://github.com/BOINC/boinc/blob/master/sched/credit.cpp#L636 - it all comes down to CreditNew in the end. Lunchtime. |
rob smith Send message Joined: 7 Mar 03 Posts: 22273 Credit: 416,307,556 RAC: 380 |
I had a look at the first 100 pendings on one of computers - and only found 4 ATI/AMD GPU tasks, and all were OK - as you say pretty dispiriting. (Painful thought, have all the ATI/AMD GPUs just stopped doing SETI due to a high error rate, because I'm sure there must be more out there in the wild than we are seeing in our random samples) Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Bill Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60 |
Note the "1000 GFLOPS peak" - that's my patch. Maybe it only comes into play when you download new work with the patch in place. I have to run and don't have time to play with this much, but I did notice the 1000 GFLOPS peak in the event log when I restarted last night. Seti@home classic: 1,456 results, 1.613 years CPU time |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14656 Credit: 200,643,578 RAC: 874 |
I have to run and don't have time to play with this much, but I did notice the 1000 GFLOPS peak in the event log when I restarted last night.OK, thanks. Let us know if you find any AP work downloaded while you're out, please. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14656 Credit: 200,643,578 RAC: 874 |
(Painful thought, have all the ATI/AMD GPUs just stopped doing SETI due to a high error rate, because I'm sure there must be more out there in the wild than we are seeing in our random samples)The MB opencl_ati5_SoG_nocal application is still showing healthy flops - and that's the one which Bill is still using successfully. This problem will only affect new users (probably joiners in the last month) who didn't get 11 completions under their belts before the driver broke. It'll build slowly from there. We would only get a mass drop-out if a new application was released - say, for the Parkes data... |
Bill Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60 |
OK, thanks. Let us know if you find any AP work downloaded while you're out, please.Nope, none downloaded overnight. This problem will only affect new users (probably joiners in the last month) who didn't get 11 completions under their belts before the driver broke. It'll build slowly from there.I built my Ryzen computer in late December. It was crunching SETI right away, but I don't think I received AP tasks until a few weeks or a month later. AP7 never worked for the GPU. Seti@home classic: 1,456 results, 1.613 years CPU time |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14656 Credit: 200,643,578 RAC: 874 |
AP7 never worked for the GPU.Yes, I saw the 23 December attach date. It's odd that you say AP never worked, because of those two 'completed' tasks for "AstroPulse v7 7.09 windows_intelx86 (opencl_ati_100)" showing in your application details. They must have slipped through while you weren't looking ;-) If you have a moment to spare sometime, could you please look in your BOINC data folder (root level, not a subfolder) called job_log_setiathome.berkeley.edu.txt As the name suggests, it's plain text and compresses nicely with ZIP or 7Z. If you could email that to me, please, at initial dot surname at btinternet dot com - it's a little hard to decipher, but it should give us a clue when those two tasks were processed, and hence a time for when the driver was last working. |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
I have another 2400G on the way. A proud member of the OFA (Old Farts Association). |
rob smith Send message Joined: 7 Mar 03 Posts: 22273 Credit: 416,307,556 RAC: 380 |
Richard - one for your collection (and the first openCL 100 task I found) https://setiathome.berkeley.edu/result.php?resultid=7388995126 Name ap_31ja19aa_B6_P1_00388_20190201_26381.wu_1 Workunit 3335412043 Created 1 Feb 2019, 14:49:15 UTC Sent 1 Feb 2019, 14:49:19 UTC Report deadline 26 Feb 2019, 14:49:19 UTC Received --- Server state In progress Outcome --- Client state New Exit status 0 (0x00000000) Computer ID 8561994 Run time CPU time Validate state Initial Credit 0.00 Device peak FLOPS 3,551.53 GFLOPS Application version AstroPulse v7 v7.09 (opencl_ati_100) windows_intelx86 That computer has a load of opencl_ati5_cat132 tasks, but they appear to have "sensible" peak_flops values. Edit - I've just gone through ~560 tasks, and only found 1 "openCL100" task, the one above Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14656 Credit: 200,643,578 RAC: 874 |
Richard - one for your collection (and the first openCL 100 task I found)Preserving Host 8561994 for the collection, but for the time being that speed looks OK. Let's see if he updates his driver... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14656 Credit: 200,643,578 RAC: 874 |
OK, conference call over: David was there and heard about the potential confusion and problematic outcomes. He's committed to going away and writing a more comprehensive patch, taking into account some extra comments from Juha. It would be a huge help if the people in this thread could be on standby to repeat their testing with the next patch. |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
Link to AMD's bug report form is here, Report Issues With the Latest Driver Release. While the problem we are interested in is BOINC reporting nutty GFLOPS it's not useful to AMD. They'd need to go through BOINC's code to see where it gets the nutty value from. Better report that clinfo (the one supplied with AMD's drivers, if possible) reports nutty numbers, listing which of the numbers are nutty. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14656 Credit: 200,643,578 RAC: 874 |
I got a clinfo output from the Science United user (clinfo from the BOINC /dl directory), and this was the faulty line: Max clock frequency: 42949672MhzI'll look at the bug report form in the morning. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14656 Credit: 200,643,578 RAC: 874 |
David has been busy already. It might be worth checking out the win-client from https://ci.appveyor.com/project/BOINC/boinc/builds/22209707/artifacts |
Bill Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60 |
I got a clinfo output from the Science United user (clinfo from the BOINC /dl directory), and this was the faulty line: I knew that number looked familiar. This is from the stderr from this task: https://setiathome.berkeley.edu/result.php?resultid=7403089694 Max clock frequency: 42949672Mhz I had seen that frequency before, and I don't know why but I think I just assumed it was a high value set as a high limit. Are you saying this should be more in line with what the actual processor's frequency should be? PS - check your email for the log. Edit: I have this frequency listed for successful MB GPU tasks as well, so now I'm really confused. I do appreciate the effort into looking at this. I'm sure we won't have any problems once people start using the new Vega VII cards ;) Seti@home classic: 1,456 results, 1.613 years CPU time |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14656 Credit: 200,643,578 RAC: 874 |
PS - check your email for the log.Email received - many thanks. It's showed me exactly what I wanted to see. You've processed two AP tasks which had initial runtime estimates of 7807 seconds - and they actually ran for 7883 seconds and 8012 seconds. They were the very first AP tasks you processed on that new machine, so the estimate was pretty damn good. I'm interpreting those two as the two completed tasks run on the GPU component of your Ryzen. Processing completed on Saturday, 29 December 2018 19:40:43 and Saturday, 29 December 2018 22:40:36 respectively (times in UTC). Most of your other tasks have initial estimates between 13024 seconds and 280706 seconds (!). I think all these will be tasks assigned to run on the CPU, as the scheduler tests out the various application versions to see which works best. We don't need to worry about those for this investigation. But the flies in the ointment are the three tasks with an initial estimate of 0.2 seconds, each of which ran for 2 seconds. (The log doesn't record the tasks you aborted). The first two of these short tasks ran in quick succession, one after the other, at Thursday, 3 January 2019 03:15:36 (UTC again). I'm guessing that gives us a much closer timeframe for the release of that faulty driver - which is exactly what I was looking for. Perfect! |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.