I am getting a lot of gpu tasks with zero (0) expected processing times.

Message boards : Number crunching : I am getting a lot of gpu tasks with zero (0) expected processing times.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1979078 - Posted: 7 Feb 2019, 11:22:07 UTC - in response to Message 1979077.  

OK, I'll try that on some of my pendings. It might be helpful if we both make a note of any affected HostID numbers we find.
ID: 1979078 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22273
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1979080 - Posted: 7 Feb 2019, 11:41:20 UTC

Will do


Off topic warning:
While looking I spotted something "rather strange" on one of my tasks - could you take a quick look at https://setiathome.berkeley.edu/result.php?resultid=7403938376
The peak_flops just doesn't make sense
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1979080 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1979082 - Posted: 7 Feb 2019, 11:55:01 UTC - in response to Message 1979080.  

Off topic warning:
While looking I spotted something "rather strange" on one of my tasks - could you take a quick look at https://setiathome.berkeley.edu/result.php?resultid=7403938376
The peak_flops just doesn't make sense
It seems to be consistent on all CPU tasks - including my Intel Valid tasks for computer 5828732

Is 'Peak Flops' even defined for CPUs? I think I'd put that one down to David's sloppy web site designing. Let me look at the web code.
ID: 1979082 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1979084 - Posted: 7 Feb 2019, 11:59:54 UTC - in response to Message 1979080.  

Will do
Back on topic. I have 1,100-and-some pendings, and I must have spot-checked maybe 20% of them.

A few validation drop-outs on 5 Feb, an HD5 app sent to an HD4 device on Mac (twice - may be the same host), and a couple of other random crap-outs - but the vast majority, at least 95%, are simply waiting for a reply. Any reply.

That's pretty dispiriting.
ID: 1979084 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1979085 - Posted: 7 Feb 2019, 12:04:35 UTC - in response to Message 1979082.  
Last modified: 7 Feb 2019, 12:17:42 UTC

ID: 1979085 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22273
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1979086 - Posted: 7 Feb 2019, 12:06:09 UTC

I had a look at the first 100 pendings on one of computers - and only found 4 ATI/AMD GPU tasks, and all were OK - as you say pretty dispiriting.

(Painful thought, have all the ATI/AMD GPUs just stopped doing SETI due to a high error rate, because I'm sure there must be more out there in the wild than we are seeing in our random samples)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1979086 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1979087 - Posted: 7 Feb 2019, 12:12:16 UTC - in response to Message 1979072.  
Last modified: 7 Feb 2019, 12:12:30 UTC

Note the "1000 GFLOPS peak" - that's my patch. Maybe it only comes into play when you download new work with the patch in place.

I have to run and don't have time to play with this much, but I did notice the 1000 GFLOPS peak in the event log when I restarted last night.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1979087 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1979088 - Posted: 7 Feb 2019, 12:19:13 UTC - in response to Message 1979087.  

I have to run and don't have time to play with this much, but I did notice the 1000 GFLOPS peak in the event log when I restarted last night.
OK, thanks. Let us know if you find any AP work downloaded while you're out, please.
ID: 1979088 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1979089 - Posted: 7 Feb 2019, 12:35:11 UTC - in response to Message 1979086.  

(Painful thought, have all the ATI/AMD GPUs just stopped doing SETI due to a high error rate, because I'm sure there must be more out there in the wild than we are seeing in our random samples)
The MB opencl_ati5_SoG_nocal application is still showing healthy flops - and that's the one which Bill is still using successfully.

This problem will only affect new users (probably joiners in the last month) who didn't get 11 completions under their belts before the driver broke. It'll build slowly from there.

We would only get a mass drop-out if a new application was released - say, for the Parkes data...
ID: 1979089 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1979091 - Posted: 7 Feb 2019, 13:15:51 UTC - in response to Message 1979088.  

OK, thanks. Let us know if you find any AP work downloaded while you're out, please.
Nope, none downloaded overnight.
This problem will only affect new users (probably joiners in the last month) who didn't get 11 completions under their belts before the driver broke. It'll build slowly from there.
I built my Ryzen computer in late December. It was crunching SETI right away, but I don't think I received AP tasks until a few weeks or a month later. AP7 never worked for the GPU.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1979091 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1979098 - Posted: 7 Feb 2019, 15:05:03 UTC - in response to Message 1979091.  

AP7 never worked for the GPU.
Yes, I saw the 23 December attach date.

It's odd that you say AP never worked, because of those two 'completed' tasks for "AstroPulse v7 7.09 windows_intelx86 (opencl_ati_100)" showing in your application details. They must have slipped through while you weren't looking ;-)

If you have a moment to spare sometime, could you please look in your BOINC data folder (root level, not a subfolder) called

job_log_setiathome.berkeley.edu.txt

As the name suggests, it's plain text and compresses nicely with ZIP or 7Z. If you could email that to me, please, at initial dot surname at btinternet dot com - it's a little hard to decipher, but it should give us a clue when those two tasks were processed, and hence a time for when the driver was last working.
ID: 1979098 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1979116 - Posted: 7 Feb 2019, 16:32:38 UTC

I have another 2400G on the way.
A proud member of the OFA (Old Farts Association).
ID: 1979116 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22273
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1979147 - Posted: 7 Feb 2019, 18:17:01 UTC
Last modified: 7 Feb 2019, 18:51:31 UTC

Richard - one for your collection (and the first openCL 100 task I found)
https://setiathome.berkeley.edu/result.php?resultid=7388995126


Name 	ap_31ja19aa_B6_P1_00388_20190201_26381.wu_1
Workunit 	3335412043
Created 	1 Feb 2019, 14:49:15 UTC
Sent 	1 Feb 2019, 14:49:19 UTC
Report deadline 	26 Feb 2019, 14:49:19 UTC
Received 	---
Server state 	In progress
Outcome 	---
Client state 	New
Exit status 	0 (0x00000000)
Computer ID 	8561994
Run time 	
CPU time 	
Validate state 	Initial
Credit 	0.00
Device peak FLOPS 	3,551.53 GFLOPS
Application version 	AstroPulse v7 v7.09 (opencl_ati_100) windows_intelx86

That computer has a load of opencl_ati5_cat132 tasks, but they appear to have "sensible" peak_flops values.

Edit - I've just gone through ~560 tasks, and only found 1 "openCL100" task, the one above
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1979147 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1979152 - Posted: 7 Feb 2019, 18:27:30 UTC - in response to Message 1979147.  

Richard - one for your collection (and the first openCL 100 task I found)
Preserving Host 8561994 for the collection, but for the time being that speed looks OK. Let's see if he updates his driver...
ID: 1979152 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1979171 - Posted: 7 Feb 2019, 20:20:47 UTC

OK, conference call over: David was there and heard about the potential confusion and problematic outcomes. He's committed to going away and writing a more comprehensive patch, taking into account some extra comments from Juha.

It would be a huge help if the people in this thread could be on standby to repeat their testing with the next patch.
ID: 1979171 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1979173 - Posted: 7 Feb 2019, 20:25:33 UTC

Link to AMD's bug report form is here, Report Issues With the Latest Driver Release.

While the problem we are interested in is BOINC reporting nutty GFLOPS it's not useful to AMD. They'd need to go through BOINC's code to see where it gets the nutty value from. Better report that clinfo (the one supplied with AMD's drivers, if possible) reports nutty numbers, listing which of the numbers are nutty.
ID: 1979173 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1979193 - Posted: 7 Feb 2019, 22:46:51 UTC - in response to Message 1979173.  

I got a clinfo output from the Science United user (clinfo from the BOINC /dl directory), and this was the faulty line:

  Max clock frequency:				 42949672Mhz
I'll look at the bug report form in the morning.
ID: 1979193 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1979197 - Posted: 7 Feb 2019, 23:00:43 UTC

David has been busy already. It might be worth checking out the win-client from

https://ci.appveyor.com/project/BOINC/boinc/builds/22209707/artifacts
ID: 1979197 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1979219 - Posted: 8 Feb 2019, 2:48:47 UTC - in response to Message 1979193.  
Last modified: 8 Feb 2019, 2:55:40 UTC

I got a clinfo output from the Science United user (clinfo from the BOINC /dl directory), and this was the faulty line:

  Max clock frequency:				 42949672Mhz
I'll look at the bug report form in the morning.

I knew that number looked familiar. This is from the stderr from this task: https://setiathome.berkeley.edu/result.php?resultid=7403089694

Max clock frequency: 42949672Mhz

I had seen that frequency before, and I don't know why but I think I just assumed it was a high value set as a high limit. Are you saying this should be more in line with what the actual processor's frequency should be?

PS - check your email for the log.

Edit: I have this frequency listed for successful MB GPU tasks as well, so now I'm really confused. I do appreciate the effort into looking at this. I'm sure we won't have any problems once people start using the new Vega VII cards ;)
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1979219 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1979301 - Posted: 8 Feb 2019, 12:31:38 UTC - in response to Message 1979219.  

PS - check your email for the log.
Email received - many thanks. It's showed me exactly what I wanted to see.

You've processed two AP tasks which had initial runtime estimates of 7807 seconds - and they actually ran for 7883 seconds and 8012 seconds. They were the very first AP tasks you processed on that new machine, so the estimate was pretty damn good.

I'm interpreting those two as the two completed tasks run on the GPU component of your Ryzen. Processing completed on Saturday, 29 December 2018 19:40:43 and Saturday, 29 December 2018 22:40:36 respectively (times in UTC).

Most of your other tasks have initial estimates between 13024 seconds and 280706 seconds (!). I think all these will be tasks assigned to run on the CPU, as the scheduler tests out the various application versions to see which works best. We don't need to worry about those for this investigation.

But the flies in the ointment are the three tasks with an initial estimate of 0.2 seconds, each of which ran for 2 seconds. (The log doesn't record the tasks you aborted).

The first two of these short tasks ran in quick succession, one after the other, at Thursday, 3 January 2019 03:15:36 (UTC again). I'm guessing that gives us a much closer timeframe for the release of that faulty driver - which is exactly what I was looking for. Perfect!
ID: 1979301 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next

Message boards : Number crunching : I am getting a lot of gpu tasks with zero (0) expected processing times.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.