I am getting a lot of gpu tasks with zero (0) expected processing times.

Message boards : Number crunching : I am getting a lot of gpu tasks with zero (0) expected processing times.
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1980395 - Posted: 14 Feb 2019, 23:19:39 UTC - in response to Message 1980389.  

Well, it does not appear to be an upgrade for my APU. The Radeon system tray does not show the upgrade, nor does it when I look up my APU on AMD's website. It looks like it is available just for discrete graphics cards. Regardless, when I was driving home I realized you said this needs to be fixed on the server end, so AMD updating their drivers might be irrelevant? Or did I misunderstand you?

Richard, as I was scrolling through this thread, I realized in this message you asked for the first few lines from the event log and I don't think I ever gave those to you. Do you want that still or is it moot at this point?
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1980395 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980483 - Posted: 15 Feb 2019, 12:03:49 UTC - in response to Message 1980395.  

Thanks, but we've actually moved on from that stage. The patched client seems to be showing what we need now - no further testing needed on that.

Which is more than can be said for the servers - it became apparent on Wednesday that work is needed there too. I'm going to suggest to Eric that our 'SETI Beta' project might be a possible venue for testing, but I don't know if it's in the right condition to accept the preparatory code which has been written so far.

Don't do anything yet, but if we do get a test started, would you be willing to attach your Ryzen to Beta briefly to see if it works?
ID: 1980483 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1980486 - Posted: 15 Feb 2019, 12:33:12 UTC - in response to Message 1980483.  

Thanks, but we've actually moved on from that stage. The patched client seems to be showing what we need now - no further testing needed on that.

Which is more than can be said for the servers - it became apparent on Wednesday that work is needed there too. I'm going to suggest to Eric that our 'SETI Beta' project might be a possible venue for testing, but I don't know if it's in the right condition to accept the preparatory code which has been written so far.

Don't do anything yet, but if we do get a test started, would you be willing to attach your Ryzen to Beta briefly to see if it works?


My 2400G is back on the air. And I will make the requested client change. I can add Beta back if you want.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1980486 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980487 - Posted: 15 Feb 2019, 12:37:50 UTC - in response to Message 1980486.  

Thanks. Please let us know whether - with the standard client - you see the crazy speed reported at startup.

I've emailed Eric with the Beta suggestion, but he won't have received it yet because of the time zone difference. Hold off attaching to Beta until we know whether he can deploy something to test.
ID: 1980487 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1980489 - Posted: 15 Feb 2019, 12:59:24 UTC - in response to Message 1980487.  
Last modified: 15 Feb 2019, 13:05:35 UTC

Thanks. Please let us know whether - with the standard client - you see the crazy speed reported at startup.

I've emailed Eric with the Beta suggestion, but he won't have received it yet because of the time zone difference. Hold off attaching to Beta until we know whether he can deploy something to test.


I had to re-install Windows, then install the drivers on the CD-Rom and then re-install Seti. I am not seeing the really large Gflops number (yet). So either I am missing something. Or all my changes have "fixed" something.

2/15/2019 6:55:25 AM |  | cc_config.xml not found - using defaults
2/15/2019 6:55:25 AM |  | Starting BOINC client version 7.14.2 for windows_x86_64
2/15/2019 6:55:25 AM |  | log flags: file_xfer, sched_ops, task
2/15/2019 6:55:25 AM |  | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
2/15/2019 6:55:25 AM |  | Data directory: C:\ProgramData\BOINC
2/15/2019 6:55:25 AM |  | Running under account tlgal
2/15/2019 6:55:25 AM |  | OpenCL: AMD/ATI GPU 0: AMD Radeon(TM) RX Vega 11 Graphics (driver version 2482.7 (PAL,HSAIL), device version OpenCL 2.0 AMD-APP (2482.7), 6566MB, 6566MB available, 1761 GFLOPS peak)
2/15/2019 6:55:25 AM |  | Host name: LYNNES-GIFT
2/15/2019 6:55:25 AM |  | Processor: 8 AuthenticAMD AMD Ryzen 5 2400G with Radeon Vega Graphics [Family 23 Model 17 Stepping 0]
2/15/2019 6:55:25 AM |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 svm sse4a osvw skinit wdt tce topx page1gb rdtscp fsgsbase bmi1 smep
2/15/2019 6:55:25 AM |  | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.17134.00)
2/15/2019 6:55:25 AM |  | Memory: 14.93 GB physical, 17.68 GB virtual
2/15/2019 6:55:25 AM |  | Disk: 148.41 GB total, 117.02 GB free
2/15/2019 6:55:25 AM |  | Local time is UTC -6 hours
2/15/2019 6:55:25 AM |  | No WSL found.
2/15/2019 6:55:25 AM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 8671627; resource share 1000
2/15/2019 6:55:25 AM | SETI@home | General prefs: from SETI@home (last modified 13-Feb-2019 12:33:43)
2/15/2019 6:55:25 AM | SETI@home | Computer location: school
2/15/2019 6:55:25 AM |  | General prefs: using separate prefs for school
2/15/2019 6:55:25 AM |  | Reading preferences override file
2/15/2019 6:55:25 AM |  | Preferences:
2/15/2019 6:55:25 AM |  | max memory usage when active: 12996.41 MB
2/15/2019 6:55:25 AM |  | max memory usage when idle: 14525.40 MB
2/15/2019 6:55:25 AM |  | max disk usage: 100.00 GB
2/15/2019 6:55:25 AM |  | max CPUs used: 7
2/15/2019 6:55:25 AM |  | suspend work if non-BOINC CPU load exceeds 25%
2/15/2019 6:55:25 AM |  | (to change preferences, visit a project web site or select Preferences in the Manager)
2/15/2019 6:55:25 AM |  | Setting up project and slot directories
2/15/2019 6:55:25 AM |  | Checking active tasks
2/15/2019 6:55:25 AM |  | Setting up GUI RPC socket
2/15/2019 6:55:25 AM |  | Checking presence of 281 project files

A proud member of the OFA (Old Farts Association).
ID: 1980489 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1980492 - Posted: 15 Feb 2019, 13:18:19 UTC - in response to Message 1980487.  

I'd be happy to run the beta, but I'm not sure how to do that. Could point me in the right direction when the time is right?
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1980492 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980504 - Posted: 15 Feb 2019, 14:11:40 UTC - in response to Message 1980489.  

I had to re-install Windows, then install the drivers on the CD-Rom ...
We think the problems arose in a more recent driver, downloaded from the internet. That would be a useful test: Bill was working fine when he first attached his Ryzen to SETI on 23 December, but he had his first failed task on 4 January. That sounds like driver 19.1 to me, so that's what I put in my bug report to AMD.

If you could install something like that from the AMD website, that would be awesome.
ID: 1980504 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980505 - Posted: 15 Feb 2019, 14:13:26 UTC - in response to Message 1980492.  

Could point me in the right direction when the time is right?
Certainly - it only takes seconds. You just have to paste a url into the box provided.

I'll tell you which url, and which box, nearer the time...
ID: 1980505 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1980541 - Posted: 15 Feb 2019, 17:27:55 UTC - in response to Message 1980504.  

I had to re-install Windows, then install the drivers on the CD-Rom ...
We think the problems arose in a more recent driver, downloaded from the internet. That would be a useful test: Bill was working fine when he first attached his Ryzen to SETI on 23 December, but he had his first failed task on 4 January. That sounds like driver 19.1 to me, so that's what I put in my bug report to AMD.

If you could install something like that from the AMD website, that would be awesome.


Let me "update" to the latest. Since I have the cd-rom I can always "back date". :)

Tom
A proud member of the OFA (Old Farts Association).
ID: 1980541 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1980546 - Posted: 15 Feb 2019, 18:15:49 UTC

If you do try to revert to the CD drivers after the test with the latest drivers, I would suggest using DDU to completely remove any vestiges of the troubled new drivers before reinstalling the good CD drivers.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1980546 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1980547 - Posted: 15 Feb 2019, 18:22:24 UTC - in response to Message 1980541.  

I had to re-install Windows, then install the drivers on the CD-Rom ...
We think the problems arose in a more recent driver, downloaded from the internet. That would be a useful test: Bill was working fine when he first attached his Ryzen to SETI on 23 December, but he had his first failed task on 4 January. That sounds like driver 19.1 to me, so that's what I put in my bug report to AMD.

If you could install something like that from the AMD website, that would be awesome.


Let me "update" to the latest. Since I have the cd-rom I can always "back date". :)

Tom



I did a custom/clean install of the latest driver set for the all in one.

2/15/2019 12:16:54 PM |  | cc_config.xml not found - using defaults
2/15/2019 12:17:02 PM |  | Starting BOINC client version 7.14.2 for windows_x86_64
2/15/2019 12:17:02 PM |  | log flags: file_xfer, sched_ops, task
2/15/2019 12:17:02 PM |  | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
2/15/2019 12:17:02 PM |  | Data directory: C:\ProgramData\BOINC
2/15/2019 12:17:02 PM |  | Running under account tlgal
2/15/2019 12:17:18 PM |  | OpenCL: AMD/ATI GPU 0: AMD Radeon(TM) RX Vega 11 Graphics (driver version 2766.5 (PAL,HSAIL), device version OpenCL 2.0 AMD-APP (2766.5), 6566MB, 6566MB available, 1761 GFLOPS peak)
2/15/2019 12:17:18 PM |  | Host name: LYNNES-GIFT
2/15/2019 12:17:18 PM |  | Processor: 8 AuthenticAMD AMD Ryzen 5 2400G with Radeon Vega Graphics [Family 23 Model 17 Stepping 0]
2/15/2019 12:17:18 PM |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 svm sse4a osvw skinit wdt tce topx page1gb rdtscp fsgsbase bmi1 smep
2/15/2019 12:17:18 PM |  | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.17134.00)
2/15/2019 12:17:18 PM |  | Memory: 14.93 GB physical, 17.68 GB virtual
2/15/2019 12:17:18 PM |  | Disk: 148.41 GB total, 115.38 GB free
2/15/2019 12:17:18 PM |  | Local time is UTC -6 hours
2/15/2019 12:17:18 PM |  | No WSL found.
2/15/2019 12:17:19 PM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 8671627; resource share 1000
2/15/2019 12:17:19 PM | SETI@home | General prefs: from SETI@home (last modified 13-Feb-2019 12:33:43)
2/15/2019 12:17:19 PM | SETI@home | Computer location: school
2/15/2019 12:17:19 PM |  | General prefs: using separate prefs for school
2/15/2019 12:17:19 PM |  | Reading preferences override file
2/15/2019 12:17:19 PM |  | Preferences:
2/15/2019 12:17:19 PM |  | max memory usage when active: 12996.41 MB
2/15/2019 12:17:19 PM |  | max memory usage when idle: 14525.40 MB
2/15/2019 12:17:20 PM |  | max disk usage: 100.00 GB
2/15/2019 12:17:20 PM |  | max CPUs used: 7
2/15/2019 12:17:20 PM |  | suspend work if non-BOINC CPU load exceeds 25%
2/15/2019 12:17:20 PM |  | (to change preferences, visit a project web site or select Preferences in the Manager)
2/15/2019 12:17:20 PM |  | Setting up project and slot directories
2/15/2019 12:17:20 PM |  | Checking active tasks
2/15/2019 12:17:20 PM |  | Setting up GUI RPC socket
2/15/2019 12:17:20 PM |  | Checking presence of 281 project files

A proud member of the OFA (Old Farts Association).
ID: 1980547 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980550 - Posted: 15 Feb 2019, 18:27:43 UTC - in response to Message 1980546.  

Agreed. That would be the Display Driver Uninstaller from https://www.wagnardsoft.com/, if you haven't come across it before: also distributed by Guru3D.
ID: 1980550 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1980551 - Posted: 15 Feb 2019, 18:29:10 UTC - in response to Message 1980546.  

If you do try to revert to the CD drivers after the test with the latest drivers, I would suggest using DDU to completely remove any vestiges of the troubled new drivers before reinstalling the good CD drivers.


I have been using the custom/clean install. If you have mentioned "DDU" before I failed to note it down.

Found it, I think: https://www.guru3d.com/files-details/display-driver-uninstaller-download.html
A proud member of the OFA (Old Farts Association).
ID: 1980551 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980552 - Posted: 15 Feb 2019, 19:00:23 UTC - in response to Message 1980547.  

OK, it looks as if your particular Ryzen doesn'r suffer from this bug. One of the other testers reported that it only showed up when he used a non-standard CPU speed (he downclocked it). Eric found host 8669611 while checking to see how widespread the problem is: user has been a SETI member since 2005, but has never posted on the message board. He attached a new Ryzen last Sunday (10 February), and had trashed 450 tasks by Wednesday. His CPU cores are working, so he probably hasn't even noticed.

But the project - all projects - could do without that 'help', which is why I'm taking this so seriously.
ID: 1980552 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1980554 - Posted: 15 Feb 2019, 19:07:58 UTC - in response to Message 1980552.  
Last modified: 15 Feb 2019, 19:14:34 UTC

OK, it looks as if your particular Ryzen doesn'r suffer from this bug. One of the other testers reported that it only showed up when he used a non-standard CPU speed (he downclocked it). Eric found host 8669611 while checking to see how widespread the problem is: user has been a SETI member since 2005, but has never posted on the message board. He attached a new Ryzen last Sunday (10 February), and had trashed 450 tasks by Wednesday. His CPU cores are working, so he probably hasn't even noticed.

But the project - all projects - could do without that 'help', which is why I'm taking this so seriously.


I will admit to turning off the "cool 'n quiet", the C states and setting the iGPU to forced so I can try to run a gtx 750Ti beside the iGPU like I did on the previous MB. But I haven't touched the cpu speed.

This last re-boot I did set the iGPU to 1500 (it was stable on the last iGPU) and the iGPU voltage to 1.2.

If it starts trashing tasks I will switch back to auto.
--edit---
I just looked at the error listing on the website and I appear to have been throwing errors since yesterday. That was with the cpu/gpu on auto and the older drivers.

The error is "exit time limit exceeded".

Now what do I do? Shut down gpu processing?

I want to leave it running for a little while because I have different drivers installed as well as the standard high power plan.
Yesterday I was still running the "Amd balanced power plan" that even with "never stop" was pausing the system.
--edit--
Tom
A proud member of the OFA (Old Farts Association).
ID: 1980554 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980560 - Posted: 15 Feb 2019, 19:23:05 UTC - in response to Message 1980554.  

-edit---
I just looked at the error listing on the website and I appear to have been throwing errors since yesterday. That was with the cpu/gpu on auto and the older drivers.

The error is "exit time limit exceeded".
I'll go and look at them to see if it's the same bug we're trying to catch - that's certainly the right same error message.

I actually came here to report that Eric has just emailed me:

I was planning on building updated server components today anyway... I'll see if I can get them deployed on beta before COB.
His close of business is likely to be after my bedtime, but I'll keep an eye on it as long as I can.
ID: 1980560 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980565 - Posted: 15 Feb 2019, 19:35:51 UTC

@ Tom M

I see host 8671627 - Ryzen 5 2400G created 14 Feb 2019, 19:51:26 UTC. That sounds like the one.

I also see 113 error tasks. The first one I spot-checked contains:

Exit status		197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED
Run time		11 sec
Device peak FLOPS	1,814,198.76 GFLOPS
Max clock frequency: 	42949672Mhz
That's a full house of the problem symptoms we're tracking. So, please take a note of what settings you were running 24 hours ago, so that we can replicate them for Beta testing.

BTW, that task was completed at 14 Feb 2019, 18:30:34 UTC, before the host was created. Trying to change too many things at once to get a host running quickly can cause problems like that.
ID: 1980565 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980600 - Posted: 15 Feb 2019, 23:07:49 UTC

OK, another reply from Eric: Beta has been updated, and we're ready to start testing.

First, please reduce you cache size/work request. We are looking for a problem which should only last for 10 tasks. If the solution works, you're cured for good, and that machine won't be useful for testing this type of problem ever again. Only if it fails can you help test the next attempted fix....

So, with a reduced cache setting, go to the tools menu in BOINC Manager. and click the first item, 'Add project...'

Ignore all the information on existing projects - you won't find Beta there. Go straight to the last box, labelled 'Project URL:'. Paste in

http://setiweb.ssl.berkeley.edu/beta/

Click 'Next >', and fill in your details. Please use the same email address and handle as you use elsewhere - it makes it easier to find you. The rest is self-explanatory.

You'll get one task immediately for each of your devices. And then it'll ask for more work seven seconds later - you did set a small cache first, didn't you? Beta is different - no five minute pause to think about things.

We're interested in the AMD GPU tasks only. Please note the initial runtime estimates, the actual runtimes, and anything else that catches your eye. Post it here, and we can pick over the entrails in the morning.

Set 'No New Tasks' as soon as you want, and go back to your normal projects / cache settings. Thanks in advance.
ID: 1980600 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980613 - Posted: 16 Feb 2019, 0:03:14 UTC

Midnight passed, and my bed calls. Happy testing, and I'll look into any results you're able to gather in the morning. G'night.
ID: 1980613 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1980624 - Posted: 16 Feb 2019, 0:36:36 UTC - in response to Message 1980565.  

@ Tom M

I see host 8671627 - Ryzen 5 2400G created 14 Feb 2019, 19:51:26 UTC. That sounds like the one.

I also see 113 error tasks. The first one I spot-checked contains:

Exit status		197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED
Run time		11 sec
Device peak FLOPS	1,814,198.76 GFLOPS
Max clock frequency: 	42949672Mhz
That's a full house of the problem symptoms we're tracking. So, please take a note of what settings you were running 24 hours ago, so that we can replicate them for Beta testing.

BTW, that task was completed at 14 Feb 2019, 18:30:34 UTC, before the host was created. Trying to change too many things at once to get a host running quickly can cause problems like that.


It now occurs to me that the above creation date after the task was run has to be a result of the merge I did on the ids.
A proud member of the OFA (Old Farts Association).
ID: 1980624 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : I am getting a lot of gpu tasks with zero (0) expected processing times.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.