I am getting a lot of gpu tasks with zero (0) expected processing times.

Message boards : Number crunching : I am getting a lot of gpu tasks with zero (0) expected processing times.
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1980625 - Posted: 16 Feb 2019, 0:40:37 UTC - in response to Message 1980600.  
Last modified: 16 Feb 2019, 0:45:08 UTC

OK, another reply from Eric: Beta has been updated, and we're ready to start testing.

First, please reduce you cache size/work request. We are looking for a problem which should only last for 10 tasks. If the solution works, you're cured for good, and that machine won't be useful for testing this type of problem ever again. Only if it fails can you help test the next attempted fix....

So, with a reduced cache setting, go to the tools menu in BOINC Manager. and click the first item, 'Add project...'

Ignore all the information on existing projects - you won't find Beta there. Go straight to the last box, labelled 'Project URL:'. Paste in

http://setiweb.ssl.berkeley.edu/beta/

Click 'Next >', and fill in your details. Please use the same email address and handle as you use elsewhere - it makes it easier to find you. The rest is self-explanatory.

You'll get one task immediately for each of your devices. And then it'll ask for more work seven seconds later - you did set a small cache first, didn't you? Beta is different - no five minute pause to think about things.

We're interested in the AMD GPU tasks only. Please note the initial runtime estimates, the actual runtimes, and anything else that catches your eye. Post it here, and we can pick over the entrails in the morning.

Set 'No New Tasks' as soon as you want, and go back to your normal projects / cache settings. Thanks in advance.



Ok, squeezed the tasks down to 0.01 day. Suspended Seti so it wouldn't get in the way. Got my "work" location setup for ATI gpu's only. Have aborted all the other beta tasks. Crunching now.

Also, finally got both the internal gpu and a discrete card (Nvidia) setup but Nvidia is not crunching.
Have at least 10 tasks downloaded.
Will report more as I notice it.

-edit-
Since I only have the iGPU task running at 50% it took 10 minutes. Estimated time left 10 minutes. This is the fastest processing I have seen on the iGPU. So running cpu processing even at 80% of the cpus isn't low enough to allow the iGPU to process at full speed.
More testing. :)
-edit-

Tom
A proud member of the OFA (Old Farts Association).
ID: 1980625 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1980629 - Posted: 16 Feb 2019, 1:10:03 UTC - in response to Message 1980625.  


edit-
Since I only have the iGPU task running at 50% it took 10 minutes. Estimated time left 10 minutes. This is the fastest processing I have seen on the iGPU. So running cpu processing even at 80% of the cpus isn't low enough to allow the iGPU to process at full speed.
More testing. :)
-edit-

Tom


Ok, I have finished re-jiggering the Seti/Beta/Work location so it is asking for 50% of the iGPU. So everybody should be crunching. I have been registered under Beta for a long time so I don't mind running the ATI tasks until someone says we are done.

I understand we are probably already tested but if Eric has to do another change and test, its already processing on my end.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1980629 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980675 - Posted: 16 Feb 2019, 12:43:15 UTC - in response to Message 1980629.  

Found you - thanks for the hint.

I see host 87174, with the Ryzen humming along happily.

But unfortunately, I also see

Device peak FLOPS	53.00 GFLOPS
Max clock frequency:	1251Mhz
It looks as if you're running the driver that reports sane values, rather than the insane one. You have 10 tasks validated as I type: fortunately the scheduler is testing out two different app versions, so we haven't quite switched to APR estimates yet, but it's getting close.

So the test result is inconclusive: we can't tell whether it's working because of Eric's fix or your driver, but at the moment my money is on your driver. Thanks for trying, anyway.
ID: 1980675 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1980676 - Posted: 16 Feb 2019, 13:14:22 UTC - in response to Message 1980600.  

Ok, beta is up and running! Well, I had it running since last night, but I guess I was tired and didn't realize I didn't hit send on this post. So here's what I know.

I have not processed a single GPU task. I did reduce my workload prior to starting beta to 0.5 (minimum days) and 0.01 (additional days). I also reduced the amount of storage space BOINC could use, but realized I had to boost it up because I had already downloaded plenty of tasks with normal SETI.

I also updated my cc_config and deleted the exclude_gpu reference for AP7 (although now that I think of it, that should only apply to the normal app, not the beta version). I am still running BOINC 7.15 build...is that okay or should I have returned back to 7.14?

Everything appears to be crunching correctly. No AP tasks at all, only MB. Now that I've "slept on it", I realized I didn't set up my S@H beta preferences, so I set those now. I also had suspended using normal S@H, but prior to doing so S@H downloaded a ton of GPU tasks, so I'm wondering if for some reason that maxed out my limit.

So, I'm running Beta with NNT, and regular S@H concurrently to eat away some of these units. I'll check back in a few hours and see what's going on.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1980676 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980677 - Posted: 16 Feb 2019, 13:17:49 UTC - in response to Message 1980676.  

Thanks - that's a long list, and I was just planning to go out for lunch...

I'll go through it all when I get back, and by then we should have some results to look at.
ID: 1980677 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980708 - Posted: 16 Feb 2019, 17:30:59 UTC

Right, back again. I see

Computer 87175
Application details for host 87175

That looks like you, and they match exactly what you've described. So far you've done nearly 100 tasks, with no sign of trouble at all. But they've all been done on the CPU component of your Ryzen.

We need to switch you to the GPU part. Please go to your SETI@home Beta preferences, and ensure that 'Use ATI GPU' is checked in the top part. I suggest you uncheck 'Use CPU' (we've completed that test). You could also turn off Astropulse tasks lower down the same page.

Go back to you BOINC Manager, and 'update' the Beta project once to collect the new settings. Then allow new work, and wait until it's ready to fetch some.

Don't bother changing the storage space allowed for BOINC - that's shared across all projects anyway. My guess is that you've probably got too much work from other projects, so BOINC doesn't feel the need to download from Beta. The best way to hurry it along is to set 'No New Tasks' for EVERY project except Beta, and then nudge your cache size up a bit until it asks for some. Remember to look in the Event Log if things don't seem to be happening as you expect.

At this point, if doesn't matter which BOINC client you're using - we're concentrating on the server. Not worth changing the client.
ID: 1980708 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1980723 - Posted: 16 Feb 2019, 18:51:02 UTC - in response to Message 1980675.  

Found you - thanks for the hint.

I see host 87174, with the Ryzen humming along happily.

But unfortunately, I also see

Device peak FLOPS	53.00 GFLOPS
Max clock frequency:	1251Mhz
It looks as if you're running the driver that reports sane values, rather than the insane one. You have 10 tasks validated as I type: fortunately the scheduler is testing out two different app versions, so we haven't quite switched to APR estimates yet, but it's getting close.

So the test result is inconclusive: we can't tell whether it's working because of Eric's fix or your driver, but at the moment my money is on your driver. Thanks for trying, anyway.


Ok, so it sounds like I should continue to let it run. Until you want me to switch to a "bad" driver. As far as I can tell I am now running the latest All in One driver.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1980723 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980727 - Posted: 16 Feb 2019, 19:04:02 UTC - in response to Message 1980723.  
Last modified: 16 Feb 2019, 19:06:45 UTC

Ok, so it sounds like I should continue to let it run. Until you want me to switch to a "bad" driver. As far as I can tell I am now running the latest All in One driver.

Tom
Actually, it would be most helpful to try the INsane driver, to see whether the new server code that Eric loaded specially actually cures the insanity problem. But before you change anything, could you please look up the exact version number of that "latest All in One driver" - we need to know for the record which are good and which are bad.

And I'll go check whether you've passed the point of no return as far as completed tasks are concerned.

Edit - It's OK, you've got plenty of headroom left on opencl_ati5_SoG_nocal - though you can't pick and choose which app_version the server will choose for you next.
ID: 1980727 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980747 - Posted: 16 Feb 2019, 21:25:02 UTC - in response to Message 1980625.  

-edit-
Since I only have the iGPU task running at 50% it took 10 minutes. Estimated time left 10 minutes. This is the fastest processing I have seen on the iGPU. So running cpu processing even at 80% of the cpus isn't low enough to allow the iGPU to process at full speed.
More testing. :)
-edit-

Tom
I skipped that part of your answer this morning.

I've long had a theory that it doesn't just matter whether you're using the CPU: it matters what you're doing with it, too. Since we're testing, I though I'd try to demonstrate that. My host 8121358 has an i5-6500 CPU @ 3.20GHz - a couple of generations old now. I plugged it into a Killa-watt meter when I first got it, and never got round to unplugging it again. Today's figures are:

Idle - BOINC not running:		22 watts
Running NumberFields on 4 cores:	55 watts
Running SETI x64 AVX on 4 cores:	69 watts
ditto at VHAR:				71 watts
So, there's a significant difference between NumberFields@Home (primarily integer arithmetic) and the heavy use of the specialist floating point hardware by SETI. I've listed VHAR separately, because last time I tested this (about 10 years ago), I could see that VHAR put an extra load on the memory controller, too.

(I kept both GPUs idle while I did that test)
ID: 1980747 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1980755 - Posted: 16 Feb 2019, 22:41:51 UTC - in response to Message 1980708.  

Right, back again. I see

Computer 87175
Application details for host 87175

That looks like you, and they match exactly what you've described. So far you've done nearly 100 tasks, with no sign of trouble at all. But they've all been done on the CPU component of your Ryzen.

We need to switch you to the GPU part. Please go to your SETI@home Beta preferences, and ensure that 'Use ATI GPU' is checked in the top part. I suggest you uncheck 'Use CPU' (we've completed that test). You could also turn off Astropulse tasks lower down the same page.

Go back to you BOINC Manager, and 'update' the Beta project once to collect the new settings. Then allow new work, and wait until it's ready to fetch some.

Don't bother changing the storage space allowed for BOINC - that's shared across all projects anyway. My guess is that you've probably got too much work from other projects, so BOINC doesn't feel the need to download from Beta. The best way to hurry it along is to set 'No New Tasks' for EVERY project except Beta, and then nudge your cache size up a bit until it asks for some. Remember to look in the Event Log if things don't seem to be happening as you expect.

At this point, if doesn't matter which BOINC client you're using - we're concentrating on the server. Not worth changing the client.
Yup, that's me. Sorry, I've been a bit busy today. ATI GPU is on, CPU tasks are off, and I updated the project. No AP tasks yet, but we'll see. Chances are I won't have time to play with this more until Sunday (for you).
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1980755 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980758 - Posted: 16 Feb 2019, 23:02:16 UTC - in response to Message 1980755.  
Last modified: 16 Feb 2019, 23:02:54 UTC

At this point, if doesn't matter which BOINC client you're using - we're concentrating on the server. Not worth changing the client.
Yup, that's me. Sorry, I've been a bit busy today. ATI GPU is on, CPU tasks are off, and I updated the project. No AP tasks yet, but we'll see. Chances are I won't have time to play with this more until Sunday (for you).
Please don't wait for AP tasks - I don't think they make many of them for Beta. MB tasks there will demonstrate what we're testing just as well.
ID: 1980758 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1980763 - Posted: 16 Feb 2019, 23:44:55 UTC - in response to Message 1980758.  

At this point, if doesn't matter which BOINC client you're using - we're concentrating on the server. Not worth changing the client.
Yup, that's me. Sorry, I've been a bit busy today. ATI GPU is on, CPU tasks are off, and I updated the project. No AP tasks yet, but we'll see. Chances are I won't have time to play with this more until Sunday (for you).
Please don't wait for AP tasks - I don't think they make many of them for Beta. MB tasks there will demonstrate what we're testing just as well.
I finally got a GPU task! I had to suspend regular S@H (after working through the throng of GPU tasks I got). This task has 5 HOURS of estimated run time. I forgot to adjust for 1 CPU per GPU for beta, but...it seems to be working.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1980763 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980815 - Posted: 17 Feb 2019, 9:02:54 UTC - in response to Message 1980763.  

I finally got a GPU task! I had to suspend regular S@H (after working through the throng of GPU tasks I got). This task has 5 HOURS of estimated run time. I forgot to adjust for 1 CPU per GPU for beta, but...it seems to be working.
In fact, you got two of them, and they both ran to completion. We can see that you're still running the faulty driver, but the bad effects have gone away.

In this context, a 5 hour estimate - although inaccurate - is far better than a zero estimate. It's failsafe. If you choose to continue running Beta tasks, you should start seeing normal estimates after a day or two: you might see more long estimates as the server tests out other versions of the application, but it would all settle down over time. That's entirely up to you: our testing here is completed (the patch is universal - we don't need to test AP separately).

This testing has been watched by several project administrators and senior developers. I've emailed them to pass on the good news, and it's in their hands now. I imagine Eric will want to apply it to the SETI Main servers on Tuesday: I can't say when it'll reach MilkyWay or the other affected projects, but it's on its way.

Very many thanks indeed.
ID: 1980815 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1980832 - Posted: 17 Feb 2019, 13:28:59 UTC - in response to Message 1980815.  

That's entirely up to you: our testing here is completed (the patch is universal - we don't need to test AP separately).

This testing has been watched by several project administrators and senior developers. I've emailed them to pass on the good news, and it's in their hands now. I imagine Eric will want to apply it to the SETI Main servers on Tuesday: I can't say when it'll reach MilkyWay or the other affected projects, but it's on its way.

Very many thanks indeed.
Glad I could help! I'm out of beta tasks and am switching it to NNT. I'm also disabling AP7 GPU tasks again for regular Seti. If you need anything else tested later let me know.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1980832 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980834 - Posted: 17 Feb 2019, 13:48:04 UTC - in response to Message 1980832.  

No, there's nothing else on the horizon now. I'll post here when I get word from Eric that the Main servers have been updated: it should be safe to enable AP again then.
ID: 1980834 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1980914 - Posted: 17 Feb 2019, 20:28:29 UTC - in response to Message 1980832.  

You could take #3020 (files) for a quick spin. That changes the client to not believe if a task's maximum allowed runtime is less than 2 minutes and right now allows such task to run for up to 12 hours.

To test that you need buggy drivers, project server than isn't patched yet and science apps that haven't completed the 10 tasks on your host yet. I think Milkyway should work for this. You should see a "Elapsed time limit x < 120; setting to 43200" message in Event Log when a task with maximum runtime of less than 2 minutes is started.
ID: 1980914 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1981105 - Posted: 19 Feb 2019, 2:56:24 UTC - in response to Message 1980914.  

You could take #3020 (files) for a quick spin. That changes the client to not believe if a task's maximum allowed runtime is less than 2 minutes and right now allows such task to run for up to 12 hours.

To test that you need buggy drivers, project server than isn't patched yet and science apps that haven't completed the 10 tasks on your host yet. I think Milkyway should work for this. You should see a "Elapsed time limit x < 120; setting to 43200" message in Event Log when a task with maximum runtime of less than 2 minutes is started.
I downloaded and ran this version of the client on MW and it appears to have worked! The estimated time started at 0:00, but once the task started to run the elapsed time started around 9 minutes, with the first GPU task completing around 7 minutes. It validated too!!

So, some lines from my event log you may find interesting:

2/18/2019 8:39:18 PM |  | OpenCL: AMD/ATI GPU 0: AMD Radeon(TM) Vega 8 Graphics (driver version 2766.5 (PAL,HSAIL), device version OpenCL 2.0 AMD-APP (2766.5), 7206MB, 7206MB available, 100 GFLOPS peak)
2/18/2019 8:39:18 PM |  | [coproc] AMD OpenCL reported bad GPU peak FLOPS 43980464128000000.000000; using 100000000000.000000
2/18/2019 8:39:48 PM | Milkyway@Home | Elapsed time limit 18.920640 < 120.000000; setting to 43200.000000

So what does this mean now? I know this isn't the MW forum, but to be honest, there is a bit more support here than in the other projects.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1981105 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1981109 - Posted: 19 Feb 2019, 3:15:32 UTC - in response to Message 1981105.  

This is just some of the "failsafe" code that DA put in to get reasonable values reported for projects so the ATI cards can at least process something with the bad reported flops value.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1981109 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1981111 - Posted: 19 Feb 2019, 3:35:15 UTC - in response to Message 1981109.  

This is just some of the "failsafe" code that DA put in to get reasonable values reported for projects so the ATI cards can at least process something with the bad reported flops value.
I figured it was just a workaround for the time being, Still, its working.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1981111 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1985268 - Posted: 15 Mar 2019, 10:11:37 UTC

The server-side patches to mitigate against this problem have now been released as a recommended update (server release v1.0.4). Project administrators have been notified via the normal email lists, but they may not be fully aware of the implications if they haven't been following the previous conversations.

If you come across this problem on another project, please notify their administration team that a patch is available and required.
ID: 1985268 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : I am getting a lot of gpu tasks with zero (0) expected processing times.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.