Message boards :
Number crunching :
I am getting a lot of gpu tasks with zero (0) expected processing times.
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next
Author | Message |
---|---|
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
OK, another reply from Eric: Beta has been updated, and we're ready to start testing. Ok, squeezed the tasks down to 0.01 day. Suspended Seti so it wouldn't get in the way. Got my "work" location setup for ATI gpu's only. Have aborted all the other beta tasks. Crunching now. Also, finally got both the internal gpu and a discrete card (Nvidia) setup but Nvidia is not crunching. Have at least 10 tasks downloaded. Will report more as I notice it. -edit- Since I only have the iGPU task running at 50% it took 10 minutes. Estimated time left 10 minutes. This is the fastest processing I have seen on the iGPU. So running cpu processing even at 80% of the cpus isn't low enough to allow the iGPU to process at full speed. More testing. :) -edit- Tom A proud member of the OFA (Old Farts Association). |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Ok, I have finished re-jiggering the Seti/Beta/Work location so it is asking for 50% of the iGPU. So everybody should be crunching. I have been registered under Beta for a long time so I don't mind running the ATI tasks until someone says we are done. I understand we are probably already tested but if Eric has to do another change and test, its already processing on my end. Tom A proud member of the OFA (Old Farts Association). |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Found you - thanks for the hint. I see host 87174, with the Ryzen humming along happily. But unfortunately, I also see Device peak FLOPS 53.00 GFLOPS Max clock frequency: 1251MhzIt looks as if you're running the driver that reports sane values, rather than the insane one. You have 10 tasks validated as I type: fortunately the scheduler is testing out two different app versions, so we haven't quite switched to APR estimates yet, but it's getting close. So the test result is inconclusive: we can't tell whether it's working because of Eric's fix or your driver, but at the moment my money is on your driver. Thanks for trying, anyway. |
Bill Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60 |
Ok, beta is up and running! Well, I had it running since last night, but I guess I was tired and didn't realize I didn't hit send on this post. So here's what I know. I have not processed a single GPU task. I did reduce my workload prior to starting beta to 0.5 (minimum days) and 0.01 (additional days). I also reduced the amount of storage space BOINC could use, but realized I had to boost it up because I had already downloaded plenty of tasks with normal SETI. I also updated my cc_config and deleted the exclude_gpu reference for AP7 (although now that I think of it, that should only apply to the normal app, not the beta version). I am still running BOINC 7.15 build...is that okay or should I have returned back to 7.14? Everything appears to be crunching correctly. No AP tasks at all, only MB. Now that I've "slept on it", I realized I didn't set up my S@H beta preferences, so I set those now. I also had suspended using normal S@H, but prior to doing so S@H downloaded a ton of GPU tasks, so I'm wondering if for some reason that maxed out my limit. So, I'm running Beta with NNT, and regular S@H concurrently to eat away some of these units. I'll check back in a few hours and see what's going on. Seti@home classic: 1,456 results, 1.613 years CPU time |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Thanks - that's a long list, and I was just planning to go out for lunch... I'll go through it all when I get back, and by then we should have some results to look at. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Right, back again. I see Computer 87175 Application details for host 87175 That looks like you, and they match exactly what you've described. So far you've done nearly 100 tasks, with no sign of trouble at all. But they've all been done on the CPU component of your Ryzen. We need to switch you to the GPU part. Please go to your SETI@home Beta preferences, and ensure that 'Use ATI GPU' is checked in the top part. I suggest you uncheck 'Use CPU' (we've completed that test). You could also turn off Astropulse tasks lower down the same page. Go back to you BOINC Manager, and 'update' the Beta project once to collect the new settings. Then allow new work, and wait until it's ready to fetch some. Don't bother changing the storage space allowed for BOINC - that's shared across all projects anyway. My guess is that you've probably got too much work from other projects, so BOINC doesn't feel the need to download from Beta. The best way to hurry it along is to set 'No New Tasks' for EVERY project except Beta, and then nudge your cache size up a bit until it asks for some. Remember to look in the Event Log if things don't seem to be happening as you expect. At this point, if doesn't matter which BOINC client you're using - we're concentrating on the server. Not worth changing the client. |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Found you - thanks for the hint. Ok, so it sounds like I should continue to let it run. Until you want me to switch to a "bad" driver. As far as I can tell I am now running the latest All in One driver. Tom A proud member of the OFA (Old Farts Association). |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Ok, so it sounds like I should continue to let it run. Until you want me to switch to a "bad" driver. As far as I can tell I am now running the latest All in One driver.Actually, it would be most helpful to try the INsane driver, to see whether the new server code that Eric loaded specially actually cures the insanity problem. But before you change anything, could you please look up the exact version number of that "latest All in One driver" - we need to know for the record which are good and which are bad. And I'll go check whether you've passed the point of no return as far as completed tasks are concerned. Edit - It's OK, you've got plenty of headroom left on opencl_ati5_SoG_nocal - though you can't pick and choose which app_version the server will choose for you next. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
-edit-I skipped that part of your answer this morning. I've long had a theory that it doesn't just matter whether you're using the CPU: it matters what you're doing with it, too. Since we're testing, I though I'd try to demonstrate that. My host 8121358 has an i5-6500 CPU @ 3.20GHz - a couple of generations old now. I plugged it into a Killa-watt meter when I first got it, and never got round to unplugging it again. Today's figures are: Idle - BOINC not running: 22 watts Running NumberFields on 4 cores: 55 watts Running SETI x64 AVX on 4 cores: 69 watts ditto at VHAR: 71 wattsSo, there's a significant difference between NumberFields@Home (primarily integer arithmetic) and the heavy use of the specialist floating point hardware by SETI. I've listed VHAR separately, because last time I tested this (about 10 years ago), I could see that VHAR put an extra load on the memory controller, too. (I kept both GPUs idle while I did that test) |
Bill Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60 |
Right, back again. I seeYup, that's me. Sorry, I've been a bit busy today. ATI GPU is on, CPU tasks are off, and I updated the project. No AP tasks yet, but we'll see. Chances are I won't have time to play with this more until Sunday (for you). Seti@home classic: 1,456 results, 1.613 years CPU time |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Please don't wait for AP tasks - I don't think they make many of them for Beta. MB tasks there will demonstrate what we're testing just as well.At this point, if doesn't matter which BOINC client you're using - we're concentrating on the server. Not worth changing the client.Yup, that's me. Sorry, I've been a bit busy today. ATI GPU is on, CPU tasks are off, and I updated the project. No AP tasks yet, but we'll see. Chances are I won't have time to play with this more until Sunday (for you). |
Bill Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60 |
I finally got a GPU task! I had to suspend regular S@H (after working through the throng of GPU tasks I got). This task has 5 HOURS of estimated run time. I forgot to adjust for 1 CPU per GPU for beta, but...it seems to be working.Please don't wait for AP tasks - I don't think they make many of them for Beta. MB tasks there will demonstrate what we're testing just as well.At this point, if doesn't matter which BOINC client you're using - we're concentrating on the server. Not worth changing the client.Yup, that's me. Sorry, I've been a bit busy today. ATI GPU is on, CPU tasks are off, and I updated the project. No AP tasks yet, but we'll see. Chances are I won't have time to play with this more until Sunday (for you). Seti@home classic: 1,456 results, 1.613 years CPU time |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
I finally got a GPU task! I had to suspend regular S@H (after working through the throng of GPU tasks I got). This task has 5 HOURS of estimated run time. I forgot to adjust for 1 CPU per GPU for beta, but...it seems to be working.In fact, you got two of them, and they both ran to completion. We can see that you're still running the faulty driver, but the bad effects have gone away. In this context, a 5 hour estimate - although inaccurate - is far better than a zero estimate. It's failsafe. If you choose to continue running Beta tasks, you should start seeing normal estimates after a day or two: you might see more long estimates as the server tests out other versions of the application, but it would all settle down over time. That's entirely up to you: our testing here is completed (the patch is universal - we don't need to test AP separately). This testing has been watched by several project administrators and senior developers. I've emailed them to pass on the good news, and it's in their hands now. I imagine Eric will want to apply it to the SETI Main servers on Tuesday: I can't say when it'll reach MilkyWay or the other affected projects, but it's on its way. Very many thanks indeed. |
Bill Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60 |
That's entirely up to you: our testing here is completed (the patch is universal - we don't need to test AP separately).Glad I could help! I'm out of beta tasks and am switching it to NNT. I'm also disabling AP7 GPU tasks again for regular Seti. If you need anything else tested later let me know. Seti@home classic: 1,456 results, 1.613 years CPU time |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
No, there's nothing else on the horizon now. I'll post here when I get word from Eric that the Main servers have been updated: it should be safe to enable AP again then. |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
You could take #3020 (files) for a quick spin. That changes the client to not believe if a task's maximum allowed runtime is less than 2 minutes and right now allows such task to run for up to 12 hours. To test that you need buggy drivers, project server than isn't patched yet and science apps that haven't completed the 10 tasks on your host yet. I think Milkyway should work for this. You should see a "Elapsed time limit x < 120; setting to 43200" message in Event Log when a task with maximum runtime of less than 2 minutes is started. |
Bill Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60 |
You could take #3020 (files) for a quick spin. That changes the client to not believe if a task's maximum allowed runtime is less than 2 minutes and right now allows such task to run for up to 12 hours.I downloaded and ran this version of the client on MW and it appears to have worked! The estimated time started at 0:00, but once the task started to run the elapsed time started around 9 minutes, with the first GPU task completing around 7 minutes. It validated too!! So, some lines from my event log you may find interesting: 2/18/2019 8:39:18 PM | | OpenCL: AMD/ATI GPU 0: AMD Radeon(TM) Vega 8 Graphics (driver version 2766.5 (PAL,HSAIL), device version OpenCL 2.0 AMD-APP (2766.5), 7206MB, 7206MB available, 100 GFLOPS peak) 2/18/2019 8:39:18 PM | | [coproc] AMD OpenCL reported bad GPU peak FLOPS 43980464128000000.000000; using 100000000000.000000 2/18/2019 8:39:48 PM | Milkyway@Home | Elapsed time limit 18.920640 < 120.000000; setting to 43200.000000 So what does this mean now? I know this isn't the MW forum, but to be honest, there is a bit more support here than in the other projects. Seti@home classic: 1,456 results, 1.613 years CPU time |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
This is just some of the "failsafe" code that DA put in to get reasonable values reported for projects so the ATI cards can at least process something with the bad reported flops value. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Bill Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60 |
This is just some of the "failsafe" code that DA put in to get reasonable values reported for projects so the ATI cards can at least process something with the bad reported flops value.I figured it was just a workaround for the time being, Still, its working. Seti@home classic: 1,456 results, 1.613 years CPU time |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
The server-side patches to mitigate against this problem have now been released as a recommended update (server release v1.0.4). Project administrators have been notified via the normal email lists, but they may not be fully aware of the implications if they haven't been following the previous conversations. If you come across this problem on another project, please notify their administration team that a patch is available and required. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.