Unable to complete GPU WU that is over 2 hours to complete.

Message boards : Number crunching : Unable to complete GPU WU that is over 2 hours to complete.
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1693720 - Posted: 20 Jun 2015, 0:42:54 UTC

Hi,

I have two systems running with an AMD A8 3870K APU. One does fine, the other not so good as it can't finish a GPU WU that is over 2 hours to complete. They both have the same graphics driver and both are running Windows Ultimate.

The only difference is that the one that runs well is running an older version of Boinc and the one that fails is running a relatively new version.

Memory is the same. This wasn't always a problem since the system that is having the trouble now, was performing great a year ago.

Both systems have all of the Windows updates as well.
One thing I should mention is that when it fails I get an error saying that the driver failed and that it recovered, but it doesn't complete the WU.

Any ideas??

Thanks, Allen
ID: 1693720 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1693729 - Posted: 20 Jun 2015, 2:19:23 UTC
Last modified: 20 Jun 2015, 2:37:14 UTC

There's an ongoing discussion in another Thread about running Lunatics' Anonymous Platform and the latest Video Drivers with OpenCL 1.2. Lunatics only recognizes OpenCL1.1 on the older drivers.

Why BOINC 6.10.58 on your one system is allowing the new driver to work is beyond me, though... However; when I view your computers, the one with 6.10.58 does NOT reflect the OpenCL1.2, where 7.4.42 DOES state clearly OpenCL1.2... So, it could be something weird with 6.10.58 allowing Lunatics on that computer to work when it shouldn't... But again, that's beyond me...

Someone with more knowledge will likely chime in soon. However; I would venture to guess that if you updated 6.10.58 to 7.4.42 and that sees the OpenCL1.2, that you will have problems on that machine as well.

If I'm right, then you should downgrade your Video Drivers to the latest that still use OpenCL1.1. Then the problems should go away, and your system(s) will finish crunching WUs.


TL

[EDIT]

Likewise, your system with the NVIDIA card on Driver 352.86 should be on Driver 347.88 if you stay on Lunatics and change BOINC to 7.4.42. Otherwise, you WILL have trouble with that system, too.


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1693729 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1693852 - Posted: 20 Jun 2015, 10:40:09 UTC - in response to Message 1693720.  

AllenIN As Doc says the machine running the 352 driver you may well get problems so rol it back to 347

As for your other problem when i have had problems withdrivers having to recover it means

1 I'm trying to run to much on the machine

2 when i did a driver update on the GPU something went wrong with the install .

So you can either roll back Bionic to a earler version like the other one or try and redo the GPU driver and if possible do it as a clean install.

A 3rd option is to get the latest driver GPU driver i'm not using ATI so don't know if that one is the latest one .

I would also check to see if it's getting to hot , possible dust bunnys in the fan ?

A 4th option is to load 7.0.28 or 7.0.64 Bionic versions .

There is a problem with older Nvida cards GTX220 or under that didn't work with version 7.4.42 Bionic client and Lunatic's so they released a update for Luna 0.43a

So maybe there is a simular problem with your ATI and the latest version of Bionic so roll back to 7.0.64 or 7.0.28 and see if that helps . Those versions are for 64 bit
ID: 1693852 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1693868 - Posted: 20 Jun 2015, 12:34:56 UTC - in response to Message 1693720.  
Last modified: 20 Jun 2015, 12:36:32 UTC

Hi,

I have two systems running with an AMD A8 3870K APU. One does fine, the other not so good as it can't finish a GPU WU that is over 2 hours to complete. They both have the same graphics driver and both are running Windows Ultimate.

The only difference is that the one that runs well is running an older version of Boinc and the one that fails is running a relatively new version.

Memory is the same. This wasn't always a problem since the system that is having the trouble now, was performing great a year ago.

Both systems have all of the Windows updates as well.
One thing I should mention is that when it fails I get an error saying that the driver failed and that it recovered, but it doesn't complete the WU.

Any ideas??

Thanks, Allen


Despite you thinking you have "the same" video driver installed on both machines your results reflect otherwise.

Host 6335328 displays OpenCL 1.2 AMD-APP (1268.1) in its recent results. Which is from Cat 13.9
Host 6755144 displays OpenCL 1.2 AMD-APP (1642.5) in its recent results. Which is from Cat 14.12

If you install a new driver over and old one you can get a mismatch between the video driver & the OpenCL run time. Which may be the case on host 6335328. The odd thing is that looks like the one that is running OK.
Since the driver is crashing on 6755144. You may want to install an older version such as Cat 14.9 or 14.4 on 6755144 to see how it responds.
It is often suggested to use a tool such as Display Driver Uninstaller to completely remove the old driver when upgrading or downgrading drivers. That will prevent the driver and runtime mismatch issues.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1693868 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1693952 - Posted: 20 Jun 2015, 15:46:24 UTC - in response to Message 1693868.  

Allen,


Listen to Hal, the problems with OpenCl have to do with Nvidia drivers not AMD. So follow Hal's advice


Zalster
ID: 1693952 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1693960 - Posted: 20 Jun 2015, 15:54:28 UTC - in response to Message 1693952.  

Zalster he has a machine with Nvida card but he's also having trouble with the AMD 's

so rolling back the 352 driver to 347 is still good advice .

And Allen yes do what Hal900 says he does know a lot about ATI and AMD and there drivers .
ID: 1693960 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1693969 - Posted: 20 Jun 2015, 16:05:04 UTC - in response to Message 1693960.  

Hi Glenn,

He said the machines were Identical. The GTX 650Ti is in dual core machine. He has 2 AMD 4 cores machines that have AMD GPUs.

Allen, the NV driver will only come into play with the Astropulse work units, not the Multibeams. True, he should downgrade the NV driver if he wants to run Astropulses.

Allen, Since I don't know what version of Vista you are running, there is the link to Nvidia drivers. See if they have 347.88 for your machine with the GTX 650
That is the last version with the OpenCl 1.1

http://www.nvidia.com/Download/Find.aspx?lang=en-us


Zalster
ID: 1693969 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1693970 - Posted: 20 Jun 2015, 16:07:57 UTC - in response to Message 1693969.  

Hi Glenn,

He said the machines were Identical. The GTX 650Ti is in dual core machine. He has 2 AMD 4 cores machines that have AMD GPUs.

Allen, the NV driver will only come into play with the Astropulse work units, not the Multibeams. True, he should downgrade the NV driver if he wants to run Astropulses.

Allen, Since I don't know what version of Vista you are running, there is the link to Nvidia drivers. See if they have 347.88 for your machine with the GTX 650
That is the last version with the OpenCl 1.1

http://www.nvidia.com/Download/Find.aspx?lang=en-us


Zalster

Yes, but Zalster, he's running Lunatics, (or some other Anonymous Platform), on the NVIDIA system with the 350.xx Driver. If he upgrades his BOINC version, he may encounter problems with that machine. That's why I stated that he should move back to 347.88 on the NVIDIA machine.


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1693970 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1694014 - Posted: 20 Jun 2015, 17:17:27 UTC

Is the errant machine this one:
ID: 6335328
AMD A8-3870 APU with Radeon(tm) HD Graphics [Family 18 Model 1 Stepping 0]
(4 processors) AMD ATI unknown (512MB) driver: 1.4.1848 Microsoft Windows 7 Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)


And the one that's behaving OK this one:
ID: 6755144
AuthenticAMD
AMD A8-3870 APU with Radeon(tm) HD Graphics [Family 18 Model 1 Stepping 0]
(4 processors) AMD AMD Radeon HD 6520G/6530D/6550D/6620G (SuperSumo) (512MB) driver: 1.4.1848 OpenCL: 1.2 Microsoft Windows 7 Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)


OK, neither has an Nvidia card, so comments about Nvidia drivers don't help much.


So its down to the detection of the AMD coprocessor... On the "good" machine this is detected as an AMD Radeon HD6250.... co processor, and on the other machine it is detected as being an "unidentified AMD" co-processor. That suggests that there is something strange either in the way the MoBo is set up, or the way the drivers have installed.

Now I'm no expert on the AMD Ax family of processors, bbut when I'h had problems with other MAD processors not being correctly identified it has been down to a BIOS setting being out of step with what it should be (8 core FX processors detecting as 3 core Phenom being my least favourite). Seeing you have two machines step through the BIOS settings on the two and make sure that they are the same. Once you done that its time to think about drivers - here make sure yo EXACTLY the same drivers on both machines, and that the driver on the errant machine correctly installed, and there is no residue from other drivers lurking around in the background. Do a google for "Cleansweep" which is a driver uninstaller that is reported to work very well on the ATI/AMD drivers which reportedly can leave sticky bits around after a normal unistall.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1694014 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1694023 - Posted: 20 Jun 2015, 17:49:46 UTC - in response to Message 1694014.  
Last modified: 20 Jun 2015, 18:10:58 UTC

Is the errant machine this one:
ID: 6335328
BOINC 6.10.58 - This machine is running fine. (TL)

AMD A8-3870 APU with Radeon(tm) HD Graphics [Family 18 Model 1 Stepping 0]
(4 processors) AMD ATI unknown (512MB) driver: 1.4.1848 Microsoft Windows 7 Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)


And the one that's behaving OK this one:
ID: 6755144
BOINC 7.4.42 - This is the errant machine. (TL)

AuthenticAMD
AMD A8-3870 APU with Radeon(tm) HD Graphics [Family 18 Model 1 Stepping 0]
(4 processors) AMD AMD Radeon HD 6520G/6530D/6550D/6620G (SuperSumo) (512MB) driver: 1.4.1848 OpenCL: 1.2 Microsoft Windows 7 Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)


OK, neither has an Nvidia card, so comments about Nvidia drivers don't help much.

Rob,

He has a LARGE computer list... First on the list is a machine with BOINC 6.10.56 and has an NVIDIA card with a 350.xx driver. My concern for the NVIDIA machine is that if he Upgrades BOINC to 7.4.42 he will encounter problems on that machine. This is because he is on Lunatics, (or some other Anonymous Platform), and the latest drivers DO NOT work with Anonymous Platform. He WILL need to downgrade the driver to 347.88 for OpenCL 1.1 support and functionality.


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1694023 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1694044 - Posted: 20 Jun 2015, 19:16:15 UTC

Computer 6335328 running BOINC 6.10.58 is doing fine, but of course that old version of BOINC doesn't have the code to specifically identify the GPU portion of the AMD A8-3870 APU.

Computer 6755144 running BOINC 7.4.42 is the one with problems, even though that version of BOINC does identify the type of the GPU.

Both systems are running Lunatics v0.41 installations, rev 1843 builds of the MB7 ATi GPU app. For the problem host, all the errored tasks show "Aborted by user", and the run time for the only one which shows some is well over one day. That squares with Allen's description saying Windows is showing driver restarts and the app fails to make any progress thereafter.

I suggest updating the problem host with Lunatics installer v0.43a. That might possibly fix the issue. Even if it doesn't, having the same rev 2489 build as most other anonymous platform users do would make further troubleshooting easier.
                                                                  Joe
ID: 1694044 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1694050 - Posted: 20 Jun 2015, 19:29:16 UTC - in response to Message 1694014.  

Is the errant machine this one:
ID: 6335328
AMD A8-3870 APU with Radeon(tm) HD Graphics [Family 18 Model 1 Stepping 0]
(4 processors) AMD ATI unknown (512MB) driver: 1.4.1848 Microsoft Windows 7 Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)


And the one that's behaving OK this one:
ID: 6755144
AuthenticAMD
AMD A8-3870 APU with Radeon(tm) HD Graphics [Family 18 Model 1 Stepping 0]
(4 processors) AMD AMD Radeon HD 6520G/6530D/6550D/6620G (SuperSumo) (512MB) driver: 1.4.1848 OpenCL: 1.2 Microsoft Windows 7 Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)


OK, neither has an Nvidia card, so comments about Nvidia drivers don't help much.


So its down to the detection of the AMD coprocessor... On the "good" machine this is detected as an AMD Radeon HD6250.... co processor, and on the other machine it is detected as being an "unidentified AMD" co-processor. That suggests that there is something strange either in the way the MoBo is set up, or the way the drivers have installed.

Now I'm no expert on the AMD Ax family of processors, bbut when I'h had problems with other MAD processors not being correctly identified it has been down to a BIOS setting being out of step with what it should be (8 core FX processors detecting as 3 core Phenom being my least favourite). Seeing you have two machines step through the BIOS settings on the two and make sure that they are the same. Once you done that its time to think about drivers - here make sure yo EXACTLY the same drivers on both machines, and that the driver on the errant machine correctly installed, and there is no residue from other drivers lurking around in the background. Do a google for "Cleansweep" which is a driver uninstaller that is reported to work very well on the ATI/AMD drivers which reportedly can leave sticky bits around after a normal unistall.

For ATI GPUs BOINC devs have to hard code the values because getting the GPU name "is difficult" apparently. Any ATI GPU that BOINC doesn't have a name for it lists as "unknown". Which isn't a problem. There are plenty of machines with GPUs newer than the version of BOINC to give out their name running just fine.
The driver version displayed is really the CAL version. Which until Cat 13.12 always changed with the driver. So it was "good enough". AMD has depreciated CAL support in new GPUs. So they do not provide any "driver information". Eric had to add a special plan class so these GPUs could get work. Currently all drivers that the new GPUs support work with the current apps. However, This will be a problem if the BOINC devs don't figure out a new driver detection scheme. If new apps were made with SDK 2.9 or higher there is currently no way to distinguish if the GPU has a valid driver.

Both of their matched machines have been returning good GPU work for some time. The OP mentioned:
The only difference is that the one that runs well is running an older version of Boinc and the one that fails is running a relatively new version.

That host 6755144 has many aborted tasks. One task with a listed run time of 1 days 5 hours 43 min 22 sec. Which fits with the issue they are mentioning.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1694050 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1694215 - Posted: 21 Jun 2015, 7:31:30 UTC

lunatics versions for 7.0.28 and above bionic Clients

0.41
0.42
0.43
0.43a

So maybe loading the latest version might just help otherwise what Hal9000 says
ID: 1694215 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1694468 - Posted: 22 Jun 2015, 1:21:25 UTC - in response to Message 1693729.  

There's an ongoing discussion in another Thread about running Lunatics' Anonymous Platform and the latest Video Drivers with OpenCL 1.2. Lunatics only recognizes OpenCL1.1 on the older drivers.

Why BOINC 6.10.58 on your one system is allowing the new driver to work is beyond me, though... However; when I view your computers, the one with 6.10.58 does NOT reflect the OpenCL1.2, where 7.4.42 DOES state clearly OpenCL1.2... So, it could be something weird with 6.10.58 allowing Lunatics on that computer to work when it shouldn't... But again, that's beyond me...

Someone with more knowledge will likely chime in soon. However; I would venture to guess that if you updated 6.10.58 to 7.4.42 and that sees the OpenCL1.2, that you will have problems on that machine as well.

If I'm right, then you should downgrade your Video Drivers to the latest that still use OpenCL1.1. Then the problems should go away, and your system(s) will finish crunching WUs.

Thanks a bunch TimeLord04 !! I'm thinking you're right, since it used to work flawlessly. Will give it a try!
Thanks again.
Allen

TL

[EDIT]

Likewise, your system with the NVIDIA card on Driver 352.86 should be on Driver 347.88 if you stay on Lunatics and change BOINC to 7.4.42. Otherwise, you WILL have trouble with that system, too.


TL

ID: 1694468 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1694470 - Posted: 22 Jun 2015, 1:24:51 UTC - in response to Message 1693852.  

Thanks for the info Glenn. I have tried much of what you suggested, but I haven't tried rolling back on Boinc yet. I too thought that it might indeed be a heat related problem, but temps are not out of range and the failure is only on GPU WU's over 2 hours long, so temps should not be a problem.
Thanks,

Allen
ID: 1694470 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1694471 - Posted: 22 Jun 2015, 1:27:25 UTC - in response to Message 1693969.  

Zalster,

You were right....lots of help here.

"Allen, the NV driver will only come into play with the Astropulse work units, not the Multibeams. True, he should downgrade the NV driver if he wants to run Astropulses."

Thanks for that info, if I upgrade, I will be sure to make the adjustment. I am running the NV latest updates.

Thanks!

Allen
ID: 1694471 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1694472 - Posted: 22 Jun 2015, 1:31:25 UTC - in response to Message 1693868.  

Hal,

Sounds like you know my machines better than I do......grin.

I update them both at the same time, but apparently the driver upgrade didn't take on one of them. I will try loading an older driver on the one that I am having trouble with and see if that make a difference. Once step at a time.

Thanks,

Allen
ID: 1694472 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1694473 - Posted: 22 Jun 2015, 1:32:43 UTC - in response to Message 1693952.  

Allen,


Listen to Hal, the problems with OpenCl have to do with Nvidia drivers not AMD. So follow Hal's advice


Zalster


Sometimes I wonder why I bother to update.......

Allen
ID: 1694473 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1694474 - Posted: 22 Jun 2015, 1:37:07 UTC - in response to Message 1694014.  

Rob,

You have the two machines backward but you still make sense to me.
The one that gives a full reporting of what GPU it has is the errant one and the other that is unknown is the good one.

Thanks,

Allen
ID: 1694474 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1694475 - Posted: 22 Jun 2015, 1:40:54 UTC - in response to Message 1694044.  

Hi Joe,

Seems like I've got a slew of things to check out. Thanks much for your post. I will take all into consideration and try them all.

Thanks again,

Allen
ID: 1694475 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Unable to complete GPU WU that is over 2 hours to complete.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.