whole serie of data blocks failing with SoG

Message boards : Number crunching : whole serie of data blocks failing with SoG
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Sixkid

Send message
Joined: 10 Jan 12
Posts: 17
Credit: 8,248,305
RAC: 21
Netherlands
Message 2009480 - Posted: 27 Aug 2019, 6:57:27 UTC
Last modified: 27 Aug 2019, 6:59:49 UTC

my client is bizzy with 22ap11ae.21036.6611.6.33.88_0 ( opencl_nvidia_SoG) and it's starting to fail.
remaining time says 2 minute 9 but after 2 minute 9 nothing changed and remaining time starts to add up till deadline is reached and block is aborted , this been going on for the past few numbers that i've been watching whats going on in this serie of blocks.
ID: 2009480 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13732
Credit: 208,696,464
RAC: 304
Australia
Message 2009481 - Posted: 27 Aug 2019, 7:07:53 UTC

My systems processed all those WUs using SoG with no issues.
Grant
Darwin NT
ID: 2009481 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2009483 - Posted: 27 Aug 2019, 7:26:58 UTC - in response to Message 2009480.  

my client is bizzy with 22ap11ae.21036.6611.6.33.88_0 ( opencl_nvidia_SoG) and it's starting to fail.
remaining time says 2 minute 9 but after 2 minute 9 nothing changed and remaining time starts to add up till deadline is reached and block is aborted , this been going on for the past few numbers that i've been watching whats going on in this serie of blocks.

Timminator2 first reported this problem, then Robert Miles reported the same problem on a different series of tasks. Might I question that you are running on Windows 10 with the latest Nvidia 436 drivers? This is the common factor in these tasks. All the problem tasks are Arecibo VHARS of angle range 2.7. All tasks error out because of exceeding the compute time limit. The host can process other types of task with no problems.

https://setiathome.berkeley.edu/forum_thread.php?id=80859&postid=2009430

I have suggested rolling back to earlier drivers as a fix.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2009483 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2009486 - Posted: 27 Aug 2019, 9:22:48 UTC
Last modified: 27 Aug 2019, 9:33:25 UTC

Such symptoms usually mean video driver restart. App doesn't get informed about such event, last OpenCL runtime call just never returns. Hence abortion by deadline reaching.
To change driver is good advice provided app worked OK with another driver version. And to narrow area where to find issue I would recommend to add (temporarily) -v 2 option.
Also there was special debug build that reports each OpenCL call into stderr - that would be ideal to see where exactly problem occurs.
From OS side-check system log to see if driver restart events had place.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2009486 · Report as offensive
Michael Donikowski
Volunteer tester

Send message
Joined: 29 May 99
Posts: 8
Credit: 22,914,826
RAC: 38
United States
Message 2009551 - Posted: 28 Aug 2019, 2:28:12 UTC - in response to Message 2009483.  

Same problem here with Aricebo GPU tasks only.

Windows 10 Pro, GTX1660Ti, Stock apps.

Problem occurs with nvidia drivers released in August 2019. (436 series).
All Aricebo CPU tasks and all Green Bank tasks run fine.

All tasks run fine with nvidia drivers released in July 2019. (431.36)
ID: 2009551 · Report as offensive
Michael Donikowski
Volunteer tester

Send message
Joined: 29 May 99
Posts: 8
Credit: 22,914,826
RAC: 38
United States
Message 2009552 - Posted: 28 Aug 2019, 2:31:26 UTC - in response to Message 2009551.  

Same problem here with Aricebo GPU tasks only.

Windows 10 Pro, GTX1660Ti, Stock apps.

Problem occurs with nvidia drivers released in August 2019. (436 series).
All Aricebo CPU tasks and all Green Bank tasks run fine.

All tasks run fine with nvidia drivers released in July 2019. (431.36)



Forgot to mention that it did not matter if it was a SoG or cuda application.
ID: 2009552 · Report as offensive
Profile Stargate (SA)
Volunteer tester
Avatar

Send message
Joined: 4 Mar 10
Posts: 1854
Credit: 2,258,721
RAC: 0
Australia
Message 2009557 - Posted: 28 Aug 2019, 5:20:59 UTC
Last modified: 28 Aug 2019, 5:23:28 UTC

Found myself going back to previous driver 430.86 as new driver 436.15 was stopping and restarting the same Aricebo GPU task.. (nothing was aborted, just kept restarting from scratch) Everything running fine again..
ID: 2009557 · Report as offensive
Sixkid

Send message
Joined: 10 Jan 12
Posts: 17
Credit: 8,248,305
RAC: 21
Netherlands
Message 2009821 - Posted: 29 Aug 2019, 15:37:33 UTC
Last modified: 29 Aug 2019, 16:13:29 UTC

Update :
436.15 driver has troubles with SoG
436.02 driver has troubles with SoG
431.60 driver is OK for my system Win10 pro - Gtx1080Ti

if you have troubles to get the 431.60 installed due to system check failing when installing the driver.
then use DDU ( google it ) and uninstall GPU drivers without restart ( also turn off internet ) , when finished install the 431.60 driver and it will succeed. turn on internet and happy crunching times are back..
ID: 2009821 · Report as offensive
Profile IntenseGuy

Send message
Joined: 25 Sep 00
Posts: 190
Credit: 23,498,825
RAC: 9
United States
Message 2009832 - Posted: 29 Aug 2019, 16:13:28 UTC - in response to Message 2009821.  

I'm having the exact same issues. Hoping the next NVIDIA driver release fixes things.
SETI@home classic workunits 103,576
SETI@home classic CPU time 655,753 hours
ID: 2009832 · Report as offensive
Sixkid

Send message
Joined: 10 Jan 12
Posts: 17
Credit: 8,248,305
RAC: 21
Netherlands
Message 2009836 - Posted: 29 Aug 2019, 16:20:30 UTC
Last modified: 29 Aug 2019, 16:32:52 UTC

4 units of the 22ap11ae serie done, so it's working again

Btw Keith Thanks for the tip :D

Happy crunching.
ID: 2009836 · Report as offensive
Jeff

Send message
Joined: 8 May 99
Posts: 5
Credit: 98,361,983
RAC: 150
United States
Message 2010081 - Posted: 31 Aug 2019, 2:32:23 UTC

Going back to an earlier version driver (431.60) appears to have solved the issue.
This is the first time that I remember an Nvidia update causing a problem.
Thanks to the suggestions here I was able to install the older driver successfully.
ID: 2010081 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2010086 - Posted: 31 Aug 2019, 3:10:57 UTC

Deductive reasoning indicated the driver was likely the cause of the issues. Simple fix to prove the case. Pretty sure it is the new integer scaling function in the driver. Nvidia likely didn't test the features outside of testing on the old pixel mapped games. Probably didn't even consider people use their cards for numerical crunching too besides just gaming.

Anybody report this problem to Nvidia yet?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2010086 · Report as offensive
Sixkid

Send message
Joined: 10 Jan 12
Posts: 17
Credit: 8,248,305
RAC: 21
Netherlands
Message 2010137 - Posted: 31 Aug 2019, 12:43:25 UTC - in response to Message 2010086.  

Deductive reasoning indicated the driver was likely the cause of the issues. Simple fix to prove the case. Pretty sure it is the new integer scaling function in the driver. Nvidia likely didn't test the features outside of testing on the old pixel mapped games. Probably didn't even consider people use their cards for numerical crunching too besides just gaming.

Anybody report this problem to Nvidia yet?



yes i did :) told em to contact the people from seti@home and or Boinc
ID: 2010137 · Report as offensive
Jeff

Send message
Joined: 8 May 99
Posts: 5
Credit: 98,361,983
RAC: 150
United States
Message 2010224 - Posted: 31 Aug 2019, 21:24:39 UTC - in response to Message 2010086.  

Thanks for the reminder!
Nvidia will not do anything if they are unaware of the problem.
I did report the issue originally but will follow up with reports on the new driver.
I think you are correct that no one at Nvidia thought that anyone would be using their cards for anything other than video games and cryptomining.
It is important that everyone who encounters this issue report it to Nvidia through their feedback page. Hopefully, it will be resolved soon.
(But after the second flawed release I am not holding my breath.)
ID: 2010224 · Report as offensive
Profile IntenseGuy

Send message
Joined: 25 Sep 00
Posts: 190
Credit: 23,498,825
RAC: 9
United States
Message 2011501 - Posted: 10 Sep 2019, 20:18:02 UTC

Version 436.30 is out today. I wonder if nVidia fixed things....
ID: 2011501 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2011506 - Posted: 10 Sep 2019, 20:36:17 UTC - in response to Message 2011501.  

Version 436.30 is out today. I wonder if nVidia fixed things....

Be the guinea pig and report back.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2011506 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13732
Credit: 208,696,464
RAC: 304
Australia
Message 2011554 - Posted: 11 Sep 2019, 4:42:16 UTC - in response to Message 2011506.  

Version 436.30 is out today. I wonder if nVidia fixed things....
Be the guinea pig and report back.
No mention of it being fixed in the release notes, but there's also no mention of it in the existing issues notes either.
Grant
Darwin NT
ID: 2011554 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2011555 - Posted: 11 Sep 2019, 5:03:48 UTC - in response to Message 2011554.  

I doubt seriously if they even consider any use other than graphics. Unless there is a specific note addressing the failure of the driver with compute loads, I would avoid the drivers.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2011555 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 2011556 - Posted: 11 Sep 2019, 5:27:01 UTC - in response to Message 2011555.  

I doubt seriously if they even consider any use other than graphics. Unless there is a specific note addressing the failure of the driver with compute loads, I would avoid the drivers.


Remind me, weren't they trying to do real time analysis of data coming in? Hopefully no one updates the drivers....
ID: 2011556 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2011561 - Posted: 11 Sep 2019, 7:09:16 UTC - in response to Message 2011556.  

I doubt seriously if they even consider any use other than graphics. Unless there is a specific note addressing the failure of the driver with compute loads, I would avoid the drivers.


Remind me, weren't they trying to do real time analysis of data coming in? Hopefully no one updates the drivers....

Not sure I understand the question. I believe the problem with the 436 drivers is the experimental integer scaling that is implemented. That feature is to make the bit mapped arcade games of the past look good on 4K monitors. It has nothing to do with compute. Except I think it is messing up compute somehow.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2011561 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : whole serie of data blocks failing with SoG


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.