Suggestions On Recent Increase In Invalids & Validatrion Inconclusives?

Message boards : Number crunching : Suggestions On Recent Increase In Invalids & Validatrion Inconclusives?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Lee Gresham
Avatar

Send message
Joined: 12 Aug 03
Posts: 159
Credit: 130,116,228
RAC: 0
United States
Message 1642510 - Posted: 15 Feb 2015, 19:40:47 UTC

Question is can this be fixed? I've seen and increase in Invalid and Inconclusive results on all 3 of my hosts but Host 7258025 has gone into orbit. It was around 200 Inconclusives last night and now up to 335 today. Its invalids are up to 111. Most all invalids and Inconclusives are for GPU tasks.
The hardware/software details follow:
DP35DP motherboard with a core2 Quad @ 2.4GHZ and 4GB RAM. Windows 7 Pro x64
Nvidia GTX560 Ti (1024MB) running driver 320.49.
Seti version 70.0.64 and lunatics x41zc_win32_cuda42.
No over voltage or over clocking currently in use. MSI afterburner fan controller is being used and GPU temp stays around 60 F. Host is running 2 work units at a time.

other hosts are 7459253 and 6896459

Is this system salvageable? Please advise
Delta-V
ID: 1642510 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1642562 - Posted: 15 Feb 2015, 21:34:38 UTC - in response to Message 1642510.  
Last modified: 15 Feb 2015, 21:38:58 UTC

Something definitely going on with the 560ti there. I would have thought it could be as simple as needing a good cleanout, check temperatures under load etc, though you mention cool running, so other options to explore for sure.

with the 560ti factory superclock versions, maybe even some at reference frequencies, at least in the beginning there were some issues with the board partners applying a given clock, but not applying a sufficient core voltage for artefact free operation over an extended period.

If that's the case here, and components do degrade over time too, then scanning for artefacts using any of a number of tools ( I use OCCT) would reveal the situation.

Artefacts evident, then a small ~2-notch core voltage increase on the GPU should reduce the artefacts. For number crunching you really want no artefacts ever, though a few hours artefact scan at a high load, and no artefacts, is good enough IMO.

Last possibility that comes to mind (excepting failure outright of the card) would be the age and headroom of the PSU in the machine.

At one stage, with one user's machine in the GPU User's group, there was an oddball Solid state disk drive firmware fault ( IIRC Crucial brand) that made it look like the cards were malfunctioning (Two of them). Once an SSD firmware update was applied all issues went away.

That anecdote is meant more as an example of how under the modern OS environment, the underlying layers of complexity in OS, drivers, firmware, hardware , and how they interlink, mean that one flaw can upset the whole apple cart is weird ways.

That suggests while you are doing basic checks/cleanout/testing, reseating components etc, then going through every driver and system firmware update etc, could be a good move, complete with checking every system driver date against current ones. ( e.g. sometimes Intel dchipset drivers need to be forced ). Also checking for DPC latency issues that would point to driver (or lower level) issues
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1642562 · Report as offensive
Lee Gresham
Avatar

Send message
Joined: 12 Aug 03
Posts: 159
Credit: 130,116,228
RAC: 0
United States
Message 1642578 - Posted: 15 Feb 2015, 22:12:50 UTC - in response to Message 1642562.  

The power supply is a Xion 850 watt and about 2 years old.The 560 Ti ran in another system with an Intel D975 MB for about 3 years with an LGA version P4 960
processor 3.6GHZ (w/Hyper Threading) running XP Pro x64. The 560 Ti was always reliable. The current system was built in the last year and performed well for several months. In the last 2 or so months invalids and inconclusives slowly increased in number until the last couple of weeks. I don't have another 560 Ti to swap test but do have a power hungry GTX470. I will try OCCT first. The 560 Ti has always run overclocked and with some over voltage. It may have been a super clock model but don't know now for sure. The base GPU clock is 900MHZ and the base memory clock is 1050MHZ. Factory voltage is 1.012V. I usually ran at 950MHZ, 1100MHZ and 1.250 respectively. Thanks very much for the help.
Delta-V
ID: 1642578 · Report as offensive
Lee Gresham
Avatar

Send message
Joined: 12 Aug 03
Posts: 159
Credit: 130,116,228
RAC: 0
United States
Message 1642585 - Posted: 15 Feb 2015, 22:40:13 UTC - in response to Message 1642578.  

Got the temperature wrong. Not 60F, 60C
Delta-V
ID: 1642585 · Report as offensive
Profile JakeTheDog
Avatar

Send message
Joined: 3 Nov 13
Posts: 153
Credit: 2,585,912
RAC: 0
United States
Message 1642662 - Posted: 16 Feb 2015, 1:34:30 UTC

Have you tried testing your RAM? When RAM fails for me usually it results in total system failure, can't boot, or crashes when reaching desktop screen. But sometimes it starts off with smaller problems, like files not saving when I press save or programs randomly freezing, before it gets to total failure. The Windows memory test doesn't really detect small problems for me. I use Memtest. Usually detects problems within the first pass.
ID: 1642662 · Report as offensive
Lee Gresham
Avatar

Send message
Joined: 12 Aug 03
Posts: 159
Credit: 130,116,228
RAC: 0
United States
Message 1642739 - Posted: 16 Feb 2015, 6:36:28 UTC - in response to Message 1642662.  

I downloaded memtest. will try tomorrow.

Thanks for helping
Delta-V
ID: 1642739 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1642755 - Posted: 16 Feb 2015, 6:59:42 UTC - in response to Message 1642739.  

Windows from Vista onward has it's own memory test routines as well.
Memory Diagnostics Tool (Win Vista) & Windows Memory Diagnostic (Win7)
It shuts down the system, reboots & runs itself, shuts down & re-boots in to Windows & gives you the results.
You can also watch it while it runs, but if the faulty memory is high up in the address numbers, you'll get bored well before you see it report any issues.
Grant
Darwin NT
ID: 1642755 · Report as offensive
Lee Gresham
Avatar

Send message
Joined: 12 Aug 03
Posts: 159
Credit: 130,116,228
RAC: 0
United States
Message 1643158 - Posted: 17 Feb 2015, 6:19:58 UTC - in response to Message 1642662.  

I ran memtest86 v6.0 all afternoon and found no memory errors. Oddly looking thru the work unit details column of inconclusive results for computer 7258025 I found that at least 1 more of the other clients sharing each inconclusive result was also inconclusive. Sometimes all 3 clients were inclusive.I went back thru several pages and there were always 2 inconclusives. Is there a Seti or boinc problem here as well? In the meantime I tried some configuration changes. I set mbcuda.cfg back to default and tried set app_info.xml back to 1 instance of seti at a time. The invalids stopped at 13:04UTC. Don't know why. There were no changes made near that time. If time permits tomorrow I'll do a tear down and reseat all components. If that doesn't work I'll try the GTX470
Delta-V
ID: 1643158 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1643160 - Posted: 17 Feb 2015, 6:23:22 UTC - in response to Message 1643158.  

If the invalids stopped already, you likely found something in your congigs.
ID: 1643160 · Report as offensive
Eric

Send message
Joined: 21 May 99
Posts: 5
Credit: 35,990,041
RAC: 15
United States
Message 1643533 - Posted: 18 Feb 2015, 2:05:32 UTC - in response to Message 1643160.  
Last modified: 18 Feb 2015, 2:06:56 UTC

Interesting.. I just upgraded from an OC`d 460 which never gives artifacts or errors *never being no where near often enough to care* and a 560 ti I got from a buddy.. I had the 460 running lunatics and 2 wu at a time.. The 560 is bone stock, no Lunatics, 1 wu at a time. The 560 is always throwing errors/inconclusives, and after a day or two of running at around 65C, the card has a hiccup and downclocks to 405mhz until I reboot the pc.. I can test for artififacts and never find any within a 1-2 hour period.. Maybe I will have to try and let it run for a day or two. *edited for spelling*
ID: 1643533 · Report as offensive
Lee Gresham
Avatar

Send message
Joined: 12 Aug 03
Posts: 159
Credit: 130,116,228
RAC: 0
United States
Message 1643602 - Posted: 18 Feb 2015, 5:47:15 UTC - in response to Message 1643160.  

2 new invalids and 10 validation inconclusive today. That's far fewer than recent days but problem not found yet. I've downloaded the latest OCCT and will run it tomorrow.
Delta-V
ID: 1643602 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1643744 - Posted: 18 Feb 2015, 13:57:26 UTC - in response to Message 1643602.  

and remember to set complexity in OCCT's GPU test to maximum. that should get it nice and warm.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1643744 · Report as offensive
Lee Gresham
Avatar

Send message
Joined: 12 Aug 03
Posts: 159
Credit: 130,116,228
RAC: 0
United States
Message 1644363 - Posted: 20 Feb 2015, 5:53:19 UTC - in response to Message 1643744.  

I ran OCCT for minutes with no over clocking and no problems and then an hour with over voltage and over clocking. Still no errors. I didn't run complexity at max though. I will try that tomorrow. If still no errors I will change to most recent Nvidia driver and try work again. After that if errors persist I will go to most recent Boinc version. If that works I will try latest anonymous. If doesn't work I'll try installing the GTX 470 and try again. Among the later cuda drivers is there a preferred version? I know some in the past have been problematic.
Delta-V
ID: 1644363 · Report as offensive
Lee Gresham
Avatar

Send message
Joined: 12 Aug 03
Posts: 159
Credit: 130,116,228
RAC: 0
United States
Message 1644662 - Posted: 20 Feb 2015, 21:36:12 UTC - in response to Message 1643744.  

I set complexity to max and ran OCCT several times today with both over clocking and stock clocking and never any errors.
Delta-V
ID: 1644662 · Report as offensive
Lee Gresham
Avatar

Send message
Joined: 12 Aug 03
Posts: 159
Credit: 130,116,228
RAC: 0
United States
Message 1644813 - Posted: 21 Feb 2015, 6:25:11 UTC

About 21:00 UTC I shut seti down and uninstalled Boinc 7.0.64 and wiped all files and registry entries on the ailing computer. I did a clean install of the latest Nvidia drivers for the 560Ti and a clean install of Boinc 7.4.36. After testing again with OCCT I started Seti running the stock application with nominal over clocking and over voltage about 20:45 UTC. So far there have been no Invalids or Inconclusives. If all is well tomorrow I will begin running the latest Lunatics version. Got my fingers crossed.

Thanks for all the help!
Delta-V
ID: 1644813 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1644822 - Posted: 21 Feb 2015, 6:49:33 UTC - in response to Message 1644813.  
Last modified: 21 Feb 2015, 6:49:57 UTC

mmm, very interesting. Well one less freaky behaving GPU :)

Great to see the card coming up roses with OCCT too. That's a pretty aggressive test on max complexity, so I'd call the hardware solid.

I wonder what breaks with these drivers though. For sanity's sake I may have to chalk that one up to eddie's again.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1644822 · Report as offensive
Lee Gresham
Avatar

Send message
Joined: 12 Aug 03
Posts: 159
Credit: 130,116,228
RAC: 0
United States
Message 1645049 - Posted: 21 Feb 2015, 19:52:21 UTC - in response to Message 1644822.  

A little side info, This GTX560 Ti card is an Asus. I usually prefer Nvidia or EVGA graphics cards. I noticed yesterday that 1 of the 2 fans on the card was turning slightly back and forth instead of spinning even when the card was under a heavy load. Spinning it by hand started it right up and it has kept spinning since then. I bet it won't start right after the next power down. Also the cover on the card is a semi open design and more of the card heat simply flows back into the computer case. Only a little actually flows thru the back card vent. I prefer the covers that are fully enclosed inside the computer case and most heat is pumped out the back.
I added some overclocking before starting data crunching yesterday on the latest Boinc. The default GPU clock is 900MHZ and default memory clock is 1050GHZ. I set GPU at 920 and memory at 1070. Default voltage was 1.012 and I set it at 1.025.
No problems with these settings. After about 11 pm central I moved the GPU to 930 and and the memory to 1080. Voltage was unchanged. This morning there were quite a few inconclusives. It is back to the default now except for the voltage. I may go ahead and try Lunatics after a day or 2 and of course I'll have to do something about that fan.

Thanks
Delta-V
ID: 1645049 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1645172 - Posted: 22 Feb 2015, 3:53:29 UTC - in response to Message 1645049.  
Last modified: 22 Feb 2015, 3:54:36 UTC

I've seen fan starting troubles, like those, happen with larger Noctua case fans when the connection to power isn't great, for example through a Molex adapter that was a little grungy or some such. That's in a situation when the Fan is fine, runs quietly once started etc.

Assuming the fans draw their power from the PCIe via some connector inside the card, I'd definitely be reseating the PSU connectors, and checking out under the shroud for a little connector to wiggle.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1645172 · Report as offensive
Lee Gresham
Avatar

Send message
Joined: 12 Aug 03
Posts: 159
Credit: 130,116,228
RAC: 0
United States
Message 1645190 - Posted: 22 Feb 2015, 5:20:22 UTC - in response to Message 1645172.  

Probably time to disassemble the fan and apply a little oil too. I like the air tool oil. Its thin and really soaks into the bronze bearings. There's a lot of miles on the fans in that 560 Ti.
Delta-V
ID: 1645190 · Report as offensive
Lee Gresham
Avatar

Send message
Joined: 12 Aug 03
Posts: 159
Credit: 130,116,228
RAC: 0
United States
Message 1646863 - Posted: 26 Feb 2015, 13:53:40 UTC

After a short period where previous efforts to correct the Invalid and Inconclusive flood seemed to have worked they returned with a vengeance earlier this week. In the end changing out the video card seems to have been the only real fix. Thanks for the help! Now back to crunching.............
Delta-V
ID: 1646863 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Suggestions On Recent Increase In Invalids & Validatrion Inconclusives?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.