Phantom Triplets

Message boards : Number crunching : Phantom Triplets
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1575671 - Posted: 21 Sep 2014, 18:17:35 UTC

This is probably one of those questions for which there really isn't any answer, but I thought I'd toss it into the arena in case someone else has run into this specific situation.

Over the last couple of months, the GTX 550 Ti on my HP Compaq dc7700 has had a tiny number of tasks declared Invalid because it identified a large number of apparently "phantom" Triplets. By tiny, I mean just 3 out of 2,653 tasks in August, and 2 out of 1,342 this month (so far).

The most recent one occurred yesterday, where my host found 19 Triplets (before overflowing), while the wingmen found 0. The other counts, 10 Spikes, 1 Autocorr, and 1 Gaussian, were the same for all three hosts. The other four WUs where this has occurred were similar, with no hosts but mine finding Triplets, although on one occasion, the wingmen did find 9 while I got 24 before overflowing.

I've tried to find some commonality among the 5 tasks that went haywire and have thus far come up empty. They were all split from different files, they all have different Angle Ranges (though they are similiar, ranging from 0.379459 to 0.445254). The time of day that they ran range from middle of the afternoon to middle of the night, so it doesn't seem to be an ambient temperature issue or some sort of interruption by an OS task that runs at a particular time. I occasionally use that host for streaming video (it's in the living room, hooked up to my 50-inch plasma), but certainly not in the middle of the night and not yesterday afternoon, so I can't blame it on that. That host isn't used for anything else.

The tasks with the phantom Triplets are widely separated, with anywhere from 2 days to 35 days between occurrences (the most recent interval being about 4 days). Also, since I run 2 tasks at a time on that GPU, each of the Invalid tasks would have overlapped with at least 2 other tasks which didn't encounter a Triplets glitch and which validated just fine.

I checked the temperature on that GPU and found that it's currently running at 67C (with fan at 64%), and the maximum temperature reached since the last reboot Friday evening (before the most recent Invalid on Saturday) is just 70C (with fan at 70%). That doesn't seem excessive to me for that card and, given that these Invalids aren't occurring in a long run, or to tasks running concurrently, or just at a time of day when ambient temperature peaks, it doesn't seem like a temperature issue to me.

It's possible that there's a dust speck somewhere that I missed during the last clean-out, about 6 weeks ago, but if that's the case, why this odd, very rare and inconsistent manifestation of a problem that only shows up with phantom Triplets and not other signal types?

Has anybody else run into a similar problem, or can anyone think of something else I might look at? At this point, I don't think it warrants a great deal of investigative effort, but I do find it mildly annoying and if it continues, or the frequency increases, I may have to try to dig deeper. I suppose I'm old-fashioned, but I actually expect a computer to be consistent! ;^)
ID: 1575671 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1575686 - Posted: 21 Sep 2014, 19:13:53 UTC

Is the 550 Ti one of the Ti cards prone issues with power?
If it is you could log the voltages and see if there are any drops on the 12v supply that coincide with the running time window for any future invalid tasks.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1575686 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1575693 - Posted: 21 Sep 2014, 19:41:36 UTC - in response to Message 1575686.  
Last modified: 21 Sep 2014, 19:42:57 UTC

Hmmmm... It seems to me I recall something about the 650 Ti having such issues (or perhaps it was the 560 Ti), but I don't remember anything regarding the 550 Ti. Seems like a voltage drop/spike would affect both tasks running on the GPU at the time and not in such a consistent way, i.e., just generating phantom Triplets. I'll keep it in mind, though, since anything's a possibility at this point. By the way, I've been running that 550 Ti in that host for over 13 months and the problem didn't start to appear until last month.

I'm beginning to wonder if it could be a GPU memory issue, one little bit starting to fail out of the 1023MB on that GPU. Since the S@h app isn't using all of that memory, even with 2 tasks running concurrently, perhaps the failing bit only comes into play occasionally. Does anybody know if the MB Cuda apps (I'm running Lunatics cuda42 on this box) always allocate memory the same way, such as bottom up or top down, or is it more random? Also, does anybody know of a GPU memory test app that will run under Windows? I see there's a "CUDA GPU memtest" program on Sourceforge, but that seems to be written for Linux. I suppose I could add it to an Ubuntu boot disk and run it from there if I had to, but I'd rather not. ;^)
ID: 1575693 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1575698 - Posted: 21 Sep 2014, 20:01:14 UTC - in response to Message 1575671.  

...Has anybody else run into a similar problem, or can anyone think of something else I might look at? At this point, I don't think it warrants a great deal of investigative effort, but I do find it mildly annoying and if it continues, or the frequency increases, I may have to try to dig deeper. I suppose I'm old-fashioned, but I actually expect a computer to be consistent! ;^)


Yeah it does get quite difficult when you want to isolate variation that rare, for the following list of possibilities:
- possible 'genuine' core or memory voltage issue on the card or anywhere else in the system. That includes some cards needing a small voltage bump from the factory, especially 560ti factory overclock models. Background relates to how card manufacturers bin parts & set base clocks, usually accepting some number of visual artefacts per time period, for desktop gaming cards.
- The inherent susceptibility of 'consumer grade' hardware to soft error
http://en.wikipedia.org/wiki/Soft_error#Causes_of_soft_errors, especially,
IBM estimated in 1996 that one error per month per 256 MiB of ram was expected for a desktop computer
, So enterprise ECC RAM and ECC features on professional Tesla model cards.
- known vagaries in floating point arithmetic, driven by cross platform or cross application version validation.
- possible bugs at any software, firmware or hardware level

and probably others that don't come immediately to mind. As it happens the example you gave 'feels' most like the first case, as opposed to the second, third or fourth possibilities listed, though without direct experimentation for the purposes of isolation, it'd be mostly guessing.

One way to test the scenario might be to actually drop the GPU core voltage a notch or two, and see if the invalid rate climbs. If so, then going the other way or reducing clocks should eliminate the particular cause, leaving 0 or more other sources of variation. The chances remain of the other causes. Yeah you're crossing the boundaries between certainty and some scary door here ;)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1575698 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1575731 - Posted: 21 Sep 2014, 21:24:48 UTC - in response to Message 1575698.  

Thanks for the insights, Jason. Thus far, I haven't tinkered with the voltage or clocks on that card at all. It's just running at all the defaults, with no overclocking. Probably, unless the frequency of this little anomaly increases significantly, I'll try to avoid altering any of those settings. The idea of actually trying to increase the Invalid rate in order to possibly get a clue to diagnose the existing low failure rate doesn't really appeal to me at the moment! ;^)

- The inherent susceptibility of 'consumer grade' hardware to soft error
http://en.wikipedia.org/wiki/Soft_error#Causes_of_soft_errors, especially,

IBM estimated in 1996 that one error per month per 256 MiB of ram was expected for a desktop computer


That's some really interesting stuff...definitely a scary door! I had no idea that "soft" errors were so prevalent and could be caused so easily. Alpha particles and cosmic rays and thermal neutrons, oh my! I can't wait for my bank to blame the next data breach on a thermal neutron.

Seriously, though, could a soft error possibly be consistent enough to cause the sort of rare, yet consistent, hiccup that I'm seeing where only Triplets (and lots of them) are being incorrectly identified where none apparently exist?
ID: 1575731 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1575738 - Posted: 21 Sep 2014, 21:41:41 UTC - in response to Message 1575731.  

Thanks for the insights, Jason. Thus far, I haven't tinkered with the voltage or clocks on that card at all. It's just running at all the defaults, with no overclocking. Probably, unless the frequency of this little anomaly increases significantly, I'll try to avoid altering any of those settings. The idea of actually trying to increase the Invalid rate in order to possibly get a clue to diagnose the existing low failure rate doesn't really appeal to me at the moment! ;^)

- The inherent susceptibility of 'consumer grade' hardware to soft error
http://en.wikipedia.org/wiki/Soft_error#Causes_of_soft_errors, especially,

IBM estimated in 1996 that one error per month per 256 MiB of ram was expected for a desktop computer


That's some really interesting stuff...definitely a scary door! I had no idea that "soft" errors were so prevalent and could be caused so easily. Alpha particles and cosmic rays and thermal neutrons, oh my! I can't wait for my bank to blame the next data breach on a thermal neutron.

Seriously, though, could a soft error possibly be consistent enough to cause the sort of rare, yet consistent, hiccup that I'm seeing where only Triplets (and lots of them) are being incorrectly identified where none apparently exist?

About every 4-6 weeks the 8500 GT I'm using throws a fit and starts trashing work. Despite it being in a lab with temperature, humidity, & power regulation. I tried a few things like shutting down the system weekly to see if it would help, but nothing I tried has so far.
I decided it was rare enough I didn't want to spend any more time looking into it.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1575738 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1575752 - Posted: 21 Sep 2014, 22:22:04 UTC - in response to Message 1575738.  

About every 4-6 weeks the 8500 GT I'm using throws a fit and starts trashing work. Despite it being in a lab with temperature, humidity, & power regulation. I tried a few things like shutting down the system weekly to see if it would help, but nothing I tried has so far.
I decided it was rare enough I didn't want to spend any more time looking into it.

Actually, I think if my 550Ti actually went off the rails like that, continuing to throw Invalids for an extended period after it got the first one, I'd understand it better, or at least be more understanding of it, than the way it just pharts once and then resumes as if there's nothing wrong. :^) At least then, if cleaning it or reseating it or increasing the fan speed, etc., didn't make the problem go away, I wouldn't feel hesitant to replace it. Under the present circumstances, though, I certainly wouldn't want to do that.

Speaking of replacing a card, a couple months ago I replaced an 8600 GT in my old IBM Thinkcentre with one of the (relatively) new ASUS GT 630 1GB cards. It's passively cooled and only has a maximum draw of 25w versus the 47w max for the 8600 GT. The actual power draw seems to be running only about 13w. The card only cost me $33.00 USD delivered (new, on eBay) and I figure it saves me over $4/month in electricity, so should pay for itself in about 8 months. Oh, and it provides about an 18% boost in production (at least as measured by Credits). Might be worth considering for your cranky 8500 GT.
ID: 1575752 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1575765 - Posted: 21 Sep 2014, 23:23:56 UTC - in response to Message 1575752.  

About every 4-6 weeks the 8500 GT I'm using throws a fit and starts trashing work. Despite it being in a lab with temperature, humidity, & power regulation. I tried a few things like shutting down the system weekly to see if it would help, but nothing I tried has so far.
I decided it was rare enough I didn't want to spend any more time looking into it.

Actually, I think if my 550Ti actually went off the rails like that, continuing to throw Invalids for an extended period after it got the first one, I'd understand it better, or at least be more understanding of it, than the way it just pharts once and then resumes as if there's nothing wrong. :^) At least then, if cleaning it or reseating it or increasing the fan speed, etc., didn't make the problem go away, I wouldn't feel hesitant to replace it. Under the present circumstances, though, I certainly wouldn't want to do that.

Speaking of replacing a card, a couple months ago I replaced an 8600 GT in my old IBM Thinkcentre with one of the (relatively) new ASUS GT 630 1GB cards. It's passively cooled and only has a maximum draw of 25w versus the 47w max for the 8600 GT. The actual power draw seems to be running only about 13w. The card only cost me $33.00 USD delivered (new, on eBay) and I figure it saves me over $4/month in electricity, so should pay for itself in about 8 months. Oh, and it provides about an 18% boost in production (at least as measured by Credits). Might be worth considering for your cranky 8500 GT.

Having something that consistently fails is indeed much nicer. Which is why Jason mentioned trying to force an error. With an error rate of 1 in 800 that would put it at 1 about every 16 days. You said most of the errors happen late at night, but have they happened on the same day(s)? It makes me think there could be some kind of large industrial plant powering up or down ever 2 weeks. Causing just enough of a fluctuation in the line voltage to make your GPU to go a little nuts.

In my case the 8500 GT is a work machine I use for testing. Hardware is kept the same in order to have consistent test platforms or for instances when something needs to be regressed. I did try a few years ago to get some GT 430's for several of the systems. So the systems could correctly support Windows Aero, but that never happened. I had also made a proposal to replace all of the monitors in the lab that were using CRTs with LCDs. I even included the amount of time it would take to recoup the cost in electric to pay for them. Sometimes our bean counters are not the brightest.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1575765 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1575772 - Posted: 21 Sep 2014, 23:57:12 UTC - in response to Message 1575765.  

Having something that consistently fails is indeed much nicer. Which is why Jason mentioned trying to force an error. With an error rate of 1 in 800 that would put it at 1 about every 16 days. You said most of the errors happen late at night, but have they happened on the same day(s)? It makes me think there could be some kind of large industrial plant powering up or down ever 2 weeks. Causing just enough of a fluctuation in the line voltage to make your GPU to go a little nuts.

Ah, if only it was that consistent! Although it may average out to once every 16 days, the intervals have actually ranged from 2 days up to 35 days, the most recent interval being just 4 days. Not happening the same day or time, either, I'm afraid. Let's see, I have (in order) a Monday at ~1:35 AM, a Wednesday at ~4:40 AM, a Tuesday at ~6:05 PM, a Tuesday at ~10:50 PM, and a Saturday at ~3:20 PM. No industrial plants nearby, either. I live in a fairly rural area. That's not to say PG&E's electric service is entirely prodicable, though. Sometimes they make California feel like a third world country, with random outages that seem to have no external cause (i.e., perfectly bright, sunny day or calm, clear night, no car meeting power pole, but suddenly, no juice). However, that's been true for as long as I've lived here (28+ years) and this thing with this one GPU is quite new. I'm keeping the suggestions on voltages, both yours and Jason's, in reserve for now, pending any worsening of the situation.

I'd still like to try running a GPU memory test first, if I can find one similar to memtest86, just to test the memory, not stress test the whole GPU.
ID: 1575772 · Report as offensive
Profile shizaru
Volunteer tester
Avatar

Send message
Joined: 14 Jun 04
Posts: 1130
Credit: 1,967,904
RAC: 0
Greece
Message 1575788 - Posted: 22 Sep 2014, 1:25:39 UTC - in response to Message 1575772.  

That's not to say PG&E's electric service is entirely prodicable, though. Sometimes they make California feel like a third world country, with random outages that seem to have no external cause (i.e., perfectly bright, sunny day or calm, clear night, no car meeting power pole, but suddenly, no juice).


Prepare to be thoroughly depressed at the answer to that :)

Enron: The Smartest Guys in the Room (2005)
ID: 1575788 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1575798 - Posted: 22 Sep 2014, 1:52:13 UTC - in response to Message 1575772.  

Having something that consistently fails is indeed much nicer. Which is why Jason mentioned trying to force an error. With an error rate of 1 in 800 that would put it at 1 about every 16 days. You said most of the errors happen late at night, but have they happened on the same day(s)? It makes me think there could be some kind of large industrial plant powering up or down ever 2 weeks. Causing just enough of a fluctuation in the line voltage to make your GPU to go a little nuts.

Ah, if only it was that consistent! Although it may average out to once every 16 days, the intervals have actually ranged from 2 days up to 35 days, the most recent interval being just 4 days. Not happening the same day or time, either, I'm afraid. Let's see, I have (in order) a Monday at ~1:35 AM, a Wednesday at ~4:40 AM, a Tuesday at ~6:05 PM, a Tuesday at ~10:50 PM, and a Saturday at ~3:20 PM. No industrial plants nearby, either. I live in a fairly rural area. That's not to say PG&E's electric service is entirely prodicable, though. Sometimes they make California feel like a third world country, with random outages that seem to have no external cause (i.e., perfectly bright, sunny day or calm, clear night, no car meeting power pole, but suddenly, no juice). However, that's been true for as long as I've lived here (28+ years) and this thing with this one GPU is quite new. I'm keeping the suggestions on voltages, both yours and Jason's, in reserve for now, pending any worsening of the situation.

I'd still like to try running a GPU memory test first, if I can find one similar to memtest86, just to test the memory, not stress test the whole GPU.

This seems to be one of few tools out there to test video memory.
http://mikelab.kiev.ua/index_en.php?page=PROGRAMS/vmt_en
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1575798 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1575803 - Posted: 22 Sep 2014, 2:17:49 UTC - in response to Message 1575731.  
Last modified: 22 Sep 2014, 2:18:37 UTC

Yeah, the hardware validation process, rates of failure due to radioactivity in the device encapsulation, and external factors, are statistical processes, and similar processes like leakage govern whether a given chip will end up in a budget card or an enterprise level Tesla.

The cynical part of me is saying Probably throw in that the cards are made to last (stay in spec) through the warranty period with a duty cycle expected of a gamer or desktop user, as opposed to a 24x7 HPC usage scenario.

What it adds up to for me, is that when someone buys a several thousand dollar Tesla, they pay for extra testing, headroom, and cherry picked parts, they do actually get *something* for their money.

My challenge continues to be to figure out how to make it work for the rest of us. I think for the most part the Boinc backend does a good job to protect the science using the validation approach (with notable past exceptions). Detecting, and correcting for this kindof spurious event completely at the end-user side is going to take a few more leaps, with some cost-benefit and feasibility constraints.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1575803 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1575810 - Posted: 22 Sep 2014, 2:29:42 UTC - in response to Message 1575772.  

I'd still like to try running a GPU memory test first, if I can find one similar to memtest86, just to test the memory, not stress test the whole GPU.


Yeah, won't hurt to start eliminating the easiest possible causes. In the not too-distant future, I'm hoping to convert some of my test pieces into a user friendly stress/bench test format. Until then, general memory and artefact scanning tools can turn up something.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1575810 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1575812 - Posted: 22 Sep 2014, 2:34:46 UTC - in response to Message 1575788.  

That's not to say PG&E's electric service is entirely prodicable, though. Sometimes they make California feel like a third world country, with random outages that seem to have no external cause (i.e., perfectly bright, sunny day or calm, clear night, no car meeting power pole, but suddenly, no juice).


Prepare to be thoroughly depressed at the answer to that :)

Enron: The Smartest Guys in the Room (2005)

Well, although the Enron debacle has cost us ratepayers a whole lot of money (thanks to our brain-dead politicians), the (un)reliability issue is something that plagued us here long before those "smart guys" came along! :^)
ID: 1575812 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1575815 - Posted: 22 Sep 2014, 2:44:20 UTC - in response to Message 1575798.  

This seems to be one of few tools out there to test video memory.
http://mikelab.kiev.ua/index_en.php?page=PROGRAMS/vmt_en

I just downloaded that and tried slogging through the Readme. On first reading, it's hard to tell whether it'll even work on newer cards. (It doesn't look like it's been updated since 2008 and only talks about supporting nVidia GeForce 8xxx and 9xxx series cards. I'll try to take another look at it with fresh eyes tomorrow. It may still be easier for me to try to use that Linux CUDA GPU memtest from a Ubuntu boot disk (unless I have to compile it myself, in which case I'd be over my head, again). I'll have to weigh my options.
ID: 1575815 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1575822 - Posted: 22 Sep 2014, 3:03:45 UTC - in response to Message 1575803.  
Last modified: 22 Sep 2014, 3:23:57 UTC

I think for the most part the Boinc backend does a good job to protect the science using the validation approach (with notable past exceptions).

In this case, that's certainly true. As far as I know, my card's occasional phantom Triplets are all failing validation. Although, now that I think about it, I wouldn't really know if they do sometimes get validated, since Valid tasks don't tend to catch my eye like Invalid ones do. :^) (Way back in January, I reported on a situation in Two wrongs make a right where runaway ATI rigs were able to validate against each other with bogus Autocorr counts of 30, polluting the science database. I haven't looked lately to see whether that's still ongoing.)

Detecting, and correcting for this kindof spurious event completely at the end-user side is going to take a few more leaps, with some cost-benefit and feasibility constraints.

Yes, I kind of got that from the "soft error" Wiki you referenced. A lot of trade-offs involved, perhaps for nebulous benefits.

Edit: Well, I think I just found the answer to the question I raised in Two wrongs make a right. It's still an ongoing problem (8 months later) as evidenced by WU #1596114528, where two runaway ATI rigs with Autocorr=30 got validated while the non-runaway rig with a Pulse count of 6 and a Triplet count of 1 got consigned to the wastebasket as Invalid.
ID: 1575822 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1575823 - Posted: 22 Sep 2014, 3:08:01 UTC
Last modified: 22 Sep 2014, 3:10:23 UTC

Not sure is what you look for but, if you try to find if your GPU is reliable, did you try the Furmark test?

http://www.ozone3d.net/benchmarks/fur/

it heavely stress your GPU even more tha crunching, allmost to the top of it´s capacity, so any trouble with memory or other components normaly apears on it.
ID: 1575823 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1575825 - Posted: 22 Sep 2014, 3:17:21 UTC - in response to Message 1575823.  

Not sure is what you look for but, if you try to find if your GPU is reliable, did you try the Furmark test?

http://www.ozone3d.net/benchmarks/fur/

it heavely stress your GPU even more tha crunching, allmost to the top of it´s capacity, so any trouble with memory or other components normaly apears on it.

Thanks, Juan. I really was hoping to find just a straightforward memory test, similar to the Memtest86+ that I've run to test main board memory. Unless, the problem gets worse, I think I'd rather postpone a full-blown GPU stress test.
ID: 1575825 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1575826 - Posted: 22 Sep 2014, 3:19:35 UTC - in response to Message 1575825.  
Last modified: 22 Sep 2014, 3:20:04 UTC

In the case of the GPU you need to test a little more than the memory itself, that why we use the Furmark to be sure the entire GPU is ok not just the memory.
ID: 1575826 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1575829 - Posted: 22 Sep 2014, 3:26:16 UTC - in response to Message 1575826.  

In the case of the GPU you need to test a little more than the memory itself, that why we use the Furmark to be sure the entire GPU is ok not just the memory.

Okay, perhaps I'll take a look into that tomorrow, as well. Thanks!
ID: 1575829 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : Phantom Triplets


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.