Posts by Jeff Buck


log in
1) Message boards : Number crunching : nVidia GTX 750 question. (Message 1579769)
Posted 1 day ago by Profile Jeff Buck
Part Number: 02G-P4-3753-KR
1176MHz Base Clock
1255MHz Boost Clock
2048MB GDDR5 Memory
5400MHz Memory Clock
86.4GB/s Memory Bandwidth
Total Power Draw : 60 Watts

Yeah, that's the one I'm running. I just checked GPU-Z and the sensor reading it's currently giving for the Core Clock is 1320.2MHz. Don't know what that really means. It's also showing it only running at about 75-80% of TDP at 1.1680V. I haven't tweaked anything, just installed it and let it run!

That 60w power draw estimate is what I expected, versus the GT 640's rating of 65w. So, since my power consumption actually went up 4w (minimum), perhaps the GT 640 was drawing considerably less than the TDP.
2) Message boards : Number crunching : nVidia GTX 750 question. (Message 1579761)
Posted 1 day ago by Profile Jeff Buck
I just installed a used EVGA 750Ti last Thursday, 02G-P4-3753-KR, that I snagged on eBay for $100 (delivered). It's a single-fan model without an extra connector. It replaced a GT 640 in my xw9400. Theoretically, a non-OC 750Ti should have been using about 5w less power than the GT 640. However, I found that this one was actually drawing a minimum of 4w more, perhaps because it's factory overclocked to 1176 Mhz. It only gets to about 62C with the fan running in default mode. (It actually resides outside the case on a riser cable, since both primary x16 slots inside the case are currently occupied by GTX 660s.)
3) Message boards : Number crunching : Pending fairwell to a faitful one... (Message 1579164)
Posted 2 days ago by Profile Jeff Buck
Now what to replace that GPU with? I want to stay "all Nvidia", so its a choice between a GTX760, or wait a short while and go for a GTX960 (I don't want to step up to a GTX780 or GTX980 on this cruncher as the price step is just a bit too much due to a trip to the USA grabbing my "spare" cash...)

If you're considering the GTX 760, you might also want to look at the GTX 670. When I did that last December, I ultimately decided that the 670 had a slight edge in both performance and price, and that's what I ended up choosing (a used one however, from eBay, but I've been quite pleased with it). Of course, prices may have changed since December, so my comparison may not hold.
4) Message boards : Number crunching : @Pre-FERMI nVidia GPU users: Important warning (Message 1579135)
Posted 2 days ago by Profile Jeff Buck
If the task doesn't have any single pulses it will validate. I've even seen cases where the WingPerson found 1 single pulse and the effected card that didn't find the single pulse still validated.

I was just looking at some of the Valid AP tasks for host 7339909, which you referenced in Message 1579054. I do see several where his single pulse count of 0 validated against other non-zero single pulse counts. In one case, the other hosts actually found 9. Fortunately, what I've also seen is that the canonical result always seems to go to one of the hosts with the non-zero count, even if the 7339909 was the _0 task. So, even if the offending host does get credit, its results aren't actually getting into the science database. On the other hand, I would expect that there are cases where 2 of the old card, new driver hosts validate against each other (much like those ATI hosts with the 30 Autocorr overflows that irritate me), their result will end up in the science database, without even an opportunity for another host to crunch the WU and possibly report a non-zero result.
5) Message boards : Number crunching : Perhaps my 7th wingman will be the charm! (Message 1578617)
Posted 4 days ago by Profile Jeff Buck
I was sent one of those, http://setiathome.berkeley.edu/workunit.php?wuid=1594867559

And that looks like another WU with no pulses to be found, despite all the back and forth it'll end up with.

I seem to recall that in June of last year, there was a major fiasco with one of the AP apps for ATI that was causing Computation errors just as fast as the scheduler could send them out. A lot of WUs were failing with too many errors after 6 wingmen crapped out. I think that's the most wingmen I've ever run across, until now. Also, the only time I ever got a "Completed, can't validate" status for one of my tasks.
6) Message boards : Number crunching : Perhaps my 7th wingman will be the charm! (Message 1578599)
Posted 4 days ago by Profile Jeff Buck
I'm posting this just for fun. WU #1561509069 seems to have been a real hot potato for about 7 weeks now, with a whole bunch of different reasons for getting dropped. My host is the first one, patiently waiting for another reliable host to come along. Even the one that finally "finished", to trigger the inconclusive, is a runaway machine that got a 30/30 overflow! The real irony is that, after all is said and done, since my host found 0 single pules and 0 repetitive pulses, all this churning will be for naught, anyway. ;^)

7) Message boards : Number crunching : Panic Mode On (89) Server Problems? (Message 1577157)
Posted 6 days ago by Profile Jeff Buck
Fair point. There are only 22 'Workunits waiting for validation' shown on the server status page (down from 63 a few minutes ago), so it sounds like that counting routine is broken too, even though the page itself seems to be working despite the error messages.

Yes, I'd say that counting routine is broken, seeing as how it appears that out of those supposed 22 'Workunits waiting for validation', I have 88 of them. As of 23 Sep 2014, 23:58 UTC, the next time my host 7057115 reported, the problem seemed to have cleared. Unfortunately, that window between 23:41 and 23:51 was the first time that host was able to report its backlog of completed tasks, following HTTP errors all night, then the outage, and then more HTTP errors. It had something like 260 tasks waiting to report by then!
8) Message boards : Number crunching : Panic Mode On (89) Server Problems? (Message 1577144)
Posted 6 days ago by Profile Jeff Buck
Here's another twist. For WU 1598612308 mine was the third host tie-breaker. My task is "Completed, waiting for validation", while the other 2 are still "Completed, validation inconclusive".

And APs are included, too:
http://setiathome.berkeley.edu/workunit.php?wuid=1599954929
9) Message boards : Number crunching : Panic Mode On (89) Server Problems? (Message 1577127)
Posted 7 days ago by Profile Jeff Buck
Richard,

Looks like I was editing my post while you were typing, so I'll copy it here.

Edit: Oh yeah, I've got lots more! It looks like any tasks I reported between about 23:41 and 23:51 UTC yesterday, where I was the second wingman to report (which should have triggered the validators) are hung up in Neverland!

Perhaps some other high-volume users could take a look in that window and see if there's a pattern.

Fair point. There are only 22 'Workunits waiting for validation' shown on the server status page (down from 63 a few minutes ago), so it sounds like that counting routine is broken too, even though the page itself seems to be working despite the error messages.

Here's another twist. For WU 1598612308 mine was the third host tie-breaker. My task is "Completed, waiting for validation", while the other 2 are still "Completed, validation inconclusive".
10) Message boards : Number crunching : Panic Mode On (89) Server Problems? (Message 1577120)
Posted 7 days ago by Profile Jeff Buck
Richard,

Looks like I was editing my post while you were typing, so I'll copy it here.

Edit: Oh yeah, I've got lots more! It looks like any tasks I reported between about 23:41 and 23:51 UTC yesterday, where I was the second wingman to report (which should have triggered the validators) are hung up in Neverland!


Perhaps some other high-volume users could take a look in that window and see if there's a pattern.
11) Message boards : Number crunching : Panic Mode On (89) Server Problems? (Message 1577113)
Posted 7 days ago by Profile Jeff Buck
Has anyone else noticed a problem with the the validators running way behind. I know of at least 5 of my WUs that have been reported by both me and my wingman and are still sitting in a Validation Pending state, with "Completed, waiting for validation", for over 15 hours.

http://setiathome.berkeley.edu/workunit.php?wuid=1599799024
http://setiathome.berkeley.edu/workunit.php?wuid=1599798865
http://setiathome.berkeley.edu/workunit.php?wuid=1599792725
http://setiathome.berkeley.edu/workunit.php?wuid=1599792731
http://setiathome.berkeley.edu/workunit.php?wuid=1599799042

There are probably more, but I stopped checking after 5. I noticed these before I went to bed last night, but decided to wait until this morning to see if they had validated. They don't appear to have done so. (I even cleared my browser cache, just to make sure I wasn't getting old news.)

These were all reported after yesterdays outage.

Edit: Oh yeah, I've got lots more! It looks like any tasks I reported between about 23:41 and 23:51 UTC yesterday, where I was the second wingman to report (which should have triggered the validators) are hung up in Neverland!
12) Message boards : Number crunching : Phantom Triplets (Message 1576279)
Posted 8 days ago by Profile Jeff Buck
I ran some GPU memory test programs today, first one called OCCT and then a couple from the Folding@Home Utilities page, MemtestG80 and MemtestCL. I wasn't terribly impressed by the OCCT program, but the other two seem to be reasonable facsimiles of Memtest86, adapted for the GPU. MemtestG80 is just for NVIDIA CUDA-enabled GPUs, while the MemtestCL can run on both NVIDIA and ATI Open-CL cards.

None of them detected any errors on the GTX 550 Ti, even after many iterations and several hours. However, the programs seem to have some limitations in regard to the maximum amount of memory they can test. The max I could get MemtestG80 to look at was 680MB out of the 1024MB on the GPU, even though GPU-Z only reported 81MB being in use prior to running the test. MemtestCL, however, was able to test 924MB under the same conditions. The advantage of MemtestG80 is that it runs about 8-10 times faster than MemtestCL (which took about 2.5 hours to test the 924MB for 50 iterations of its 13 different test schemes).

Of course, the absence of errors doesn't really prove that there isn't a weak bit lurking somewhere in there but, for now, I think I'll just let it ride. I've got BoincLogX running to capture Result files, so if the phantom Triplets show up again, perhaps there will be some more evidence available to help pin it down.
13) Message boards : Number crunching : Well it had to happen... (Message 1576101)
Posted 8 days ago by Profile Jeff Buck
Sooner or later one of my crunchers was bound to be drawn against a pair of "King Corrupters" - those crunchers that return a very high proportion of invalid results:
http://setiathome.berkeley.edu/workunit.php?wuid=1596624815
The two conspirators have about 1000 invalids between them, and I bet they are pleased with their high rate of turn-around......

It's funny, I was just commenting yesterday in Message 1575822 as to whether or not this problem had ever been addressed. A bit of research and I found it hadn't. I've actually got a list of 11 of these hosts w/ ATI GPUs that are corrupting the science database every time they match up against one another. They are 6062303, 5744165, 6228988, 6772486, 5440804, 6901854, 6836000, 6755483, 6929369, 6936833, and 6156050. There are probably more, but I quit tracking them back in February when it became obvious that the project admins didn't care about the corruption.

About as bad as those NV card users that return nothing but -9 overflows.

Actually, this is much worse because they're validating bad results against each other, causing good results to be thrown out.
14) Message boards : Number crunching : Phantom Triplets (Message 1575841)
Posted 9 days ago by Profile Jeff Buck
In the spirit of "You never know what you'll find when you start pulling on a loose thread", I just ran across an example of another problem that's been going on for a long time. I decided to start preemptively watching my "Inconclusive" tasks on that machine, to try to catch these phantom Triplets while the result file might still be available to somebody. I just checked the 4 new ones that appeared today and, while I didn't catch any phantom Triplets, I did find one WU where my task got marked Inconclusive while my first wingman's got immediately marked Invalid. Initially, I assumed that this was an example of the ongoing "-9 overflow with truncated Stderr" problem for which the fix still hasn't been implemented. However, on closer inspection, this wingman turned out to be one of those who is still trying to run v7 tasks through a v6 app, specifically:

setiathome_enhanced 6.11 $Revision: 850 $ g++ (Ubuntu/Linaro 4.5.1-8ubuntu2~ppa1) 4.5.1

Of course, since the host has an Anonymous user, under the present setup the only people who could contact him to set him straight are the project admins. In the meantime, that host shows: Invalid (643) · Error (52)

Not good!
15) Message boards : Number crunching : Phantom Triplets (Message 1575829)
Posted 9 days ago by Profile Jeff Buck
In the case of the GPU you need to test a little more than the memory itself, that why we use the Furmark to be sure the entire GPU is ok not just the memory.

Okay, perhaps I'll take a look into that tomorrow, as well. Thanks!
16) Message boards : Number crunching : Phantom Triplets (Message 1575825)
Posted 9 days ago by Profile Jeff Buck
Not sure is what you look for but, if you try to find if your GPU is reliable, did you try the Furmark test?

http://www.ozone3d.net/benchmarks/fur/

it heavely stress your GPU even more tha crunching, allmost to the top of it´s capacity, so any trouble with memory or other components normaly apears on it.

Thanks, Juan. I really was hoping to find just a straightforward memory test, similar to the Memtest86+ that I've run to test main board memory. Unless, the problem gets worse, I think I'd rather postpone a full-blown GPU stress test.
17) Message boards : Number crunching : Phantom Triplets (Message 1575822)
Posted 9 days ago by Profile Jeff Buck
I think for the most part the Boinc backend does a good job to protect the science using the validation approach (with notable past exceptions).

In this case, that's certainly true. As far as I know, my card's occasional phantom Triplets are all failing validation. Although, now that I think about it, I wouldn't really know if they do sometimes get validated, since Valid tasks don't tend to catch my eye like Invalid ones do. :^) (Way back in January, I reported on a situation in Two wrongs make a right where runaway ATI rigs were able to validate against each other with bogus Autocorr counts of 30, polluting the science database. I haven't looked lately to see whether that's still ongoing.)

Detecting, and correcting for this kindof spurious event completely at the end-user side is going to take a few more leaps, with some cost-benefit and feasibility constraints.

Yes, I kind of got that from the "soft error" Wiki you referenced. A lot of trade-offs involved, perhaps for nebulous benefits.

Edit: Well, I think I just found the answer to the question I raised in Two wrongs make a right. It's still an ongoing problem (8 months later) as evidenced by WU #1596114528, where two runaway ATI rigs with Autocorr=30 got validated while the non-runaway rig with a Pulse count of 6 and a Triplet count of 1 got consigned to the wastebasket as Invalid.
18) Message boards : Number crunching : Phantom Triplets (Message 1575815)
Posted 9 days ago by Profile Jeff Buck
This seems to be one of few tools out there to test video memory.
http://mikelab.kiev.ua/index_en.php?page=PROGRAMS/vmt_en

I just downloaded that and tried slogging through the Readme. On first reading, it's hard to tell whether it'll even work on newer cards. (It doesn't look like it's been updated since 2008 and only talks about supporting nVidia GeForce 8xxx and 9xxx series cards. I'll try to take another look at it with fresh eyes tomorrow. It may still be easier for me to try to use that Linux CUDA GPU memtest from a Ubuntu boot disk (unless I have to compile it myself, in which case I'd be over my head, again). I'll have to weigh my options.
19) Message boards : Number crunching : Phantom Triplets (Message 1575812)
Posted 9 days ago by Profile Jeff Buck
That's not to say PG&E's electric service is entirely prodicable, though. Sometimes they make California feel like a third world country, with random outages that seem to have no external cause (i.e., perfectly bright, sunny day or calm, clear night, no car meeting power pole, but suddenly, no juice).


Prepare to be thoroughly depressed at the answer to that :)

Enron: The Smartest Guys in the Room (2005)

Well, although the Enron debacle has cost us ratepayers a whole lot of money (thanks to our brain-dead politicians), the (un)reliability issue is something that plagued us here long before those "smart guys" came along! :^)
20) Message boards : Number crunching : Phantom Triplets (Message 1575772)
Posted 9 days ago by Profile Jeff Buck
Having something that consistently fails is indeed much nicer. Which is why Jason mentioned trying to force an error. With an error rate of 1 in 800 that would put it at 1 about every 16 days. You said most of the errors happen late at night, but have they happened on the same day(s)? It makes me think there could be some kind of large industrial plant powering up or down ever 2 weeks. Causing just enough of a fluctuation in the line voltage to make your GPU to go a little nuts.

Ah, if only it was that consistent! Although it may average out to once every 16 days, the intervals have actually ranged from 2 days up to 35 days, the most recent interval being just 4 days. Not happening the same day or time, either, I'm afraid. Let's see, I have (in order) a Monday at ~1:35 AM, a Wednesday at ~4:40 AM, a Tuesday at ~6:05 PM, a Tuesday at ~10:50 PM, and a Saturday at ~3:20 PM. No industrial plants nearby, either. I live in a fairly rural area. That's not to say PG&E's electric service is entirely prodicable, though. Sometimes they make California feel like a third world country, with random outages that seem to have no external cause (i.e., perfectly bright, sunny day or calm, clear night, no car meeting power pole, but suddenly, no juice). However, that's been true for as long as I've lived here (28+ years) and this thing with this one GPU is quite new. I'm keeping the suggestions on voltages, both yours and Jason's, in reserve for now, pending any worsening of the situation.

I'd still like to try running a GPU memory test first, if I can find one similar to memtest86, just to test the memory, not stress test the whole GPU.


Next 20

Copyright © 2014 University of California