Message boards :
Number crunching :
Validation inconclusive with V0.38g installer
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
Hmmm, I just noticed an inconclusive workunit where BOTH results were crunched using x38g: I noticed one the other day that found the same spikes and Gaussians (no spikes or triplets)running two different versions of the CUDA (38g and 23) that would not validate. Weird. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Hmmm, I just noticed an inconclusive workunit where BOTH results were crunched using x38g: Or, multiple WUs ran on NVIDIAs 400/500 series? I've yet to see errors comming from this. GPUgrid WUs error out, if I set my 480 higher then 900MHz core freq. And some MB WUs on SETI, also, but starts at >950MHz. which also could be temp. related, IMO. (Found triplets in a row) Used to see quite a lot on 9800GTX+ card, which isn't used anymore, atm., only once on the GTX480.(Also heat) With local temps (outside) going >25C and >30C for tomorrow, I'll swith all rigs off, for one or two days! I'll be of too......... |
Tazz Send message Joined: 5 Oct 99 Posts: 137 Credit: 34,342,390 RAC: 0 |
Not sure if it's related to x39c (or x38g)or not; in the last 12 hours I've had three BSODs. There's a lot of things that have happened too. Power dimming then clicking off for a few seconds then back on (lightning storm), I installed the x39c client, I updated the Nvidia drivers from 267.59 to 275.33. Not sure what's gone wrong, but I'm trying to get to the bottom of it. I have a hdd image from before I touched anything to fall back on. CPU temp = 58-61C GPU temp = 50C 2 wu at a time. GPU load - 90-95% I post if/when I find out what's going on. </Tazz> |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Ho-hum, another day of running the 39c and nothing to report. Running two at a time seems to have cured my problems. No slowdowns, no bluescreens, just my RAC climbing slowly up towards 11k. GPU temp 67c, fan speed 70%, slightly OCed to 900/1800/1804. Memory usage 81-90%/ 728-740MB. My little GTS 450 is happy now. :-) PROUD MEMBER OF Team Starfire World BOINC |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I noticed one the other day that found the same spikes and Gaussians (no spikes or triplets)running two different versions of the CUDA (38g and 23) that would not validate. Those kind are worth looking at comparing the conditions on the local host and the wingman in detail (e.g. if the wingman has invalids/errors), and watching the task go through to either validate or be marked invalid. Certainly there are known instances where the older apps can gang-up on the newer ones (fortunately relatively rare so far), but also we are seeing cases crop up with even CPU wingmen just reporting flakey results (detected by rerunning the task on CPU 6.03 under bench conditions, and getting results matching the newer GPU app ). That's not the way around I'm used to, as I'm more inclined to expect flakey GPU results from either direction, so it's slightly surprising that the picture isn't always immediately clear. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Something like this one where I found a pulse the other two didn't but we all three still got credit? http://setiathome.berkeley.edu/workunit.php?wuid=769599295 PROUD MEMBER OF Team Starfire World BOINC |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Something like this one where I found a pulse the other two didn't but we all three still got credit? http://setiathome.berkeley.edu/workunit.php?wuid=769599295 I reckon that one pulse they missed due to the inaccurate chirp in 6.10, so fitting my expected patterns. The hilarious part about this 'circus' is that if there had been a CPU wingman thrown in there, then it likely would have agreed with you on that pulse, since the CPU chirp is highly accurate. At this stage I think that the noticeable slight variations in results will continue until V7 release. I have considered introducing random error back into the results to closer match legacy results, but then the question becomes "Which inaccurate build do you try to match? Legacy CPU apps with inaccurate spikes ? Or Legacy GPU apps with inaccurate chirp ?" ... so I've decided against putting the error back in for now on the hopes that the validator will choose wisely. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Yeah, the next one I found had two pulses the others had missed. Go figure. :-) We still all three got validated. I'm getting a lot of work validating without going to inconclusive so things are looking up I guess. Very few invalids, only two or three but they've cleared already. PROUD MEMBER OF Team Starfire World BOINC |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
Someone might want to take a look at this one: http://setiathome.berkeley.edu/workunit.php?wuid=769169215 What I find interesting about it is: GT 240 ended with a -9 on 30 spikes and nothing else. V6 GTX 570 ended with -9 after finding 31 pulses, but no spikes. V12 That seems really wrong. EDIT: Same thing exactly http://setiathome.berkeley.edu/result.php?resultid=1965990051 |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
GTX 570 ended with -9 after finding 31 pulses, but no spikes. V12 And indeed it is. No-one, absolutely no-one should be running V12, especially on a Fermi (GTX 570). "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
Oh, so it's THAT version that's causing trouble on Fermi cards. You know, you read these things and they don't pertain to you directly so you don't remember them as specifically as you should... I'm sorry. That's an old issue and I should have recognized it. I wish there was a way to make that combination stop asking for tasks. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I wish there was a way to make that combination stop asking for tasks. The good news is that there is :) . That is the project will move to V7 when satisfied the kinks are ironed out. For that there will be a minimum Cuda driver specified, likely Cuda 3.2 capability at this point, for which stock & Opt code will be refined & universally applicable. There will be howls for sure, but less inconclusives & no more work issued to legacy applications. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
rob smith Send message Joined: 7 Mar 03 Posts: 22189 Credit: 416,307,556 RAC: 380 |
Some will mourn, other's will rejoice, and some won't notice.... I've often pondered about the possibility of some form of "auto update", but I have my doubts about the practicalities involved - it might be easier to have an alert for a new STABLE version being out (along the lines of the messages presented by the likes of adobe at boot/program start), but how to get this to those fit and forgets who are running really old versions of either BOINC or the S@H as it would probably need a change to the api among other things.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Tazz Send message Joined: 5 Oct 99 Posts: 137 Credit: 34,342,390 RAC: 0 |
Not sure if it's related to x39c (or x38g)or not; in the last 12 hours I've had three BSODs. Well, Memtest said my RAM was OK after four passes. SpinRite said that every part of my hdd could be written to and read from with no errors. Heat isn't an issue. I don't think corrupted files are to blame because I dropped back to x32f and driver 267.59 two days ago and all is running fine now. After dropping back the OS was still hanging for a couple of seconds every now and then. It was worse when I was on a heavy Flash webpage. I downloaded and reinstalled Flash and the hang-ups went away. I just stepped up to x38g but kept the 267.59 drivers, I'll let this run for a week or so then step up to the next newer drivers (not the newest) and see how that goes. </Tazz> |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Is it possible to Show A Banner on the BOINC Manager screens, if a nVIDIA FERMI card, reported by BOINC, is running rev.x32f, (in stead of rev.x38g)? (And download this file automatic to (ab)users of a 400 & 500 series and other (FERMIs), Probably to complicated, for, maybe a few 1000s users? Maybe less. They don't represent the typical Set and Forget Crowd, otherwise they probably were not aware, those apps excisted and where to find them. But if is(n't a real) problem and 7th version is used, those who run stock can notice the UPDate, or don't noticed it, at all. End of story, for those? Still have UPDate 1 rig, a HP Pavillion, C2QUAD+GTS250, it has an old driver, and driver has to support CUDA 2.3 (Min.) My X9650 @ 3.51 400MHz DDR2 (FSB=1600MHz)+GTX480, is running x38f. The I7-2600(HT)+2x EAH5870s running x38f . Doing a lot of ATROPULSE, on CPU but more on ATI GPUs (2x2) (Still one rig down :( ) (Just got back from Nijmegen, which was hit by a heavy thunderstorm, hail with the sice of an egg or bigger, the car I rented, looked like it was 'stoned' never seen so much lightning in 2 hours, >350). Also had to wait a few hours, cause a 3x 380 KV power line and 5 big trees, no trains, diesel-elecric included, cause a srike was going on, too........................... Ehh, sorry driftin off TOPIC. |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Not sure if it's related to x39c (or x38g)or not; in the last 12 hours I've had three BSODs. In past I upgraded/updated FlashPlayer and Antivir tool and the machine freezes after every ~ 12 hours. Only switch off via button at the PC case gave me again control over the machine. So I needed to reboot every 8 hours to prevent the freeze. The next upgrade/update of FlashPlayer and Antivir tool solved the prob and the machine was never again freezed. - Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. - |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Is it possible to Show A Banner on the BOINC Manager screens, if a This happend not after we saw the GTX4xx+ and CUDA V12 combis -> only errors. Also no other way to info or solve the prob.. I write still PMs if I see wingmen with this combi.. - Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. - |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
As Sutaru said, the problem right now isn't with the x32f, it still runs okay, the problem is with people that found the older V12 app from Raistmer and then upgraded their equipment to the new Fermi cards without changing to the new Apps that can run them. I too have sent many PMs trying to get these guys attention but most don't have PMs turned on or are just ignoring them. I think I've only had one person actually reply. PROUD MEMBER OF Team Starfire World BOINC |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
As Sutaru said, the problem right now isn't with the x32f, it still runs okay, the problem is with people that found the older V12 app from Raistmer and then upgraded their equipment to the new Fermi cards without changing to the new Apps that can run them. I too have sent many PMs trying to get these guys attention but most don't have PMs turned on or are just ignoring them. I think I've only had one person actually reply. Quite a lot of people, 'run' SETI@home or other projects and never* take a look at all the forums, BOINC is involved. *Lurking, maybe ;-) Quite a lot are anonymus and have their computers, not visible , at home, at work or at school. (Hope they've asked for permission...) Good to see, so many people, making sure no [i]Fault Results, ends up in the Scientific Data! |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.