Validation inconclusive with V0.38g installer

Message boards : Number crunching : Validation inconclusive with V0.38g installer
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1121674 - Posted: 26 Jun 2011, 9:53:00 UTC - in response to Message 1121663.  

Hmmm, I just noticed an inconclusive workunit where BOTH results were crunched using x38g:



I noticed one the other day that found the same spikes and Gaussians (no spikes or triplets)running two different versions of the CUDA (38g and 23) that would not validate.

Weird.
ID: 1121674 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1121719 - Posted: 26 Jun 2011, 14:25:42 UTC - in response to Message 1121663.  
Last modified: 26 Jun 2011, 14:30:16 UTC

Hmmm, I just noticed an inconclusive workunit where BOTH results were crunched using x38g:

http://setiathome.berkeley.edu/workunit.php?wuid=766872407

Flopcounter for both results is identical, but the spike count is different (4,1,0,0) vs (3,1,0,0).

Can this be caused by the different GPUs being used (GTX 580 vs GT 240) ?

Tom




Or, multiple WUs ran on NVIDIAs 400/500 series?
I've yet to see errors comming from this. GPUgrid WUs error out,
if I set my 480 higher then 900MHz core freq.

And some MB WUs on SETI, also, but starts at >950MHz. which also
could be temp. related, IMO. (Found triplets in a row)
Used to see quite a lot on 9800GTX+ card, which isn't used anymore, atm., only once on the GTX480.(Also heat)

With local temps (outside) going >25C and >30C for tomorrow, I'll swith all rigs
off, for one or two days!
I'll be of too.........
ID: 1121719 · Report as offensive
Profile Tazz
Volunteer tester
Avatar

Send message
Joined: 5 Oct 99
Posts: 137
Credit: 34,342,390
RAC: 0
Canada
Message 1121971 - Posted: 27 Jun 2011, 13:31:27 UTC - in response to Message 1121719.  
Last modified: 27 Jun 2011, 13:36:39 UTC

Not sure if it's related to x39c (or x38g)or not; in the last 12 hours I've had three BSODs.

There's a lot of things that have happened too. Power dimming then clicking off for a few seconds then back on (lightning storm), I installed the x39c client, I updated the Nvidia drivers from 267.59 to 275.33. Not sure what's gone wrong, but I'm trying to get to the bottom of it. I have a hdd image from before I touched anything to fall back on.

CPU temp = 58-61C GPU temp = 50C 2 wu at a time. GPU load - 90-95%

I post if/when I find out what's going on.
</Tazz>
ID: 1121971 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1122038 - Posted: 27 Jun 2011, 16:29:13 UTC

Ho-hum, another day of running the 39c and nothing to report. Running two at a time seems to have cured my problems. No slowdowns, no bluescreens, just my RAC climbing slowly up towards 11k.

GPU temp 67c, fan speed 70%, slightly OCed to 900/1800/1804. Memory usage 81-90%/ 728-740MB. My little GTS 450 is happy now. :-)


PROUD MEMBER OF Team Starfire World BOINC
ID: 1122038 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1122196 - Posted: 27 Jun 2011, 22:34:33 UTC - in response to Message 1121674.  
Last modified: 27 Jun 2011, 22:34:49 UTC

I noticed one the other day that found the same spikes and Gaussians (no spikes or triplets)running two different versions of the CUDA (38g and 23) that would not validate.

Weird.


Those kind are worth looking at comparing the conditions on the local host and the wingman in detail (e.g. if the wingman has invalids/errors), and watching the task go through to either validate or be marked invalid.

Certainly there are known instances where the older apps can gang-up on the newer ones (fortunately relatively rare so far), but also we are seeing cases crop up with even CPU wingmen just reporting flakey results (detected by rerunning the task on CPU 6.03 under bench conditions, and getting results matching the newer GPU app ). That's not the way around I'm used to, as I'm more inclined to expect flakey GPU results from either direction, so it's slightly surprising that the picture isn't always immediately clear.

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1122196 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1122209 - Posted: 27 Jun 2011, 23:07:34 UTC - in response to Message 1122196.  

Something like this one where I found a pulse the other two didn't but we all three still got credit? http://setiathome.berkeley.edu/workunit.php?wuid=769599295


PROUD MEMBER OF Team Starfire World BOINC
ID: 1122209 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1122214 - Posted: 27 Jun 2011, 23:15:35 UTC - in response to Message 1122209.  
Last modified: 27 Jun 2011, 23:18:58 UTC

Something like this one where I found a pulse the other two didn't but we all three still got credit? http://setiathome.berkeley.edu/workunit.php?wuid=769599295


I reckon that one pulse they missed due to the inaccurate chirp in 6.10, so fitting my expected patterns. The hilarious part about this 'circus' is that if there had been a CPU wingman thrown in there, then it likely would have agreed with you on that pulse, since the CPU chirp is highly accurate. At this stage I think that the noticeable slight variations in results will continue until V7 release.

I have considered introducing random error back into the results to closer match legacy results, but then the question becomes "Which inaccurate build do you try to match? Legacy CPU apps with inaccurate spikes ? Or Legacy GPU apps with inaccurate chirp ?" ... so I've decided against putting the error back in for now on the hopes that the validator will choose wisely.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1122214 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1122216 - Posted: 27 Jun 2011, 23:19:58 UTC - in response to Message 1122214.  
Last modified: 27 Jun 2011, 23:26:54 UTC

Yeah, the next one I found had two pulses the others had missed. Go figure. :-) We still all three got validated.

I'm getting a lot of work validating without going to inconclusive so things are looking up I guess. Very few invalids, only two or three but they've cleared already.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1122216 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1122297 - Posted: 28 Jun 2011, 3:51:50 UTC
Last modified: 28 Jun 2011, 3:53:37 UTC

Someone might want to take a look at this one:

http://setiathome.berkeley.edu/workunit.php?wuid=769169215

What I find interesting about it is:

GT 240 ended with a -9 on 30 spikes and nothing else. V6

GTX 570 ended with -9 after finding 31 pulses, but no spikes. V12

That seems really wrong.



EDIT: Same thing exactly

http://setiathome.berkeley.edu/result.php?resultid=1965990051
ID: 1122297 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1122298 - Posted: 28 Jun 2011, 3:53:07 UTC - in response to Message 1122297.  

GTX 570 ended with -9 after finding 31 pulses, but no spikes. V12

That seems really wrong.


And indeed it is. No-one, absolutely no-one should be running V12, especially on a Fermi (GTX 570).

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1122298 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1122301 - Posted: 28 Jun 2011, 3:57:58 UTC - in response to Message 1122298.  


And indeed it is. No-one, absolutely no-one should be running V12, especially on a Fermi (GTX 570).


Oh, so it's THAT version that's causing trouble on Fermi cards. You know, you read these things and they don't pertain to you directly so you don't remember them as specifically as you should...

I'm sorry. That's an old issue and I should have recognized it.

I wish there was a way to make that combination stop asking for tasks.
ID: 1122301 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1122303 - Posted: 28 Jun 2011, 4:03:44 UTC - in response to Message 1122301.  

I wish there was a way to make that combination stop asking for tasks.


The good news is that there is :) . That is the project will move to V7 when satisfied the kinks are ironed out. For that there will be a minimum Cuda driver specified, likely Cuda 3.2 capability at this point, for which stock & Opt code will be refined & universally applicable. There will be howls for sure, but less inconclusives & no more work issued to legacy applications.

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1122303 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1122408 - Posted: 28 Jun 2011, 9:48:58 UTC

Some will mourn, other's will rejoice, and some won't notice....
I've often pondered about the possibility of some form of "auto update", but I have my doubts about the practicalities involved - it might be easier to have an alert for a new STABLE version being out (along the lines of the messages presented by the likes of adobe at boot/program start), but how to get this to those fit and forgets who are running really old versions of either BOINC or the S@H as it would probably need a change to the api among other things....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1122408 · Report as offensive
Profile Tazz
Volunteer tester
Avatar

Send message
Joined: 5 Oct 99
Posts: 137
Credit: 34,342,390
RAC: 0
Canada
Message 1122676 - Posted: 29 Jun 2011, 13:01:15 UTC - in response to Message 1121971.  

Not sure if it's related to x39c (or x38g)or not; in the last 12 hours I've had three BSODs.

There's a lot of things that have happened too. Power dimming then clicking off for a few seconds then back on (lightning storm), I installed the x39c client, I updated the Nvidia drivers from 267.59 to 275.33. Not sure what's gone wrong, but I'm trying to get to the bottom of it. I have a hdd image from before I touched anything to fall back on.

CPU temp = 58-61C GPU temp = 50C 2 wu at a time. GPU load - 90-95%

I post if/when I find out what's going on.


Well, Memtest said my RAM was OK after four passes. SpinRite said that every part of my hdd could be written to and read from with no errors. Heat isn't an issue. I don't think corrupted files are to blame because I dropped back to x32f and driver 267.59 two days ago and all is running fine now. After dropping back the OS was still hanging for a couple of seconds every now and then. It was worse when I was on a heavy Flash webpage. I downloaded and reinstalled Flash and the hang-ups went away.

I just stepped up to x38g but kept the 267.59 drivers, I'll let this run for a week or so then step up to the next newer drivers (not the newest) and see how that goes.

</Tazz>
ID: 1122676 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1122710 - Posted: 29 Jun 2011, 14:41:34 UTC - in response to Message 1122676.  
Last modified: 29 Jun 2011, 14:45:57 UTC

Is it possible to Show A Banner on the BOINC Manager screens, if a
nVIDIA FERMI card, reported by BOINC, is running rev.x32f, (in stead of
rev.x38g)?
(And download this file automatic to (ab)users of a 400 & 500 series and
other (FERMIs),


Probably to complicated, for, maybe a few 1000s users? Maybe less.
They don't represent the typical Set and Forget Crowd, otherwise
they probably were not aware, those apps excisted and where to find them.

But if is(n't a real) problem and 7th version is used, those who run stock can
notice the UPDate, or don't noticed it, at all. End of story, for those?

Still have UPDate 1 rig, a HP Pavillion, C2QUAD+GTS250, it has an old driver,
and driver has to support CUDA 2.3 (Min.)
My X9650 @ 3.51 400MHz DDR2 (FSB=1600MHz)+GTX480, is running x38f.
The I7-2600(HT)+2x EAH5870s running x38f .
Doing a lot of ATROPULSE, on CPU but more on ATI GPUs (2x2)
(Still one rig down :( )

(Just got back from Nijmegen, which was hit by a heavy thunderstorm, hail with
the sice of an egg or bigger, the car I rented, looked like it was 'stoned' never seen so much lightning in 2 hours, >350). Also had to wait a few hours, cause a 3x 380
KV power line and 5 big trees, no trains, diesel-elecric included, cause a srike
was going on, too...........................

Ehh, sorry driftin off TOPIC.
ID: 1122710 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1122716 - Posted: 29 Jun 2011, 14:58:42 UTC - in response to Message 1122676.  
Last modified: 29 Jun 2011, 15:17:43 UTC

Not sure if it's related to x39c (or x38g)or not; in the last 12 hours I've had three BSODs.

There's a lot of things that have happened too. Power dimming then clicking off for a few seconds then back on (lightning storm), I installed the x39c client, I updated the Nvidia drivers from 267.59 to 275.33. Not sure what's gone wrong, but I'm trying to get to the bottom of it. I have a hdd image from before I touched anything to fall back on.

CPU temp = 58-61C GPU temp = 50C 2 wu at a time. GPU load - 90-95%

I post if/when I find out what's going on.


Well, Memtest said my RAM was OK after four passes. SpinRite said that every part of my hdd could be written to and read from with no errors. Heat isn't an issue. I don't think corrupted files are to blame because I dropped back to x32f and driver 267.59 two days ago and all is running fine now. After dropping back the OS was still hanging for a couple of seconds every now and then. It was worse when I was on a heavy Flash webpage. I downloaded and reinstalled Flash and the hang-ups went away.

I just stepped up to x38g but kept the 267.59 drivers, I'll let this run for a week or so then step up to the next newer drivers (not the newest) and see how that goes.


In past I upgraded/updated FlashPlayer and Antivir tool and the machine freezes after every ~ 12 hours.
Only switch off via button at the PC case gave me again control over the machine.
So I needed to reboot every 8 hours to prevent the freeze.

The next upgrade/update of FlashPlayer and Antivir tool solved the prob and the machine was never again freezed.


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -
ID: 1122716 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1122722 - Posted: 29 Jun 2011, 15:17:16 UTC - in response to Message 1122710.  

Is it possible to Show A Banner on the BOINC Manager screens, if a
nVIDIA FERMI card, reported by BOINC, is running rev.x32f, (in stead of
rev.x38g)?
(And download this file automatic to (ab)users of a 400 & 500 series and
other (FERMIs),


Probably to complicated, for, maybe a few 1000s users? Maybe less.
They don't represent the typical Set and Forget Crowd, otherwise
they probably were not aware, those apps excisted and where to find them.

But if is(n't a real) problem and 7th version is used, those who run stock can
notice the UPDate, or don't noticed it, at all. End of story, for those?
(...)


This happend not after we saw the GTX4xx+ and CUDA V12 combis -> only errors.

Also no other way to info or solve the prob..

I write still PMs if I see wingmen with this combi..


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -
ID: 1122722 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1122732 - Posted: 29 Jun 2011, 15:37:46 UTC - in response to Message 1122722.  

As Sutaru said, the problem right now isn't with the x32f, it still runs okay, the problem is with people that found the older V12 app from Raistmer and then upgraded their equipment to the new Fermi cards without changing to the new Apps that can run them. I too have sent many PMs trying to get these guys attention but most don't have PMs turned on or are just ignoring them. I think I've only had one person actually reply.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1122732 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1122736 - Posted: 29 Jun 2011, 16:00:32 UTC - in response to Message 1122732.  

As Sutaru said, the problem right now isn't with the x32f, it still runs okay, the problem is with people that found the older V12 app from Raistmer and then upgraded their equipment to the new Fermi cards without changing to the new Apps that can run them. I too have sent many PMs trying to get these guys attention but most don't have PMs turned on or are just ignoring them. I think I've only had one person actually reply.


Quite a lot of people, 'run' SETI@home or other projects and never* take a look
at all the forums, BOINC is involved.
*Lurking, maybe ;-)

Quite a lot are anonymus and have their computers, not visible
, at home, at work or at school. (Hope they've asked for permission...)

Good to see, so many people, making sure no [i]Fault Results
, ends up
in the Scientific Data!




ID: 1122736 · Report as offensive
Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : Validation inconclusive with V0.38g installer


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.