Lots of invalids!!

Message boards : Number crunching : Lots of invalids!!
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1401871 - Posted: 11 Aug 2013, 16:36:06 UTC

I seem to have a lot of invalids on my i7/930 machine, some seem to be valid because of number spikes, etc. The ones that worry me are the -9 overflows and I see this -

SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected equals the storage space allocated.


What can I do to eliminate this kind of error if possible. The device is an EVGA GTX460SE @ 1Gb running 2 cuda_42 tasks at a time. Machine temps are in the low 60c, GPU temps stays around 55c. Its only using a max of approx. 558 vram (56%).


I don't buy computers, I build them!!
ID: 1401871 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1401880 - Posted: 11 Aug 2013, 16:56:01 UTC

-9 isn`t an error in particular.
Just to many signals found (more than 30).
If no issue on your host those getting validated as well.



With each crime and every kindness we birth our future.
ID: 1401880 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1401882 - Posted: 11 Aug 2013, 17:02:29 UTC - in response to Message 1401880.  

-9 isn`t an error in particular.
Just to many signals found (more than 30).
If no issue on your host those getting validated as well.


Thanks Mike, It's just that I hate errors of any kind and try to avoid them if possible.


I don't buy computers, I build them!!
ID: 1401882 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1401934 - Posted: 11 Aug 2013, 20:42:34 UTC
Last modified: 11 Aug 2013, 20:47:49 UTC

Of the 20 first invalids listed, in workunits

1297202646, 1297164790, 1297153620, 1294196873
your results went straight to invalid. Possibly a server issue. There was another thread earlier this week about similar incident.


1294190796, 1294122191, 1294055636, 1294012161, 1293916702, 1293889682, 1293841533, 1293837004, 1293832729, 1293830488, 1293966600, 1293955942
your returned a -9 result whereas your wingmen didn't.


1294181079, 1294007554, 1293877285
you returned result that had more autocorrelation signals than what your wingmen found.


1293881736
and for this one can't tell anymore. Might have been a server issue.

I'm no expert but out of 648 total tasks you have 122 validation inconclusives and 45 invalids. I'd say that's a bit much. Can't help you fix though, sorry.
ID: 1401934 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1402949 - Posted: 14 Aug 2013, 7:36:38 UTC - in response to Message 1401934.  

Of the 20 first invalids listed, in workunits

1297202646, 1297164790, 1297153620, 1294196873
your results went straight to invalid. Possibly a server issue. There was another thread earlier this week about similar incident.

I just got one similar to those - 1297694454. Our stderr results tables are different, but _0 was marked "Invalid" while my _1 was marked "Inconclusive". Both are CPU jobs.
Donald
Infernal Optimist / Submariner, retired
ID: 1402949 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1403119 - Posted: 14 Aug 2013, 17:04:34 UTC - in response to Message 1402949.  

Of the 20 first invalids listed, in workunits

1297202646, 1297164790, 1297153620, 1294196873
your results went straight to invalid. Possibly a server issue. There was another thread earlier this week about similar incident.

I just got one similar to those - 1297694454. Our stderr results tables are different, but _0 was marked "Invalid" while my _1 was marked "Inconclusive". Both are CPU jobs.


From your wingman's host:

Task 3111831279 (11se08aa.21408.8661.13.12.247_0) reported 11 Aug 2013, 19:31:05 UTC.

<core_client_version>6.6.31</core_client_version>
<![CDATA[
<stderr_txt>
Restarted at 100.00 percent.

Flopcounter: 67276640950494.930000

Spike count: 0
Autocorr count: 0
Pulse count: 3
Triplet count: 5
Gaussian count: 4
14:26:31 (2968): called boinc_finish

</stderr_txt>
]]>

And Task 3100532045 (27mr08ad.21861.51545.16.12.204.vlar_0) reported 3 Aug 2013, 4:04:38 UTC.

<core_client_version>6.6.31</core_client_version>
<![CDATA[
<stderr_txt>
Restarted at 100.00 percent.

Flopcounter: 67276640950494.930000

Spike count: 0
Autocorr count: 0
Pulse count: 3
Triplet count: 5
Gaussian count: 4
23:00:17 (23452): called boinc_finish

</stderr_txt>
]]>
Spot the differences! There's more of those on his task list. I'd guess either some of the slot directories are somehow broken or the filesystem is corrupted.
ID: 1403119 · Report as offensive
_
Avatar

Send message
Joined: 15 Nov 12
Posts: 299
Credit: 9,037,618
RAC: 0
United States
Message 1403298 - Posted: 15 Aug 2013, 3:25:46 UTC

Hey guys,

This is a new one. On one of my laptops, the CPU crunched an invalid MB with the CPU. Here. Don't think that has ever happened before.

Nothing too alarming I suppose, since it is only one, but thought I would add some personal evidence to the thread.
ID: 1403298 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1403336 - Posted: 15 Aug 2013, 6:55:04 UTC - in response to Message 1403298.  

Hey guys,

This is a new one. On one of my laptops, the CPU crunched an invalid MB with the CPU. Here. Don't think that has ever happened before.

Nothing too alarming I suppose, since it is only one, but thought I would add some personal evidence to the thread.

That laptop only sent half of the Stderr output result required which is why it went straight to being invalid.

You can get them once in a while, but if this is happening a lot the you may want to check the system out (overheating, bad back ground program, maybe a program that you started at the time, system memory,....).

Cheers.
ID: 1403336 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1403370 - Posted: 15 Aug 2013, 9:04:30 UTC - in response to Message 1403336.  

Hey guys,

This is a new one. On one of my laptops, the CPU crunched an invalid MB with the CPU. Here. Don't think that has ever happened before.

Nothing too alarming I suppose, since it is only one, but thought I would add some personal evidence to the thread.

That laptop only sent half of the Stderr output result required which is why it went straight to being invalid.

You can get them once in a while, but if this is happening a lot the you may want to check the system out (overheating, bad back ground program, maybe a program that you started at the time, system memory,....).

Cheers.

it doesn't matter whetehr stderr is truncated or present at all - that's for human eyes.
The validator checks the integrity of the result file too, and if that is corrupted, it will also go stright to invalid.

If it's a one-off disregard.
If it happens more frequently, time to check the integrity of the HD or check for other reasons the result files are being mangled.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1403370 · Report as offensive
Profile Len
Avatar

Send message
Joined: 15 Mar 10
Posts: 52
Credit: 11,725,173
RAC: 86
United Kingdom
Message 1403638 - Posted: 15 Aug 2013, 20:47:56 UTC - in response to Message 1403370.  

I am suddenly getting lots (700+) of invalid results. I have never gotten invalid results before. Not even many errors on this particular host before. (http://setiathome.berkeley.edu/show_host_detail.php?hostid=6647938)

Do any of you superior minds have any clue what has gone wrong?

Len
I think I am. Therefore I am. I think.
ID: 1403638 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1403680 - Posted: 15 Aug 2013, 22:24:15 UTC - in response to Message 1403638.  
Last modified: 15 Aug 2013, 22:25:19 UTC

I am suddenly getting lots (700+) of invalid results. I have never gotten invalid results before. Not even many errors on this particular host before. (http://setiathome.berkeley.edu/show_host_detail.php?hostid=6647938)

Do any of you superior minds have any clue what has gone wrong?

Len

Check your cards for overheating as they are just throwing -9 overflows.

Cheers.
ID: 1403680 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1403770 - Posted: 16 Aug 2013, 5:49:36 UTC - in response to Message 1403638.  

I am suddenly getting lots (700+) of invalid results. I have never gotten invalid results before. Not even many errors on this particular host before. (http://setiathome.berkeley.edu/show_host_detail.php?hostid=6647938)

Do any of you superior minds have any clue what has gone wrong?

Len


No superior mind, here.

Last time I had that happen that wasn't a driver crash or heat or power supply issue, I found I had to turn the computer off (power down) and start it from "off". It never happened again. Restarting warm didn't help.
ID: 1403770 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1404064 - Posted: 16 Aug 2013, 23:38:16 UTC

How about this one?
http://setiathome.berkeley.edu/workunit.php?wuid=1300239737 shows at the time of writing this post:

3117169326 	7040524 	15 Aug 2013, 21:52:22 UTC 	16 Aug 2013, 7:27:12 UTC 	Completed, marked as invalid 	2,538.31 	256.86 	0.00 	SETI@home v7 v7.00 (cuda32)
3117169327 	6065655 	15 Aug 2013, 21:52:23 UTC 	16 Aug 2013, 22:55:12 UTC 	Completed, validation inconclusive 	2,626.30 	2,321.33 	pending 	SETI@home v7
Anonymous platform (CPU)
3118906781 	--- 	--- 	--- 	Unsent 	--- 	--- 	--- 	

Here's the conundrum: His shows as invalid, mine shows as inconclusive. Shouldn't mine be set back to pending validation, awaiting the third one to be sent and returned? :-)
ID: 1404064 · Report as offensive
Profile Len
Avatar

Send message
Joined: 15 Mar 10
Posts: 52
Credit: 11,725,173
RAC: 86
United Kingdom
Message 1404273 - Posted: 17 Aug 2013, 11:36:29 UTC

I think I cracked it with your clues to help.

I have two cards, because the old one still works and I have a spare slot so I left it in. Another project, POEM, couldn't handle the fact that there were two cards, So I set the config to disallow one for POEM, which fixed their problem.

The upshot of that setting meant that SETI preferentially used the other one, which it has used for years. That card runs at a higher temperature, being the older, less efficient one. It looks like SETI was finding errors with that card only. (All the invalids I checked were done on the older card.) I have told POEM to use the old card only now and their WUs don't seem to mind the slightly higher temp.

My Invalid list it falling albeit a little less rapidly than it rose now. If it doesn't drop away fully, I will stop SETI from using the old card even when POEM has no work. IT's pointless using it if so many WUs return invalid. It may be a 'feature' of having two different spec cards running at once on a project.

With the new card warming the back end of the old card it probably runs slightly hotter than when it was running solo. It now only gets used for GPU crunching anyway. All my own graphic needs are fulfilled with the newer card. Neither card is even approaching the manufacturer's recommended high end of the operating temperature. The case has very good cooling and the CPU is water-cooled.


I think I am. Therefore I am. I think.
ID: 1404273 · Report as offensive

Message boards : Number crunching : Lots of invalids!!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.