Lots of invalids!!


log in

Advanced search

Message boards : Number crunching : Lots of invalids!!

Author Message
Profile Cliff HardingProject donor
Volunteer tester
Avatar
Send message
Joined: 18 Aug 99
Posts: 1005
Credit: 52,909,890
RAC: 42,442
United States
Message 1401871 - Posted: 11 Aug 2013, 16:36:06 UTC

I seem to have a lot of invalids on my i7/930 machine, some seem to be valid because of number spikes, etc. The ones that worry me are the -9 overflows and I see this -

SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected equals the storage space allocated.


What can I do to eliminate this kind of error if possible. The device is an EVGA GTX460SE @ 1Gb running 2 cuda_42 tasks at a time. Machine temps are in the low 60c, GPU temps stays around 55c. Its only using a max of approx. 558 vram (56%).
____________


I don't buy computers, I build them!!

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24484
Credit: 33,819,706
RAC: 24,478
Germany
Message 1401880 - Posted: 11 Aug 2013, 16:56:01 UTC

-9 isn`t an error in particular.
Just to many signals found (more than 30).
If no issue on your host those getting validated as well.

____________

Profile Cliff HardingProject donor
Volunteer tester
Avatar
Send message
Joined: 18 Aug 99
Posts: 1005
Credit: 52,909,890
RAC: 42,442
United States
Message 1401882 - Posted: 11 Aug 2013, 17:02:29 UTC - in response to Message 1401880.

-9 isn`t an error in particular.
Just to many signals found (more than 30).
If no issue on your host those getting validated as well.


Thanks Mike, It's just that I hate errors of any kind and try to avoid them if possible.
____________


I don't buy computers, I build them!!

Juha
Volunteer tester
Send message
Joined: 7 Mar 04
Posts: 184
Credit: 143,295
RAC: 42
Finland
Message 1401934 - Posted: 11 Aug 2013, 20:42:34 UTC
Last modified: 11 Aug 2013, 20:47:49 UTC

Of the 20 first invalids listed, in workunits

1297202646, 1297164790, 1297153620, 1294196873
your results went straight to invalid. Possibly a server issue. There was another thread earlier this week about similar incident.


1294190796, 1294122191, 1294055636, 1294012161, 1293916702, 1293889682, 1293841533, 1293837004, 1293832729, 1293830488, 1293966600, 1293955942
your returned a -9 result whereas your wingmen didn't.


1294181079, 1294007554, 1293877285
you returned result that had more autocorrelation signals than what your wingmen found.


1293881736
and for this one can't tell anymore. Might have been a server issue.

I'm no expert but out of 648 total tasks you have 122 validation inconclusives and 45 invalids. I'd say that's a bit much. Can't help you fix though, sorry.

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 6257
Credit: 736,260
RAC: 1,167
United States
Message 1402949 - Posted: 14 Aug 2013, 7:36:38 UTC - in response to Message 1401934.

Of the 20 first invalids listed, in workunits

1297202646, 1297164790, 1297153620, 1294196873
your results went straight to invalid. Possibly a server issue. There was another thread earlier this week about similar incident.

I just got one similar to those - 1297694454. Our stderr results tables are different, but _0 was marked "Invalid" while my _1 was marked "Inconclusive". Both are CPU jobs.
____________
Donald
Infernal Optimist / Submariner, retired

Juha
Volunteer tester
Send message
Joined: 7 Mar 04
Posts: 184
Credit: 143,295
RAC: 42
Finland
Message 1403119 - Posted: 14 Aug 2013, 17:04:34 UTC - in response to Message 1402949.

Of the 20 first invalids listed, in workunits

1297202646, 1297164790, 1297153620, 1294196873
your results went straight to invalid. Possibly a server issue. There was another thread earlier this week about similar incident.

I just got one similar to those - 1297694454. Our stderr results tables are different, but _0 was marked "Invalid" while my _1 was marked "Inconclusive". Both are CPU jobs.


From your wingman's host:

Task 3111831279 (11se08aa.21408.8661.13.12.247_0) reported 11 Aug 2013, 19:31:05 UTC.

<core_client_version>6.6.31</core_client_version>
<![CDATA[
<stderr_txt>
Restarted at 100.00 percent.

Flopcounter: 67276640950494.930000

Spike count: 0
Autocorr count: 0
Pulse count: 3
Triplet count: 5
Gaussian count: 4
14:26:31 (2968): called boinc_finish

</stderr_txt>
]]>

And Task 3100532045 (27mr08ad.21861.51545.16.12.204.vlar_0) reported 3 Aug 2013, 4:04:38 UTC.

<core_client_version>6.6.31</core_client_version>
<![CDATA[
<stderr_txt>
Restarted at 100.00 percent.

Flopcounter: 67276640950494.930000

Spike count: 0
Autocorr count: 0
Pulse count: 3
Triplet count: 5
Gaussian count: 4
23:00:17 (23452): called boinc_finish

</stderr_txt>
]]>
Spot the differences! There's more of those on his task list. I'd guess either some of the slot directories are somehow broken or the filesystem is corrupted.

mherr170
Avatar
Send message
Joined: 15 Nov 12
Posts: 273
Credit: 6,670,086
RAC: 10,308
United States
Message 1403298 - Posted: 15 Aug 2013, 3:25:46 UTC

Hey guys,

This is a new one. On one of my laptops, the CPU crunched an invalid MB with the CPU. Here. Don't think that has ever happened before.

Nothing too alarming I suppose, since it is only one, but thought I would add some personal evidence to the thread.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7323
Credit: 96,733,440
RAC: 68,231
Australia
Message 1403336 - Posted: 15 Aug 2013, 6:55:04 UTC - in response to Message 1403298.

Hey guys,

This is a new one. On one of my laptops, the CPU crunched an invalid MB with the CPU. Here. Don't think that has ever happened before.

Nothing too alarming I suppose, since it is only one, but thought I would add some personal evidence to the thread.

That laptop only sent half of the Stderr output result required which is why it went straight to being invalid.

You can get them once in a while, but if this is happening a lot the you may want to check the system out (overheating, bad back ground program, maybe a program that you started at the time, system memory,....).

Cheers.

Profile WilliamProject donor
Volunteer tester
Avatar
Send message
Joined: 14 Feb 13
Posts: 1610
Credit: 9,469,907
RAC: 44
Message 1403370 - Posted: 15 Aug 2013, 9:04:30 UTC - in response to Message 1403336.

Hey guys,

This is a new one. On one of my laptops, the CPU crunched an invalid MB with the CPU. Here. Don't think that has ever happened before.

Nothing too alarming I suppose, since it is only one, but thought I would add some personal evidence to the thread.

That laptop only sent half of the Stderr output result required which is why it went straight to being invalid.

You can get them once in a while, but if this is happening a lot the you may want to check the system out (overheating, bad back ground program, maybe a program that you started at the time, system memory,....).

Cheers.

it doesn't matter whetehr stderr is truncated or present at all - that's for human eyes.
The validator checks the integrity of the result file too, and if that is corrupted, it will also go stright to invalid.

If it's a one-off disregard.
If it happens more frequently, time to check the integrity of the HD or check for other reasons the result files are being mangled.
____________
A person who won't read has no advantage over one who can't read. (Mark Twain)

Profile Len
Avatar
Send message
Joined: 15 Mar 10
Posts: 53
Credit: 1,980,454
RAC: 1,257
United Kingdom
Message 1403638 - Posted: 15 Aug 2013, 20:47:56 UTC - in response to Message 1403370.

I am suddenly getting lots (700+) of invalid results. I have never gotten invalid results before. Not even many errors on this particular host before. (http://setiathome.berkeley.edu/show_host_detail.php?hostid=6647938)

Do any of you superior minds have any clue what has gone wrong?

Len
____________
I think I am. Therefore I am. I think.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7323
Credit: 96,733,440
RAC: 68,231
Australia
Message 1403680 - Posted: 15 Aug 2013, 22:24:15 UTC - in response to Message 1403638.
Last modified: 15 Aug 2013, 22:25:19 UTC

I am suddenly getting lots (700+) of invalid results. I have never gotten invalid results before. Not even many errors on this particular host before. (http://setiathome.berkeley.edu/show_host_detail.php?hostid=6647938)

Do any of you superior minds have any clue what has gone wrong?

Len

Check your cards for overheating as they are just throwing -9 overflows.

Cheers.

tbretProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2860
Credit: 215,033,536
RAC: 176,340
United States
Message 1403770 - Posted: 16 Aug 2013, 5:49:36 UTC - in response to Message 1403638.

I am suddenly getting lots (700+) of invalid results. I have never gotten invalid results before. Not even many errors on this particular host before. (http://setiathome.berkeley.edu/show_host_detail.php?hostid=6647938)

Do any of you superior minds have any clue what has gone wrong?

Len


No superior mind, here.

Last time I had that happen that wasn't a driver crash or heat or power supply issue, I found I had to turn the computer off (power down) and start it from "off". It never happened again. Restarting warm didn't help.

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12324
Credit: 2,627,879
RAC: 986
Netherlands
Message 1404064 - Posted: 16 Aug 2013, 23:38:16 UTC

How about this one?
http://setiathome.berkeley.edu/workunit.php?wuid=1300239737 shows at the time of writing this post:

3117169326 7040524 15 Aug 2013, 21:52:22 UTC 16 Aug 2013, 7:27:12 UTC Completed, marked as invalid 2,538.31 256.86 0.00 SETI@home v7 v7.00 (cuda32) 3117169327 6065655 15 Aug 2013, 21:52:23 UTC 16 Aug 2013, 22:55:12 UTC Completed, validation inconclusive 2,626.30 2,321.33 pending SETI@home v7 Anonymous platform (CPU) 3118906781 --- --- --- Unsent --- --- ---

Here's the conundrum: His shows as invalid, mine shows as inconclusive. Shouldn't mine be set back to pending validation, awaiting the third one to be sent and returned? :-)
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile Len
Avatar
Send message
Joined: 15 Mar 10
Posts: 53
Credit: 1,980,454
RAC: 1,257
United Kingdom
Message 1404273 - Posted: 17 Aug 2013, 11:36:29 UTC

I think I cracked it with your clues to help.

I have two cards, because the old one still works and I have a spare slot so I left it in. Another project, POEM, couldn't handle the fact that there were two cards, So I set the config to disallow one for POEM, which fixed their problem.

The upshot of that setting meant that SETI preferentially used the other one, which it has used for years. That card runs at a higher temperature, being the older, less efficient one. It looks like SETI was finding errors with that card only. (All the invalids I checked were done on the older card.) I have told POEM to use the old card only now and their WUs don't seem to mind the slightly higher temp.

My Invalid list it falling albeit a little less rapidly than it rose now. If it doesn't drop away fully, I will stop SETI from using the old card even when POEM has no work. IT's pointless using it if so many WUs return invalid. It may be a 'feature' of having two different spec cards running at once on a project.

With the new card warming the back end of the old card it probably runs slightly hotter than when it was running solo. It now only gets used for GPU crunching anyway. All my own graphic needs are fulfilled with the newer card. Neither card is even approaching the manufacturer's recommended high end of the operating temperature. The case has very good cooling and the CPU is water-cooled.


____________
I think I am. Therefore I am. I think.

Message boards : Number crunching : Lots of invalids!!

Copyright © 2014 University of California