Message boards :
Number crunching :
Lots of invalids!!
Message board moderation
Author | Message |
---|---|
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
I seem to have a lot of invalids on my i7/930 machine, some seem to be valid because of number spikes, etc. The ones that worry me are the -9 overflows and I see this - SETI@Home Informational message -9 result_overflow NOTE: The number of results detected equals the storage space allocated. What can I do to eliminate this kind of error if possible. The device is an EVGA GTX460SE @ 1Gb running 2 cuda_42 tasks at a time. Machine temps are in the low 60c, GPU temps stays around 55c. Its only using a max of approx. 558 vram (56%). I don't buy computers, I build them!! |
Mike Send message Joined: 17 Feb 01 Posts: 34253 Credit: 79,922,639 RAC: 80 |
-9 isn`t an error in particular. Just to many signals found (more than 30). If no issue on your host those getting validated as well. With each crime and every kindness we birth our future. |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
-9 isn`t an error in particular. Thanks Mike, It's just that I hate errors of any kind and try to avoid them if possible. I don't buy computers, I build them!! |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
Of the 20 first invalids listed, in workunits 1297202646, 1297164790, 1297153620, 1294196873 your results went straight to invalid. Possibly a server issue. There was another thread earlier this week about similar incident. 1294190796, 1294122191, 1294055636, 1294012161, 1293916702, 1293889682, 1293841533, 1293837004, 1293832729, 1293830488, 1293966600, 1293955942 your returned a -9 result whereas your wingmen didn't. 1294181079, 1294007554, 1293877285 you returned result that had more autocorrelation signals than what your wingmen found. 1293881736 and for this one can't tell anymore. Might have been a server issue. I'm no expert but out of 648 total tasks you have 122 validation inconclusives and 45 invalids. I'd say that's a bit much. Can't help you fix though, sorry. |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
Of the 20 first invalids listed, in workunits I just got one similar to those - 1297694454. Our stderr results tables are different, but _0 was marked "Invalid" while my _1 was marked "Inconclusive". Both are CPU jobs. Donald Infernal Optimist / Submariner, retired |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
Of the 20 first invalids listed, in workunits From your wingman's host: Task 3111831279 (11se08aa.21408.8661.13.12.247_0) reported 11 Aug 2013, 19:31:05 UTC. <core_client_version>6.6.31</core_client_version> <![CDATA[ <stderr_txt> Restarted at 100.00 percent. Flopcounter: 67276640950494.930000 Spike count: 0 Autocorr count: 0 Pulse count: 3 Triplet count: 5 Gaussian count: 4 14:26:31 (2968): called boinc_finish </stderr_txt> ]]> And Task 3100532045 (27mr08ad.21861.51545.16.12.204.vlar_0) reported 3 Aug 2013, 4:04:38 UTC. <core_client_version>6.6.31</core_client_version> <![CDATA[ <stderr_txt> Restarted at 100.00 percent. Flopcounter: 67276640950494.930000 Spike count: 0 Autocorr count: 0 Pulse count: 3 Triplet count: 5 Gaussian count: 4 23:00:17 (23452): called boinc_finish </stderr_txt> ]]> Spot the differences! There's more of those on his task list. I'd guess either some of the slot directories are somehow broken or the filesystem is corrupted. |
_ Send message Joined: 15 Nov 12 Posts: 299 Credit: 9,037,618 RAC: 0 |
Hey guys, This is a new one. On one of my laptops, the CPU crunched an invalid MB with the CPU. Here. Don't think that has ever happened before. Nothing too alarming I suppose, since it is only one, but thought I would add some personal evidence to the thread. |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
Hey guys, That laptop only sent half of the Stderr output result required which is why it went straight to being invalid. You can get them once in a while, but if this is happening a lot the you may want to check the system out (overheating, bad back ground program, maybe a program that you started at the time, system memory,....). Cheers. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
Hey guys, it doesn't matter whetehr stderr is truncated or present at all - that's for human eyes. The validator checks the integrity of the result file too, and if that is corrupted, it will also go stright to invalid. If it's a one-off disregard. If it happens more frequently, time to check the integrity of the HD or check for other reasons the result files are being mangled. A person who won't read has no advantage over one who can't read. (Mark Twain) |
Len Send message Joined: 15 Mar 10 Posts: 52 Credit: 11,725,173 RAC: 86 |
I am suddenly getting lots (700+) of invalid results. I have never gotten invalid results before. Not even many errors on this particular host before. (http://setiathome.berkeley.edu/show_host_detail.php?hostid=6647938) Do any of you superior minds have any clue what has gone wrong? Len I think I am. Therefore I am. I think. |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
I am suddenly getting lots (700+) of invalid results. I have never gotten invalid results before. Not even many errors on this particular host before. (http://setiathome.berkeley.edu/show_host_detail.php?hostid=6647938) Check your cards for overheating as they are just throwing -9 overflows. Cheers. |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
I am suddenly getting lots (700+) of invalid results. I have never gotten invalid results before. Not even many errors on this particular host before. (http://setiathome.berkeley.edu/show_host_detail.php?hostid=6647938) No superior mind, here. Last time I had that happen that wasn't a driver crash or heat or power supply issue, I found I had to turn the computer off (power down) and start it from "off". It never happened again. Restarting warm didn't help. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
How about this one? http://setiathome.berkeley.edu/workunit.php?wuid=1300239737 shows at the time of writing this post: 3117169326 7040524 15 Aug 2013, 21:52:22 UTC 16 Aug 2013, 7:27:12 UTC Completed, marked as invalid 2,538.31 256.86 0.00 SETI@home v7 v7.00 (cuda32) 3117169327 6065655 15 Aug 2013, 21:52:23 UTC 16 Aug 2013, 22:55:12 UTC Completed, validation inconclusive 2,626.30 2,321.33 pending SETI@home v7 Anonymous platform (CPU) 3118906781 --- --- --- Unsent --- --- --- Here's the conundrum: His shows as invalid, mine shows as inconclusive. Shouldn't mine be set back to pending validation, awaiting the third one to be sent and returned? :-) |
Len Send message Joined: 15 Mar 10 Posts: 52 Credit: 11,725,173 RAC: 86 |
I think I cracked it with your clues to help. I have two cards, because the old one still works and I have a spare slot so I left it in. Another project, POEM, couldn't handle the fact that there were two cards, So I set the config to disallow one for POEM, which fixed their problem. The upshot of that setting meant that SETI preferentially used the other one, which it has used for years. That card runs at a higher temperature, being the older, less efficient one. It looks like SETI was finding errors with that card only. (All the invalids I checked were done on the older card.) I have told POEM to use the old card only now and their WUs don't seem to mind the slightly higher temp. My Invalid list it falling albeit a little less rapidly than it rose now. If it doesn't drop away fully, I will stop SETI from using the old card even when POEM has no work. IT's pointless using it if so many WUs return invalid. It may be a 'feature' of having two different spec cards running at once on a project. With the new card warming the back end of the old card it probably runs slightly hotter than when it was running solo. It now only gets used for GPU crunching anyway. All my own graphic needs are fulfilled with the newer card. Neither card is even approaching the manufacturer's recommended high end of the operating temperature. The case has very good cooling and the CPU is water-cooled. I think I am. Therefore I am. I think. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.