Message boards :
Number crunching :
BOINC produces about 70% of INVALID results
Message board moderation
Author | Message |
---|---|
v010dya Send message Joined: 25 Jun 11 Posts: 9 Credit: 1,569,505 RAC: 0 |
Please help. I have tried to report this to Ubuntu there i've been told to report it to BOINC. After sending it to BOINC alpha list i've been told that the issue is likely to be with SETI and to report it here. The issue is the following: I have gotten myself a new machine (Asus N76V) and now quite a few SETI bundles get "completed" about 10 times faster than the reported time (reported time is from 40 to 90 minutes and the real time spent is from 5 to 8 usually... but sometimes as little as 30 seconds). When i check my results at SETI project i see that i now have a huge number of invalid results. The behaviour is unpredictable, sometimes things run correctly for a day or so, and sometimes almost all the tasks are trashed. System load and other similar factors seem to not play any role in what is happening. Previous bug reports: https://bugs.launchpad.net/ubuntu/+source/boinc/+bug/1266361 http://lists.ssl.berkeley.edu/mailman/private/boinc_alpha/2014-January/019225.html |
v010dya Send message Joined: 25 Jun 11 Posts: 9 Credit: 1,569,505 RAC: 0 |
Note: I did not overclock the machine, while i agree that there is a possibility that it is clocked higher than it should be, but then i guess i'll be complaining to Asus. Are there other ppl on here with the similar machine? Is anybody having these issues? |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
It maybe heat related, are you checking running temps? Is the laptop lifted up off the surface it's sitting on to allow more airflow underneath it? Also you may also want to take a look at TThrottle to keep temps under control. Cheers. |
v010dya Send message Joined: 25 Jun 11 Posts: 9 Credit: 1,569,505 RAC: 0 |
Nothing is blocking the airflow, but i will research this possibility a bit further. |
v010dya Send message Joined: 25 Jun 11 Posts: 9 Credit: 1,569,505 RAC: 0 |
TThrottle appears to be a Windows only application q;-(= So it's of little use to me. I'll look for something similar to it. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
After normal filesystem checks, I would start by checking the RAM voltage setting against the BIOS Vtt setting. e.g. normal DDR3 type memory ~1.5V w/~1.1Vtt, or high end corasir memory ~1.5V-1.6V w/1.2Vtt .... then doing a memtest86+ "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
v010dya Send message Joined: 25 Jun 11 Posts: 9 Credit: 1,569,505 RAC: 0 |
Ok, i7z does the trick. The average temperature seems to be about 80 degrees at this moment. I will play around with the load and see if it will help. |
Roy Collins Send message Joined: 12 Aug 99 Posts: 73 Credit: 53,671,192 RAC: 71 |
Sounds superficially similar to the problem I've been having. http://setiathome.berkeley.edu/forum_thread.php?id=73514 Do you see any other symptom when this happens? Roy |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Your errored tasks show You are using rev 1146. I wonder if there is more recent version available. setiathome_v7 7.00 $Revision: 1146 $ g++ (Ubuntu/Linaro 4.8.1-8ubuntu1) 4.8.1 To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Roy Collins Send message Joined: 12 Aug 99 Posts: 73 Credit: 53,671,192 RAC: 71 |
Too late to edit my post, unfortunately. Please ignore my post above = I had misread your statement and thought you were getting errors, not the invalids that you actually mentioned. Roy |
v010dya Send message Joined: 25 Jun 11 Posts: 9 Credit: 1,569,505 RAC: 0 |
Ran and reran memtest, no errors found. It's still happening even after i've lessened the load on the machine, but now less frequently. If it is due to the temperature, then it's way too sensitive, and no other software has any glitches, only BOINC, which is strange. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Ran and reran memtest, no errors found. It's still happening even after i've lessened the load on the machine, but now less frequently. If it is due to the temperature, then it's way too sensitive, and no other software has any glitches, only BOINC, which is strange. Projects that run under BOINC will often push a system much, much harder than most other software. It will show up weaknesses in a system, and Seti will cause problems on a system that has insufficient cooling, poor voltage regulation, borderline overclocking etc. Grant Darwin NT |
ralph Send message Joined: 19 Feb 12 Posts: 19 Credit: 31,993,767 RAC: 9 |
Hi Volodya, I am also having the same problems. Mine started in mid December but on a smaller scale. By the time I realized what was happening, I also had a growing stack of invalids. At this point I think it will be important to add that I also am using ubuntu, mine being 13.10 KED 3.5.0-34 with Boinc/seti@home/ubuntu being v 7.2.33 New Years day brought me a dead mother board and I said to myself that I should have seen the signs building. Small things were happening - nothing big except the dead mother board and this was a system that was less than a year old. The lights on the m board didn't even flicker. hind sight says that it could have been caused by other things too but I ordered a new mother board, cpu and memory sticks. which arrived this past Thursday. Friday morning I installed the new parts and it started right up. Crunching seti@home at twice the old rate! Later that afternoon I went in to see how it all was going -- only to find it was vomiting invalids with great speed. I cut back CPU rate to 50% to reduce the vomit rate and over this past weekend have also gone back to an earlier update version of 13.10 which had been 3.5.0 -34 ending up with 3.11.0 - 15. I created no invalids yesterday and things seemed to be working so I spent some time digging a bit deeper. ~~~ I was able to isolate the invalid work units to a group of files: 17oc13XXXXXXXXXXXXX 18oc13xxxxxxxxxxxxx 19oc13xXXXXXXXxxxxx 21dc08xxxxxxxxxxxxx 25mr08xxxxxxxxxxxxx There most likely are more groups but these were groups that I received. I did stop the invalids with those actions (mentioned above). I'm sure there will be more as the 'in process' work gets sorted out. I had hoped to continue with this today but my left click mouse problem kicked in and now my same (as in new Mobo,CPU and memory) computer doesn't want to boot ... again! Really! The new board, by the way, is a ASUS motherboard. At this point I think the root problem is between the above listed files and some quirk in them that has reacted with the way Ubuntu interfaces with the project. They are all very small files ... my larger files are all running fine. I have a Win 8.1 desktop that is merrily crunching away on it's Boinc projects and my laptop that wants NOTHING to do with Boinc or seti@home. Regards, Rocky |
ralph Send message Joined: 19 Feb 12 Posts: 19 Credit: 31,993,767 RAC: 9 |
Ran and reran memtest, no errors found. It's still happening even after i've lessened the load on the machine, but now less frequently. If it is due to the temperature, then it's way too sensitive, and no other software has any glitches, only BOINC, which is strange. Thanks for the input Grant...I guess I shouldn't be surprised. I have been crunching numbers for over a year now and hadn't been aware of any stress on my system...until now! But then, it is for a good project--right? |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I had hoped to continue with this today but my left click mouse problem kicked in and now my same (as in new Mobo,CPU and memory) computer doesn't want to boot ... again! Really! The new board, by the way, is a ASUS motherboard. Have you tried a different PSU? I've just switched from a Corsair AX850 to an AX1200i on my i7-2600K/GTX460/HD7770, where before I got restarts/BSODs if I crunched more than four CPU tasks at once, now I'm up to Six without problems so far. Claggy |
ralph Send message Joined: 19 Feb 12 Posts: 19 Credit: 31,993,767 RAC: 9 |
The idea never crossed through my brain...will do that in the AM. Thanks |
v010dya Send message Joined: 25 Jun 11 Posts: 9 Credit: 1,569,505 RAC: 0 |
@ralph I think we are having if not the same, then related issue. I am currently on the new machine (also Asus), and when i first got it (in December) there were only a couple of invalids per bunch, which has happened to be previously, but i've always attributed it to some sort of network glitch or a file corruption issue. I have now lowered the use of the CPU by BOINC to less than 50% (45% to be exact) and for it to use only 1 out of 8 virtual cores (so i now have only one process running at the time using a non-full core, rather than 3 using 95% of their core with which i've started). Unfortunately that does not resolve the problem completely, i still have some of the calculations coming back as invalid. Here is what really strikes me as odd, all the invalid results happen when the calculation takes about 5 minutes (sometimes as little as 30 seconds, sometimes as much as 8 minutes, but no longer). I have not noticed any invalid results that happen after a couple of hours of calculations... and that really makes me question the 'hardware-only hypothesis'. While i accept that hardware can accept part of the blame, if it were the only thing that's at fault here, i would expect calculations to be messed up at different times for different tasks. Unless of course there is some sort of "different" calculations that take place in the beginning of the task, and it is only with this "different" calculations that my cores have difficulty. It could be some issue with Ubuntu, but i don't have (and will never have) Windows on this machine. I have been thinking about trying out Slackware again, but i've been thinking about that for a couple of months already, and don't have the time so far. If i will find the time/energy i will report the results here. |
ralph Send message Joined: 19 Feb 12 Posts: 19 Credit: 31,993,767 RAC: 9 |
I was doing a final check this morning before hooking up a different power unit, but this time I included the KEYBOARD! OK, color me stupid -- I connected a spare...punched delete and the screen jumped into BIOS. Oh well, I really wasted yesterday morning! |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
... The system is "finding" Autocorrelation signals which other hosts don't, and usually enough to reach the 30 signal limit and overflow quickly. That's typical of hosts which produce excessive invalid results, though it may be another signal type. I've never seen a case where a lot of invalid results were caused by a host finding fewer signals than it should have. Joe |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
With a laptop, I'd consider using an external cooling stand whenever possible. The 'use only x% of CPU time' function is very badly implemented - basically boinc keeps stopping and starting the apps all the time. IOW with a 50% setting they run at full power but only half the time... If linux has any tools (nice level maybe? I've forgotten how nice exactly works) to throttle apps you might be better off applying that externally. Empirically, getting too many signals is often a sign of an overheated CPU/GPU. A person who won't read has no advantage over one who can't read. (Mark Twain) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.