BOINC produces about 70% of INVALID results

Message boards : Number crunching : BOINC produces about 70% of INVALID results
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
v010dya

Send message
Joined: 25 Jun 11
Posts: 9
Credit: 1,569,505
RAC: 0
Message 1461104 - Posted: 6 Jan 2014, 11:48:23 UTC

Please help.

I have tried to report this to Ubuntu there i've been told to report it to BOINC. After sending it to BOINC alpha list i've been told that the issue is likely to be with SETI and to report it here.

The issue is the following:

I have gotten myself a new machine (Asus N76V) and now quite a few SETI
bundles get "completed" about 10 times faster than the reported time (reported
time is from 40 to 90 minutes and the real time spent is from 5 to 8
usually... but sometimes as little as 30 seconds).

When i check my results at SETI project i see that i now have a huge number of
invalid results.

The behaviour is unpredictable, sometimes things run correctly for a day or
so, and sometimes almost all the tasks are trashed. System load and other
similar factors seem to not play any role in what is happening.

Previous bug reports:
https://bugs.launchpad.net/ubuntu/+source/boinc/+bug/1266361
http://lists.ssl.berkeley.edu/mailman/private/boinc_alpha/2014-January/019225.html
ID: 1461104 · Report as offensive
v010dya

Send message
Joined: 25 Jun 11
Posts: 9
Credit: 1,569,505
RAC: 0
Message 1461106 - Posted: 6 Jan 2014, 11:53:43 UTC - in response to Message 1461104.  

Note: I did not overclock the machine, while i agree that there is a possibility that it is clocked higher than it should be, but then i guess i'll be complaining to Asus.

Are there other ppl on here with the similar machine? Is anybody having these issues?
ID: 1461106 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1461116 - Posted: 6 Jan 2014, 12:33:54 UTC
Last modified: 6 Jan 2014, 12:35:26 UTC

It maybe heat related, are you checking running temps?

Is the laptop lifted up off the surface it's sitting on to allow more airflow underneath it?

Also you may also want to take a look at TThrottle to keep temps under control.

Cheers.
ID: 1461116 · Report as offensive
v010dya

Send message
Joined: 25 Jun 11
Posts: 9
Credit: 1,569,505
RAC: 0
Message 1461120 - Posted: 6 Jan 2014, 12:49:41 UTC

Nothing is blocking the airflow, but i will research this possibility a bit further.
ID: 1461120 · Report as offensive
v010dya

Send message
Joined: 25 Jun 11
Posts: 9
Credit: 1,569,505
RAC: 0
Message 1461121 - Posted: 6 Jan 2014, 12:51:42 UTC - in response to Message 1461116.  

TThrottle appears to be a Windows only application q;-(= So it's of little use to me. I'll look for something similar to it.
ID: 1461121 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1461124 - Posted: 6 Jan 2014, 12:59:46 UTC
Last modified: 6 Jan 2014, 13:00:47 UTC

After normal filesystem checks, I would start by checking the RAM voltage setting against the BIOS Vtt setting. e.g. normal DDR3 type memory ~1.5V w/~1.1Vtt, or high end corasir memory ~1.5V-1.6V w/1.2Vtt .... then doing a memtest86+
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1461124 · Report as offensive
v010dya

Send message
Joined: 25 Jun 11
Posts: 9
Credit: 1,569,505
RAC: 0
Message 1461125 - Posted: 6 Jan 2014, 13:00:09 UTC - in response to Message 1461121.  

Ok, i7z does the trick. The average temperature seems to be about 80 degrees at this moment. I will play around with the load and see if it will help.
ID: 1461125 · Report as offensive
Roy Collins

Send message
Joined: 12 Aug 99
Posts: 73
Credit: 53,671,192
RAC: 71
United States
Message 1461139 - Posted: 6 Jan 2014, 14:34:44 UTC

Sounds superficially similar to the problem I've been having.

http://setiathome.berkeley.edu/forum_thread.php?id=73514

Do you see any other symptom when this happens?

Roy
ID: 1461139 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1461159 - Posted: 6 Jan 2014, 15:59:55 UTC - in response to Message 1461104.  

Your errored tasks show You are using rev 1146. I wonder if there is more recent version available.

setiathome_v7 7.00 $Revision: 1146 $ g++ (Ubuntu/Linaro 4.8.1-8ubuntu1) 4.8.1

To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1461159 · Report as offensive
Roy Collins

Send message
Joined: 12 Aug 99
Posts: 73
Credit: 53,671,192
RAC: 71
United States
Message 1461273 - Posted: 6 Jan 2014, 20:49:11 UTC - in response to Message 1461139.  

Too late to edit my post, unfortunately.
Please ignore my post above = I had misread your statement and thought you were getting errors, not the invalids that you actually mentioned.

Roy
ID: 1461273 · Report as offensive
v010dya

Send message
Joined: 25 Jun 11
Posts: 9
Credit: 1,569,505
RAC: 0
Message 1463904 - Posted: 13 Jan 2014, 12:33:06 UTC

Ran and reran memtest, no errors found. It's still happening even after i've lessened the load on the machine, but now less frequently. If it is due to the temperature, then it's way too sensitive, and no other software has any glitches, only BOINC, which is strange.
ID: 1463904 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1464061 - Posted: 13 Jan 2014, 20:37:26 UTC - in response to Message 1463904.  
Last modified: 13 Jan 2014, 20:39:59 UTC

Ran and reran memtest, no errors found. It's still happening even after i've lessened the load on the machine, but now less frequently. If it is due to the temperature, then it's way too sensitive, and no other software has any glitches, only BOINC, which is strange.

Projects that run under BOINC will often push a system much, much harder than most other software. It will show up weaknesses in a system, and Seti will cause problems on a system that has insufficient cooling, poor voltage regulation, borderline overclocking etc.
Grant
Darwin NT
ID: 1464061 · Report as offensive
ralph

Send message
Joined: 19 Feb 12
Posts: 19
Credit: 31,993,767
RAC: 9
United States
Message 1464072 - Posted: 13 Jan 2014, 21:16:21 UTC - in response to Message 1461104.  

Hi Volodya, I am also having the same problems. Mine started in mid December but on a smaller scale. By the time I realized what was happening, I also had a growing stack of invalids.

At this point I think it will be important to add that I also am using ubuntu, mine being 13.10 KED 3.5.0-34 with Boinc/seti@home/ubuntu being v 7.2.33

New Years day brought me a dead mother board and I said to myself that I should have seen the signs building. Small things were happening - nothing big except the dead mother board and this was a system that was less than a year old. The lights on the m board didn't even flicker. hind sight says that it could have been caused by other things too but I ordered a new mother board, cpu and memory sticks. which arrived this past Thursday.

Friday morning I installed the new parts and it started right up. Crunching seti@home at twice the old rate! Later that afternoon I went in to see how it all was going -- only to find it was vomiting invalids with great speed.

I cut back CPU rate to 50% to reduce the vomit rate and over this past weekend have also gone back to an earlier update version of 13.10 which had been 3.5.0 -34 ending up with 3.11.0 - 15. I created no invalids yesterday and things seemed to be working so I spent some time digging a bit deeper. ~~~

I was able to isolate the invalid work units to a group of files:

17oc13XXXXXXXXXXXXX
18oc13xxxxxxxxxxxxx
19oc13xXXXXXXXxxxxx
21dc08xxxxxxxxxxxxx
25mr08xxxxxxxxxxxxx

There most likely are more groups but these were groups that I received. I did
stop the invalids with those actions (mentioned above). I'm sure there will be more as the 'in process' work gets sorted out.

I had hoped to continue with this today but my left click mouse problem kicked in and now my same (as in new Mobo,CPU and memory) computer doesn't want to boot ... again! Really! The new board, by the way, is a ASUS motherboard.


At this point I think the root problem is between the above listed files and some quirk in them that has reacted with the way Ubuntu interfaces with the project. They are all very small files ... my larger files are all running fine.

I have a Win 8.1 desktop that is merrily crunching away on it's Boinc projects and my laptop that wants NOTHING to do with Boinc or seti@home.

Regards,
Rocky
ID: 1464072 · Report as offensive
ralph

Send message
Joined: 19 Feb 12
Posts: 19
Credit: 31,993,767
RAC: 9
United States
Message 1464079 - Posted: 13 Jan 2014, 21:50:00 UTC - in response to Message 1464061.  

Ran and reran memtest, no errors found. It's still happening even after i've lessened the load on the machine, but now less frequently. If it is due to the temperature, then it's way too sensitive, and no other software has any glitches, only BOINC, which is strange.

Projects that run under BOINC will often push a system much, much harder than most other software. It will show up weaknesses in a system, and Seti will cause problems on a system that has insufficient cooling, poor voltage regulation, borderline overclocking etc.


Thanks for the input Grant...I guess I shouldn't be surprised. I have been crunching numbers for over a year now and hadn't been aware of any stress on my system...until now! But then, it is for a good project--right?
ID: 1464079 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1464082 - Posted: 13 Jan 2014, 22:21:26 UTC - in response to Message 1464072.  

I had hoped to continue with this today but my left click mouse problem kicked in and now my same (as in new Mobo,CPU and memory) computer doesn't want to boot ... again! Really! The new board, by the way, is a ASUS motherboard.

Have you tried a different PSU?

I've just switched from a Corsair AX850 to an AX1200i on my i7-2600K/GTX460/HD7770, where before I got restarts/BSODs if I crunched more than four CPU tasks at once, now I'm up to Six without problems so far.

Claggy
ID: 1464082 · Report as offensive
ralph

Send message
Joined: 19 Feb 12
Posts: 19
Credit: 31,993,767
RAC: 9
United States
Message 1464137 - Posted: 14 Jan 2014, 3:17:38 UTC - in response to Message 1464082.  

The idea never crossed through my brain...will do that in the AM. Thanks
ID: 1464137 · Report as offensive
v010dya

Send message
Joined: 25 Jun 11
Posts: 9
Credit: 1,569,505
RAC: 0
Message 1464191 - Posted: 14 Jan 2014, 6:28:24 UTC - in response to Message 1464072.  

@ralph

I think we are having if not the same, then related issue. I am currently on the new machine (also Asus), and when i first got it (in December) there were only a couple of invalids per bunch, which has happened to be previously, but i've always attributed it to some sort of network glitch or a file corruption issue.

I have now lowered the use of the CPU by BOINC to less than 50% (45% to be exact) and for it to use only 1 out of 8 virtual cores (so i now have only one process running at the time using a non-full core, rather than 3 using 95% of their core with which i've started). Unfortunately that does not resolve the problem completely, i still have some of the calculations coming back as invalid.

Here is what really strikes me as odd, all the invalid results happen when the calculation takes about 5 minutes (sometimes as little as 30 seconds, sometimes as much as 8 minutes, but no longer). I have not noticed any invalid results that happen after a couple of hours of calculations... and that really makes me question the 'hardware-only hypothesis'. While i accept that hardware can accept part of the blame, if it were the only thing that's at fault here, i would expect calculations to be messed up at different times for different tasks. Unless of course there is some sort of "different" calculations that take place in the beginning of the task, and it is only with this "different" calculations that my cores have difficulty.

It could be some issue with Ubuntu, but i don't have (and will never have) Windows on this machine. I have been thinking about trying out Slackware again, but i've been thinking about that for a couple of months already, and don't have the time so far. If i will find the time/energy i will report the results here.
ID: 1464191 · Report as offensive
ralph

Send message
Joined: 19 Feb 12
Posts: 19
Credit: 31,993,767
RAC: 9
United States
Message 1464360 - Posted: 14 Jan 2014, 16:24:30 UTC - in response to Message 1464082.  

I was doing a final check this morning before hooking up a different power unit, but this time I included the KEYBOARD! OK, color me stupid -- I connected a spare...punched delete and the screen jumped into BIOS. Oh well, I really wasted yesterday morning!
ID: 1464360 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1464440 - Posted: 15 Jan 2014, 0:17:38 UTC - in response to Message 1464191.  

...
Here is what really strikes me as odd, all the invalid results happen when the calculation takes about 5 minutes (sometimes as little as 30 seconds, sometimes as much as 8 minutes, but no longer). I have not noticed any invalid results that happen after a couple of hours of calculations... and that really makes me question the 'hardware-only hypothesis'. While i accept that hardware can accept part of the blame, if it were the only thing that's at fault here, i would expect calculations to be messed up at different times for different tasks. Unless of course there is some sort of "different" calculations that take place in the beginning of the task, and it is only with this "different" calculations that my cores have difficulty.
...

The system is "finding" Autocorrelation signals which other hosts don't, and usually enough to reach the 30 signal limit and overflow quickly. That's typical of hosts which produce excessive invalid results, though it may be another signal type. I've never seen a case where a lot of invalid results were caused by a host finding fewer signals than it should have.
                                                                   Joe
ID: 1464440 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1464635 - Posted: 15 Jan 2014, 16:39:55 UTC

With a laptop, I'd consider using an external cooling stand whenever possible.

The 'use only x% of CPU time' function is very badly implemented - basically boinc keeps stopping and starting the apps all the time. IOW with a 50% setting they run at full power but only half the time...

If linux has any tools (nice level maybe? I've forgotten how nice exactly works) to throttle apps you might be better off applying that externally.

Empirically, getting too many signals is often a sign of an overheated CPU/GPU.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1464635 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : BOINC produces about 70% of INVALID results


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.