Message boards :
Number crunching :
Strange result, how is this possible?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
See: wuid=669862967, and you'll find two super trashing machines validate each others -9's, making the likely right result invalid, and enter false data into the data base and science. Specifically, Computer 5472266 and Computer 5293938, both i7 CPUs with Fermi cards, and both running Raistmer's incompatible v12 autokill application. This is the sort of thing which, unfortunately, gets optimisation a bad name. To recap: the early NVidia stock application was incompatible with Fermis, too - and nobody, not even NVidia (who wrote the app) knew that incompatible hardware would be developed 18 months later. So I have no criticism of Raistmer or his application, which was fit, proper and useful in its time. The problem is the inappropriate use of optimised applications in general. When the Fermi incompatibility was discovered, a new stock application could be (and was) rolled out, so that all 'set and forget' hosts got the new application automatically and stopped trashing WUs. But all third-party applications, whether optimised or not, bypass the automatic updating process - so these unattended machines will go on trashing work until forcibly stopped, either by their owners or by the project blocking them. That is why all contributors to this message board have a heavy responsibility when suggesting the use of third-party applications. You must, must, must stress the importance of users monitoring both the health of their machines, and software developments which may require a manual upgrade, as it does in this case. The two users hare have, in the first case, never posted on these message boards, and in the second case, not posted for almost 4 years. What on earth are people like that doing running optimised applications? Who told them the applications existed, or where to get them? I sincerely hope nobody here would do such a thing (without including the caution on updating), and I also hope there aren't any hardware or team forums out there giving bad advice, as well. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Or how about the SETI project tries to contact these host owners to correct the problem, and if that doesn't help for whatever reason, block these hosts? Can they block hosts if they want? No way does the project (or specifically, its - overworked, underpaid, and desparately outnumbered - staff) have time to do that. In other projects, the volunteer moderators - not having much forum moderating to do - have taken on this sort of role. But I don't see that happening here. And what happens if people have email notification turned off, or have stopped using the email registered when they joined the project? No, the simplest and most robust solution will be for the project to block the use of third-party applications entirely. Maybe that would bring some people to their senses. |
JohnDK Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127 |
Or how about the SETI project tries to contact these host owners to correct the problem, and if that doesn't help for whatever reason, block these hosts? Can they block hosts if they want? I didn't necessarily mean underpaid staff should do themselves but some volunteers of sort. They could try to contact the host owners in question, and if that's doesn't help, for example if e-mail doesn't work, they could put the host on a server block list. Blocking opp apps is a bad solution I think. |
tullio Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1 |
I am using a BOINC client 6.2.19 by Dotsch and a 6.03 SETI app on my virtual machine running Solaris Express on a Linux host and, although slower than the Lunatics app I am using on my Linux host, they never crash. No GPU. Tullio |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Or how about the SETI project tries to contact these host owners to correct the problem, and if that doesn't help for whatever reason, block these hosts? Can they block hosts if they want? I think in reality the project would want to avoid blocking anonymous applications except as the very, very last resort - but if the science is damaged by these false validations, and the reputation of the Lunatics applications is reduced as a consequence, then they might be forced to consider it - even if perfectly blameless participants like Tullio suffer collateral damage. But I don't see how volunteers could solve the problem. We don't have direct access to the users email, only indirectly via PMs: and we certainly don't have (and will never be given) the power to block individual hosts. The most we could ask is that a member of staff could develop a 'block and notify' script (such as was used under similar circumstances at CPDN), and agree to run it against hosts identified and notified by responsible volunteers. |
ReiAyanami Send message Joined: 6 Dec 05 Posts: 116 Credit: 222,900,202 RAC: 174 |
Hi, I just added a new cruncher three weeks ago, i7-950 with two EVGA GTX-470s running boinc apps and NO 3rd party apps on windows 7 pro. with the newest nVidia drivers. I occasionally just stare at boinc tasks screen (as probably all of you do sometimes) and noticed that once in a while cuda calculations finish in a couple of minutes despite the projected time of over 10 min while all other calculations of the same WU series take over 10 min. Is this what you are talking about? If so, what can be the cause of this in case of my PC? All stock apps and newest drivers, it's not even overclocked. |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
Or how about the SETI project tries to contact these host owners to correct the problem, and if that doesn't help for whatever reason, block these hosts? Can they block hosts if they want? I think you have an idea there with script to block. And I really hope that the guys in the lab read this thread. [/quote] Old James |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Hi, I just added a new cruncher three weeks ago, i7-950 with two EVGA GTX-470s running boinc apps and NO 3rd party apps on windows 7 pro. with the newest nVidia drivers. If it only happens 'once in a while', that's normal, and nothing to worry about. What we're talking about are computers which do that all day, every day, for months on end - and whose owners obviously don't look at them for at least as long. Yours is running fine. PS - well, not quite fine: Error tasks for computer 5636753. Maybe running a bit hot? But nothing like the problems in this thread. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Or how about the SETI project tries to contact these host owners to correct the problem, and if that doesn't help for whatever reason, block these hosts? Can they block hosts if they want? Here's the CPDN reference in case they do: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=6482 |
ReiAyanami Send message Joined: 6 Dec 05 Posts: 116 Credit: 222,900,202 RAC: 174 |
All your Cuda machines seems to work as they should, even the one you mentioned above. There's a few errors but nothing out of the ordinary. As far as I can judge you've done a good job in setting up your systems, and can relax, sit back and enjoy the ride. Thank you but following is what i found in my WU. http://setiathome.berkeley.edu/workunit.php?wuid=670260959 This work HAS -9 result_overflow I went back and looked at some pending results and found a few of these. |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
All your Cuda machines seems to work as they should, even the one you mentioned above. There's a few errors but nothing out of the ordinary. As far as I can judge you've done a good job in setting up your systems, and can relax, sit back and enjoy the ride. And welcome to the boards. Hopefully you have a program to monitor temps. Also it helps to clean the dust bunnies out every so often. And as I use this i7 as my daily driver I reboot about every 3 or 4 days. [/quote] Old James |
Dave Stegner Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27 |
Previously I mentioned "old" installations should be candidates for cut off also. Look at this one: http://setiathome.berkeley.edu/show_host_detail.php?hostid=3198906 How (and/or why would you want to) can a machine with a RAC of 100 be issued 1100+ work units?? Dave |
Dave Stegner Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27 |
He is running 4.19 Boinc. Did I read in another thread that the 4 versions do not calculate credit correctly any longer? I am paired with him on an AP WU. http://setiathome.berkeley.edu/workunit.php?wuid=674824439 I guess my AP instead of ~700 credits will receive nothing or close to ?? Dave |
tullio Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1 |
I have a cache of 0.25 days on 6 BOINC projects and I get a new WU to crunch on my Linux box only when the preceding one has been uploaded. And I am never out of work. Tullio |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
He is running 4.19 Boinc. Einstein@Home has a minimum boinc that you need to run. I cant fathom why Seti@Home dosent do the same. Seems that would whack some of the garbage being returned. [/quote] Old James |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Einstein@Home has a minimum boinc that you need to run. I cant fathom why Seti@Home dosent do the same. Seems that would whack some of the garbage being returned. The trouble is, that the replies from BOINC v4.19 aren't garbage. The primary output, the result file, aka 'science', is OK. It's only the secondary output, the credit, which is distorted. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
He is running 4.19 Boinc. You may have read that, but it isn't an accurate statement. Any BOINC version prior to 5.2.7 doesn't report the fpops_cumulative we were using for work based credit, but that no longer matters for the new credit system. The 4.x versions on Mac systems definitely have a problem establishing shared memory communication, and if that fails the application does the work in standalone mode and no CPU time is reported, leading to zero credit claims even with the new credit system. On other platforms the 4.x versions may also have weakness in that same area, but the pending tasks for that host have reasonable CPU times. Not having elapsed time reporting (which started someplace in the 6.6 series) also doesn't make much difference. Joe |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Sten-Arne wrote: But number one though, is to find an automated way to find and indentify those hosts that produces false results into the database. There's no way in h*** that it's going to be possible for volunteers to manually search through the system to find those hosts. It has to be done in some automatic database search fashion. 4202271 5257703 5293938 5354400 5393593 5396192 5445713 5467571 5472266 5478875 Joe |
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
Looking at that list I don't understand why with "Consecutive valid tasks=0" "Number of tasks today= several hundred" is this correct, if so why. Bernie. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Sten-Arne wrote:But number one though, is to find an automated way to find and indentify those hosts that produces false results into the database. There's no way in h*** that it's going to be possible for volunteers to manually search through the system to find those hosts. It has to be done in some automatic database search fashion. hostid=5393593 has reverted back to Stock since the list was published. Claggy |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.