Message boards :
Number crunching :
Discussion of Invalid Host Messaging
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
I got a rather unusual reply from 1 person via PM today, rob99999_2 ID 129322, the owner of Computer 6797265, which produces nothing but errors from its GT 650M but here's his reply. No, he is not. He is the owner/user of his computer and only he is responsible for what this computer is doing, just like he has to watch, that it's not sending out spam mails or participate in DDoS attacs, so he has to watch what it is doing with the SETI WUs, that it gets assigned. Just like a car driver is responsible for his car, listen to it and watch how it behaves carefully and if you suspect, that something might be wrong, stop and call help, if you can't fix it yourself. |
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
I got a rather unusual reply from 1 person via PM today, rob99999_2 ID 129322, the owner of Computer 6797265, which produces nothing but errors from its GT 650M but here's his reply. Nothing is wrong with his car nothing is wrong with his engine he has it serviced correctly, however the manufacturer has failed to tell him that there is a fault that will mean his engine is about to breakdown. Can the manufacturer fix it, no you have to do it yourself. Or stop using the car! |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
A car is a deadly weapon so there is a definite moral obligation to keep it well maintained, backed most places by legal requirements. A system crunching SaH data without producing correct results is turning electrical energy into heat energy without advancing the scientific search we're trying to do. Because the user presumably intended to help the cause, sending a heads up message when there's good evidence the user hasn't noticed the problem makes sense. The additional load on the servers caused by systems gone bad can't be separated from cases where the user has decided to stop crunching, etc. But there's an easy way to see how much overall waste there is. If there were no waste, the ratio of "Results waiting for db purging" to "Workunits waiting for db purging" would be exactly 2. In practice the MB ratio is usually between 2.1 and 2.2 indicating waste of 5 to 10 percent. Joe |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
I got a rather unusual reply from 1 person via PM today, rob99999_2 ID 129322, the owner of Computer 6797265, which produces nothing but errors from its GT 650M but here's his reply. I know that car-computer comparisons are crap, but sometimes I have not a better one. Point is: his computer fails, he should be the first who notices it and see if he can fix it or ask for help. I have the similar situation with Milkyway right now: my old ATI HD3850 can only run the older (not really supported anymore) CAL application, I have to watch if new batches of WUs are still compatible with it, if not I have to stop crunching. It was already once the case, I had to stop crunching for about a month, than it worked again. Wether old or new hardware, you have to watch it, something might always not work as expected. Specially after any changes on the system, for example if you buy a new card or install new drivers, you have to first see that it actually works before you let it do it's work without too much attention from your side. And I'm pretty sure that most of the owners of those 560Ti cards have skipped that part. Something like "set and forget" does not exist with computers anyway, even if many think so. In best case it's "set, see that it works and hope it lasts for a while". |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
A car is a deadly weapon so there is a definite moral obligation to keep it well maintained, backed most places by legal requirements. Not all issues with a car make it more dangerous, if it's leaking a drop of oil every now and than, it's still safe to drive but bad for the environment. And so are such hosts for the SETI environment, they waste bandwidth and eventually (if two such hosts validate against eachother like fermi cards did before) even compromise the science. |
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
Everyone is correct, however it means SETI@Home is not set and forget. It should have instructions posted that explain the problems with GPU crunching and a need to check on a regular basis to see if your results are valid. Also to warn people that if they are not prepared to do this they could return invalid results and it is best they don't crunch using a GPU. They especially will need to check when updating the GPU drivers as this has introduced several bugs in the past. Also before buying a new modern graphics card please check on the forums to see if it will work with SET@Home and or the latest drivers! If you are unsure of any of this please do not attach your computer to SETI@Home. In real terms that is what anyone crunching needs to be aware of Of course no one want to post that on the front page but something like that is needed. I am aware that updates are due, but who is to say that in 3 or 6 months time a new GPU or driver. won't start this whole thing off again. |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
GPU computing disabled by default and a red "READ THIS FIRST" link to a page with a short info like the one you posted would be IMO a good solution. CPU-only crunching might be "set and forget", GPU crunching is not. An alternative would be a better quota system, one which counts invalids as errors and which expects something better than 1 valid result out of 50. 98% failure rate can't be "OK", even 50% would be IMHO way to much, but should be good enough to start with. I mean it's not just SETI, I crunch also for Milkyway and Collatz and issues like that, i.e. with new hardware or drivers occur on those projects as well everytime nVidia or ATI comes up with something "revolutionary". Hence I don't see it as a fault of the project staff, if their apps don't run properly on a new hardware. |
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
Hence I don't see it as a fault of the project staff, if their apps don't run properly on a new hardware. No possibly not, but if GPU crunching is not "set and forget" both current and prospective users need to know, otherwise as you say we could end up with errors validating against each other, corrupting the science! Users like rob99999_2 need to know what they are getting into. |
Murat Adas Send message Joined: 9 Aug 10 Posts: 1 Credit: 1,585,782 RAC: 0 |
After reading NVidia driver problems which cause computation errors by Richard Haselgrove I've changed my avanced power settings, below are the steps I used to accomplish this (Windows 7) right clicking on desktop selecting personalize then selected screen saver made sure I have none selected for screen saver then clicked on Change power settings next I clicked Change plan settings made sure Turn off the display and Put the computer to sleep are Never Finally I clicked on Change advanced power settings Under the toolbar Sleep - Allow hybrid sleep I turned the setting to "Off". (default was On) I hope this helps, if not please let me know weather to roll back to a previous driver? Thanks |
Tron Send message Joined: 16 Aug 09 Posts: 180 Credit: 2,250,468 RAC: 0 |
After reading NVidia driver problems which cause computation errors by Richard Haselgrove Thank You for posting. It will take a few days for the dust to settle on the invalids before you can see for sure whether the changes you made helped. You can keep an eye on your finished tasks in the mean time , .. watch for short run times , those tend to be the -9 error you are experiencing. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
After reading NVidia driver problems which cause computation errors by Richard Haselgrove Read again: 1) Sleeping Monitor Bug Drivers affected: 295.51 (BETA), 295.73 and 296.10 You use driver: 306.97 so what you did was not needed. You also don't have 'Kepler' GPU (GT 6xx and GTX 6xx) so the other (CUDA_GRID_SIZE_COMPAT) fix do not apply to you. GTX 560 Ti problems are 'famous' and not related to 'Sleeping Monitor Bug' nor 'Kepler' Read 'a few' threads about GTX 560 Ti problems: http://www.google.com/#hl=en&q=560+Ti+problems+site:setiathome.berkeley.edu  - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
Thndr Send message Joined: 4 Mar 02 Posts: 18 Credit: 3,477,289 RAC: 1 |
I have told it not to use gpu and set power settings to never turn off monitor. let me know. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I have told it not to use gpu and set power settings to never turn off monitor. let me know. Setting power settings to never turn off your monitor won't help, you're not running 295.xx or 296.xx drivers, Looking at your inconclusive/errored tasks, they are a mixture of CPU and GPU tasks, all the one's i looked at all say 'Restarted at 100.00 percent.' which is strange, then looking at the stderr.txt results, multiple tasks have the same result, eithier: <core_client_version>7.0.28</core_client_version> <![CDATA[ <stderr_txt> Restarted at 100.00 percent. Flopcounter: 48049228806222.281000 Spike count: 1 Pulse count: 1 Triplet count: 8 Gaussian count: 0 called boinc_finish </stderr_txt> ]]> http://setiathome.berkeley.edu/result.php?resultid=2686739603 http://setiathome.berkeley.edu/result.php?resultid=2686762450 http://setiathome.berkeley.edu/result.php?resultid=2686762474 http://setiathome.berkeley.edu/result.php?resultid=2688750025 Or: Spike count: 10 Pulse count: 0 Triplet count: 0 Gaussian count: 3 called boinc_finish http://setiathome.berkeley.edu/result.php?resultid=2687146759 http://setiathome.berkeley.edu/result.php?resultid=2687140765 Or: Spike count: 14 Pulse count: 5 Triplet count: 12 Gaussian count: 0 http://setiathome.berkeley.edu/result.php?resultid=2687140761 http://setiathome.berkeley.edu/result.php?resultid=2686771628 Looks like your slot directories aren't getting cleared for some reason, Please post your Boinc startup messages from the Event Log, the first 30 lines will do. Claggy |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Looks as if Thndr has fixed his problems with his slot directories, he's now fully completing 6.03, 6.10 (cuda_fermi), AstroPulse v6 v6.01 and AstroPulse v6 v6.04 (cuda_opencl_100) tasks, although it would have been good if he had responded and told us what he found and did. All tasks for computer 6432659 Claggy |
Thndr Send message Joined: 4 Mar 02 Posts: 18 Credit: 3,477,289 RAC: 1 |
Looks as if Thndr has fixed his problems with his slot directories, he's now fully completing 6.03, 6.10 (cuda_fermi), AstroPulse v6 v6.01 and AstroPulse v6 v6.04 (cuda_opencl_100) tasks, Well.... to make a long story short, I scrapped the boinc software and started over... that and I changed power settings and reset the project and environment but!! I'm back to 6.10 errors again! I checked everything and gpu usage was turned back on?? how?? Clearly this is NOT just a driver problem. https://www.facebook.com/LAKEVILLEUNITYGARDENS/ |
Thndr Send message Joined: 4 Mar 02 Posts: 18 Credit: 3,477,289 RAC: 1 |
I have removed the boinc manager from my machine again. I will watch this thread for further developments. I can not see wasting my efforts and messing up data packets until there is a fix. https://www.facebook.com/LAKEVILLEUNITYGARDENS/ |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Looks as if Thndr has fixed his problems with his slot directories, he's now fully completing 6.03, 6.10 (cuda_fermi), AstroPulse v6 v6.01 and AstroPulse v6 v6.04 (cuda_opencl_100) tasks, What enviromental setting? If it's the one in the 'NVidia driver problems which cause computation errors' thread, please Note that is for 6** Keplar GPUs only and is not required on your GTS 450, and you also don't need to change power settings as you're not running 295.xx or 296.xx drivers, Uninstalling and Reinstalling the Boinc software didn't help as that only installs the program, the Boinc Data directory is left intact, and that is where your problem is, looking at your errored tasks still shows 'Restarted at 100.00 percent.', did you go and empty all the slot directories, did you delete them? or did you not touch them? Please post your Boinc startup messages from the Event Log, the first 20 to 30 lines will do (I've already asked you for it once before) <core_client_version>7.0.28</core_client_version> Claggy |
Thndr Send message Joined: 4 Mar 02 Posts: 18 Credit: 3,477,289 RAC: 1 |
didnt delete anything https://www.facebook.com/LAKEVILLEUNITYGARDENS/ |
Thndr Send message Joined: 4 Mar 02 Posts: 18 Credit: 3,477,289 RAC: 1 |
didnt delete anything program is completely uninstalled. https://www.facebook.com/LAKEVILLEUNITYGARDENS/ |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
didnt delete anything Then reinstall, The Boinc Data directory is Never removed when uninstalling Boinc, post the startup messages, then i'll known what directory you're installed the Data directory to, and should known whether permissions will be correct or not, then we can get the slot directories cleaned up, after that things should just work, Claggy |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.