Message boards :
Number crunching :
Major problem with a user
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next
Author | Message |
---|---|
Werecow Send message Joined: 13 Mar 05 Posts: 56 Credit: 4,917,657 RAC: 3 |
I doubt that Glenn did, or indeed ever will. I had an (unpleasant) exchange with him some time ago. His GTX560 is monstrously overclocked, and "it works fine for games, so why not for S@H?" (only with a pile of four letter words). If ever there was a computer and user that deserved to be cut off at its roots then this combination would be my nomination. One of my antique CPU-only machines would give him a guaranteed 50% RAC boost. I wonder if I could convince him to trade.. ;-) |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
I doubt that Glenn did, or indeed ever will. I had an (unpleasant) exchange with him some time ago. His GTX560 is monstrously overclocked, and "it works fine for games, so why not for S@H?" (only with a pile of four letter words). If ever there was a computer and user that deserved to be cut off at its roots then this combination would be my nomination. Yeah, I've got an old P4HT/XP box that RACs 415 right now. If he's a gamer, that's what he really cares about, and he's not likely to change anything to improve his S@H performance. Would be nice if there was a mechanism to boot folks such as him off the project....... Donald Infernal Optimist / Submariner, retired |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
A one in a million. No real harm is done. Your computers are running just fine. Keep tuning. When the time is done. Yours will still be running, tuned and humming. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Rasputin42 Send message Joined: 25 Jul 08 Posts: 412 Credit: 5,834,661 RAC: 0 |
This guys tasks are all erroring out. http://setiathome.berkeley.edu/show_host_detail.php?hostid=5295453 What to do? |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
I have a lot of wingmen which just destroy all tasks/the science ... (which make me very sad!) Just two wingman of one task ... http://setiathome.berkeley.edu/workunit.php?wuid=1801074356 The 'AMD Radeon HD 4670 (256MB) OpenCL: 1.0' destroy all tasks (errors) ... http://setiathome.berkeley.edu/show_host_detail.php?hostid=7426251 The 'AMD AMD Radeon HD 6200/6300/7200/7300 series (Wrestler) (384MB) driver: 1.4.1589 OpenCL: 1.1' make just '-9 overflows' ... http://setiathome.berkeley.edu/show_host_detail.php?hostid=7504170 Wrong driver? If so, why they get all this fresh tasks just for destroying the science? |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
26no12ac.15576.20108.438086664204.12.102 http://setiathome.berkeley.edu/workunit.php?wuid=1802092176 Two results with '-9 overflow' (two AMD VGA cards, stock SETI@home v7 v7.03 (opencl_ati_sah)): http://setiathome.berkeley.edu/show_host_detail.php?hostid=6889787 http://setiathome.berkeley.edu/show_host_detail.php?hostid=7504170 And a result with a correct result (stock CPU app): http://setiathome.berkeley.edu/show_host_detail.php?hostid=7522258 Spike count: 7 Autocorr count: 2 Pulse count: 3 Triplet count: 3 Gaussian count: 1 Which is marked as 'invalid' now and the science is destroyed, because the '-9 overflow' is now in the data base. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
26no12ac.15576.20108.438086664204.12.102 The science is not destroyed. We might be looking into a wrong direction, at a wrong vawelength, trying to find a completely wrong kind of a signal (it takes a lot of power to transmit at one frequency - easy to detect - tdma - fdma.) ... and those now missed signals can be recalculated ina a day with the HW available in ten years from now. (We're reprocessing right now some I guess) To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Rasputin42 Send message Joined: 25 Jul 08 Posts: 412 Credit: 5,834,661 RAC: 0 |
http://setiathome.berkeley.edu/show_host_detail.php?hostid=7044454 is an other one.... What is going on? |
betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66 |
http://setiathome.berkeley.edu/show_host_detail.php?hostid=7044454 This looks like it is another 560ti gone astray. |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
If you click to this kind of 'bad' hosts, and then to the overview 'valid' tasks of this hosts, look to them and then you will see wrong '-9 overflows' results which are marked as valid, and wingman with 'well' result but marked as invalid (at this WUs). Examples: 22my12ab.20524.6202.438086664200.12.217 http://setiathome.berkeley.edu/workunit.php?wuid=1795776824 Bad hosts which are wingmen: http://setiathome.berkeley.edu/show_host_detail.php?hostid=6889787 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6780580 And a well result of: http://setiathome.berkeley.edu/show_host_detail.php?hostid=7460773 Spike count: 4 Autocorr count: 0 Pulse count: 2 Triplet count: 0 Gaussian count: 0 24fe13ab.31742.13564.438086664207.12.86 http://setiathome.berkeley.edu/workunit.php?wuid=1801693399 Bad hosts which are wingmen: http://setiathome.berkeley.edu/show_host_detail.php?hostid=6656511 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6889787 And a well result of: http://setiathome.berkeley.edu/show_host_detail.php?hostid=7481053 Spike count: 2 Autocorr count: 2 Pulse count: 0 Triplet count: 2 Gaussian count: 0 26no12ac.15576.20108.438086664204.12.96 http://setiathome.berkeley.edu/workunit.php?wuid=1802092170 Bad hosts which are wingmen: http://setiathome.berkeley.edu/show_host_detail.php?hostid=6889787 http://setiathome.berkeley.edu/show_host_detail.php?hostid=7504170 And a well result of: http://setiathome.berkeley.edu/show_host_detail.php?hostid=6505277 Spike count: 12 Autocorr count: 0 Pulse count: 0 Triplet count: 1 Gaussian count: 0 25oc12ad.27789.18472.438086664204.12.109 http://setiathome.berkeley.edu/workunit.php?wuid=1802156502 Bad hosts which are wingmen: http://setiathome.berkeley.edu/show_host_detail.php?hostid=6889787 http://setiathome.berkeley.edu/show_host_detail.php?hostid=7504170 And a well result of: http://setiathome.berkeley.edu/show_host_detail.php?hostid=5967851 Spike count: 4 Autocorr count: 2 Pulse count: 1 Triplet count: 0 Gaussian count: 1 And always the '-9 overflow' is in the data base, and the well results are destroyed. Maybe it would be better if a VGA card send a '-9 overflow' result back, it should be calculated on a CPU again if there is really a '-9 overflow'. If it's true, it lasts just a few seconds. If it's not true, the science is rescued. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Well you know what they say "A broken clock is right twice a day" SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
These are just the 1's that have annoyed me in the last 24hrs. http://setiathome.berkeley.edu/show_host_detail.php?hostid=7546980 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6741097 http://setiathome.berkeley.edu/show_host_detail.php?hostid=5068008 http://setiathome.berkeley.edu/show_host_detail.php?hostid=7202354 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6518832 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6741097 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6072583 http://setiathome.berkeley.edu/show_host_detail.php?hostid=7374866 http://setiathome.berkeley.edu/show_host_detail.php?hostid=7585154 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6989786 http://setiathome.berkeley.edu/show_host_detail.php?hostid=5857368 http://setiathome.berkeley.edu/show_host_detail.php?hostid=7440946 http://setiathome.berkeley.edu/show_host_detail.php?hostid=7556063 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6880918 http://setiathome.berkeley.edu/show_host_detail.php?hostid=7459017 http://setiathome.berkeley.edu/show_host_detail.php?hostid=7200981 Cheers. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
26no12ac.15576.20108.438086664204.12.102 Both hosts running the stock (opencl_ati_sah) are using Catalyst 11.10 drivers (AMD OpenCL SDK 2.5), known to cause such problems. That stock build is rev. 1831, we need to get newer OpenCL_ATi builds deployed here which will refuse to run with anything older than Catalyst 11.12 (AMD OpenCL SDK 2.6). Such apps are under test at SETI BETA, of course, and I hope those who care about quality will devote some of their crunching resources there. In addition to the check of drivers, recent MB7 OpenCL apps also have a sanity check designed to error out for such Autocorr overflows. Getting that tuned so it does error out reliably on bad cases but is unlikely to do so on a good result has been challenging, the apps at Beta are currently too sensitive in that respect but there's a refinement in the pipeline. Joe |
Blurf Send message Joined: 2 Sep 06 Posts: 8962 Credit: 12,678,685 RAC: 0 |
Maybe send a list of the "bad" users to the mods and have them send it up to Eric? Just a thought.... |
Werecow Send message Joined: 13 Mar 05 Posts: 56 Credit: 4,917,657 RAC: 3 |
Rogue hosts like these were being tracked in this thread, with PMs sent to users. A few got sorted that way. Unfortunately, the thread doesn't seem to be monitored much now, if at all. |
Dimly Lit Lightbulb 😀 Send message Joined: 30 Aug 08 Posts: 15399 Credit: 7,423,413 RAC: 1 |
Maybe send a list of the "bad" users to the mods and have them send it up to Eric? Just a thought.... That's admin stuff, we already have enough paperwork to do around here already. Member of the People Encouraging Niceness In Society club. |
Cavalary Send message Joined: 15 Jul 99 Posts: 104 Credit: 7,507,548 RAC: 38 |
*skimmed thread* This should be automated. Besides, as was already suggested, a mandatory CPU validation of 2 GPUs agreeing on an overflow (or overflow of spikes and/or autocorrs at least, as if it gets past that with a full match it's likely right... I think), hosts with many invalids should receive notices, possibly be required to do something to confirm they saw them and intend to fix the issue, and if the situation doesn't improve be stopped from getting more WUs, at least for the type of processor that causes the invalids. And this is a matter to be tackled at BOINC level, not project level. Quite shocked it hasn't been, really, especially since I think it wouldn't be hard to implement. |
rob smith Send message Joined: 7 Mar 03 Posts: 22199 Credit: 416,307,556 RAC: 380 |
Doing what you suggest would not work given the ratio of tasks that are returned by GPU:CPU. A far better solution would be restrict the number of tasks allowed in the cache of an errant processor when it exceeds a given number (or ratio) of invalid tasks, and the number of tasks permitted per day. Reducing both by a factor of ten (ten in the cache per GPU or CPU, which ever is "offending"). Continued no remedial action resulting in another factor of ten... If that processor started to return valid tasks then slowly ramp the cache and daily back to normal levels, taking days/weeks to reach the "normal" levels. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Darth Beaver Send message Joined: 20 Aug 99 Posts: 6728 Credit: 21,443,075 RAC: 3 |
NO Donald still no contact , sorry i have been busy and this is first chance i've had to check things out . |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
The quota system used to be pretty good at putting a stop to these runaway hosts... but then someone decided the absolute bottom value must be 33, so basically all that does is just beg for large amounts of problems. It used to (I think..?) start for a new host at something like.. 10. For every good task they returned, it would +5, for every bad result, it would divide by 2. You could go all the way up to 100, and go as low as 1. If you had a runaway machine, you would be limited to 1 task/day until you fixed the problem. Okay, so it took a while to be able to get a reasonable cache by only gaining +5 for every good task, but it meant that those machines that had made it to 100 were reliable and dependable. Now though... I think it is something like it doubles for every one task that is good up to 100, then +1 for every one beyond that, and halves for every bad one, but cannot go below 33. This is made worse by the fact that there is no upper-limit on number of tasks/day, so if a good machine has been going for a while and is theoretically allowed thousands of tasks/day and then something goes wrong and it starts trashing WUs, it takes a while to get down to 33 from 10,000+ by dividing by 2 each time. What's more is.. when these -9 overflows validate against other -9s, that counts as "good" and raises your daily limit again. Long story short.. I don't particularly think this is specifically an issue with not letting two overflow results go without a CPU's opinion (because all that will end up happening is the CPU's result won't match the majority, and the CPU's result will be discarded as invalid--which is what already happens quite frequently anyway). The best solution to this problem is a simple one: have a very strict, draconian, unforgiving quota system. Something like: - Start every machine at a daily limit of 10 - For every valid task, +2 to the quota - For every bad task, /2 to the quota if below 200, /5 if above 200 - Allow quota to go all the way down to 1, no upper limit. This would feel agonizing for everyone at first, but the good, reliable machines would quickly build their quota up to a point where the server-side cache limits become the limiting factor, just like it is now. This would drastically limit these runaway hosts down to such a small amount of work that they can't really do much of any damage at all. If the owners of these machines want to complain about how they can't get any work, the response should be "then fix whatever problem your machine has, and then you'll get more work." That's the way the quota system is supposed to be: rewarded for good work, punished for bad work. The way it is right now, there is no punishment, because you can still get tons of tasks to trash and there's not really anything that can stop you from doing so. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.