Message boards :
Number crunching :
Cancelled by project question
Message board moderation
Author | Message |
---|---|
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result). In the interest of useful science and conserving our environment, does this mean that most slower computers are essentially redundant now? Does this mean there's no useful reason to run them anymore (other than to increase one's RAC)? |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Yep, as long as trailers are sent by default, that's what it means. ;-) Alinator |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
So if they change the default so that a third WU is only sent out if the first two don't match (etc.), then slower computers will be doing useful science again? |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Correct. |
Philadelphia Send message Joined: 12 Feb 07 Posts: 1590 Credit: 399,688 RAC: 0 |
Correct. That almost sounds like a "Hey, we'll call you when we need you" situation. That's not very nice :-( |
Henk Haneveld Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1 |
Wrong. All the WU's with 2 returned results are aborted from your cache. What is left is usefull work. I run a 10 days cache and still have pending results, meaning that my result is the first returned. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Except that before 5.10 clients were available, even a fast host running a 10 cache was wasting a major chunk of its runtime doing useless trailers. My data showed that my T2400 running a 3 day cache was wasting fully a third of it's SAH runtime on trailers. That's why I dropped back to a CI just big enough to cover short outages, and if it runs out of SAH so be it, it has other projects to work on. Not wasting cash on 'junk' is more important, IMHO. Alinator |
James Nelson Send message Joined: 23 Mar 02 Posts: 381 Credit: 4,806,382 RAC: 0 |
I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result). I have two old slow computers and they often are the second result if not the first I'll keep them crunching as long as there is work to do. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Yep, not too much wrong with those PIII's. ;-) Alinator |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Wrong. Only WUs that haven't already been started crunching will get cancelled. What if I have an AMD K6-2 500MHz system that takes 100 hours to complete? Chances are, two systems will return theirs much before I return mine, thus making my K6-2 a redundant-always machine. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result). How old and slow? Mine is a K6-2 500MHz and it is always the third man in. Always redundant, even with the MMX optimized app. I would not consider a PIII 866MHz machine "old and slow". PII and AMD K6 would be old and slow. |
James Nelson Send message Joined: 23 Mar 02 Posts: 381 Credit: 4,806,382 RAC: 0 |
I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result). one is a PII 500 the other is a PIII 700 my fast computers are amd XP 2100 and 2800, I know not so fast next to a quad core but It's the best I have. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
one is a PII 500 the other is a PIII 700 my fast computers are amd XP 2100 and 2800, I know not so fast next to a quad core but It's the best I have. PIIs only went up to 450MHz. PIIIs started at 450MHz (Katmai) and went up to 600MHz before getting a revision (Coppermine). Your PII 500MHz is either a PIII or its a PII-based Celeron. Still, those chips are relatively fast. My K6-2 500MHz is about as fast as a PII 266MHz or 300MHz. Or even my P233MMX. It was always third man in on every task returned. Older Pentiums, Pentium IIs, AMD K5s and AMD K6 series processors all seem to be wasting electricity by contributing to my RAC without producing useful science. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
@ James: Hmmmm.... I was looking over your host list. When I took a look at some of the Result summaries for the 500 MHz Celeron I noticed the Coop app was reporting it as a Coppermine Celeron and capable of SSE. So you should be able to squeeze a little more out of it by running the SSE Coop app on it. Of course this assumes it wasn't being mis-identifed by the app. Alinator <edit> @ Ozz: Yeah the 500MHz PII part is what got me to take a closer look at the host list. ;-) Also, if I wasn't slugging it out with Dr. Watson for Class supremacy right now, my K6's would be crunching elsewhere. ;-) |
James Nelson Send message Joined: 23 Mar 02 Posts: 381 Credit: 4,806,382 RAC: 0 |
@ James: yes it does say sse but it sse doesnt work it runs for a while and errors out it wont run the whole way through so it must not be fully implimented . |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result). Oh, I don't know.... If you are running the current 5.10.x client, and you have a short "connect every 'x' days" (like maybe 0.1) and you have a large cache using the "extra days" functionality, you'll be returning work reasonably fast. Probably before a fast machine with "connect every '4' days" or somesuch. That isn't saying there may be a better use of the electricity, but I think the new client actually makes your old, slow machines more likely to be effective (more likely to return one of the first two results). |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result). My Connect To is at 0 and my Extra Cache is 2.75 days. It does not matter what the cache is because on these slow machines, it will download a single workunit that will fill up an entire 3 day cache. It will start processing it, making the servers unable to cancel the workunit if two have already returned by faster machines, and it will take 110 hours to complete, making it the last man in. It will then return the result (third one of course), download a new one, lather, rinse, repeat. That isn't saying there may be a better use of the electricity, but I think the new client actually makes your old, slow machines more likely to be effective (more likely to return one of the first two results). Not true in practice. Perhaps the theory wasn't worked out too well on this idea. Of course I like the idea of doing more useful science, but now all my slower machines have become redundant. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Hmmm... That's interesting. Maybe Simon or someone else from the Coop will see this and provide some insight. I agree though, a compute error is a definte showstopper. ;-) Alinator |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
I can give a little background information, at least. All the apps, including the MMX one, include our hand vectorized code subroutines as well as standard versions. The app tests which can be used and decides which is fastest. On James' system it is probable that some SSE routines are being used even though he's using the MMX app. OTOH, we're using IPP for FFTs and the MMX build wouldn't get a vectorized version. In addition, the Intel compiler may autovectorize some other areas for the SSE build. James' system isn't the only SSE capable system on which the SSE build fails, though it's fairly rare. There's no obvious cause, I'm not sure I could figure it out even if I decided to concentrate a lot of effort there. Joe |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Keep in mind that "connect every '0' days" may be slightly dangerous: it is possible to report before the scheduler knows about the upload. That's why I suggested a small, but non-zero value. That isn't saying there may be a better use of the electricity, but I think the new client actually makes your old, slow machines more likely to be effective (more likely to return one of the first two results). Before 5.10.x, your very slow machines were likely to be the last to report, and while no work was aborted, you were usually last (and perhaps, redundant). On 5.10.x, if you have a work unit that is already redundant, it has a good chance of being aborted before you crunch it. ... and if there is just one work unit at a time, then the behaviour is effectively the same. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.