Cancelled by project question

Author	Message
OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 614301 - Posted: 3 Aug 2007, 20:32:58 UTC I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result). In the interest of useful science and conserving our environment, does this mean that most slower computers are essentially redundant now? Does this mean there's no useful reason to run them anymore (other than to increase one's RAC)? ID: 614301 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 614308 - Posted: 3 Aug 2007, 20:42:20 UTC Yep, as long as trailers are sent by default, that's what it means. ;-) Alinator ID: 614308 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 614312 - Posted: 3 Aug 2007, 20:45:10 UTC Last modified: 3 Aug 2007, 20:45:32 UTC So if they change the default so that a third WU is only sent out if the first two don't match (etc.), then slower computers will be doing useful science again? ID: 614312 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 614313 - Posted: 3 Aug 2007, 20:46:00 UTC Correct. ID: 614313 ·

Philadelphia Volunteer tester Send message Joined: 12 Feb 07 Posts: 1590 Credit: 399,688 RAC: 0	Message 614317 - Posted: 3 Aug 2007, 20:48:20 UTC - in response to Message 614313. Correct. That almost sounds like a "Hey, we'll call you when we need you" situation. That's not very nice :-( ID: 614317 ·

Henk Haneveld Volunteer tester Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1	Message 614322 - Posted: 3 Aug 2007, 20:54:25 UTC Wrong. All the WU's with 2 returned results are aborted from your cache. What is left is usefull work. I run a 10 days cache and still have pending results, meaning that my result is the first returned. ID: 614322 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 614328 - Posted: 3 Aug 2007, 21:02:52 UTC Last modified: 3 Aug 2007, 21:03:52 UTC Except that before 5.10 clients were available, even a fast host running a 10 cache was wasting a major chunk of its runtime doing useless trailers. My data showed that my T2400 running a 3 day cache was wasting fully a third of it's SAH runtime on trailers. That's why I dropped back to a CI just big enough to cover short outages, and if it runs out of SAH so be it, it has other projects to work on. Not wasting cash on 'junk' is more important, IMHO. Alinator ID: 614328 ·

James Nelson Volunteer tester Send message Joined: 23 Mar 02 Posts: 381 Credit: 4,806,382 RAC: 0	Message 614330 - Posted: 3 Aug 2007, 21:05:50 UTC - in response to Message 614301. I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result). In the interest of useful science and conserving our environment, does this mean that most slower computers are essentially redundant now? Does this mean there's no useful reason to run them anymore (other than to increase one's RAC)? I have two old slow computers and they often are the second result if not the first I'll keep them crunching as long as there is work to do. ID: 614330 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 614333 - Posted: 3 Aug 2007, 21:10:42 UTC Yep, not too much wrong with those PIII's. ;-) Alinator ID: 614333 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 614350 - Posted: 3 Aug 2007, 21:40:02 UTC - in response to Message 614322. Wrong. All the WU's with 2 returned results are aborted from your cache. What is left is usefull work. I run a 10 days cache and still have pending results, meaning that my result is the first returned. Only WUs that haven't already been started crunching will get cancelled. What if I have an AMD K6-2 500MHz system that takes 100 hours to complete? Chances are, two systems will return theirs much before I return mine, thus making my K6-2 a redundant-always machine. ID: 614350 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 614352 - Posted: 3 Aug 2007, 21:42:07 UTC - in response to Message 614330. Last modified: 3 Aug 2007, 21:43:31 UTC I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result). In the interest of useful science and conserving our environment, does this mean that most slower computers are essentially redundant now? Does this mean there's no useful reason to run them anymore (other than to increase one's RAC)? I have two old slow computers and they often are the second result if not the first I'll keep them crunching as long as there is work to do. How old and slow? Mine is a K6-2 500MHz and it is always the third man in. Always redundant, even with the MMX optimized app. I would not consider a PIII 866MHz machine "old and slow". PII and AMD K6 would be old and slow. ID: 614352 ·

James Nelson Volunteer tester Send message Joined: 23 Mar 02 Posts: 381 Credit: 4,806,382 RAC: 0	Message 614374 - Posted: 3 Aug 2007, 22:08:00 UTC - in response to Message 614352. I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result). In the interest of useful science and conserving our environment, does this mean that most slower computers are essentially redundant now? Does this mean there's no useful reason to run them anymore (other than to increase one's RAC)? I have two old slow computers and they often are the second result if not the first I'll keep them crunching as long as there is work to do. How old and slow? Mine is a K6-2 500MHz and it is always the third man in. Always redundant, even with the MMX optimized app. I would not consider a PIII 866MHz machine "old and slow". PII and AMD K6 would be old and slow. one is a PII 500 the other is a PIII 700 my fast computers are amd XP 2100 and 2800, I know not so fast next to a quad core but It's the best I have. ID: 614374 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 614380 - Posted: 3 Aug 2007, 22:19:41 UTC - in response to Message 614374. Last modified: 3 Aug 2007, 22:20:02 UTC one is a PII 500 the other is a PIII 700 my fast computers are amd XP 2100 and 2800, I know not so fast next to a quad core but It's the best I have. PIIs only went up to 450MHz. PIIIs started at 450MHz (Katmai) and went up to 600MHz before getting a revision (Coppermine). Your PII 500MHz is either a PIII or its a PII-based Celeron. Still, those chips are relatively fast. My K6-2 500MHz is about as fast as a PII 266MHz or 300MHz. Or even my P233MMX. It was always third man in on every task returned. Older Pentiums, Pentium IIs, AMD K5s and AMD K6 series processors all seem to be wasting electricity by contributing to my RAC without producing useful science. ID: 614380 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 614381 - Posted: 3 Aug 2007, 22:22:33 UTC Last modified: 3 Aug 2007, 22:28:16 UTC @ James: Hmmmm.... I was looking over your host list. When I took a look at some of the Result summaries for the 500 MHz Celeron I noticed the Coop app was reporting it as a Coppermine Celeron and capable of SSE. So you should be able to squeeze a little more out of it by running the SSE Coop app on it. Of course this assumes it wasn't being mis-identifed by the app. Alinator <edit> @ Ozz: Yeah the 500MHz PII part is what got me to take a closer look at the host list. ;-) Also, if I wasn't slugging it out with Dr. Watson for Class supremacy right now, my K6's would be crunching elsewhere. ;-) ID: 614381 ·

James Nelson Volunteer tester Send message Joined: 23 Mar 02 Posts: 381 Credit: 4,806,382 RAC: 0	Message 614435 - Posted: 3 Aug 2007, 23:47:46 UTC - in response to Message 614381. @ James: Hmmmm.... I was looking over your host list. When I took a look at some of the Result summaries for the 500 MHz Celeron I noticed the Coop app was reporting it as a Coppermine Celeron and capable of SSE. So you should be able to squeeze a little more out of it by running the SSE Coop app on it. Of course this assumes it wasn't being mis-identifed by the app. Alinator <edit> @ Ozz: Yeah the 500MHz PII part is what got me to take a closer look at the host list. ;-) Also, if I wasn't slugging it out with Dr. Watson for Class supremacy right now, my K6's would be crunching elsewhere. ;-) yes it does say sse but it sse doesnt work it runs for a while and errors out it wont run the whole way through so it must not be fully implimented . ID: 614435 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 614438 - Posted: 3 Aug 2007, 23:57:55 UTC - in response to Message 614301. I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result). In the interest of useful science and conserving our environment, does this mean that most slower computers are essentially redundant now? Does this mean there's no useful reason to run them anymore (other than to increase one's RAC)? Oh, I don't know.... If you are running the current 5.10.x client, and you have a short "connect every 'x' days" (like maybe 0.1) and you have a large cache using the "extra days" functionality, you'll be returning work reasonably fast. Probably before a fast machine with "connect every '4' days" or somesuch. That isn't saying there may be a better use of the electricity, but I think the new client actually makes your old, slow machines more likely to be effective (more likely to return one of the first two results). ID: 614438 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 614445 - Posted: 4 Aug 2007, 0:02:57 UTC - in response to Message 614438. I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result). In the interest of useful science and conserving our environment, does this mean that most slower computers are essentially redundant now? Does this mean there's no useful reason to run them anymore (other than to increase one's RAC)? Oh, I don't know.... If you are running the current 5.10.x client, and you have a short "connect every 'x' days" (like maybe 0.1) and you have a large cache using the "extra days" functionality, you'll be returning work reasonably fast. Probably before a fast machine with "connect every '4' days" or somesuch. My Connect To is at 0 and my Extra Cache is 2.75 days. It does not matter what the cache is because on these slow machines, it will download a single workunit that will fill up an entire 3 day cache. It will start processing it, making the servers unable to cancel the workunit if two have already returned by faster machines, and it will take 110 hours to complete, making it the last man in. It will then return the result (third one of course), download a new one, lather, rinse, repeat. That isn't saying there may be a better use of the electricity, but I think the new client actually makes your old, slow machines more likely to be effective (more likely to return one of the first two results). Not true in practice. Perhaps the theory wasn't worked out too well on this idea. Of course I like the idea of doing more useful science, but now all my slower machines have become redundant. ID: 614445 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 614479 - Posted: 4 Aug 2007, 0:51:16 UTC - in response to Message 614435. yes it does say sse but it sse doesnt work it runs for a while and errors out it wont run the whole way through so it must not be fully implimented . Hmmm... That's interesting. Maybe Simon or someone else from the Coop will see this and provide some insight. I agree though, a compute error is a definte showstopper. ;-) Alinator ID: 614479 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 614590 - Posted: 4 Aug 2007, 3:33:28 UTC - in response to Message 614479. yes it does say sse but it sse doesnt work it runs for a while and errors out it wont run the whole way through so it must not be fully implimented . Hmmm... That's interesting. Maybe Simon or someone else from the Coop will see this and provide some insight. I agree though, a compute error is a definte showstopper. ;-) Alinator I can give a little background information, at least. All the apps, including the MMX one, include our hand vectorized code subroutines as well as standard versions. The app tests which can be used and decides which is fastest. On James' system it is probable that some SSE routines are being used even though he's using the MMX app. OTOH, we're using IPP for FFTs and the MMX build wouldn't get a vectorized version. In addition, the Intel compiler may autovectorize some other areas for the SSE build. James' system isn't the only SSE capable system on which the SSE build fails, though it's fairly rare. There's no obvious cause, I'm not sure I could figure it out even if I decided to concentrate a lot of effort there. Joe ID: 614590 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 614596 - Posted: 4 Aug 2007, 3:52:28 UTC - in response to Message 614445. My Connect To is at 0 and my Extra Cache is 2.75 days. It does not matter what the cache is because on these slow machines, it will download a single workunit that will fill up an entire 3 day cache. It will start processing it, making the servers unable to cancel the workunit if two have already returned by faster machines, and it will take 110 hours to complete, making it the last man in. Keep in mind that "connect every '0' days" may be slightly dangerous: it is possible to report before the scheduler knows about the upload. That's why I suggested a small, but non-zero value. That isn't saying there may be a better use of the electricity, but I think the new client actually makes your old, slow machines more likely to be effective (more likely to return one of the first two results). Not true in practice. Perhaps the theory wasn't worked out too well on this idea. Of course I like the idea of doing more useful science, but now all my slower machines have become redundant. Before 5.10.x, your very slow machines were likely to be the last to report, and while no work was aborted, you were usually last (and perhaps, redundant). On 5.10.x, if you have a work unit that is already redundant, it has a good chance of being aborted before you crunch it. ... and if there is just one work unit at a time, then the behaviour is effectively the same. ID: 614596 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.