Cancelled by project question

Message boards : Number crunching : Cancelled by project question
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 614301 - Posted: 3 Aug 2007, 20:32:58 UTC

I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result).

In the interest of useful science and conserving our environment, does this mean that most slower computers are essentially redundant now? Does this mean there's no useful reason to run them anymore (other than to increase one's RAC)?
ID: 614301 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 614308 - Posted: 3 Aug 2007, 20:42:20 UTC

Yep, as long as trailers are sent by default, that's what it means. ;-)

Alinator
ID: 614308 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 614312 - Posted: 3 Aug 2007, 20:45:10 UTC
Last modified: 3 Aug 2007, 20:45:32 UTC

So if they change the default so that a third WU is only sent out if the first two don't match (etc.), then slower computers will be doing useful science again?
ID: 614312 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 614313 - Posted: 3 Aug 2007, 20:46:00 UTC

Correct.
ID: 614313 · Report as offensive
Profile Philadelphia
Volunteer tester
Avatar

Send message
Joined: 12 Feb 07
Posts: 1590
Credit: 399,688
RAC: 0
United States
Message 614317 - Posted: 3 Aug 2007, 20:48:20 UTC - in response to Message 614313.  

Correct.


That almost sounds like a "Hey, we'll call you when we need you" situation.

That's not very nice :-(
ID: 614317 · Report as offensive
Profile Henk Haneveld
Volunteer tester

Send message
Joined: 16 May 99
Posts: 154
Credit: 1,577,293
RAC: 1
Netherlands
Message 614322 - Posted: 3 Aug 2007, 20:54:25 UTC

Wrong.

All the WU's with 2 returned results are aborted from your cache. What is left is usefull work.

I run a 10 days cache and still have pending results, meaning that my result is the first returned.
ID: 614322 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 614328 - Posted: 3 Aug 2007, 21:02:52 UTC
Last modified: 3 Aug 2007, 21:03:52 UTC

Except that before 5.10 clients were available, even a fast host running a 10 cache was wasting a major chunk of its runtime doing useless trailers.

My data showed that my T2400 running a 3 day cache was wasting fully a third of it's SAH runtime on trailers. That's why I dropped back to a CI just big enough to cover short outages, and if it runs out of SAH so be it, it has other projects to work on. Not wasting cash on 'junk' is more important, IMHO.

Alinator
ID: 614328 · Report as offensive
James Nelson
Volunteer tester
Avatar

Send message
Joined: 23 Mar 02
Posts: 381
Credit: 4,806,382
RAC: 0
United States
Message 614330 - Posted: 3 Aug 2007, 21:05:50 UTC - in response to Message 614301.  

I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result).

In the interest of useful science and conserving our environment, does this mean that most slower computers are essentially redundant now? Does this mean there's no useful reason to run them anymore (other than to increase one's RAC)?


I have two old slow computers and they often are the second result if not the first I'll keep them crunching as long as there is work to do.
ID: 614330 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 614333 - Posted: 3 Aug 2007, 21:10:42 UTC

Yep, not too much wrong with those PIII's. ;-)

Alinator
ID: 614333 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 614350 - Posted: 3 Aug 2007, 21:40:02 UTC - in response to Message 614322.  

Wrong.

All the WU's with 2 returned results are aborted from your cache. What is left is usefull work.

I run a 10 days cache and still have pending results, meaning that my result is the first returned.


Only WUs that haven't already been started crunching will get cancelled. What if I have an AMD K6-2 500MHz system that takes 100 hours to complete? Chances are, two systems will return theirs much before I return mine, thus making my K6-2 a redundant-always machine.
ID: 614350 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 614352 - Posted: 3 Aug 2007, 21:42:07 UTC - in response to Message 614330.  
Last modified: 3 Aug 2007, 21:43:31 UTC

I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result).

In the interest of useful science and conserving our environment, does this mean that most slower computers are essentially redundant now? Does this mean there's no useful reason to run them anymore (other than to increase one's RAC)?


I have two old slow computers and they often are the second result if not the first I'll keep them crunching as long as there is work to do.


How old and slow? Mine is a K6-2 500MHz and it is always the third man in. Always redundant, even with the MMX optimized app.

I would not consider a PIII 866MHz machine "old and slow". PII and AMD K6 would be old and slow.
ID: 614352 · Report as offensive
James Nelson
Volunteer tester
Avatar

Send message
Joined: 23 Mar 02
Posts: 381
Credit: 4,806,382
RAC: 0
United States
Message 614374 - Posted: 3 Aug 2007, 22:08:00 UTC - in response to Message 614352.  

I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result).

In the interest of useful science and conserving our environment, does this mean that most slower computers are essentially redundant now? Does this mean there's no useful reason to run them anymore (other than to increase one's RAC)?


I have two old slow computers and they often are the second result if not the first I'll keep them crunching as long as there is work to do.


How old and slow? Mine is a K6-2 500MHz and it is always the third man in. Always redundant, even with the MMX optimized app.

I would not consider a PIII 866MHz machine "old and slow". PII and AMD K6 would be old and slow.


one is a PII 500 the other is a PIII 700 my fast computers are amd XP 2100 and 2800, I know not so fast next to a quad core but It's the best I have.

ID: 614374 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 614380 - Posted: 3 Aug 2007, 22:19:41 UTC - in response to Message 614374.  
Last modified: 3 Aug 2007, 22:20:02 UTC

one is a PII 500 the other is a PIII 700 my fast computers are amd XP 2100 and 2800, I know not so fast next to a quad core but It's the best I have.


PIIs only went up to 450MHz. PIIIs started at 450MHz (Katmai) and went up to 600MHz before getting a revision (Coppermine). Your PII 500MHz is either a PIII or its a PII-based Celeron.


Still, those chips are relatively fast. My K6-2 500MHz is about as fast as a PII 266MHz or 300MHz. Or even my P233MMX. It was always third man in on every task returned.

Older Pentiums, Pentium IIs, AMD K5s and AMD K6 series processors all seem to be wasting electricity by contributing to my RAC without producing useful science.
ID: 614380 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 614381 - Posted: 3 Aug 2007, 22:22:33 UTC
Last modified: 3 Aug 2007, 22:28:16 UTC

@ James:

Hmmmm....

I was looking over your host list. When I took a look at some of the Result summaries for the 500 MHz Celeron I noticed the Coop app was reporting it as a Coppermine Celeron and capable of SSE.

So you should be able to squeeze a little more out of it by running the SSE Coop app on it. Of course this assumes it wasn't being mis-identifed by the app.

Alinator

<edit> @ Ozz: Yeah the 500MHz PII part is what got me to take a closer look at the host list. ;-)

Also, if I wasn't slugging it out with Dr. Watson for Class supremacy right now, my K6's would be crunching elsewhere. ;-)
ID: 614381 · Report as offensive
James Nelson
Volunteer tester
Avatar

Send message
Joined: 23 Mar 02
Posts: 381
Credit: 4,806,382
RAC: 0
United States
Message 614435 - Posted: 3 Aug 2007, 23:47:46 UTC - in response to Message 614381.  

@ James:

Hmmmm....

I was looking over your host list. When I took a look at some of the Result summaries for the 500 MHz Celeron I noticed the Coop app was reporting it as a Coppermine Celeron and capable of SSE.

So you should be able to squeeze a little more out of it by running the SSE Coop app on it. Of course this assumes it wasn't being mis-identifed by the app.

Alinator

<edit> @ Ozz: Yeah the 500MHz PII part is what got me to take a closer look at the host list. ;-)

Also, if I wasn't slugging it out with Dr. Watson for Class supremacy right now, my K6's would be crunching elsewhere. ;-)


yes it does say sse but it sse doesnt work it runs for a while and errors out it wont run the whole way through so it must not be fully implimented .
ID: 614435 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 614438 - Posted: 3 Aug 2007, 23:57:55 UTC - in response to Message 614301.  

I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result).

In the interest of useful science and conserving our environment, does this mean that most slower computers are essentially redundant now? Does this mean there's no useful reason to run them anymore (other than to increase one's RAC)?

Oh, I don't know....

If you are running the current 5.10.x client, and you have a short "connect every 'x' days" (like maybe 0.1) and you have a large cache using the "extra days" functionality, you'll be returning work reasonably fast.

Probably before a fast machine with "connect every '4' days" or somesuch.

That isn't saying there may be a better use of the electricity, but I think the new client actually makes your old, slow machines more likely to be effective (more likely to return one of the first two results).
ID: 614438 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 614445 - Posted: 4 Aug 2007, 0:02:57 UTC - in response to Message 614438.  

I have noticed that, on average, if a system doesn't return a WU within a 24-ish hour window, it will always be the "last man in" (i.e. the third & redundant result).

In the interest of useful science and conserving our environment, does this mean that most slower computers are essentially redundant now? Does this mean there's no useful reason to run them anymore (other than to increase one's RAC)?

Oh, I don't know....

If you are running the current 5.10.x client, and you have a short "connect every 'x' days" (like maybe 0.1) and you have a large cache using the "extra days" functionality, you'll be returning work reasonably fast.

Probably before a fast machine with "connect every '4' days" or somesuch.


My Connect To is at 0 and my Extra Cache is 2.75 days. It does not matter what the cache is because on these slow machines, it will download a single workunit that will fill up an entire 3 day cache. It will start processing it, making the servers unable to cancel the workunit if two have already returned by faster machines, and it will take 110 hours to complete, making it the last man in.

It will then return the result (third one of course), download a new one, lather, rinse, repeat.

That isn't saying there may be a better use of the electricity, but I think the new client actually makes your old, slow machines more likely to be effective (more likely to return one of the first two results).


Not true in practice. Perhaps the theory wasn't worked out too well on this idea. Of course I like the idea of doing more useful science, but now all my slower machines have become redundant.
ID: 614445 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 614479 - Posted: 4 Aug 2007, 0:51:16 UTC - in response to Message 614435.  


yes it does say sse but it sse doesnt work it runs for a while and errors out it wont run the whole way through so it must not be fully implimented .


Hmmm... That's interesting. Maybe Simon or someone else from the Coop will see this and provide some insight.

I agree though, a compute error is a definte showstopper. ;-)

Alinator
ID: 614479 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 614590 - Posted: 4 Aug 2007, 3:33:28 UTC - in response to Message 614479.  


yes it does say sse but it sse doesnt work it runs for a while and errors out it wont run the whole way through so it must not be fully implimented .


Hmmm... That's interesting. Maybe Simon or someone else from the Coop will see this and provide some insight.

I agree though, a compute error is a definte showstopper. ;-)

Alinator

I can give a little background information, at least.

All the apps, including the MMX one, include our hand vectorized code subroutines as well as standard versions. The app tests which can be used and decides which is fastest. On James' system it is probable that some SSE routines are being used even though he's using the MMX app. OTOH, we're using IPP for FFTs and the MMX build wouldn't get a vectorized version. In addition, the Intel compiler may autovectorize some other areas for the SSE build.

James' system isn't the only SSE capable system on which the SSE build fails, though it's fairly rare. There's no obvious cause, I'm not sure I could figure it out even if I decided to concentrate a lot of effort there.
                                                              Joe
ID: 614590 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 614596 - Posted: 4 Aug 2007, 3:52:28 UTC - in response to Message 614445.  


My Connect To is at 0 and my Extra Cache is 2.75 days. It does not matter what the cache is because on these slow machines, it will download a single workunit that will fill up an entire 3 day cache. It will start processing it, making the servers unable to cancel the workunit if two have already returned by faster machines, and it will take 110 hours to complete, making it the last man in.

Keep in mind that "connect every '0' days" may be slightly dangerous: it is possible to report before the scheduler knows about the upload. That's why I suggested a small, but non-zero value.
That isn't saying there may be a better use of the electricity, but I think the new client actually makes your old, slow machines more likely to be effective (more likely to return one of the first two results).


Not true in practice. Perhaps the theory wasn't worked out too well on this idea. Of course I like the idea of doing more useful science, but now all my slower machines have become redundant.

Before 5.10.x, your very slow machines were likely to be the last to report, and while no work was aborted, you were usually last (and perhaps, redundant).

On 5.10.x, if you have a work unit that is already redundant, it has a good chance of being aborted before you crunch it.

... and if there is just one work unit at a time, then the behaviour is effectively the same.
ID: 614596 · Report as offensive
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : Cancelled by project question


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.