New Twist For "Aborted by project" Issue ?

Message boards : Number crunching : New Twist For "Aborted by project" Issue ?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 586927 - Posted: 14 Jun 2007, 19:27:38 UTC - in response to Message 586922.  
Last modified: 14 Jun 2007, 19:33:11 UTC


Yep.
But you can only increase the resolution of the search in the data so much without making any results meaningless.


OK, my lack of sleep is really kicking in now... Due to noise enhancement or due to having sufficient samples so that any more samples just add redundant data that could be extrapolated (ala Integral Calculus), or something else entirely?



Overly simplifying, many signal processing algorithms, like Fast Forier Transforms for example, achieve what they do by sorting into 'buckets' and are 'windowed'. Both the number of the buckets, and the size and shape of the 'windows' determine the resolution of the final data [ A lot is thrown out at this stage: think 4 band equaliser versus 32 band equaliser]. Those are choices, made in the design of the algorithms and program, which are influenced by practical computational complexity and the required precision.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 586927 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 586930 - Posted: 14 Jun 2007, 19:32:48 UTC - in response to Message 586923.  

Brian Silvers:
...multibeam will increase processing time, reducing even further the number of results that can be processed in any "average" day. This will mean that 400/day will still be able to fill up a cache quicker.
...

Actually, Multibeam will decrease processing time. Because of the difference in beam width, ar=0.441 Multibeam WUs like those we're running in SETI Beta now are processed like ar=0.732 Line feed WUs.

In addition, one of the techniques used with the ALFA receivers is a basket weave scan which Kevin Douglas described as "nodding on the meridian". The nodding rate is such that those observations will produce VHAR WUs in abundance.
                                                                  Joe


Joe,

MajorKong already pointed that out, and I thank both of you for clearing up my misunderstanding in regards to MB. AP though will likely still take longer. I guess what needs to be understood is if fast hosts will be given a higher share of AP to work on, or is it just "potluck"...?

Brian
ID: 586930 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 586932 - Posted: 14 Jun 2007, 19:41:58 UTC - in response to Message 586927.  


Yep.
But you can only increase the resolution of the search in the data so much without making any results meaningless.


OK, my lack of sleep is really kicking in now... Due to noise enhancement or due to having sufficient samples so that any more samples just add redundant data that could be extrapolated (ala Integral Calculus), or something else entirely?



Overly simplifying, many signal processing algorithms, like Fast Forier Transforms for example, achieve what they do by sorting into 'buckets' and are 'windowed'. Both the number of the buckets, and the size and shape of the 'windows' determine the resolution of the final data [ A lot is thrown out at this stage: think 4 band equaliser versus 32 band equaliser]. Those are choices, made in the design of the algorithms and program, which are influenced by practical computational complexity and the required precision.



Eh, the life I could've had (US Navy nuclear operator -> civilian physicist / engineer) would've likely had me easily understanding all that and then some without the equalizer comment. Instead, I decided that being lazy and not going to class on "minor subjects" such as Organic Chemistry in favor of playing Spades...was somehow a "good idea"... Duhhhhhhhhhhhhhhhh.....

Funny but true story: My high school Physics instructor who wanted me to go into Physics (if not the Navy) cuts the grass for the woman that lives next door... It's a small world after all...
ID: 586932 · Report as offensive
Profile LTDInvestments
Volunteer tester
Avatar

Send message
Joined: 2 Aug 99
Posts: 14
Credit: 4,592,515
RAC: 42
United States
Message 586948 - Posted: 14 Jun 2007, 20:36:39 UTC - in response to Message 586693.  

Here's a thought:

If a DC project cannot keep up with the machines of their contributors, then perhaps they shouldn't be DCs. At that point, their projects just don't require the crunching power offered by DC. Just go buy a few machines and do the work in-house.


It's not the DC but your cache/machine combination that is failing to keep up with the DC. Your WU are aborting because the DC has reached its quota because other machines have finished their WUs. The only way for the system to not run into the abort issuse is to only send out the number of WUs for the qouta and wait for the timeout to pass before issuing another WU. Even then if you have cut off network activity you yourself are defeating the purpose of the DC.
ID: 586948 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 586961 - Posted: 14 Jun 2007, 21:15:38 UTC - in response to Message 586930.  
Last modified: 14 Jun 2007, 21:16:34 UTC

Actually, Multibeam will decrease processing time. Because of the difference in beam width, ar=0.441 Multibeam WUs like those we're running in SETI Beta now are processed like ar=0.732 Line feed WUs.

In addition, one of the techniques used with the ALFA receivers is a basket weave scan which Kevin Douglas described as "nodding on the meridian". The nodding rate is such that those observations will produce VHAR WUs in abundance.
                                                                  Joe


Joe,

MajorKong already pointed that out, and I thank both of you for clearing up my misunderstanding in regards to MB. AP though will likely still take longer. I guess what needs to be understood is if fast hosts will be given a higher share of AP to work on, or is it just "potluck"...?

Brian

I had read MajorKong's post, but wanted to point out that there were technical reasons that would boost S@H work throughput even using identical apps. My guesstimate is that those who have been using the stock 5.15 here and will be automatically upgraded to the 5.2x Multibeam app will experience nearly a doubling of WUs/day. That combines the above factors with the optimizations Eric has added to the Multibeam app.

In the test project, AP work is distributed on a "potluck" basis except that it has a requirement that the host's memory size as reported by BOINC be at least 256 MiB. The AP WUs contain 8 times as much data as S@H WUs, but processing is different enough I doubt the eventual crunch time will be directly proportional. The AP Wus contain the full 2.5 MHz bandwidth rather than 1/256 subbands like S@H, but only represent 13.42 seconds of telescope time. Assuming they have the same ~20% time overlaps as S@H WUs, there can be 8 AP WUs for each 256 S@H WUs.
                                                                 Joe
ID: 586961 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 586972 - Posted: 14 Jun 2007, 21:40:14 UTC - in response to Message 586961.  


In the test project, AP work is distributed on a "potluck" basis except that it has a requirement that the host's memory size as reported by BOINC be at least 256 MiB. The AP WUs contain 8 times as much data as S@H WUs, but processing is different enough I doubt the eventual crunch time will be directly proportional. The AP Wus contain the full 2.5 MHz bandwidth rather than 1/256 subbands like S@H, but only represent 13.42 seconds of telescope time. Assuming they have the same ~20% time overlaps as S@H WUs, there can be 8 AP WUs for each 256 S@H WUs.


OK, then it is important to figure out if host selection for AP is going to have a bump in requirements once it enters "real world". Also it is important to note performance of MB vs. current optimized apps to see if top-end machines are really going to be able to process 400/day on an average day if they got all MB. If the answers are "no" and "yes", then I'm willing to tentatively endorse the idea of going to 200/core (4 max). The consequence of all that means that the splitting better be fast and the back-end better be able to handle the load. I personally doubt it could at this point, although since Sidious was swapped, things have seemed much snappier...

Also, I know the "no guarantee" verbiage, but the reality is that a lot of "no work from project" messages will irk a lot of people...

ID: 586972 · Report as offensive
Profile KWSN - MajorKong
Volunteer tester
Avatar

Send message
Joined: 5 Jan 00
Posts: 2892
Credit: 1,499,890
RAC: 0
United States
Message 586983 - Posted: 14 Jun 2007, 21:58:24 UTC - in response to Message 586961.  


The AP Wus contain the full 2.5 MHz bandwidth rather than 1/256 subbands like S@H, but only represent 13.42 seconds of telescope time. Assuming they have the same ~20% time overlaps as S@H WUs, there can be 8 AP WUs for each 256 S@H WUs.
                                                                 Joe


8 per 256 instead of 1 per 256. Thanks Joe for correcting my flawed understanding!
ID: 586983 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19059
Credit: 40,757,560
RAC: 67
United Kingdom
Message 587105 - Posted: 15 Jun 2007, 3:25:44 UTC

In the short term, before AP gets here, the 'no work from project' is probably going to be fairly common.
As Joe has said the MB units, at same AR, take approx 70% of time as present enhanced units, because of the reduced beamwidth. But also the standard app now incorporates most of the generic cpu optimisation and is therefore another 20% faster.

So this all means that the majority of users who don't know about and therefore don't use optimised apps will see the crunch time per unit decrease to under 60% of present enhanced times.
There will still be optimised apps, but these gains will be for using SSEx etc. And the gains will not be as much as before.

Going back to original thread subject. The only way to stop 'aborted by project' is to initially only replicate as many as required to meet quorum. i.e. 2. Unless the BOINC code can be modified to delay issue of extra results for a limited period. If the issuing of the extra results could be delayed for 2 days, here on Seti, then from my observations, for over 80% of WU's they would not be issued.

AstroPulse. Assuming we can use most of the old data, there should be plenty of work for AP when it arrives here. But don't hold your breath, AP is in alpha phase. It took enhanced about 9 months to go through the Beta phase.

Andy
ID: 587105 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 587110 - Posted: 15 Jun 2007, 3:39:33 UTC - in response to Message 587105.  


So this all means that the majority of users who don't know about and therefore don't use optimised apps will see the crunch time per unit decrease to under 60% of present enhanced times.
There will still be optimised apps, but these gains will be for using SSEx etc. And the gains will not be as much as before.


How is MB in regards to noisy units? Are there large batches of early-ending results?
ID: 587110 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19059
Credit: 40,757,560
RAC: 67
United Kingdom
Message 587118 - Posted: 15 Jun 2007, 3:55:40 UTC - in response to Message 587110.  


So this all means that the majority of users who don't know about and therefore don't use optimised apps will see the crunch time per unit decrease to under 60% of present enhanced times.
There will still be optimised apps, but these gains will be for using SSEx etc. And the gains will not be as much as before.


How is MB in regards to noisy units? Are there large batches of early-ending results?

To be honest, I don't think we know. So far on my computers all units have been close to AR=0.4nnnn. Experience indicates most noisy units are at VHAR, or had radar interference, now cleared.

But I have just asked over there, about an hour ago, msg 25184.


ID: 587118 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 587125 - Posted: 15 Jun 2007, 4:13:47 UTC - in response to Message 587118.  


So this all means that the majority of users who don't know about and therefore don't use optimised apps will see the crunch time per unit decrease to under 60% of present enhanced times.
There will still be optimised apps, but these gains will be for using SSEx etc. And the gains will not be as much as before.


How is MB in regards to noisy units? Are there large batches of early-ending results?

To be honest, I don't think we know. So far on my computers all units have been close to AR=0.4nnnn. Experience indicates most noisy units are at VHAR, or had radar interference, now cleared.

But I have just asked over there, about an hour ago, msg 25184.



Well, that's a critical component.

If what you are saying comes to fruition, then that would push "Octo" (happy now????) machines up to the point where they could do 400/day on a consistent basis, as my estimate for Zombie before was in the 200-250 range. Since we know he'll use a further optimized app, then you have to assume 2X the capability.

The quandary becomes thus: Are people like Zombie kept happy, or are more users kept happy? Can both be kept happy (faster splitting)? Can the back-end handle this? I feel slightly better about that seeing how the responsiveness of the forums and viewing result pages has improved since Jocelyn was made the master db... It will be interesting how the next outage goes. Will it take less time to complete? Will it take less time to recover?

If the backend can't efficiently deal with the higher load again, then if I were making the decision, the most I'd do is bump to 150/core (4 max), but I'd be more inclined to leave it alone and try to augment the equipment to handle the extra load before doing that...
ID: 587125 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19059
Credit: 40,757,560
RAC: 67
United Kingdom
Message 587149 - Posted: 15 Jun 2007, 6:05:26 UTC - in response to Message 587125.  
Last modified: 15 Jun 2007, 6:06:10 UTC


So this all means that the majority of users who don't know about and therefore don't use optimised apps will see the crunch time per unit decrease to under 60% of present enhanced times.
There will still be optimised apps, but these gains will be for using SSEx etc. And the gains will not be as much as before.


How is MB in regards to noisy units? Are there large batches of early-ending results?

To be honest, I don't think we know. So far on my computers all units have been close to AR=0.4nnnn. Experience indicates most noisy units are at VHAR, or had radar interference, now cleared.

But I have just asked over there, about an hour ago, msg 25184.



Well, that's a critical component.

If what you are saying comes to fruition, then that would push "Octo" (happy now????) machines up to the point where they could do 400/day on a consistent basis, as my estimate for Zombie before was in the 200-250 range. Since we know he'll use a further optimized app, then you have to assume 2X the capability.

The quandary becomes thus: Are people like Zombie kept happy, or are more users kept happy? Can both be kept happy (faster splitting)? Can the back-end handle this? I feel slightly better about that seeing how the responsiveness of the forums and viewing result pages has improved since Jocelyn was made the master db... It will be interesting how the next outage goes. Will it take less time to complete? Will it take less time to recover?

If the backend can't efficiently deal with the higher load again, then if I were making the decision, the most I'd do is bump to 150/core (4 max), but I'd be more inclined to leave it alone and try to augment the equipment to handle the extra load before doing that...

As the MB splitter is new, the MB data comes from Aricebo on HDD, I have no idea on its max splitting rate, compared to enhanced tape splitter. But with 7 beams and H and V polarization, there could be more data but that is governed by periods MB receiver is going to be on, after Aricebo finishes its maintenance/repaint.

My personal view is that I don't believe the 'Management' will increase daily allowance. Unless they can be sure the backend can handle it reliably and the new splitter can handle the increased rate required. I think they will try to ensure all hosts get some work, rather than a few fast multicores getting the lions share.
If you think about it more hosts, each with a bit, is probably more reliable due to redundancy, when relying on 'unknown' computers, that a few fast multicores. Plus if the users that only do a few units/day at present start noticing they are only getting work now and then, will soon question wether Seti and/or BOINC really needs them and start detaching. Then when they are needed (AP) they won't be here. If you look at the top 20 computers, the only ones at the mo that stand a chance of crunching more than 400/day, what would be the effect on the project if we switched them all off. Absolutely nothing (<1%), its only 8000/day out of 800,000/day. Sorry Zombie, but your octo, is a very small part (<0.05%) in a large machine.

Andy
ID: 587149 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19059
Credit: 40,757,560
RAC: 67
United Kingdom
Message 587302 - Posted: 15 Jun 2007, 16:25:56 UTC

I knew a rogue host would pop up sooner or later. This host is an example of why the download limits are imposed, and why they probably should not be increased until the Berkeley backend can handle a larger load reliably.

Andy
ID: 587302 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 587329 - Posted: 15 Jun 2007, 17:49:03 UTC

Hmmm....

I rooted around in this hosts summary pages a bit. Turns out it's running 4.19.

I couldn't find any sign that it started malfunctioning per se. IOW, no compute errors just time outs. It had a few successful returns at the very end of the listings back toward the end of May.

Weren't there some real problems with the CC scheduler and big CI's back with the early versions of 4x (It's been so long I don't remember for sure)?

In any event, I don't see why the project doesn't use a 4.45 (or whatever is appropriate) cutoff. This would allow the folks who need to use 4x to participate, yet eliminate problems which arise from the use of the real old stuff (which obviously had problems or there wouldn't have been the need to release newer versions).

Alinator
ID: 587329 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 592370 - Posted: 25 Jun 2007, 11:33:27 UTC

bumped from 2nd page for Modesto
ID: 592370 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5

Message boards : Number crunching : New Twist For "Aborted by project" Issue ?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.