Message boards :
Number crunching :
To S@H Crew -Please shut down S@H ( for week, or two)
Message board moderation
Author | Message |
---|---|
Crun-chi Send message Joined: 3 Apr 99 Posts: 174 Credit: 3,037,232 RAC: 0 |
Why? In that case you will have time to fix all thing that bother you, examine servers, and do all thing you do every Tuesday. After that I hope that most of your problems will be fixed, and all parts of system will work normal. Thanks. P.S Please do not throw flame at me, it is just my view of solving this. I am cruncher :) I LOVE SETI BOINC :) |
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
Why? Their current issue is the 100mbit network connection to the rest of the world can't cope with the 100,000+ users all trying to upload and download. It's not a server issue. Shutting the project down for a week isn't going to solve this. BOINC blog |
Crun-chi Send message Joined: 3 Apr 99 Posts: 174 Credit: 3,037,232 RAC: 0 |
Why? Please tell me number of aprox number of active hosts before one month? Then 100mbit network connection wasn't problem? I am cruncher :) I LOVE SETI BOINC :) |
Bernie Vine Send message Joined: 26 May 99 Posts: 9958 Credit: 103,452,613 RAC: 328 |
Why? After every outage the 100mbit network has always been a problem. Plus there seem to be a lot of "shorties" as my fastest cruncher is constantly requesting new work. Shouldn't we be looking at the number of processors rather than the number of users. I am one users 4 crunchers and 12 processors (CPU and GPU). Plus for a long shutdown to work you would have to stop generating new work and receive all completed work back. Then just imagine what would happen when the project restarted, thousands of processors all requesting new work, some massively fast GPU's for up to a 10 day cache. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Please tell me number of aprox number of active hosts before one month? Here you go: Read the graph scale carefully - the variation is only 5,000 out of 250,000: not significant. No, it isn't a change in the user base: it isn't a change in the servers at Berkeley (although we never really got the chance to recover fully from the storage unit breakdown a couple of weeks ago, which doesn't help). The only thing which has changed is the pattern of data recordings received from Arecibo: and to remind readers, SETI@home has no control over that. The radio astronomers with funds to hire telescope time at Arecibo decide where the telescope is pointing at any one time. SETI just gets a spare copy of whatever the astonomers are listening to. If the astronomers are studying point objects intensively, they keep the telescope pointing at the same place in the sky for extended durations. Those recordings have small changes in 'Angle Range' in the resulting SETI tasks - the dreaded VLARs beloved of CUDA users. Other astonomers use the telescope for survey work, and sweep the sky rapidly looking for new objects. These recordings have large Angle Ranges (the telescope aiming point changes quickly), and the resulting SETI tasks are VHARs, or 'shorties'. Seti can't get as much useful information out of them, so doesn't spend as long looking at them. It appears that in early April 2001 (when most of the current tasks orginate from), it was survey astronomers who were allocated the lion's share of the recording schedule at Arecibo. Shutting down SETI for a week isn't going to change that: the data would still be there, waiting to be crunched, when they switch it back on. |
Crun-chi Send message Joined: 3 Apr 99 Posts: 174 Credit: 3,037,232 RAC: 0 |
Thanks for explanation. But in that case, what SETI crew can do? Nothing. Crunchers well be unpleased, system will be "minutes" away from total collapse,and whole project will be very, very slow. :( I think it is not bright future of SETI @ home :( I am cruncher :) I LOVE SETI BOINC :) |
rob smith Send message Joined: 7 Mar 03 Posts: 22535 Credit: 416,307,556 RAC: 380 |
If network traffic is the bottleneck, and it would appear to be so, then there is a slightly different bold solution. If my PCs are typical of the rest of the S@H community then a fair amount of network traffic is due to re-tries due to time-outs of one sort or another within "the BOINC environment". This is bad news as each time out requires several messages to be passed between the two communicating system = network traffic. Each time-out requires a proportion of the data to be retransmitted = network traffic (it appears that sometimes a WU download/upload restarts and other times it is a complete retransmission - retransmission is a lot of network traffic if the WU is an AP download that was 90% complete, whereas the completion of a results upload is a relatively small network traffic load). So how about increasing the time allowed before a client, or server, end triggered re-try? I know this would impact on the size of buffer required, but it would reduce the amount of network traffic required to trigger and manage retries. I doubt that it would take a substantial change to have a significant impact, I'd guess increasing the time-out by 10% would reduce the number of retries by between 20 and 50% - now that's some saving! Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Crun-chi Send message Joined: 3 Apr 99 Posts: 174 Credit: 3,037,232 RAC: 0 |
In that case:many , especially fast crunchers can freely stop doing S@H. He can have additional work for lets say 5 days, crunch that in 1.5 days, and in rest time waiting to get new one. And if you increase re-try time it is obvious that new work will come in very slow rate... I am cruncher :) I LOVE SETI BOINC :) |
SciManStev Send message Joined: 20 Jun 99 Posts: 6658 Credit: 121,090,076 RAC: 0 |
If the astronomers are studying point objects intensively, they keep the telescope pointing at the same place in the sky for extended durations. Those recordings have small changes in 'Angle Range' in the resulting SETI tasks - the dreaded VLARs beloved of CUDA users. Other astonomers use the telescope for survey work, and sweep the sky rapidly looking for new objects. These recordings have large Angle Ranges (the telescope aiming point changes quickly), and the resulting SETI tasks are VHARs, or 'shorties'. Seti can't get as much useful information out of them, so doesn't spend as long looking at them. That's the first time I have seen that explained, and I really appreciated it. Thank you Richard! Steve Warning, addicted to SETI crunching! Crunching as a member of GPU Users Group. GPUUG Website |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
I doubt very much you will ever see a mass exodus from Seti@Home from the longterm dedicated users. You have to admit the problems right now are tiny compared to last year, Before we had the 3 new servers. When we got the new servers I went from a 3 day cache to the one day I have now. If I run out of work its no big deal,as I run back up projects. Now Im not saying that it isnt frustrating to run out of work or up / down loads jam up because it is. Its frustrating in the fact that im still trying to find out what my max rac is for my i7:) In any case no matter what happens, I will crunch till the guys in the lab turn off the lights and pull all the plugs. [/quote] Old James |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Anyone interested in doing a bit of amateur 'storm forecasting' can look at the Arecibo Observatory Telescope Schedule (http://www.naic.edu/vscience/schedule/scedfra2.htm). The projects to look for are those which mention 'ALFA' on the schedule pages (those are the ones which SETI@home can eavesdrop on). It looks as it the early-April recordings which are causing most of the current problems originated with A2130 - The G-ALFA Continuum Transit Survey (details, PDF format). The good news? They took a break after 3 May. The bad news? They're due back on Monday 6 June..... Apart from throwing hardware at the problem, there are some suggestions we could make to alleviate it in the short term (before Moore's Law catches up with us again). 1) Hire a real, top-notch, world-class specialist networking consultant to get the most out of the existing pipe. As Rob Smith says, all the back-off / retry traffic adds to the overall load. More sophisticated traffic-shaping and demand management might actually improve the effective, useful, throughput. Although our much-loved project staff have a wealth of practical experience with the SETI hardware, they are generalists, and I don't think it's unfair to suggest that a specialist could help out. 2) Look at the 'tapes' and determine the work mix, before loading them into the splitters. If tapes with a high proportion of VHARs could be rationed, and only split one or two at a time alongside tapes with other recording patterns, the average download volume could be smoothed out, rather than bunching into these 'storm peaks' (which we've seen before, and will no doubt see again). |
PhilippeD Send message Joined: 14 May 11 Posts: 2 Credit: 56,468 RAC: 0 |
If I understand correctly, each work unit is processed by two different clients, say, me and my wingman. If I guess correctly, this means that the same work unit data is sent twice by the Seti servers, say once to me and once to my wingman. When I look at the work units details, this seems to happen most often at the same time (within the same second), or just within a few seconds interval. So am I wrong in thinking that it might be possible to improve the situation in most cases by putting the data only ONCE on the pipe by the Seti servers, and having both clients listen to the same data? I let the details to a network specialist :-), but there would probably exist several possible solutions. If my assumptions are correct and this could be possible for all the work units, it would then be possible to save up to 50% of the bandwith ? Or, in other words, to nearly double the download throughput ? Apologies if this seems a bit naive; I would be glad to learn from my mistakes. Thanks and respect to all the people who spend their time helping naive users like me. |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
retransmission is a lot of network traffic if the WU is an AP download that was 90% complete, whereas the completion of a results upload is a relatively small network traffic load Downloads are continued from the point where they got HTTP error, maybe few bytes are lost when I look carefully, but that's all. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
1) Hire a real, top-notch, world-class specialist networking consultant to get the most out of the existing pipe. As Rob Smith says, all the back-off / retry traffic adds to the overall load. More sophisticated traffic-shaping and demand management might actually improve the effective, useful, throughput. Although our much-loved project staff have a wealth of practical experience with the SETI hardware, they are generalists, and I don't think it's unfair to suggest that a specialist could help out. Don't worry about asking naive questions - sometimes they're the best of all, because they prompt people into lateral thinking :-) [The problems start when people start relying on naive answers.....] As it happens, there's a possible way that could be achieved. SETI@home actually rents a gigabit internet connection, from a company called 'Hurricane Electric'. Unfortunately, the nearest Hurricane Electric gets to the Berkeley laboratories is Palo Alto, across the Bay. From there, the data has to use a complicated route over slower cables to reach, first, the Berkeley university campus, and from there up the imfamous 'hill' to the labs. It's a bit like the slow bus ride to the airport which adds so much to trans-continental flight times..... If there could be a fast, large, reliable (and hence expensive) transparent data cache in Palo Alto, directly connected to the fast trans-continental lines, only one copy of each datafile would need to take that slow suburban bus-ride off campus. But: I stress reliable. We can't ask SETI staff to tramp out to Palo Alto to replace a failed disk drive every few weeks. If any donor was prepared to stump up for such a cache, it would be useless without a proper enterprise-grade on-site maintenance contract to go with it. |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
I did not find the post yet, but I thought I remembered Matt saying that getting a 1Gb link to Seti was indeed a possibility, depending on some investment in hardware, but mostly working through politics with Berkeley IT. The possibility was also suggested at the time to keep the existing 1Mb link, and just add some bandwidth from the campus link. Several folks had offered to donate for the hardware to make this happen. Wonder...has any progress has been made on that front? "Time is simply the mechanism that keeps everything from happening all at once." |
PhilippeD Send message Joined: 14 May 11 Posts: 2 Credit: 56,468 RAC: 0 |
If there could be a fast, large, reliable (and hence expensive) transparent data cache in Palo Alto, directly connected to the fast trans-continental lines, only one copy of each datafile would need to take that slow suburban bus-ride off campus. But: I stress reliable. We can't ask SETI staff to tramp out to Palo Alto to replace a failed disk drive every few weeks. If any donor was prepared to stump up for such a cache, it would be useless without a proper enterprise-grade on-site maintenance contract to go with it. Thanks for your response! From what I have read in the forums, it seems that, to get some work, my BOINC Manager first contacts the scheduler, and then issues a HTTP request to the download server indicated by the scheduler. Therefore, maybe a reasonably priced HTTP proxy server in Palo Alto would do the job ? With disk space for just a few seconds cache, this proxy server could then serve the request from the wingman (that is, the second request for the same data) directly without fetching it from Berkeley... If the scheduler just manages some simple keep-alive hanshaking with the proxy server, it could easily detect that the server is down and avoid directing clients to that server, just reverting to current situation. Some kind of graceful degradation. Of course, reliability is a must, as you stressed. But such a simple server should not be that expensive. |
Sirius B Send message Joined: 26 Dec 00 Posts: 24912 Credit: 3,081,182 RAC: 7 |
Very interesting thread. Thanks Richard for a very detailed explanation. Possible (long term) solution? Is it possible for the Project Admin in conjunction with all our dedicated crunchers, to allocate specifically, part of donated funds (if stated by donator of course) to build up a dedicated fibre optic fund? This of course being ongoing until target is met. I should think that Berkeley Admin would not stand in the way as they would benefit from it as well.... just a thought........ I'm pretty sure that there are many out there who do not wish to see this project in trouble. |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Yes, thank you Richard. I was under the impression the GB line from Hurricane was up to the campus IT and it was they that was causing the holdup as they didn't have the funds allocated to go up the hill with it.(No pun intended!) I seem to remember seeing a cost of either $38,000 or $48,000 to run it for the lab. Either way that's a lot of fund raising and there is a lot of other stuff that needs replaced that could use that kind of money and is more crucial than the line. PROUD MEMBER OF Team Starfire World BOINC |
Sirius B Send message Joined: 26 Dec 00 Posts: 24912 Credit: 3,081,182 RAC: 7 |
I seem to remember seeing a cost of either $38,000 or $48,000 to run it for the lab. Either way that's a lot of fund raising and there is a lot of other stuff that needs replaced that could use that kind of money and is more crucial than the line. True, but I did say long term. Seti has been going 10 years already, just think if something like that had been implemented, what would the total fund including interest, be today? I was not asking/stating for an immediate relief but a long term gaol - maybe over 2/3 years? (THAT's just an example, not a statement) I'm pretty certain, it can be acheived - the question is....do people want to support this project on a short term & moan when no work available? OR really help the project & their rigs on a long term basis.... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Perry, Sirius: Read Mark Sattler's post. Yes, we understand that there is already a Gigabit link in position between Campus and SSL - Matt wrote that, but I don't have the reference to hand either. Maybe within the edit window ;-) There are two problems. 1) The link is for the whole SSL to share. The politics need to be sorted out before SETI can borrow some, but not all, of it. It's unpopular when the SETI cuckoo outgrows the SSL nest. 2) They need some additional (or upgraded) hardware to hook up the SETI connection, and break it out again at the other end of the new link. No specification, or price, given yet, but I doubt it's as high as tens of thousands of dollars. No doubt we can suggest a fund-raiser once political approval is given. Edit - references message 1093673, message 1093952 (same thread). |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.