To S@H Crew -Please shut down S@H ( for week, or two)

Message boards : Number crunching : To S@H Crew -Please shut down S@H ( for week, or two)
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Crun-chi
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 174
Credit: 3,037,232
RAC: 0
Croatia
Message 1112931 - Posted: 4 Jun 2011, 9:26:07 UTC

Why?
In that case you will have time to fix all thing that bother you, examine servers, and do all thing you do every Tuesday.
After that I hope that most of your problems will be fixed, and all parts of system will work normal.
Thanks.

P.S Please do not throw flame at me, it is just my view of solving this.
I am cruncher :)
I LOVE SETI BOINC :)
ID: 1112931 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 1112937 - Posted: 4 Jun 2011, 10:20:20 UTC - in response to Message 1112931.  
Last modified: 4 Jun 2011, 10:20:50 UTC

Why?
In that case you will have time to fix all thing that bother you, examine servers, and do all thing you do every Tuesday.
After that I hope that most of your problems will be fixed, and all parts of system will work normal.
Thanks.

P.S Please do not throw flame at me, it is just my view of solving this.


Their current issue is the 100mbit network connection to the rest of the world can't cope with the 100,000+ users all trying to upload and download. It's not a server issue. Shutting the project down for a week isn't going to solve this.
BOINC blog
ID: 1112937 · Report as offensive
Crun-chi
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 174
Credit: 3,037,232
RAC: 0
Croatia
Message 1112938 - Posted: 4 Jun 2011, 10:24:58 UTC - in response to Message 1112937.  

Why?
In that case you will have time to fix all thing that bother you, examine servers, and do all thing you do every Tuesday.
After that I hope that most of your problems will be fixed, and all parts of system will work normal.
Thanks.

P.S Please do not throw flame at me, it is just my view of solving this.


Their current issue is the 100mbit network connection to the rest of the world can't cope with the 100,000+ users all trying to upload and download. It's not a server issue. Shutting the project down for a week isn't going to solve this.

Please tell me number of aprox number of active hosts before one month?
Then 100mbit network connection wasn't problem?
I am cruncher :)
I LOVE SETI BOINC :)
ID: 1112938 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9958
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1112939 - Posted: 4 Jun 2011, 10:44:30 UTC - in response to Message 1112938.  
Last modified: 4 Jun 2011, 10:48:12 UTC

Why?
In that case you will have time to fix all thing that bother you, examine servers, and do all thing you do every Tuesday.
After that I hope that most of your problems will be fixed, and all parts of system will work normal.
Thanks.

P.S Please do not throw flame at me, it is just my view of solving this.


Their current issue is the 100mbit network connection to the rest of the world can't cope with the 100,000+ users all trying to upload and download. It's not a server issue. Shutting the project down for a week isn't going to solve this.

Please tell me number of aprox number of active hosts before one month?
Then 100mbit network connection wasn't problem?


After every outage the 100mbit network has always been a problem. Plus there seem to be a lot of "shorties" as my fastest cruncher is constantly requesting new work.

Shouldn't we be looking at the number of processors rather than the number of users. I am one users 4 crunchers and 12 processors (CPU and GPU).

Plus for a long shutdown to work you would have to stop generating new work and receive all completed work back. Then just imagine what would happen when the project restarted, thousands of processors all requesting new work, some massively fast GPU's for up to a 10 day cache.
ID: 1112939 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1112943 - Posted: 4 Jun 2011, 11:05:43 UTC - in response to Message 1112938.  

Please tell me number of aprox number of active hosts before one month?
Then 100mbit network connection wasn't problem?

Here you go:



Read the graph scale carefully - the variation is only 5,000 out of 250,000: not significant.

No, it isn't a change in the user base: it isn't a change in the servers at Berkeley (although we never really got the chance to recover fully from the storage unit breakdown a couple of weeks ago, which doesn't help).

The only thing which has changed is the pattern of data recordings received from Arecibo: and to remind readers, SETI@home has no control over that. The radio astronomers with funds to hire telescope time at Arecibo decide where the telescope is pointing at any one time. SETI just gets a spare copy of whatever the astonomers are listening to.

If the astronomers are studying point objects intensively, they keep the telescope pointing at the same place in the sky for extended durations. Those recordings have small changes in 'Angle Range' in the resulting SETI tasks - the dreaded VLARs beloved of CUDA users. Other astonomers use the telescope for survey work, and sweep the sky rapidly looking for new objects. These recordings have large Angle Ranges (the telescope aiming point changes quickly), and the resulting SETI tasks are VHARs, or 'shorties'. Seti can't get as much useful information out of them, so doesn't spend as long looking at them.

It appears that in early April 2001 (when most of the current tasks orginate from), it was survey astronomers who were allocated the lion's share of the recording schedule at Arecibo. Shutting down SETI for a week isn't going to change that: the data would still be there, waiting to be crunched, when they switch it back on.
ID: 1112943 · Report as offensive
Crun-chi
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 174
Credit: 3,037,232
RAC: 0
Croatia
Message 1112944 - Posted: 4 Jun 2011, 11:19:43 UTC - in response to Message 1112943.  

Thanks for explanation.
But in that case, what SETI crew can do?
Nothing.
Crunchers well be unpleased, system will be "minutes" away from total collapse,and whole project will be very, very slow. :(
I think it is not bright future of SETI @ home :(
I am cruncher :)
I LOVE SETI BOINC :)
ID: 1112944 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22528
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1112948 - Posted: 4 Jun 2011, 11:27:12 UTC

If network traffic is the bottleneck, and it would appear to be so, then there is a slightly different bold solution.

If my PCs are typical of the rest of the S@H community then a fair amount of network traffic is due to re-tries due to time-outs of one sort or another within "the BOINC environment". This is bad news as each time out requires several messages to be passed between the two communicating system = network traffic. Each time-out requires a proportion of the data to be retransmitted = network traffic (it appears that sometimes a WU download/upload restarts and other times it is a complete retransmission - retransmission is a lot of network traffic if the WU is an AP download that was 90% complete, whereas the completion of a results upload is a relatively small network traffic load).

So how about increasing the time allowed before a client, or server, end triggered re-try? I know this would impact on the size of buffer required, but it would reduce the amount of network traffic required to trigger and manage retries. I doubt that it would take a substantial change to have a significant impact, I'd guess increasing the time-out by 10% would reduce the number of retries by between 20 and 50% - now that's some saving!
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1112948 · Report as offensive
Crun-chi
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 174
Credit: 3,037,232
RAC: 0
Croatia
Message 1112950 - Posted: 4 Jun 2011, 11:34:28 UTC - in response to Message 1112948.  

In that case:many , especially fast crunchers can freely stop doing S@H.
He can have additional work for lets say 5 days, crunch that in 1.5 days, and in rest time waiting to get new one. And if you increase re-try time it is obvious that new work will come in very slow rate...
I am cruncher :)
I LOVE SETI BOINC :)
ID: 1112950 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6658
Credit: 121,090,076
RAC: 0
United States
Message 1112957 - Posted: 4 Jun 2011, 11:58:10 UTC

If the astronomers are studying point objects intensively, they keep the telescope pointing at the same place in the sky for extended durations. Those recordings have small changes in 'Angle Range' in the resulting SETI tasks - the dreaded VLARs beloved of CUDA users. Other astonomers use the telescope for survey work, and sweep the sky rapidly looking for new objects. These recordings have large Angle Ranges (the telescope aiming point changes quickly), and the resulting SETI tasks are VHARs, or 'shorties'. Seti can't get as much useful information out of them, so doesn't spend as long looking at them.


That's the first time I have seen that explained, and I really appreciated it. Thank you Richard!

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1112957 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1112959 - Posted: 4 Jun 2011, 12:01:10 UTC
Last modified: 4 Jun 2011, 12:02:01 UTC

I doubt very much you will ever see a mass exodus from Seti@Home from the longterm dedicated users. You have to admit the problems right now are tiny compared to last year, Before we had the 3 new servers.

When we got the new servers I went from a 3 day cache to the one day I have now. If I run out of work its no big deal,as I run back up projects.

Now Im not saying that it isnt frustrating to run out of work or up / down loads jam up because it is. Its frustrating in the fact that im still trying to find out what my max rac is for my i7:)

In any case no matter what happens, I will crunch till the guys in the lab turn off the lights and pull all the plugs.
[/quote]

Old James
ID: 1112959 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1112964 - Posted: 4 Jun 2011, 12:31:46 UTC

Anyone interested in doing a bit of amateur 'storm forecasting' can look at the Arecibo Observatory Telescope Schedule (http://www.naic.edu/vscience/schedule/scedfra2.htm). The projects to look for are those which mention 'ALFA' on the schedule pages (those are the ones which SETI@home can eavesdrop on).

It looks as it the early-April recordings which are causing most of the current problems originated with A2130 - The G-ALFA Continuum Transit Survey (details, PDF format). The good news? They took a break after 3 May. The bad news? They're due back on Monday 6 June.....

Apart from throwing hardware at the problem, there are some suggestions we could make to alleviate it in the short term (before Moore's Law catches up with us again).

1) Hire a real, top-notch, world-class specialist networking consultant to get the most out of the existing pipe. As Rob Smith says, all the back-off / retry traffic adds to the overall load. More sophisticated traffic-shaping and demand management might actually improve the effective, useful, throughput. Although our much-loved project staff have a wealth of practical experience with the SETI hardware, they are generalists, and I don't think it's unfair to suggest that a specialist could help out.

2) Look at the 'tapes' and determine the work mix, before loading them into the splitters. If tapes with a high proportion of VHARs could be rationed, and only split one or two at a time alongside tapes with other recording patterns, the average download volume could be smoothed out, rather than bunching into these 'storm peaks' (which we've seen before, and will no doubt see again).
ID: 1112964 · Report as offensive
PhilippeD

Send message
Joined: 14 May 11
Posts: 2
Credit: 56,468
RAC: 0
Belgium
Message 1112969 - Posted: 4 Jun 2011, 12:55:45 UTC - in response to Message 1112964.  


1) Hire a real, top-notch, world-class specialist networking consultant to get the most out of the existing pipe. As Rob Smith says, all the back-off / retry traffic adds to the overall load. More sophisticated traffic-shaping and demand management might actually improve the effective, useful, throughput. Although our much-loved project staff have a wealth of practical experience with the SETI hardware, they are generalists, and I don't think it's unfair to suggest that a specialist could help out.


If I understand correctly, each work unit is processed by two different clients, say, me and my wingman. If I guess correctly, this means that the same work unit data is sent twice by the Seti servers, say once to me and once to my wingman. When I look at the work units details, this seems to happen most often at the same time (within the same second), or just within a few seconds interval.

So am I wrong in thinking that it might be possible to improve the situation in most cases by putting the data only ONCE on the pipe by the Seti servers, and having both clients listen to the same data? I let the details to a network specialist :-), but there would probably exist several possible solutions.

If my assumptions are correct and this could be possible for all the work units, it would then be possible to save up to 50% of the bandwith ? Or, in other words, to nearly double the download throughput ?

Apologies if this seems a bit naive; I would be glad to learn from my mistakes. Thanks and respect to all the people who spend their time helping naive users like me.
ID: 1112969 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1112971 - Posted: 4 Jun 2011, 13:01:08 UTC - in response to Message 1112948.  

retransmission is a lot of network traffic if the WU is an AP download that was 90% complete, whereas the completion of a results upload is a relatively small network traffic load

Downloads are continued from the point where they got HTTP error, maybe few bytes are lost when I look carefully, but that's all.
ID: 1112971 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1112980 - Posted: 4 Jun 2011, 13:20:51 UTC - in response to Message 1112969.  

1) Hire a real, top-notch, world-class specialist networking consultant to get the most out of the existing pipe. As Rob Smith says, all the back-off / retry traffic adds to the overall load. More sophisticated traffic-shaping and demand management might actually improve the effective, useful, throughput. Although our much-loved project staff have a wealth of practical experience with the SETI hardware, they are generalists, and I don't think it's unfair to suggest that a specialist could help out.

If I understand correctly, each work unit is processed by two different clients, say, me and my wingman. If I guess correctly, this means that the same work unit data is sent twice by the Seti servers, say once to me and once to my wingman. When I look at the work units details, this seems to happen most often at the same time (within the same second), or just within a few seconds interval.

So am I wrong in thinking that it might be possible to improve the situation in most cases by putting the data only ONCE on the pipe by the Seti servers, and having both clients listen to the same data? I let the details to a network specialist :-), but there would probably exist several possible solutions.

If my assumptions are correct and this could be possible for all the work units, it would then be possible to save up to 50% of the bandwith ? Or, in other words, to nearly double the download throughput ?

Apologies if this seems a bit naive; I would be glad to learn from my mistakes. Thanks and respect to all the people who spend their time helping naive users like me.

Don't worry about asking naive questions - sometimes they're the best of all, because they prompt people into lateral thinking :-) [The problems start when people start relying on naive answers.....]

As it happens, there's a possible way that could be achieved. SETI@home actually rents a gigabit internet connection, from a company called 'Hurricane Electric'. Unfortunately, the nearest Hurricane Electric gets to the Berkeley laboratories is Palo Alto, across the Bay. From there, the data has to use a complicated route over slower cables to reach, first, the Berkeley university campus, and from there up the imfamous 'hill' to the labs. It's a bit like the slow bus ride to the airport which adds so much to trans-continental flight times.....

If there could be a fast, large, reliable (and hence expensive) transparent data cache in Palo Alto, directly connected to the fast trans-continental lines, only one copy of each datafile would need to take that slow suburban bus-ride off campus. But: I stress reliable. We can't ask SETI staff to tramp out to Palo Alto to replace a failed disk drive every few weeks. If any donor was prepared to stump up for such a cache, it would be useless without a proper enterprise-grade on-site maintenance contract to go with it.
ID: 1112980 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1112992 - Posted: 4 Jun 2011, 13:40:25 UTC
Last modified: 4 Jun 2011, 13:40:46 UTC

I did not find the post yet, but I thought I remembered Matt saying that getting a 1Gb link to Seti was indeed a possibility, depending on some investment in hardware, but mostly working through politics with Berkeley IT.

The possibility was also suggested at the time to keep the existing 1Mb link, and just add some bandwidth from the campus link.

Several folks had offered to donate for the hardware to make this happen.

Wonder...has any progress has been made on that front?
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1112992 · Report as offensive
PhilippeD

Send message
Joined: 14 May 11
Posts: 2
Credit: 56,468
RAC: 0
Belgium
Message 1112995 - Posted: 4 Jun 2011, 13:43:50 UTC - in response to Message 1112980.  

If there could be a fast, large, reliable (and hence expensive) transparent data cache in Palo Alto, directly connected to the fast trans-continental lines, only one copy of each datafile would need to take that slow suburban bus-ride off campus. But: I stress reliable. We can't ask SETI staff to tramp out to Palo Alto to replace a failed disk drive every few weeks. If any donor was prepared to stump up for such a cache, it would be useless without a proper enterprise-grade on-site maintenance contract to go with it.


Thanks for your response!

From what I have read in the forums, it seems that, to get some work, my BOINC Manager first contacts the scheduler, and then issues a HTTP request to the download server indicated by the scheduler. Therefore, maybe a reasonably priced HTTP proxy server in Palo Alto would do the job ? With disk space for just a few seconds cache, this proxy server could then serve the request from the wingman (that is, the second request for the same data) directly without fetching it from Berkeley... If the scheduler just manages some simple keep-alive hanshaking with the proxy server, it could easily detect that the server is down and avoid directing clients to that server, just reverting to current situation. Some kind of graceful degradation.

Of course, reliability is a must, as you stressed. But such a simple server should not be that expensive.
ID: 1112995 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24911
Credit: 3,081,182
RAC: 7
Ireland
Message 1113011 - Posted: 4 Jun 2011, 14:19:37 UTC

Very interesting thread. Thanks Richard for a very detailed explanation.

Possible (long term) solution?

Is it possible for the Project Admin in conjunction with all our dedicated crunchers, to allocate specifically, part of donated funds (if stated by donator of course) to build up a dedicated fibre optic fund?

This of course being ongoing until target is met. I should think that Berkeley Admin would not stand in the way as they would benefit from it as well....

just a thought........

I'm pretty sure that there are many out there who do not wish to see this project in trouble.
ID: 1113011 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1113024 - Posted: 4 Jun 2011, 14:53:05 UTC

Yes, thank you Richard. I was under the impression the GB line from Hurricane was up to the campus IT and it was they that was causing the holdup as they didn't have the funds allocated to go up the hill with it.(No pun intended!) I seem to remember seeing a cost of either $38,000 or $48,000 to run it for the lab. Either way that's a lot of fund raising and there is a lot of other stuff that needs replaced that could use that kind of money and is more crucial than the line.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1113024 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24911
Credit: 3,081,182
RAC: 7
Ireland
Message 1113034 - Posted: 4 Jun 2011, 15:00:24 UTC - in response to Message 1113024.  
Last modified: 4 Jun 2011, 15:00:48 UTC

I seem to remember seeing a cost of either $38,000 or $48,000 to run it for the lab. Either way that's a lot of fund raising and there is a lot of other stuff that needs replaced that could use that kind of money and is more crucial than the line.


True, but I did say long term. Seti has been going 10 years already, just think if something like that had been implemented, what would the total fund including interest, be today?

I was not asking/stating for an immediate relief but a long term gaol - maybe over 2/3 years? (THAT's just an example, not a statement)

I'm pretty certain, it can be acheived - the question is....do people want to support this project on a short term & moan when no work available? OR really help the project & their rigs on a long term basis....
ID: 1113034 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1113048 - Posted: 4 Jun 2011, 15:13:06 UTC
Last modified: 4 Jun 2011, 15:17:33 UTC

Perry, Sirius: Read Mark Sattler's post.

Yes, we understand that there is already a Gigabit link in position between Campus and SSL - Matt wrote that, but I don't have the reference to hand either. Maybe within the edit window ;-)

There are two problems.

1) The link is for the whole SSL to share. The politics need to be sorted out before SETI can borrow some, but not all, of it. It's unpopular when the SETI cuckoo outgrows the SSL nest.

2) They need some additional (or upgraded) hardware to hook up the SETI connection, and break it out again at the other end of the new link. No specification, or price, given yet, but I doubt it's as high as tens of thousands of dollars. No doubt we can suggest a fund-raiser once political approval is given.

Edit - references message 1093673, message 1093952 (same thread).
ID: 1113048 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : To S@H Crew -Please shut down S@H ( for week, or two)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.