To S@H Crew -Please shut down S@H ( for week, or two)


log in

Advanced search

Message boards : Number crunching : To S@H Crew -Please shut down S@H ( for week, or two)

1 · 2 · 3 · Next
Author Message
Crun-chi
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 174
Credit: 3,037,232
RAC: 0
Croatia
Message 1112931 - Posted: 4 Jun 2011, 9:26:07 UTC

Why?
In that case you will have time to fix all thing that bother you, examine servers, and do all thing you do every Tuesday.
After that I hope that most of your problems will be fixed, and all parts of system will work normal.
Thanks.

P.S Please do not throw flame at me, it is just my view of solving this.
____________
I am cruncher :)
I LOVE SETI BOINC :)

Profile MarkJ
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 936
Credit: 19,744,113
RAC: 27,121
Australia
Message 1112937 - Posted: 4 Jun 2011, 10:20:20 UTC - in response to Message 1112931.
Last modified: 4 Jun 2011, 10:20:50 UTC

Why?
In that case you will have time to fix all thing that bother you, examine servers, and do all thing you do every Tuesday.
After that I hope that most of your problems will be fixed, and all parts of system will work normal.
Thanks.

P.S Please do not throw flame at me, it is just my view of solving this.


Their current issue is the 100mbit network connection to the rest of the world can't cope with the 100,000+ users all trying to upload and download. It's not a server issue. Shutting the project down for a week isn't going to solve this.
____________
BOINC blog

Crun-chi
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 174
Credit: 3,037,232
RAC: 0
Croatia
Message 1112938 - Posted: 4 Jun 2011, 10:24:58 UTC - in response to Message 1112937.

Why?
In that case you will have time to fix all thing that bother you, examine servers, and do all thing you do every Tuesday.
After that I hope that most of your problems will be fixed, and all parts of system will work normal.
Thanks.

P.S Please do not throw flame at me, it is just my view of solving this.


Their current issue is the 100mbit network connection to the rest of the world can't cope with the 100,000+ users all trying to upload and download. It's not a server issue. Shutting the project down for a week isn't going to solve this.

Please tell me number of aprox number of active hosts before one month?
Then 100mbit network connection wasn't problem?
____________
I am cruncher :)
I LOVE SETI BOINC :)

Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 6824
Credit: 24,648,349
RAC: 26,797
United Kingdom
Message 1112939 - Posted: 4 Jun 2011, 10:44:30 UTC - in response to Message 1112938.
Last modified: 4 Jun 2011, 10:48:12 UTC

Why?
In that case you will have time to fix all thing that bother you, examine servers, and do all thing you do every Tuesday.
After that I hope that most of your problems will be fixed, and all parts of system will work normal.
Thanks.

P.S Please do not throw flame at me, it is just my view of solving this.


Their current issue is the 100mbit network connection to the rest of the world can't cope with the 100,000+ users all trying to upload and download. It's not a server issue. Shutting the project down for a week isn't going to solve this.

Please tell me number of aprox number of active hosts before one month?
Then 100mbit network connection wasn't problem?


After every outage the 100mbit network has always been a problem. Plus there seem to be a lot of "shorties" as my fastest cruncher is constantly requesting new work.

Shouldn't we be looking at the number of processors rather than the number of users. I am one users 4 crunchers and 12 processors (CPU and GPU).

Plus for a long shutdown to work you would have to stop generating new work and receive all completed work back. Then just imagine what would happen when the project restarted, thousands of processors all requesting new work, some massively fast GPU's for up to a 10 day cache.
____________


Today is life, the only life we're sure of. Make the most of today.

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8376
Credit: 46,789,652
RAC: 23,557
United Kingdom
Message 1112943 - Posted: 4 Jun 2011, 11:05:43 UTC - in response to Message 1112938.

Please tell me number of aprox number of active hosts before one month?
Then 100mbit network connection wasn't problem?

Here you go:



Read the graph scale carefully - the variation is only 5,000 out of 250,000: not significant.

No, it isn't a change in the user base: it isn't a change in the servers at Berkeley (although we never really got the chance to recover fully from the storage unit breakdown a couple of weeks ago, which doesn't help).

The only thing which has changed is the pattern of data recordings received from Arecibo: and to remind readers, SETI@home has no control over that. The radio astronomers with funds to hire telescope time at Arecibo decide where the telescope is pointing at any one time. SETI just gets a spare copy of whatever the astonomers are listening to.

If the astronomers are studying point objects intensively, they keep the telescope pointing at the same place in the sky for extended durations. Those recordings have small changes in 'Angle Range' in the resulting SETI tasks - the dreaded VLARs beloved of CUDA users. Other astonomers use the telescope for survey work, and sweep the sky rapidly looking for new objects. These recordings have large Angle Ranges (the telescope aiming point changes quickly), and the resulting SETI tasks are VHARs, or 'shorties'. Seti can't get as much useful information out of them, so doesn't spend as long looking at them.

It appears that in early April 2001 (when most of the current tasks orginate from), it was survey astronomers who were allocated the lion's share of the recording schedule at Arecibo. Shutting down SETI for a week isn't going to change that: the data would still be there, waiting to be crunched, when they switch it back on.

Crun-chi
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 174
Credit: 3,037,232
RAC: 0
Croatia
Message 1112944 - Posted: 4 Jun 2011, 11:19:43 UTC - in response to Message 1112943.

Thanks for explanation.
But in that case, what SETI crew can do?
Nothing.
Crunchers well be unpleased, system will be "minutes" away from total collapse,and whole project will be very, very slow. :(
I think it is not bright future of SETI @ home :(
____________
I am cruncher :)
I LOVE SETI BOINC :)

rob smith
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8149
Credit: 52,861,782
RAC: 76,007
United Kingdom
Message 1112948 - Posted: 4 Jun 2011, 11:27:12 UTC

If network traffic is the bottleneck, and it would appear to be so, then there is a slightly different bold solution.

If my PCs are typical of the rest of the S@H community then a fair amount of network traffic is due to re-tries due to time-outs of one sort or another within "the BOINC environment". This is bad news as each time out requires several messages to be passed between the two communicating system = network traffic. Each time-out requires a proportion of the data to be retransmitted = network traffic (it appears that sometimes a WU download/upload restarts and other times it is a complete retransmission - retransmission is a lot of network traffic if the WU is an AP download that was 90% complete, whereas the completion of a results upload is a relatively small network traffic load).

So how about increasing the time allowed before a client, or server, end triggered re-try? I know this would impact on the size of buffer required, but it would reduce the amount of network traffic required to trigger and manage retries. I doubt that it would take a substantial change to have a significant impact, I'd guess increasing the time-out by 10% would reduce the number of retries by between 20 and 50% - now that's some saving!
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Crun-chi
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 174
Credit: 3,037,232
RAC: 0
Croatia
Message 1112950 - Posted: 4 Jun 2011, 11:34:28 UTC - in response to Message 1112948.

In that case:many , especially fast crunchers can freely stop doing S@H.
He can have additional work for lets say 5 days, crunch that in 1.5 days, and in rest time waiting to get new one. And if you increase re-try time it is obvious that new work will come in very slow rate...
____________
I am cruncher :)
I LOVE SETI BOINC :)

Profile SciManStev
Volunteer tester
Avatar
Send message
Joined: 20 Jun 99
Posts: 4792
Credit: 79,814,471
RAC: 36,341
United States
Message 1112957 - Posted: 4 Jun 2011, 11:58:10 UTC

If the astronomers are studying point objects intensively, they keep the telescope pointing at the same place in the sky for extended durations. Those recordings have small changes in 'Angle Range' in the resulting SETI tasks - the dreaded VLARs beloved of CUDA users. Other astonomers use the telescope for survey work, and sweep the sky rapidly looking for new objects. These recordings have large Angle Ranges (the telescope aiming point changes quickly), and the resulting SETI tasks are VHARs, or 'shorties'. Seti can't get as much useful information out of them, so doesn't spend as long looking at them.


That's the first time I have seen that explained, and I really appreciated it. Thank you Richard!

Steve
____________
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website

Profile James Sotherden
Avatar
Send message
Joined: 16 May 99
Posts: 8555
Credit: 31,489,002
RAC: 57,434
United States
Message 1112959 - Posted: 4 Jun 2011, 12:01:10 UTC
Last modified: 4 Jun 2011, 12:02:01 UTC

I doubt very much you will ever see a mass exodus from Seti@Home from the longterm dedicated users. You have to admit the problems right now are tiny compared to last year, Before we had the 3 new servers.

When we got the new servers I went from a 3 day cache to the one day I have now. If I run out of work its no big deal,as I run back up projects.

Now Im not saying that it isnt frustrating to run out of work or up / down loads jam up because it is. Its frustrating in the fact that im still trying to find out what my max rac is for my i7:)

In any case no matter what happens, I will crunch till the guys in the lab turn off the lights and pull all the plugs.
____________

Old James

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8376
Credit: 46,789,652
RAC: 23,557
United Kingdom
Message 1112964 - Posted: 4 Jun 2011, 12:31:46 UTC

Anyone interested in doing a bit of amateur 'storm forecasting' can look at the Arecibo Observatory Telescope Schedule (http://www.naic.edu/vscience/schedule/scedfra2.htm). The projects to look for are those which mention 'ALFA' on the schedule pages (those are the ones which SETI@home can eavesdrop on).

It looks as it the early-April recordings which are causing most of the current problems originated with A2130 - The G-ALFA Continuum Transit Survey (details, PDF format). The good news? They took a break after 3 May. The bad news? They're due back on Monday 6 June.....

Apart from throwing hardware at the problem, there are some suggestions we could make to alleviate it in the short term (before Moore's Law catches up with us again).

1) Hire a real, top-notch, world-class specialist networking consultant to get the most out of the existing pipe. As Rob Smith says, all the back-off / retry traffic adds to the overall load. More sophisticated traffic-shaping and demand management might actually improve the effective, useful, throughput. Although our much-loved project staff have a wealth of practical experience with the SETI hardware, they are generalists, and I don't think it's unfair to suggest that a specialist could help out.

2) Look at the 'tapes' and determine the work mix, before loading them into the splitters. If tapes with a high proportion of VHARs could be rationed, and only split one or two at a time alongside tapes with other recording patterns, the average download volume could be smoothed out, rather than bunching into these 'storm peaks' (which we've seen before, and will no doubt see again).

PhilippeD
Send message
Joined: 14 May 11
Posts: 2
Credit: 56,468
RAC: 0
Belgium
Message 1112969 - Posted: 4 Jun 2011, 12:55:45 UTC - in response to Message 1112964.


1) Hire a real, top-notch, world-class specialist networking consultant to get the most out of the existing pipe. As Rob Smith says, all the back-off / retry traffic adds to the overall load. More sophisticated traffic-shaping and demand management might actually improve the effective, useful, throughput. Although our much-loved project staff have a wealth of practical experience with the SETI hardware, they are generalists, and I don't think it's unfair to suggest that a specialist could help out.


If I understand correctly, each work unit is processed by two different clients, say, me and my wingman. If I guess correctly, this means that the same work unit data is sent twice by the Seti servers, say once to me and once to my wingman. When I look at the work units details, this seems to happen most often at the same time (within the same second), or just within a few seconds interval.

So am I wrong in thinking that it might be possible to improve the situation in most cases by putting the data only ONCE on the pipe by the Seti servers, and having both clients listen to the same data? I let the details to a network specialist :-), but there would probably exist several possible solutions.

If my assumptions are correct and this could be possible for all the work units, it would then be possible to save up to 50% of the bandwith ? Or, in other words, to nearly double the download throughput ?

Apologies if this seems a bit naive; I would be glad to learn from my mistakes. Thanks and respect to all the people who spend their time helping naive users like me.

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 824
Credit: 1,545,480
RAC: 290
Germany
Message 1112971 - Posted: 4 Jun 2011, 13:01:08 UTC - in response to Message 1112948.

retransmission is a lot of network traffic if the WU is an AP download that was 90% complete, whereas the completion of a results upload is a relatively small network traffic load

Downloads are continued from the point where they got HTTP error, maybe few bytes are lost when I look carefully, but that's all.
____________
.

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8376
Credit: 46,789,652
RAC: 23,557
United Kingdom
Message 1112980 - Posted: 4 Jun 2011, 13:20:51 UTC - in response to Message 1112969.

1) Hire a real, top-notch, world-class specialist networking consultant to get the most out of the existing pipe. As Rob Smith says, all the back-off / retry traffic adds to the overall load. More sophisticated traffic-shaping and demand management might actually improve the effective, useful, throughput. Although our much-loved project staff have a wealth of practical experience with the SETI hardware, they are generalists, and I don't think it's unfair to suggest that a specialist could help out.

If I understand correctly, each work unit is processed by two different clients, say, me and my wingman. If I guess correctly, this means that the same work unit data is sent twice by the Seti servers, say once to me and once to my wingman. When I look at the work units details, this seems to happen most often at the same time (within the same second), or just within a few seconds interval.

So am I wrong in thinking that it might be possible to improve the situation in most cases by putting the data only ONCE on the pipe by the Seti servers, and having both clients listen to the same data? I let the details to a network specialist :-), but there would probably exist several possible solutions.

If my assumptions are correct and this could be possible for all the work units, it would then be possible to save up to 50% of the bandwith ? Or, in other words, to nearly double the download throughput ?

Apologies if this seems a bit naive; I would be glad to learn from my mistakes. Thanks and respect to all the people who spend their time helping naive users like me.

Don't worry about asking naive questions - sometimes they're the best of all, because they prompt people into lateral thinking :-) [The problems start when people start relying on naive answers.....]

As it happens, there's a possible way that could be achieved. SETI@home actually rents a gigabit internet connection, from a company called 'Hurricane Electric'. Unfortunately, the nearest Hurricane Electric gets to the Berkeley laboratories is Palo Alto, across the Bay. From there, the data has to use a complicated route over slower cables to reach, first, the Berkeley university campus, and from there up the imfamous 'hill' to the labs. It's a bit like the slow bus ride to the airport which adds so much to trans-continental flight times.....

If there could be a fast, large, reliable (and hence expensive) transparent data cache in Palo Alto, directly connected to the fast trans-continental lines, only one copy of each datafile would need to take that slow suburban bus-ride off campus. But: I stress reliable. We can't ask SETI staff to tramp out to Palo Alto to replace a failed disk drive every few weeks. If any donor was prepared to stump up for such a cache, it would be useless without a proper enterprise-grade on-site maintenance contract to go with it.

Profile Chris S
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 31147
Credit: 11,352,290
RAC: 21,139
United Kingdom
Message 1112984 - Posted: 4 Jun 2011, 13:29:49 UTC

Hi Richard.

Hire a real, top-notch, world-class specialist networking consultant to get the most out of the existing pipe.

Great idea, anyone know one that would work for free? SETI wouldn't have the budget for that.

Look at the 'tapes' and determine the work mix, before loading them into the splitters

That suggestion has some merit, but don't they do that already? I'm sure that Chief Scientist Dan Werthimer and Project Scientist and Admin Eric Korpela, are constantly looking at ways to better things, and would welcome any suggestions to help.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38335
Credit: 561,308,469
RAC: 650,980
United States
Message 1112992 - Posted: 4 Jun 2011, 13:40:25 UTC
Last modified: 4 Jun 2011, 13:40:46 UTC

I did not find the post yet, but I thought I remembered Matt saying that getting a 1Gb link to Seti was indeed a possibility, depending on some investment in hardware, but mostly working through politics with Berkeley IT.

The possibility was also suggested at the time to keep the existing 1Mb link, and just add some bandwidth from the campus link.

Several folks had offered to donate for the hardware to make this happen.

Wonder...has any progress has been made on that front?
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

PhilippeD
Send message
Joined: 14 May 11
Posts: 2
Credit: 56,468
RAC: 0
Belgium
Message 1112995 - Posted: 4 Jun 2011, 13:43:50 UTC - in response to Message 1112980.

If there could be a fast, large, reliable (and hence expensive) transparent data cache in Palo Alto, directly connected to the fast trans-continental lines, only one copy of each datafile would need to take that slow suburban bus-ride off campus. But: I stress reliable. We can't ask SETI staff to tramp out to Palo Alto to replace a failed disk drive every few weeks. If any donor was prepared to stump up for such a cache, it would be useless without a proper enterprise-grade on-site maintenance contract to go with it.


Thanks for your response!

From what I have read in the forums, it seems that, to get some work, my BOINC Manager first contacts the scheduler, and then issues a HTTP request to the download server indicated by the scheduler. Therefore, maybe a reasonably priced HTTP proxy server in Palo Alto would do the job ? With disk space for just a few seconds cache, this proxy server could then serve the request from the wingman (that is, the second request for the same data) directly without fetching it from Berkeley... If the scheduler just manages some simple keep-alive hanshaking with the proxy server, it could easily detect that the server is down and avoid directing clients to that server, just reverting to current situation. Some kind of graceful degradation.

Of course, reliability is a must, as you stressed. But such a simple server should not be that expensive.

Profile Sirius B
Volunteer tester
Avatar
Send message
Joined: 26 Dec 00
Posts: 10273
Credit: 1,530,306
RAC: 248
United Kingdom
Message 1113011 - Posted: 4 Jun 2011, 14:19:37 UTC

Very interesting thread. Thanks Richard for a very detailed explanation.

Possible (long term) solution?

Is it possible for the Project Admin in conjunction with all our dedicated crunchers, to allocate specifically, part of donated funds (if stated by donator of course) to build up a dedicated fibre optic fund?

This of course being ongoing until target is met. I should think that Berkeley Admin would not stand in the way as they would benefit from it as well....

just a thought........

I'm pretty sure that there are many out there who do not wish to see this project in trouble.
____________

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 14,942,305
RAC: 12,284
United States
Message 1113024 - Posted: 4 Jun 2011, 14:53:05 UTC

Yes, thank you Richard. I was under the impression the GB line from Hurricane was up to the campus IT and it was they that was causing the holdup as they didn't have the funds allocated to go up the hill with it.(No pun intended!) I seem to remember seeing a cost of either $38,000 or $48,000 to run it for the lab. Either way that's a lot of fund raising and there is a lot of other stuff that needs replaced that could use that kind of money and is more crucial than the line.
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile Sirius B
Volunteer tester
Avatar
Send message
Joined: 26 Dec 00
Posts: 10273
Credit: 1,530,306
RAC: 248
United Kingdom
Message 1113034 - Posted: 4 Jun 2011, 15:00:24 UTC - in response to Message 1113024.
Last modified: 4 Jun 2011, 15:00:48 UTC

I seem to remember seeing a cost of either $38,000 or $48,000 to run it for the lab. Either way that's a lot of fund raising and there is a lot of other stuff that needs replaced that could use that kind of money and is more crucial than the line.


True, but I did say long term. Seti has been going 10 years already, just think if something like that had been implemented, what would the total fund including interest, be today?

I was not asking/stating for an immediate relief but a long term gaol - maybe over 2/3 years? (THAT's just an example, not a statement)

I'm pretty certain, it can be acheived - the question is....do people want to support this project on a short term & moan when no work available? OR really help the project & their rigs on a long term basis....
____________

1 · 2 · 3 · Next

Message boards : Number crunching : To S@H Crew -Please shut down S@H ( for week, or two)

Copyright © 2014 University of California