Busy Bytes (Jul 06 2009)


log in

Advanced search

Message boards : Technical News : Busy Bytes (Jul 06 2009)

1 · 2 · 3 · 4 . . . 5 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1388
Credit: 74,079
RAC: 0
United States
Message 915003 - Posted: 6 Jul 2009, 23:00:18 UTC

It's still pretty ugly out there - we're maxed out our bandwidth and mysql resources. We were able to squeeze out a few more cycles from the upload/scheduling servers this morning, but generally it's been quite impossible the past week or so. Clearly this is a result of increasing our user base, and the growing percentage of results being processed by cuda clients.

To solve this problem we have several options. There is non-zero but nevertheless slow progress in both the bandwidth and mysql fronts, so we're effectively stuck with what we got for now. We could go to single redundancy and keep the split rate the same. This will immediately divide out outgoing bandwidth in half, but people will, on average, get less work to chew on. We could also increase the resolution of chirp rates that we process, thus lengthening the time it takes to process a workunit. We may do both. From what Eric tells me compressing workunits only helps multibeam, and only by about 20%. Almost not worth considering, since that will get us 5-10 Mbits back, and we need something like 50.

The other annoying thing is that on Friday/Saturday our raw data storage server got hung up while we were copying a file up from our archives. This caused splitting to slow down until we ran out of work to send. Not sure why this was the case, as I killed that transfer and everything worked fine after that. Even more mysterious is that, while bringing the same file up again this morning it choked our server once more. Why this one particular file is having such a random and extreme negative effect is beyond me at this point, but we're doing other tests, etc.

You know, I should point out that while I write these daily missives I tend to disagree with a lot of policies that end up getting enacted around here, which it makes it difficult for me to defend one practice or another that might be discussed on these threads. Anyway, don't blame the messenger.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

zpm
Volunteer tester
Avatar
Send message
Joined: 25 Apr 08
Posts: 284
Credit: 1,460,230
RAC: 3,306
United States
Message 915005 - Posted: 6 Jul 2009, 23:05:16 UTC - in response to Message 915003.

we don't blame the messenger; although, someone should look at the idea's being thrown out by the masses on forum...


i'm ok with less average work...
____________

I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
Go Georgia Tech.

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13541
Credit: 29,279,122
RAC: 15,252
United States
Message 915011 - Posted: 6 Jul 2009, 23:11:54 UTC - in response to Message 915003.

That's too bad. I'd really like to hear your thoughts on several of the issues and discussions that happen around here.
____________

Profile Gary Charpentier
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12039
Credit: 6,368,124
RAC: 8,653
United States
Message 915015 - Posted: 6 Jul 2009, 23:19:29 UTC - in response to Message 915003.

Thanks for the update.

And we understand: Bosses dictate, grunts implement.

____________

C
Send message
Joined: 3 Apr 99
Posts: 240
Credit: 6,602,807
RAC: 782
United States
Message 915030 - Posted: 6 Jul 2009, 23:34:59 UTC - in response to Message 915003.

...You know, I should point out that while I write these daily missives I tend to disagree with a lot of policies that end up getting enacted around here, which it makes it difficult for me to defend one practice or another that might be discussed on these threads. Anyway, don't blame the messenger.

- Matt


"Your new idea isn't going to work. I know this because I have a PhD and you don't, and besides that, I didn't think of the idea first."

Yep - hear that every now and then at work...
C
____________

Join Team MacNN

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13541
Credit: 29,279,122
RAC: 15,252
United States
Message 915067 - Posted: 7 Jul 2009, 0:13:12 UTC

What about the possibility of getting full gigabit access to the lab through targeted donations from the users for this specific project?
____________

zpm
Volunteer tester
Avatar
Send message
Joined: 25 Apr 08
Posts: 284
Credit: 1,460,230
RAC: 3,306
United States
Message 915069 - Posted: 7 Jul 2009, 0:15:01 UTC - in response to Message 915067.

about as much as Ozzy coming to Columbus, GA to play.....
____________

I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
Go Georgia Tech.

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 915070 - Posted: 7 Jul 2009, 0:17:38 UTC - in response to Message 915067.

What about the possibility of getting full gigabit access to the lab through targeted donations from the users for this specific project?

As much as I hate to say it, about as much as getting our fellow SETIzens to kick in $80k.
____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24050
Credit: 516,741
RAC: 128
United States
Message 915082 - Posted: 7 Jul 2009, 0:30:57 UTC

A possibility that might help would be to generate a second task or not based on the "reputation" of the first computer to get the task. A computer that keeps returning no errors that validate would keep getting its reputation increased. You ought to be able to get around 40% of the bandwidth back.
____________


BOINC WIKI

zpm
Volunteer tester
Avatar
Send message
Joined: 25 Apr 08
Posts: 284
Credit: 1,460,230
RAC: 3,306
United States
Message 915106 - Posted: 7 Jul 2009, 0:46:46 UTC - in response to Message 915082.
Last modified: 7 Jul 2009, 0:49:17 UTC

perhaps their needs to be a Downtime period only longer.....

A 1 month long Seti-Break like spring break...I'll admit that i let my quad talk to me, and half the time it says, "give me a crunch break please". This would allow you guys time to work on the stuff that needs to be worked on...

Einstien is already slowing down....
____________

I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
Go Georgia Tech.

john gray
Send message
Joined: 16 May 99
Posts: 3
Credit: 8,379,909
RAC: 0
United States
Message 915119 - Posted: 7 Jul 2009, 0:59:05 UTC

some one needs to fix what is wrong with boinc or bandwith before we (I)or others stop runnig this mess of so called looking for ET...
____________

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13541
Credit: 29,279,122
RAC: 15,252
United States
Message 915120 - Posted: 7 Jul 2009, 1:00:55 UTC - in response to Message 915119.

some one needs to fix what is wrong with boinc or bandwith before we (I)or others stop runnig this mess of so called looking for ET...


Much discussion has happened as to the "hows" of fixing it. Needless to say, the answers aren't as simple as "fix it!", and all of them require more money than the project has available.
____________

Andy Williams
Volunteer tester
Avatar
Send message
Joined: 11 May 01
Posts: 187
Credit: 112,464,820
RAC: 0
United States
Message 915125 - Posted: 7 Jul 2009, 1:07:02 UTC - in response to Message 915082.

A possibility that might help would be to generate a second task or not based on the "reputation" of the first computer to get the task. A computer that keeps returning no errors that validate would keep getting its reputation increased. You ought to be able to get around 40% of the bandwidth back.


The best suggestion I have heard in a long, long, long time.
____________
--
Classic 82353 WU / 400979 h

john gray
Send message
Joined: 16 May 99
Posts: 3
Credit: 8,379,909
RAC: 0
United States
Message 915126 - Posted: 7 Jul 2009, 1:07:06 UTC

wrong answer the problem onley started within the last three weeks.... before not to bad ... something changed but what??????????????????
____________

Andy Williams
Volunteer tester
Avatar
Send message
Joined: 11 May 01
Posts: 187
Credit: 112,464,820
RAC: 0
United States
Message 915127 - Posted: 7 Jul 2009, 1:09:39 UTC - in response to Message 915126.

wrong answer the problem onley started within the last three weeks.... before not to bad ... something changed but what??????????????????


Please. 2009 as a whole has been very troublesome.
____________
--
Classic 82353 WU / 400979 h

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13541
Credit: 29,279,122
RAC: 15,252
United States
Message 915129 - Posted: 7 Jul 2009, 1:13:25 UTC - in response to Message 915126.

wrong answer the problem onley started within the last three weeks.... before not to bad ... something changed but what??????????????????


Like I said: much discussion has been happening over exactly that.

I do believe Matt answered very specifically what has changed.
____________

john gray
Send message
Joined: 16 May 99
Posts: 3
Credit: 8,379,909
RAC: 0
United States
Message 915133 - Posted: 7 Jul 2009, 1:23:31 UTC

there is someting way wrong with data base management or the servers apeare way beyond capacity.......sence i am a long time seti participate i may have to stop all of my computing for seti ....... in these hard times we all do what we can .. the use of my computers and the power comsmption is great...... if things do not improve why should i continue with using my resouces for the cause????
____________

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13541
Credit: 29,279,122
RAC: 15,252
United States
Message 915134 - Posted: 7 Jul 2009, 1:25:45 UTC - in response to Message 915133.

there is someting way wrong with data base management or the servers apeare way beyond capacity.......sence i am a long time seti participate i may have to stop all of my computing for seti ....... in these hard times we all do what we can .. the use of my computers and the power comsmption is great...... if things do not improve why should i continue with using my resouces for the cause????


A question every user must ask themselves. I've cut back immensely myself, but mainly due to power consumption and rising costs in these economically hard times.

But if you're really that dedicated, you'd understand that server outages have been common for SETI for ages, and that patience is ultimately required.
____________

Profile Virtual Boss*
Volunteer tester
Avatar
Send message
Joined: 4 May 08
Posts: 417
Credit: 6,163,425
RAC: 740
Australia
Message 915148 - Posted: 7 Jul 2009, 1:53:56 UTC

Re: Maxed out Bandwidth.

This seems to cause the most anger amongst crunchers because the uploads are blocked, and when that happens some hosts run out of work and can't download any more WU's. Users then increase cache size to try to hold enough work to tide them over, which just makes the problem worse.

Why not restrict bandwidth of the (just the) download servers to 80-85Mb/s?

This would obviously lengthen the time of max DL bandwidth, but leave enough bandwidth for the uploads to get through. This in turn would:-
1). Reduce storage requirements for 'In progress' work
2). Allow crunchers to get new work because they can upload completed work. (reduce frustrations)
3). Reduce need for very large cache.

Theoretically everybody would get some (enough) work, and large caches would slowly fill over time, faster as demand reduced.

Is there a flaw in my logic?
____________
Flying high with Team Sicituradastra.

Profile Neil Blaikie
Volunteer tester
Avatar
Send message
Joined: 17 May 99
Posts: 142
Credit: 6,466,200
RAC: 12
Canada
Message 915150 - Posted: 7 Jul 2009, 1:55:28 UTC

Patience is required and if "you" (as a user) don't have it then go crunch another project. I am an avid online flight simulator pilot and one of the biggest sites that I and hundreds of other people used to get freeware addons, was hacked on May 12th, the massive file library of freeware files is still down and being worked on to this day. I can wait

The sites forums have been inundated with people asking when will it be back online? The answer when it is fixed and working how we want it to work. The particular site was able to purchase 4 massively powerful new servers with a mass donation post hack.
Digging up a large chunk of road to dump a fiber cable is very expensive, and being on campus bureaucratic processes must be followed, sure it would be great but in the short term is NOT going to happen. Having users send a large amount of money for this purpose could prove fruitless if those that allow the campus to be dug up say no.

I have a ton of files waiting to upload and usually had "network always available", I am doing my part by most of the day leaving it "suspended", I try and if some get through great, if not, I wait a little longer and try again. My point being that they are getting thorough albeit slowly and sitting being patient has helped.

Enough of my rambling and Thank you Matt for your posts, they are appreciated and even though you have not a huge amount of say on matters, your hard work and problem solving is helping 10 times as much as I am sure you would give yourself credit for. Keep up the good work and things will get better, however long it takes!
____________

1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Technical News : Busy Bytes (Jul 06 2009)

Copyright © 2014 University of California