Busy Bytes (Jul 06 2009)

Message boards : Technical News : Busy Bytes (Jul 06 2009)
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 915003 - Posted: 6 Jul 2009, 23:00:18 UTC

It's still pretty ugly out there - we're maxed out our bandwidth and mysql resources. We were able to squeeze out a few more cycles from the upload/scheduling servers this morning, but generally it's been quite impossible the past week or so. Clearly this is a result of increasing our user base, and the growing percentage of results being processed by cuda clients.

To solve this problem we have several options. There is non-zero but nevertheless slow progress in both the bandwidth and mysql fronts, so we're effectively stuck with what we got for now. We could go to single redundancy and keep the split rate the same. This will immediately divide out outgoing bandwidth in half, but people will, on average, get less work to chew on. We could also increase the resolution of chirp rates that we process, thus lengthening the time it takes to process a workunit. We may do both. From what Eric tells me compressing workunits only helps multibeam, and only by about 20%. Almost not worth considering, since that will get us 5-10 Mbits back, and we need something like 50.

The other annoying thing is that on Friday/Saturday our raw data storage server got hung up while we were copying a file up from our archives. This caused splitting to slow down until we ran out of work to send. Not sure why this was the case, as I killed that transfer and everything worked fine after that. Even more mysterious is that, while bringing the same file up again this morning it choked our server once more. Why this one particular file is having such a random and extreme negative effect is beyond me at this point, but we're doing other tests, etc.

You know, I should point out that while I write these daily missives I tend to disagree with a lot of policies that end up getting enacted around here, which it makes it difficult for me to defend one practice or another that might be discussed on these threads. Anyway, don't blame the messenger.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 915003 · Report as offensive
zpm
Volunteer tester
Avatar

Send message
Joined: 25 Apr 08
Posts: 284
Credit: 1,659,024
RAC: 0
United States
Message 915005 - Posted: 6 Jul 2009, 23:05:16 UTC - in response to Message 915003.  

we don't blame the messenger; although, someone should look at the idea's being thrown out by the masses on forum...


i'm ok with less average work...

I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
Go Georgia Tech.
ID: 915005 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 915011 - Posted: 6 Jul 2009, 23:11:54 UTC - in response to Message 915003.  

That's too bad. I'd really like to hear your thoughts on several of the issues and discussions that happen around here.
ID: 915011 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30591
Credit: 53,134,872
RAC: 32
United States
Message 915015 - Posted: 6 Jul 2009, 23:19:29 UTC - in response to Message 915003.  

Thanks for the update.

And we understand: Bosses dictate, grunts implement.

ID: 915015 · Report as offensive
C

Send message
Joined: 3 Apr 99
Posts: 240
Credit: 7,716,977
RAC: 0
United States
Message 915030 - Posted: 6 Jul 2009, 23:34:59 UTC - in response to Message 915003.  

...You know, I should point out that while I write these daily missives I tend to disagree with a lot of policies that end up getting enacted around here, which it makes it difficult for me to defend one practice or another that might be discussed on these threads. Anyway, don't blame the messenger.

- Matt


"Your new idea isn't going to work. I know this because I have a PhD and you don't, and besides that, I didn't think of the idea first."

Yep - hear that every now and then at work...
C

Join Team MacNN
ID: 915030 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 915067 - Posted: 7 Jul 2009, 0:13:12 UTC

What about the possibility of getting full gigabit access to the lab through targeted donations from the users for this specific project?
ID: 915067 · Report as offensive
zpm
Volunteer tester
Avatar

Send message
Joined: 25 Apr 08
Posts: 284
Credit: 1,659,024
RAC: 0
United States
Message 915069 - Posted: 7 Jul 2009, 0:15:01 UTC - in response to Message 915067.  

about as much as Ozzy coming to Columbus, GA to play.....

I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
Go Georgia Tech.
ID: 915069 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 915070 - Posted: 7 Jul 2009, 0:17:38 UTC - in response to Message 915067.  

What about the possibility of getting full gigabit access to the lab through targeted donations from the users for this specific project?

As much as I hate to say it, about as much as getting our fellow SETIzens to kick in $80k.
ID: 915070 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 915082 - Posted: 7 Jul 2009, 0:30:57 UTC

A possibility that might help would be to generate a second task or not based on the "reputation" of the first computer to get the task. A computer that keeps returning no errors that validate would keep getting its reputation increased. You ought to be able to get around 40% of the bandwidth back.


BOINC WIKI
ID: 915082 · Report as offensive
zpm
Volunteer tester
Avatar

Send message
Joined: 25 Apr 08
Posts: 284
Credit: 1,659,024
RAC: 0
United States
Message 915106 - Posted: 7 Jul 2009, 0:46:46 UTC - in response to Message 915082.  
Last modified: 7 Jul 2009, 0:49:17 UTC

perhaps their needs to be a Downtime period only longer.....

A 1 month long Seti-Break like spring break...I'll admit that i let my quad talk to me, and half the time it says, "give me a crunch break please". This would allow you guys time to work on the stuff that needs to be worked on...

Einstien is already slowing down....

I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
Go Georgia Tech.
ID: 915106 · Report as offensive
john gray

Send message
Joined: 16 May 99
Posts: 3
Credit: 8,379,909
RAC: 0
United States
Message 915119 - Posted: 7 Jul 2009, 0:59:05 UTC

some one needs to fix what is wrong with boinc or bandwith before we (I)or others stop runnig this mess of so called looking for ET...
ID: 915119 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 915120 - Posted: 7 Jul 2009, 1:00:55 UTC - in response to Message 915119.  

some one needs to fix what is wrong with boinc or bandwith before we (I)or others stop runnig this mess of so called looking for ET...


Much discussion has happened as to the "hows" of fixing it. Needless to say, the answers aren't as simple as "fix it!", and all of them require more money than the project has available.
ID: 915120 · Report as offensive
Andy Williams
Volunteer tester
Avatar

Send message
Joined: 11 May 01
Posts: 187
Credit: 112,464,820
RAC: 0
United States
Message 915125 - Posted: 7 Jul 2009, 1:07:02 UTC - in response to Message 915082.  

A possibility that might help would be to generate a second task or not based on the "reputation" of the first computer to get the task. A computer that keeps returning no errors that validate would keep getting its reputation increased. You ought to be able to get around 40% of the bandwidth back.


The best suggestion I have heard in a long, long, long time.
--
Classic 82353 WU / 400979 h
ID: 915125 · Report as offensive
john gray

Send message
Joined: 16 May 99
Posts: 3
Credit: 8,379,909
RAC: 0
United States
Message 915126 - Posted: 7 Jul 2009, 1:07:06 UTC

wrong answer the problem onley started within the last three weeks.... before not to bad ... something changed but what??????????????????
ID: 915126 · Report as offensive
Andy Williams
Volunteer tester
Avatar

Send message
Joined: 11 May 01
Posts: 187
Credit: 112,464,820
RAC: 0
United States
Message 915127 - Posted: 7 Jul 2009, 1:09:39 UTC - in response to Message 915126.  

wrong answer the problem onley started within the last three weeks.... before not to bad ... something changed but what??????????????????


Please. 2009 as a whole has been very troublesome.
--
Classic 82353 WU / 400979 h
ID: 915127 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 915129 - Posted: 7 Jul 2009, 1:13:25 UTC - in response to Message 915126.  

wrong answer the problem onley started within the last three weeks.... before not to bad ... something changed but what??????????????????


Like I said: much discussion has been happening over exactly that.

I do believe Matt answered very specifically what has changed.
ID: 915129 · Report as offensive
john gray

Send message
Joined: 16 May 99
Posts: 3
Credit: 8,379,909
RAC: 0
United States
Message 915133 - Posted: 7 Jul 2009, 1:23:31 UTC

there is someting way wrong with data base management or the servers apeare way beyond capacity.......sence i am a long time seti participate i may have to stop all of my computing for seti ....... in these hard times we all do what we can .. the use of my computers and the power comsmption is great...... if things do not improve why should i continue with using my resouces for the cause????
ID: 915133 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 915134 - Posted: 7 Jul 2009, 1:25:45 UTC - in response to Message 915133.  

there is someting way wrong with data base management or the servers apeare way beyond capacity.......sence i am a long time seti participate i may have to stop all of my computing for seti ....... in these hard times we all do what we can .. the use of my computers and the power comsmption is great...... if things do not improve why should i continue with using my resouces for the cause????


A question every user must ask themselves. I've cut back immensely myself, but mainly due to power consumption and rising costs in these economically hard times.

But if you're really that dedicated, you'd understand that server outages have been common for SETI for ages, and that patience is ultimately required.
ID: 915134 · Report as offensive
Profile Virtual Boss*
Volunteer tester
Avatar

Send message
Joined: 4 May 08
Posts: 417
Credit: 6,440,287
RAC: 0
Australia
Message 915148 - Posted: 7 Jul 2009, 1:53:56 UTC

Re: Maxed out Bandwidth.

This seems to cause the most anger amongst crunchers because the uploads are blocked, and when that happens some hosts run out of work and can't download any more WU's. Users then increase cache size to try to hold enough work to tide them over, which just makes the problem worse.

Why not restrict bandwidth of the (just the) download servers to 80-85Mb/s?

This would obviously lengthen the time of max DL bandwidth, but leave enough bandwidth for the uploads to get through. This in turn would:-
1). Reduce storage requirements for 'In progress' work
2). Allow crunchers to get new work because they can upload completed work. (reduce frustrations)
3). Reduce need for very large cache.

Theoretically everybody would get some (enough) work, and large caches would slowly fill over time, faster as demand reduced.

Is there a flaw in my logic?
Flying high with Team Sicituradastra.
ID: 915148 · Report as offensive
Profile Neil Blaikie
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 143
Credit: 6,652,341
RAC: 0
Canada
Message 915150 - Posted: 7 Jul 2009, 1:55:28 UTC

Patience is required and if "you" (as a user) don't have it then go crunch another project. I am an avid online flight simulator pilot and one of the biggest sites that I and hundreds of other people used to get freeware addons, was hacked on May 12th, the massive file library of freeware files is still down and being worked on to this day. I can wait

The sites forums have been inundated with people asking when will it be back online? The answer when it is fixed and working how we want it to work. The particular site was able to purchase 4 massively powerful new servers with a mass donation post hack.
Digging up a large chunk of road to dump a fiber cable is very expensive, and being on campus bureaucratic processes must be followed, sure it would be great but in the short term is NOT going to happen. Having users send a large amount of money for this purpose could prove fruitless if those that allow the campus to be dug up say no.

I have a ton of files waiting to upload and usually had "network always available", I am doing my part by most of the day leaving it "suspended", I try and if some get through great, if not, I wait a little longer and try again. My point being that they are getting thorough albeit slowly and sitting being patient has helped.

Enough of my rambling and Thank you Matt for your posts, they are appreciated and even though you have not a huge amount of say on matters, your hard work and problem solving is helping 10 times as much as I am sure you would give yourself credit for. Keep up the good work and things will get better, however long it takes!
ID: 915150 · Report as offensive
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Technical News : Busy Bytes (Jul 06 2009)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.