Busy Bytes (Jul 06 2009)


log in

Advanced search

Message boards : Technical News : Busy Bytes (Jul 06 2009)

Previous · 1 · 2 · 3 · 4 · 5 · Next
Author Message
Profile Sacaripasa
Avatar
Send message
Joined: 29 Dec 05
Posts: 13
Credit: 836,147
RAC: 0
United States
Message 915168 - Posted: 7 Jul 2009, 2:31:08 UTC - in response to Message 915119.

There is no "prize" at the end of a fictitious line here, so if there is down time, people should understand that, take a deep breath and wait for it to clear. Being #1 cruncher gets you what, bragging rights on a message board!? Matt is in the most awful position here by having to obey orders, even if he disagrees, or has better ideas that are falling on def ears. It would be great for another 80k, but with today's financial rout, that will be just as hard as have a problem free seti@home day. BTW, I'll be in the top 50 of my class tomorrow, first milestone!
____________

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12704
Credit: 7,191,504
RAC: 15,509
United States
Message 915169 - Posted: 7 Jul 2009, 2:31:26 UTC - in response to Message 915126.

wrong answer the problem onley started within the last three weeks.... before not to bad ... something changed but what??????????????????

CUDA
No Seriously!
A lot of people have gone out a gotten a card to crunch more. It takes just one straw to break the camels back. Someone bought that card and put it up. Too much of a good thing ...

If everyone took 5% off their SETI percentage and gave it to their backup project(s) it would clear the logjam for now. The long term answer is $$$ to get gigabit up the hill.

I've got an alternative suggestion for those that refuse to run a backup. Turn your computer off for an hour or two a day and send the $$ you don't spend on electricity in as a donation.

____________

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1622
Credit: 22,353,672
RAC: 6,558
United States
Message 915171 - Posted: 7 Jul 2009, 2:36:01 UTC - in response to Message 915148.

Re: Maxed out Bandwidth.

This seems to cause the most anger amongst crunchers because the uploads are blocked, and when that happens some hosts run out of work and can't download any more WU's. Users then increase cache size to try to hold enough work to tide them over, which just makes the problem worse.

Why not restrict bandwidth of the (just the) download servers to 80-85Mb/s?

This would obviously lengthen the time of max DL bandwidth, but leave enough bandwidth for the uploads to get through. This in turn would:-
1). Reduce storage requirements for 'In progress' work
2). Allow crunchers to get new work because they can upload completed work. (reduce frustrations)
3). Reduce need for very large cache.

Theoretically everybody would get some (enough) work, and large caches would slowly fill over time, faster as demand reduced.

Is there a flaw in my logic?


I made more or less the same suggestion in another thread. However, I thought that if they could prioritize the uploads, without a static restriction, that would help solve the problem. Nobody seemed to understand my suggestion, or it is a bad one.

I don't think reducing cache size matters too much, but reducing storage requirements by being sure the uploads succeed does make sense, even at the inefficient use of bandwidth.

A while ago, seti was running on 1/2 the band width and pegging it. Something happened and the bandwidth doubled, and overnight it was pegged again. Since the number of users didn't change so quickly, and this was before the baraCUDA, I think procedure in the back-office must have changed to choke the bandwidth chicken.

From a different perspective, like any other 'free' resource, bandwidth will be used until it is used up. The fact that the bandwidth use is pegged so frequently is no surprise. However, the boinc system (including mysql) just doesn't seem to be very solid or at least doesn't seem optimized. I'd rather see some change in procedures or evolution of boinc, before investing in yet more hardware and the like. It's a matter of trust, I suppose.

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 915207 - Posted: 7 Jul 2009, 4:03:47 UTC - in response to Message 915126.

wrong answer the problem onley started within the last three weeks.... before not to bad ... something changed but what??????????????????

If you think it's never happened before, then you've not noticed, or you've forgotten.

The problem starts when there is some sort of interruption -- a server fails, or some bottleneck slows things down, and it doesn't get caught right away.

Over the next day or so, a backlog builds up, and then the BOINC clients start trying to "push through" all at once.

It's particularly bad when there is a batch of "short" work like we've had lately, because that also increases the load.

The glitch that over-distributed Astropulse adds to that, since AP work units are long.

Faster processors (and CUDA) increase load as well, but that didn't happen in the last three weeks.

In other words, several things have pushed the load just a little beyond "normal" and it doesn't take much.

But it has all happened before, and at least so far it doesn't seem any worse than earlier glitches.
____________

Profile Jon Golding
Avatar
Send message
Joined: 20 Apr 00
Posts: 56
Credit: 365,460
RAC: 7
United Kingdom
Message 915249 - Posted: 7 Jul 2009, 8:29:38 UTC

Are there any "spare" servers, down the hill or elsewhere on/off campus, belonging to the Space Sciences Lab that could be used as a download mirror for data distribution to clients, to relieve pressure on the normal servers?
Nothing complicated - this would be only for downloads of a large reservoir of pre-split WUs at busy times. All processed data would be returned by clients to the normal servers up the hill. It could be one of the student's jobs to swap in a new drive of pre-split data every few days.
____________

djmotiska
Send message
Joined: 26 Jul 01
Posts: 13
Credit: 3,556,663
RAC: 1,862
Finland
Message 915253 - Posted: 7 Jul 2009, 8:51:18 UTC - in response to Message 915249.

I was just about to suggest using mirrors. Looks like Einstein@Home use them.

The mirror server(s) could be anywhere, there's 900mbit of unused capacity in that gigabit line to upload splitted wu's to mirror server(s). Of course the wu's can be splitted at the mirror. The current Seti servers could do the other jobs and maybe some of the download traffic.
____________

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7101
Credit: 60,863,176
RAC: 17,255
Germany
Message 915256 - Posted: 7 Jul 2009, 9:24:11 UTC
Last modified: 7 Jul 2009, 9:26:39 UTC


Because of the traffic..

A new cable (down the hill) or what ever is needed..

~ 80,000.- US$ ?


It's not possible to make it wireless?


I don't speak about WLAN, I guess the distance would be too far. (WLAN max. 100 MBit/s ?)

How long is the distance?

But in Germany some cities and villages use wireless DSL for to save the costs for to open the streets and bury the cables.
One transmitter on a high building - maybe church, town hall or what ever and in the houses the receiver.
I guess it's like radio.

I don't know how the bandwidth would be, maybe more transmitter and receiver needed to reach 1 GBit/s?
Or maybe 'only' double the current to 200 MBit/s?

I don't know if this would be cheaper and more possible as to bury a big cable in the bottom.

Also, don't laugh.. ;-) ..what about SAT DSL? It would be looking very crazy if on the campus would be one/some (depend of the bandwidth) satellite dish which don't 'look' to the sky..
..but if it would help..

____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7101
Credit: 60,863,176
RAC: 17,255
Germany
Message 915257 - Posted: 7 Jul 2009, 9:31:09 UTC
Last modified: 7 Jul 2009, 9:31:34 UTC


OTOH.. BTW.. maybe..

There are 'WLAN amplifiers' available, which are for long distances?

Every 100 or 200 m an amplifier and well.

Maybe, if every WLAN connection 100 MBit/s.. two side by side for double traffic connection.. ..or use both.. the current cable and one/two WLAN.

____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5861
Credit: 60,421,323
RAC: 49,219
Australia
Message 915259 - Posted: 7 Jul 2009, 9:37:13 UTC - in response to Message 915256.

It's not possible to make it wireless?

It is.
____________
Grant
Darwin NT.

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7101
Credit: 60,863,176
RAC: 17,255
Germany
Message 915270 - Posted: 7 Jul 2009, 10:47:49 UTC - in response to Message 915259.
Last modified: 7 Jul 2009, 10:48:45 UTC

It's not possible to make it wireless?

It is.


And.. why they don't do it? ;-)

I guess it would be a lot cheaper than with cable..

Maybe here are people around which have knowledge about wireless equipment (WLAN, SAT, radio, or what ever) and knowledge how to do it?

____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

Profile S@NL - Eesger - www.knoop.nl
Avatar
Send message
Joined: 7 Oct 01
Posts: 384
Credit: 37,140,484
RAC: 14,038
Netherlands
Message 915296 - Posted: 7 Jul 2009, 12:19:51 UTC

An other idea.. (maybe)

I have built the "heart" of the statistics of our team. I realize my "problems" are of a much smaller magnitude compared to Berkeleys, but I do think there are similarities.

One MySql table, the user history, has been trouble for a while now.. probably due to it's size of almost 8 Gb.. (like: data corruption => complete restore from backup, sql queries getting stuck => detection of inactivity, restarting history generation with detection as to "were was I")..

To fix this I first tried an other engine (innoDB instead of MyISAM), no joy (table got much larger and processing got much slower).

Next we'll try to upgrade to a newer MySql engine.. but I fear that will not (completely) fix the problem.

Now I am thinking of splitting the data in several identical tables containing:
(arg, [ code ] doesn't preserve multiple spaces)

___>=__ ___<___ _id's_ active % ______0 _500000 152071 45589 30 _500000 2500000 132865 25980 20 2500000 8250000 170345 20092 12 8250000 8500000 154294 14652 _9 8500000 8750000 155417 12821 _8 8750000 9000000 164536 20945 13 9000000 9999999 _59176 24888 42

and for the 'read queries' use a MRG_MyISAM structure.. that way I could write to the needed table based upon an id 'greater then .. AND less then ..' and keep the reading simple with the merge table.
This might help in:
- easier backup and restore (less massive backup files, partial restores if needed)
- less fragmentation due to smaller tables => more stability
- faster due to smaller index tables => more stability

The potential unhandy part is that I discovered that altered data in the underlying table isn't pushed to the merge-table. In an other table structure I used this to fix that:
ALTER TABLE `merge_table` UNION=(`table_1`,`table_2`....);

I haven't done any testing what this query does on a huge table... (it worked like a charm on a couple of 50k rows tables though :D)

Is there merit to this idea and could "table splitting" be an idea for the Berkeley tables as well?
____________
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS

BMgoau
Send message
Joined: 8 Jan 07
Posts: 29
Credit: 1,541,301
RAC: 0
Australia
Message 915298 - Posted: 7 Jul 2009, 12:28:10 UTC

I think a picture really makes it clear what the effect of the recent problems have been on the grid. SETI's growth has effectively stopped.

Wolverine
Avatar
Send message
Joined: 9 Jan 00
Posts: 35
Credit: 7,360,391
RAC: 58
Canada
Message 915305 - Posted: 7 Jul 2009, 13:06:09 UTC

How about sending more of the AP 5.05 and less Seti@home 6.03. More time out in the field processing with less contacts to the sever for more work. Days worth work at a time instead of hours.

Sure the initial change might be a little rough, but I think it will smooth out once you get enough "larger" workunits out in the field.

Once production of more larger work units is up to par, then re-examine the workunit compression idea. See if it would now be worth it due to the amount of AP units has been increased.

Is it possible to produce workunits for an offsite mirror while the system is down for weekely Tues. Maint.??? Stock up while things are cleaned up? This may reduce the startup stress on the servers when it all comes back online. Let the offsite mirror take the load and give the main server time to build up some workunits.


Just some things I wanted to throw out there for people to chew on.

- W
____________

Profile RandyC
Avatar
Send message
Joined: 20 Oct 99
Posts: 714
Credit: 1,704,345
RAC: 0
United States
Message 915312 - Posted: 7 Jul 2009, 13:29:28 UTC - in response to Message 915305.

How about sending more of the AP 5.05 and less Seti@home 6.03. More time out in the field processing with less contacts to the sever for more work. Days worth work at a time instead of hours.

Sure the initial change might be a little rough, but I think it will smooth out once you get enough "larger" workunits out in the field.


This is what many people (including myself) would like. The problem is that the AP and MB WUs are split from the same input datasets. To make a long story short, the AP splitters chew through the input files FASTER than MB WUs are being processed. Two or three weeks ago we had the situation where the AP splitters had processed every single one of about 100 input files while the MB splitters were busy working on the first 10 - 20 input files! The AP splitters were basically shut down for a couple of weeks to let MB catch up on the backlog.

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7101
Credit: 60,863,176
RAC: 17,255
Germany
Message 915319 - Posted: 7 Jul 2009, 13:53:44 UTC


Now I remember.. also.. TV channel have trucks if they make a live report from special places.
IIRC.. this small satellite dish transmitter are with (a kind of) microwaves.
Of course - video and audio need more bandwidth as only radio.. but maybe this technic is well for internet traffic.


Examples of this trucks:
http://de.wikipedia.org/wiki/%C3%9Cbertragungswagen

http://de.wikipedia.org/wiki/Satellite_News_Gathering
In this report they talk about 'only' 6 - 10 or 16 - 24 MBit/s..
I don't know how old this report is (or this infos in it), but maybe current the technic is faster.. ;-)

____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

DJStarfox
Send message
Joined: 23 May 01
Posts: 1044
Credit: 559,966
RAC: 582
United States
Message 915323 - Posted: 7 Jul 2009, 14:02:14 UTC - in response to Message 915003.
Last modified: 7 Jul 2009, 14:03:09 UTC

Matt,
If this is really a science/research project, then there are only two solutions to the bandwidth problem.

1. Get more funding
2. Deploy the NTPCkr

I HIGHLY favor these two over introducing some non-productive or artificial latency. From these two goals, other ideas become possible.

I would really like to install a server-side instance of BOINC on my machine at home, but I need to make time and upgrade drive space. It would really help me understand how this all works, and it sounds like something cool to get into.

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7101
Credit: 60,863,176
RAC: 17,255
Germany
Message 915332 - Posted: 7 Jul 2009, 14:25:54 UTC


How about to outsourcing the Berkeley traffic?

If SETI@home have members with big internet connections and big HDDs, maybe the SETI@home scheduler say at every work request where other member can download new WUs or upload the results.
If not for free, maybe for some money for the electriciy bill.

And this outsourced server at SETI@home members make at different times report/UL/DL to Berkeley.

Or maybe the scheduler at SETI@home tell every time an other outsourced server that this servers will also not have full load.

IIRC.. the software Skype use also user hardware/internet connection of many users at home for better voice quality.


..this is the last idea for today.. I don't want to SPAM the forum.. ;-D

____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,143,646
RAC: 1,271
United States
Message 915345 - Posted: 7 Jul 2009, 15:05:28 UTC
Last modified: 7 Jul 2009, 15:06:08 UTC

As a scientific project the scientists in charge must keep extremely tight control of the data in order to preserve it's validity. It's not likely that they would move data to servers outside of their direct control because of this.

Even the data we are given is checked by a wingman before it's loaded into the data base. Constant checks and controls have to be maintained to maintain the validity of the science.
____________
Boinc....Boinc....Boinc....Boinc....

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7101
Credit: 60,863,176
RAC: 17,255
Germany
Message 915360 - Posted: 7 Jul 2009, 15:56:28 UTC - in response to Message 915003.

...
We could also increase the resolution of chirp rates that we process, thus lengthening the time it takes to process a workunit.
...


This is 'usefull' for the science?
What does this mean?
The data will be calculated more enhanced/fine/accurate?


BTW.
This isn't an idea.. it's a question.. ;-)

____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

Profile cgregb
Avatar
Send message
Joined: 2 Apr 03
Posts: 515
Credit: 2,284,437
RAC: 453
United States
Message 915369 - Posted: 7 Jul 2009, 16:13:14 UTC - in response to Message 915003.

Having read these forums for a while I appreciate your pain.

I am not a techie, but a money and marketing guy.

A few thoughts:

Define the probem, not just the symptoms (scientific)

Define the best solution (scientific)

Define the resources needed (scientific, poitical, monetary)

Create a liasion team to assist in the non - scientific issues.

Delegate the tasks to responsible people (of which there are thousands trying to help you)

Implement - Evaluate - Adjust - do it again.

I am sure you realize you are on the bleeding edge of technology You will eventually write the book (if you are not writing now) on distributed processing. I would think that several commercial organizations would be more than happy to review your work for a mere donation of money or equipment. You have ten+ years of experience going where no one has gone before (pardon the pun)

You and the guys at Berkley are making tremendous progress in computer science as well as Astronomy.

Like any other not for profit (not non-profit) organization, you have grown to the pont where you need additional infrastructure in terms on people skills to allow you to focus on your goals.

We are fully behind you! Take the step. Dare to be great, don't settle for good enough.

Go get em, tiger!

Greg
____________
Greg in Phoenix

Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Technical News : Busy Bytes (Jul 06 2009)

Copyright © 2014 University of California