Working as Expected (Jul 13 2009)


log in

Advanced search

Message boards : Technical News : Working as Expected (Jul 13 2009)

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next
Author Message
OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13615
Credit: 30,387,019
RAC: 21,152
United States
Message 919586 - Posted: 20 Jul 2009, 1:38:54 UTC - in response to Message 919532.

Then there is the sense from some posts that campus Admin (1) places a low priority on the project, that there is a (2) factor contaminating incoming signal with radar pulses, (3) software is being rewritten by a single volunteer who is currently on a well-deserved vacation, and (4) correct me if I'm wrong, simply throwing more servers into the mix won't cure it either. Then, there is also the undertone that crunchers are somehow to blame and they need to go somewhere else to volunteer their multicores.


I feel I need to correct some of this:

1) Campus Admins (not the project Admins) have generally been cooperative with SETI@Home and have allowed many new technologies into the lab despite no one else needing them. The only stipulation is that the campus needs to examine what is needed, send out quotes for prices, and consider the costs of upkeep after the purchase.

2) Correct.

3) Matt isn't a volunteer, he is one of the SETI Admin Staff, but yes, he is on a well deserved vacation. The rest of them should do so at their first chance as well.

4) More powerful servers would help pick up a lot of the dropped TCP connections, but the gigabit internet kind of goes hand-in-hand with this.

5) (Even though you didn't mention it at 5) Volunteers are not to blame, but they are definitely encouraged to join backup projects simply because the entire point of BOINC is to allow distributed computing on a low budget, and SETI@Home is the flagship of that banner. SETI@Home is pushing the limits using old, donated or beta hardware with minimal staff and funding, and its quite amazing at what they can accomplish with what little they have.
____________

clive G1FYE
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,054,144
RAC: 0
United Kingdom
Message 919679 - Posted: 20 Jul 2009, 12:40:21 UTC

Quote Ozz - . SETI@Home is pushing the limits using old, donated or beta hardware with minimal staff and funding,
and its quite amazing at what they can accomplish with what little they have.

And I second that they do a great job, with whatever they can get, and make it work,
when bit`s fall off i tend to weld them back on, but software wont stay still in the vice.

and they even get let out of the lab for hol`s . . :)

Profile Jim H
Send message
Joined: 28 Nov 06
Posts: 12
Credit: 2,186,439
RAC: 0
United States
Message 919718 - Posted: 20 Jul 2009, 15:47:53 UTC - in response to Message 919679.

Thanks for all the hard work.
Makes my head hurt when considering all the factors in play to keep SETI and BOINC running.

I've noted the dificulties over the last several weeks and more importantly, I've noted the efforts the folks are making in order to get it all moving..
THX
____________
Clear Skies to all amateur Astronomers out there...

zpm
Volunteer tester
Avatar
Send message
Joined: 25 Apr 08
Posts: 284
Credit: 1,595,133
RAC: 1,411
United States
Message 919864 - Posted: 20 Jul 2009, 21:45:11 UTC - in response to Message 919718.

ozzy hit the head of the nail on #5.... old equipment.....

Profile Toeman
Send message
Joined: 31 Mar 01
Posts: 2
Credit: 498,430
RAC: 0
United States
Message 919884 - Posted: 20 Jul 2009, 22:30:41 UTC

Another frustrating Mon. Have been trying to upload/download work units for three weeks. Very sporadic at best. Managing only one connection per week for two in a row. I wish someone would let us know what's up. I started running Boinc for Seti@home when the classic S@H was shut down and am not really interested in running other "filler" projects. Thanks, and any information would very welcome.
____________

Profile Borgholio
Avatar
Send message
Joined: 2 Aug 99
Posts: 651
Credit: 12,040,738
RAC: 3,316
United States
Message 919897 - Posted: 20 Jul 2009, 23:24:10 UTC - in response to Message 919884.

Another frustrating Mon. Have been trying to upload/download work units for three weeks. Very sporadic at best. Managing only one connection per week for two in a row. I wish someone would let us know what's up. I started running Boinc for Seti@home when the classic S@H was shut down and am not really interested in running other "filler" projects. Thanks, and any information would very welcome.


I would hardly call searching for a cure for AIDS or cancer filler...but that's just me.
____________


You will be assimilated...bunghole!

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 919903 - Posted: 20 Jul 2009, 23:48:28 UTC - in response to Message 919897.

Another frustrating Mon. Have been trying to upload/download work units for three weeks. Very sporadic at best. Managing only one connection per week for two in a row. I wish someone would let us know what's up. I started running Boinc for Seti@home when the classic S@H was shut down and am not really interested in running other "filler" projects. Thanks, and any information would very welcome.


I would hardly call searching for a cure for AIDS or cancer filler...but that's just me.

"filler" is in the eye of the beholder.
____________

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13615
Credit: 30,387,019
RAC: 21,152
United States
Message 919912 - Posted: 21 Jul 2009, 0:04:13 UTC - in response to Message 919884.

Another frustrating Mon. Have been trying to upload/download work units for three weeks. Very sporadic at best. Managing only one connection per week for two in a row. I wish someone would let us know what's up. I started running Boinc for Seti@home when the classic S@H was shut down and am not really interested in running other "filler" projects. Thanks, and any information would very welcome.


Eric and co. are keeping things going as best they can. Matt is on vacation and will be back soon.
____________

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 919968 - Posted: 21 Jul 2009, 2:26:46 UTC - in response to Message 919912.

Another frustrating Mon. Have been trying to upload/download work units for three weeks. Very sporadic at best. Managing only one connection per week for two in a row. I wish someone would let us know what's up. I started running Boinc for Seti@home when the classic S@H was shut down and am not really interested in running other "filler" projects. Thanks, and any information would very welcome.


Eric and co. are keeping things going as best they can. Matt is on vacation and will be back soon.

As of approximately 1:30am UTC, it looks like there is a nice fat spike in uploaded (probably actually reported) results.
____________

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13615
Credit: 30,387,019
RAC: 21,152
United States
Message 919995 - Posted: 21 Jul 2009, 4:28:14 UTC - in response to Message 919968.

Another frustrating Mon. Have been trying to upload/download work units for three weeks. Very sporadic at best. Managing only one connection per week for two in a row. I wish someone would let us know what's up. I started running Boinc for Seti@home when the classic S@H was shut down and am not really interested in running other "filler" projects. Thanks, and any information would very welcome.


Eric and co. are keeping things going as best they can. Matt is on vacation and will be back soon.

As of approximately 1:30am UTC, it looks like there is a nice fat spike in uploaded (probably actually reported) results.


That's just Vistro and TCP Jesus/MC Hammer or whatever he decides to call himself this week pressing the retry button too many times.
____________

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8420
Credit: 4,136,541
RAC: 1,474
United Kingdom
Message 920042 - Posted: 21 Jul 2009, 10:55:02 UTC - in response to Message 919995.
Last modified: 21 Jul 2009, 10:55:24 UTC

As of approximately 1:30am UTC, it looks like there is a nice fat spike in uploaded (probably actually reported) results.

That's just Vistro and TCP Jesus/MC Hammer or whatever he decides to call himself this week pressing the retry button too many times.

Hey! Shame on you... Cynicism doesn't become you.

Ya just got to admire their interest and dedication to be sat there clicking away at the button. ... They may even get to learn more of how Boinc works, and why, and how, and also find out something of the exploration in how Boinc is put together.

All very good fun!

Meanwhile, I leave Boinc to it's own devices. It usually muddles through.

(I will admit to the occasional prod for the sake of my own experiments in GPU WU selection :-o )

Meanwhile #2, the Cricket graphs form a very good study in TCP effects on a saturated link! It is also a good reminder that for any system, overall 'control' is exerted by the most significant bottleneck (or whatever system resource limit gets hit the hardest).

Happy crunchin',
Martin
____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8420
Credit: 4,136,541
RAC: 1,474
United Kingdom
Message 920043 - Posted: 21 Jul 2009, 11:01:49 UTC - in response to Message 919968.
Last modified: 21 Jul 2009, 11:42:01 UTC

As of approximately 1:30am UTC, it looks like there is a nice fat spike in uploaded (probably actually reported) results.

That upload spike there shows very nicely how the uploads rate can above double when the downlink is non-saturated.

Note: Green = downlink, blue line = uplink.

There also appears to be a tail-off for a while when the download link becomes saturated oncemore until a short while later the uploads settle back to the saturation average. Is that the exponential backoff coming into play but only for individual upload attempts? The backoffs appear rather too quickly to average out to a high background noise level...

Regards,
Martin


Note the download dip and matching upload peak at Monday 19:00 ->



(Snapshot image from Cricket. Don't do this directly to Cricket itself for obvious reasons!)
____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5823
Credit: 59,062,279
RAC: 47,966
Australia
Message 920045 - Posted: 21 Jul 2009, 11:12:38 UTC - in response to Message 920043.
Last modified: 21 Jul 2009, 11:20:42 UTC

Thought this might be of interest to some.

Using an SSD for an OLTP log disk
"On a "normal" SLES 10 SP2 we achieved 1400 tr/s on a quad core (an anonymous CPU for now ;-). But Anand's article really got us curious and we replaced our mighty Cheetah disk with the Intel x25-M SSD (80 GB). All of a sudden we achieved 1900 tr/s! No less than 35% more transactions, just by replacing the disk that holds the log with the fastest SSD of the moment. That is pretty amazing if you consider that there is no indication whatsoever that we were bottlenecked by our log disk.

....

So our conclusion so far seems to be that in case of MySQL OLTP, sizing for IO/s seems to be less important than the individual write latency. To put it more blunt: in many cases even tens of of spindles will not be able to beat one SSD as each individual disk spindle has a relatively high latency."


EDIT- and from a RAID review,

"However, placing your database data files on an Intel X25-E is an excellent strategy. One X25-E is 66% faster than eight (!) 15000RPM SAS drives. That means if you don't need capacity, you can replace about 13 SAS disks with one SSD to get the same performance. You can keep the SAS disks as your log drives as they are a relatively cheap way to obtain good logging performance."

If only Intel would donate a few dozen X25-Es to the cause. Might help with some of the database, replica performance issues...
____________
Grant
Darwin NT.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8507
Credit: 49,999,305
RAC: 50,376
United Kingdom
Message 920046 - Posted: 21 Jul 2009, 11:18:29 UTC - in response to Message 920043.

As of approximately 1:30am UTC, it looks like there is a nice fat spike in uploaded (probably actually reported) results.

That upload spike there shows very nicely how the uploads rate can above double when the downlink is non-saturated.

There also appears to be a tail-off for a while when the download link becomes saturated oncemore until a short while later the uploads settle back to the saturation average. Is that the exponential backoff coming into play but only for individual upload attempts? The backoffs appear rather too quickly average out to a high background noise level...

Regards,
Martin

Because the Cricket graphs record the raw number of bits passing through the router (or packets ditto, if you look at the wrong page :-) ), they won't distinguish between successful uploads and those maddening (and wasteful) ones which get to 100% and then die.

Maybe it's an artefact of the way the link re-saturates after whatever it is that causes the dips (I don't think we've ever got to the bottom of those, have we?). Perhaps there's a phase where a higher number than usual succeed in connecting, and at least partially uploading, before the concrete finally sets again.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8507
Credit: 49,999,305
RAC: 50,376
United Kingdom
Message 920047 - Posted: 21 Jul 2009, 11:22:33 UTC - in response to Message 920045.


Thought this might be of interest to some.

Using an SSD for an OLTP log disk
"On a "normal" SLES 10 SP2 we achieved 1400 tr/s on a quad core (an anonymous CPU for now ;-). But Anand's article really got us curious and we replaced our mighty Cheetah disk with the Intel x25-M SSD (80 GB). All of a sudden we achieved 1900 tr/s! No less than 35% more transactions, just by replacing the disk that holds the log with the fastest SSD of the moment. That is pretty amazing if you consider that there is no indication whatsoever that we were bottlenecked by our log disk.

....

So our conclusion so far seems to be that in case of MySQL OLTP, sizing for IO/s seems to be less important than the individual write latency. To put it more blunt: in many cases even tens of of spindles will not be able to beat one SSD as each individual disk spindle has a relatively high latency."

I wonder if a manufacturer could be persuaded to "lend" SETI a suitable drive to test that assertion under field conditions. An extended test, to include SSD lifetimes cycle limits, of course.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5823
Credit: 59,062,279
RAC: 47,966
Australia
Message 920048 - Posted: 21 Jul 2009, 11:26:18 UTC - in response to Message 920046.
Last modified: 21 Jul 2009, 11:28:28 UTC

If you look at the network traffic graphs & match them up with Scarecrow's graphs it's interesting to see that while the upload data rate might move about a bit (5Mb/s or so), that the number of uploads per hour steadily climbs.
I've always attributed that to the gradually reducing number of attempted connections resulting in more successfull connections. End result- more results being returned even though the traffic remains (relatively) unchanged.

When you see the large spikes in upload traffic, that's when you see the huge spkies in results returned per hour. And not long after that you see the database transaction increase & the replica fall behind as the validators start chruning out more work & the assimilators fall behind. Once they catch up the databse transaction rate drops & the replica can catch up again.
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5823
Credit: 59,062,279
RAC: 47,966
Australia
Message 920050 - Posted: 21 Jul 2009, 11:31:13 UTC - in response to Message 920047.

I wonder if a manufacturer could be persuaded to "lend" SETI a suitable drive to test that assertion under field conditions. An extended test, to include SSD lifetimes cycle limits, of course.

It would be good if they could.
I think the Seti servers would benefit greatly from them, and it'd give the manufacturers some solid data to work with in their devlopment.
____________
Grant
Darwin NT.

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8420
Credit: 4,136,541
RAC: 1,474
United Kingdom
Message 920052 - Posted: 21 Jul 2009, 11:39:15 UTC - in response to Message 920045.
Last modified: 21 Jul 2009, 11:40:11 UTC

Thought this might be of interest to some.

Using an SSD for an OLTP log disk
"On a "normal" SLES 10 SP2 we achieved 1400 tr/s on a quad core (an anonymous CPU for now ;-). But Anand's article really got us curious and we replaced our mighty Cheetah disk with the Intel x25-M SSD (80 GB). All of a sudden we achieved 1900 tr/s! No less than 35% more transactions, just by replacing the disk that holds the log with the fastest SSD of the moment. That is pretty amazing if you consider that there is no indication whatsoever that we were bottlenecked by our log disk. ...

Good note there.

The critical bits are:

In MySQL each user thread can issue a write when the transaction is commited . More importantly is a completely serial, there doesn't seem to be a separate log I/O thread which would allow our user thread to "fire" a disk operation "and forget". As we want to be fully ACID compliant our database is configured with
innodb_flush_log_at_trx_commit = 1

So after each transaction is committed, there is a "pwrite" first, then followed by a flush to the disk. So the actual transactions performance is also influenced by the disk write latency even if the disk is nowhere near it's limits.


And in the comments:

quote: "* typical average I/O latency is 0.23 ms (90%), with about 10% spikes of 7 to 12 ms

That reassured us that our transaction log disk was not a bottleneck"

No, that shows exactly that your disk latency is the limit: If these number are in the right ball park, the average latency is at least (0.9*0.23 + 0.1*7) ~= 0.9 ms, which limits the number of transactions per second to ~1100. Your performance is limited by the 10% of transactions that actually incur a disk related latency.


Very nice when the numbers add up.


Now... It troubles me that we have a potentially hugely parallel system with Boinc, and yet ALL Boinc server state change must go through just the ONE central database that is itself limited by the rate that ONE serial log can be updated!

So... We can only go as fast as that one log file can be updated.

(Note, the present bottleneck is the saturated download link. If that is cleared, we'll likely hit the MySQL log update bottleneck again.)


A very good find there, yay!

Regards,
Martin
____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

DJStarfox
Send message
Joined: 23 May 01
Posts: 1040
Credit: 547,492
RAC: 228
United States
Message 920086 - Posted: 21 Jul 2009, 14:12:28 UTC - in response to Message 920052.

I agree that the MySQL BOINC database is a serious bottleneck. I wonder how much work it would be to update the code to use a different database system, such as an object-oriented database? Or, perhaps as a first step, could we update the code to use a difference RDBMS such as Oracle or MS SQL?

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8644
Credit: 24,169,080
RAC: 21,435
United Kingdom
Message 920087 - Posted: 21 Jul 2009, 14:13:13 UTC
Last modified: 21 Jul 2009, 14:13:39 UTC

Just noticed that even though there was a download dip and upload spike in the graph that ML1 displayed, that when looking at the packets there is virtually no variation on the uploads.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : Technical News : Working as Expected (Jul 13 2009)

Copyright © 2014 University of California