High Score (Mar 27 2012)


log in

Advanced search

Message boards : Technical News : High Score (Mar 27 2012)

1 · 2 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1210767 - Posted: 27 Mar 2012, 22:49:20 UTC

Another outage day (for database backups, maintenance, etc.). Today we also tackled a couple extra things.

First, I did a download test to answer the question: "given our current hardware and software setup, if we had a 1Gbits/sec link available to us (as opposed to currently being choked at 100Mbits/sec) how fast could we actually push bits out?" Well, the answer is: roughly peaking at 450 Mbits/sec, where the next chokepoint is our workunit file server. Not bad. This datum will help when making arguments to the right people about what we hope to gain from network improvements around here. Of course, we'd still average about 100Mbits/sec (like we do now) but we'd drop far less connections, and everything would be faster/happier.

Second, Jeff and I did some tests regarding our internal network. Turns out we're finding our few switches handling traffic in the server closet are being completely overloaded. This actually may be the source of several issues recently. However, we're still finding other mysterious chokepoints. Oy, all the hidden bottlenecks!

We also hoped to get the VGC-sensitive splitter on line (see previous note) but the recent compile got munged somehow so we had to revert to the previous one as I brought the projects back up this afternoon. Oh well. We'll get it on line soon.

We did get beyond all the early drive failures on the new JBOD and now have a full set of 24 working drives on the front of it, all hooked up to georgem, RAIDed up and tested. Below is a picture of them in the rack in the closet (georgem just above the monitor, the JBOD just below). The other new server paddym is still on the lab table pending certain plans and me finding time to get an OS on it.



Oh yeah I also updated the server list at the bottom of the server status page.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4068
Credit: 32,909,554
RAC: 7,960
United Kingdom
Message 1210769 - Posted: 27 Mar 2012, 22:54:12 UTC - in response to Message 1210767.

Thanks for the update Matt,

Claggy

Profile Byron Leigh Hatch @ team Carl SaganProject donor
Volunteer tester
Avatar
Send message
Joined: 5 Jul 99
Posts: 3611
Credit: 11,852,059
RAC: 1,120
Canada
Message 1210787 - Posted: 27 Mar 2012, 23:57:12 UTC - in response to Message 1210769.

Thanks for the update Matt,

Claggy

+1

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 15,331,642
RAC: 11,547
United States
Message 1210806 - Posted: 28 Mar 2012, 0:35:07 UTC
Last modified: 28 Mar 2012, 0:35:27 UTC

That sure looks a lot neater than pics from the past. You keep posting pics like that and people are gonna get the idea you guys actually do know what you are doing!
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 1923
Credit: 9,774,612
RAC: 17,457
United States
Message 1211148 - Posted: 28 Mar 2012, 18:18:08 UTC
Last modified: 28 Mar 2012, 18:18:45 UTC

We also hoped to get the VGC-sensitive splitter on line (see previous note) but the recent compile got munged somehow so we had to revert to the previous one as I brought the projects back up this afternoon. Oh well. We'll get it on line soon.


Why isn't this being tested on Beta? (NTM: Why wasn't AP v6.0 tested on Beta?) I thought testing of this sort was what Beta was for?
____________
.

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38925
Credit: 579,081,550
RAC: 512,620
United States
Message 1211150 - Posted: 28 Mar 2012, 18:20:19 UTC

Great work on the bottleneck testing....
Now, time for more bandwidth!

Meow!
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4230
Credit: 1,043,161
RAC: 314
United States
Message 1211185 - Posted: 28 Mar 2012, 19:38:20 UTC - in response to Message 1211148.

We also hoped to get the VGC-sensitive splitter on line (see previous note) but the recent compile got munged somehow so we had to revert to the previous one as I brought the projects back up this afternoon. Oh well. We'll get it on line soon.

Why isn't this being tested on Beta? (NTM: Why wasn't AP v6.0 tested on Beta?) I thought testing of this sort was what Beta was for?

Astropulse v6 testing started at Beta early last December, and has been more or less continuous since late January. For all of that testing, new splitter code had to be in place with the sign difference which was the primary reason for a new version of Astropulse. Changes to the Validator to reliably sense tasks with abnormal runtimes took place more recently, and a trivial change of the application was needed to support that, hence the release is version 6.01.

There are usually less than 4000 active hosts working Beta, and many of those with a fairly low resource share. One tape file typically lasts a month, a far different situation than here where the data flow is much higher. Issues like how many splitters are needed to supply demand cannot be checked at Beta, and using VGC-sensitive splitters there would probably reduce the variations which makes beta testing of applications useful. Whether they may have been used to split some channels I don't know, it's certainly possible though I think unlikely.
Joe

Profile Michel448a
Volunteer tester
Avatar
Send message
Joined: 27 Oct 00
Posts: 1201
Credit: 2,891,635
RAC: 0
Canada
Message 1211190 - Posted: 28 Mar 2012, 19:53:38 UTC
Last modified: 28 Mar 2012, 19:54:54 UTC

but... but .. but ... i m there now :)


just have to feed me up ^^


j/k

EDIT: How many AP a tape has ?
____________

Profile Alaun
Send message
Joined: 29 Nov 05
Posts: 16
Credit: 5,196,377
RAC: 0
United States
Message 1211271 - Posted: 28 Mar 2012, 22:19:33 UTC

Glad to hear there's talk of bandwidth, and nice to see the new server in!
Thanks for the update
____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2248
Credit: 8,600,416
RAC: 4,277
United States
Message 1211294 - Posted: 28 Mar 2012, 22:52:00 UTC - in response to Message 1211190.
Last modified: 28 Mar 2012, 22:57:17 UTC

EDIT: How many AP a tape has ?

This has been answered a bunch over in Number Crunching over the years, but I did a search and found the information. Josef explains it.

~400 APs per channel per 50.2gb tape X 14 channels = 5600 APs per tape. However, I think.. if I remember from a post years ago, not all 14 channels are used. I think it's just the first 12 (B1_P0 through B6_P1.. B = 'beam' and P = 'polarity'), so that drops the number down to ~4800, but I think the real number is closer to 4700.

Now that is WUs that get generated. Then x2 for tasks for the initial replication and that means there's nearly 10,000 AP tasks to be handed out to people who are asking for work.. per tape. Sometimes there are a dozen or so tapes available, so that's a bit over 100,000 AP tasks.

Each one of those tasks are 8MB of data to be pushed through a 100mbit pipe.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile Michel448a
Volunteer tester
Avatar
Send message
Joined: 27 Oct 00
Posts: 1201
Credit: 2,891,635
RAC: 0
Canada
Message 1211317 - Posted: 28 Mar 2012, 23:19:18 UTC

wow ! so 4700 tasks

if we considare that 1 task takes a median of 1hr with gpu... it would gives 4700 hrs of works. sweet thanks you.
____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2248
Credit: 8,600,416
RAC: 4,277
United States
Message 1211339 - Posted: 29 Mar 2012, 0:16:37 UTC - in response to Message 1211317.

wow ! so 4700 tasks

if we considare that 1 task takes a median of 1hr with gpu... it would gives 4700 hrs of works. sweet thanks you.

Times two. Each of those 4700 WUs get done by two people, so if you assume everyone uses a GPU to do it, you're looking at ~9400hrs (~13 months) of crunching per tape.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile Michel448a
Volunteer tester
Avatar
Send message
Joined: 27 Oct 00
Posts: 1201
Credit: 2,891,635
RAC: 0
Canada
Message 1211341 - Posted: 29 Mar 2012, 0:21:06 UTC

ya but cant get the _0 and _1 in same time on same PC for the same WU :)
____________

Profile Slavac
Volunteer tester
Avatar
Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1211399 - Posted: 29 Mar 2012, 3:49:38 UTC - in response to Message 1211341.

SETI does have a small cache of 3TB drives used for Astropulse Reob data.
____________


Executive Director GPU Users Group Inc. -
brad@gpuug.org

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2248
Credit: 8,600,416
RAC: 4,277
United States
Message 1211413 - Posted: 29 Mar 2012, 4:19:44 UTC - in response to Message 1211341.

ya but cant get the _0 and _1 in same time on same PC for the same WU :)

Exactly why I said "two people" and "assume everyone uses a GPU." :P
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile Michel448a
Volunteer tester
Avatar
Send message
Joined: 27 Oct 00
Posts: 1201
Credit: 2,891,635
RAC: 0
Canada
Message 1211417 - Posted: 29 Mar 2012, 4:27:46 UTC - in response to Message 1211413.

ya but cant get the _0 and _1 in same time on same PC for the same WU :)

Exactly why I said "two people" and "assume everyone uses a GPU." :P


hehe ya /agree on that one :)
it s cause i m not patient :P everything more than 2:30 hrs arent welcome ^^ in my case *wink
____________

Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 1923
Credit: 9,774,612
RAC: 17,457
United States
Message 1211450 - Posted: 29 Mar 2012, 7:42:29 UTC - in response to Message 1211185.
Last modified: 29 Mar 2012, 7:45:08 UTC

Astropulse v6 testing started at Beta early last December, and has been more or less continuous since late January. For all of that testing, new splitter code had to be in place with the sign difference which was the primary reason for a new version of Astropulse. Changes to the Validator to reliably sense tasks with abnormal runtimes took place more recently, and a trivial change of the application was needed to support that, hence the release is version 6.01.

There are usually less than 4000 active hosts working Beta, and many of those with a fairly low resource share. One tape file typically lasts a month, a far different situation than here where the data flow is much higher. Issues like how many splitters are needed to supply demand cannot be checked at Beta, and using VGC-sensitive splitters there would probably reduce the variations which makes beta testing of applications useful. Whether they may have been used to split some channels I don't know, it's certainly possible though I think unlikely.


Umm, I'm a Beta tester, and never saw a AP WU come through either of the two computers I have doing Beta... (which is why I asked :-) )
____________
.

Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 1923
Credit: 9,774,612
RAC: 17,457
United States
Message 1211455 - Posted: 29 Mar 2012, 7:55:46 UTC - in response to Message 1211417.

[quote]ya but cant get the _0 and _1 in same time on same PC for the same WU :)

Exactly why I said "two people" and "assume everyone uses a GPU." :P


Bad assumption: lots of people (including me...) don't want AP on their GPU(s),as it would take too long - (I don't have a 580 GTX![or anything close]) but I do run AP on my CPUs (4 of them, one too slow to do an AP WU in a reasonable amount of time. [it's a AMD C50 dual-core laptop - I did try a WU on it, and the WU ran for three days!])
____________
.

Profile cgoodrich
Send message
Joined: 21 Mar 12
Posts: 1
Credit: 2,981,407
RAC: 0
United States
Message 1211764 - Posted: 30 Mar 2012, 4:03:21 UTC - in response to Message 1210767.

I just started this I have decide you my failover datacenter to help out with the crunching. I am running it on grid of 6 vSphere 5 ESX servers running on Dell m1000e Blade Chassis with M610 blades. I cloned VMs what I call seti@home machines and let them run a 100% 24x7. I have been doing this a week I have 81,681 credits. A problem I have is am not being fed jobs fast enough to tap my servers out 24x7.

Profile Ex
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 12 Mar 12
Posts: 2895
Credit: 1,732,803
RAC: 1,239
United States
Message 1211782 - Posted: 30 Mar 2012, 5:09:07 UTC - in response to Message 1211764.
Last modified: 30 Mar 2012, 5:15:13 UTC

I just started this I have decide you my failover datacenter to help out with the crunching. I am running it on grid of 6 vSphere 5 ESX servers running on Dell m1000e Blade Chassis with M610 blades. I cloned VMs what I call seti@home machines and let them run a 100% 24x7. I have been doing this a week I have 81,681 credits. A problem I have is am not being fed jobs fast enough to tap my servers out 24x7.


/off topic
but
Wow... nice setup. I wish my builds could cross into that territory, ah if I had but the money :-)
-Dave

1 · 2 · Next

Message boards : Technical News : High Score (Mar 27 2012)

Copyright © 2014 University of California