High Score (Mar 27 2012)

Message boards : Technical News : High Score (Mar 27 2012)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1210767 - Posted: 27 Mar 2012, 22:49:20 UTC

Another outage day (for database backups, maintenance, etc.). Today we also tackled a couple extra things.

First, I did a download test to answer the question: "given our current hardware and software setup, if we had a 1Gbits/sec link available to us (as opposed to currently being choked at 100Mbits/sec) how fast could we actually push bits out?" Well, the answer is: roughly peaking at 450 Mbits/sec, where the next chokepoint is our workunit file server. Not bad. This datum will help when making arguments to the right people about what we hope to gain from network improvements around here. Of course, we'd still average about 100Mbits/sec (like we do now) but we'd drop far less connections, and everything would be faster/happier.

Second, Jeff and I did some tests regarding our internal network. Turns out we're finding our few switches handling traffic in the server closet are being completely overloaded. This actually may be the source of several issues recently. However, we're still finding other mysterious chokepoints. Oy, all the hidden bottlenecks!

We also hoped to get the VGC-sensitive splitter on line (see previous note) but the recent compile got munged somehow so we had to revert to the previous one as I brought the projects back up this afternoon. Oh well. We'll get it on line soon.

We did get beyond all the early drive failures on the new JBOD and now have a full set of 24 working drives on the front of it, all hooked up to georgem, RAIDed up and tested. Below is a picture of them in the rack in the closet (georgem just above the monitor, the JBOD just below). The other new server paddym is still on the lab table pending certain plans and me finding time to get an OS on it.



Oh yeah I also updated the server list at the bottom of the server status page.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1210767 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1210769 - Posted: 27 Mar 2012, 22:54:12 UTC - in response to Message 1210767.  

Thanks for the update Matt,

Claggy
ID: 1210769 · Report as offensive
Profile Byron Leigh Hatch @ team Carl Sagan
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 4548
Credit: 35,667,570
RAC: 4
Canada
Message 1210787 - Posted: 27 Mar 2012, 23:57:12 UTC - in response to Message 1210769.  

Thanks for the update Matt,

Claggy

+1
ID: 1210787 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1210806 - Posted: 28 Mar 2012, 0:35:07 UTC
Last modified: 28 Mar 2012, 0:35:27 UTC

That sure looks a lot neater than pics from the past. You keep posting pics like that and people are gonna get the idea you guys actually do know what you are doing!


PROUD MEMBER OF Team Starfire World BOINC
ID: 1210806 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1211148 - Posted: 28 Mar 2012, 18:18:08 UTC
Last modified: 28 Mar 2012, 18:18:45 UTC

We also hoped to get the VGC-sensitive splitter on line (see previous note) but the recent compile got munged somehow so we had to revert to the previous one as I brought the projects back up this afternoon. Oh well. We'll get it on line soon.


Why isn't this being tested on Beta? (NTM: Why wasn't AP v6.0 tested on Beta?) I thought testing of this sort was what Beta was for?
.

Hello, from Albany, CA!...
ID: 1211148 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1211150 - Posted: 28 Mar 2012, 18:20:19 UTC

Great work on the bottleneck testing....
Now, time for more bandwidth!

Meow!
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1211150 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1211185 - Posted: 28 Mar 2012, 19:38:20 UTC - in response to Message 1211148.  

We also hoped to get the VGC-sensitive splitter on line (see previous note) but the recent compile got munged somehow so we had to revert to the previous one as I brought the projects back up this afternoon. Oh well. We'll get it on line soon.

Why isn't this being tested on Beta? (NTM: Why wasn't AP v6.0 tested on Beta?) I thought testing of this sort was what Beta was for?

Astropulse v6 testing started at Beta early last December, and has been more or less continuous since late January. For all of that testing, new splitter code had to be in place with the sign difference which was the primary reason for a new version of Astropulse. Changes to the Validator to reliably sense tasks with abnormal runtimes took place more recently, and a trivial change of the application was needed to support that, hence the release is version 6.01.

There are usually less than 4000 active hosts working Beta, and many of those with a fairly low resource share. One tape file typically lasts a month, a far different situation than here where the data flow is much higher. Issues like how many splitters are needed to supply demand cannot be checked at Beta, and using VGC-sensitive splitters there would probably reduce the variations which makes beta testing of applications useful. Whether they may have been used to split some channels I don't know, it's certainly possible though I think unlikely.
                                                                  Joe
ID: 1211185 · Report as offensive
Profile Alaun

Send message
Joined: 29 Nov 05
Posts: 18
Credit: 9,310,773
RAC: 0
United States
Message 1211271 - Posted: 28 Mar 2012, 22:19:33 UTC

Glad to hear there's talk of bandwidth, and nice to see the new server in!
Thanks for the update
ID: 1211271 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1211294 - Posted: 28 Mar 2012, 22:52:00 UTC - in response to Message 1211190.  
Last modified: 28 Mar 2012, 22:57:17 UTC

EDIT: How many AP a tape has ?

This has been answered a bunch over in Number Crunching over the years, but I did a search and found the information. Josef explains it.

~400 APs per channel per 50.2gb tape X 14 channels = 5600 APs per tape. However, I think.. if I remember from a post years ago, not all 14 channels are used. I think it's just the first 12 (B1_P0 through B6_P1.. B = 'beam' and P = 'polarity'), so that drops the number down to ~4800, but I think the real number is closer to 4700.

Now that is WUs that get generated. Then x2 for tasks for the initial replication and that means there's nearly 10,000 AP tasks to be handed out to people who are asking for work.. per tape. Sometimes there are a dozen or so tapes available, so that's a bit over 100,000 AP tasks.

Each one of those tasks are 8MB of data to be pushed through a 100mbit pipe.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1211294 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1211339 - Posted: 29 Mar 2012, 0:16:37 UTC - in response to Message 1211317.  

wow ! so 4700 tasks

if we considare that 1 task takes a median of 1hr with gpu... it would gives 4700 hrs of works. sweet thanks you.

Times two. Each of those 4700 WUs get done by two people, so if you assume everyone uses a GPU to do it, you're looking at ~9400hrs (~13 months) of crunching per tape.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1211339 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1211399 - Posted: 29 Mar 2012, 3:49:38 UTC - in response to Message 1211341.  

SETI does have a small cache of 3TB drives used for Astropulse Reob data.


Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1211399 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1211413 - Posted: 29 Mar 2012, 4:19:44 UTC - in response to Message 1211341.  

ya but cant get the _0 and _1 in same time on same PC for the same WU :)

Exactly why I said "two people" and "assume everyone uses a GPU." :P
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1211413 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1211450 - Posted: 29 Mar 2012, 7:42:29 UTC - in response to Message 1211185.  
Last modified: 29 Mar 2012, 7:45:08 UTC

Astropulse v6 testing started at Beta early last December, and has been more or less continuous since late January. For all of that testing, new splitter code had to be in place with the sign difference which was the primary reason for a new version of Astropulse. Changes to the Validator to reliably sense tasks with abnormal runtimes took place more recently, and a trivial change of the application was needed to support that, hence the release is version 6.01.

There are usually less than 4000 active hosts working Beta, and many of those with a fairly low resource share. One tape file typically lasts a month, a far different situation than here where the data flow is much higher. Issues like how many splitters are needed to supply demand cannot be checked at Beta, and using VGC-sensitive splitters there would probably reduce the variations which makes beta testing of applications useful. Whether they may have been used to split some channels I don't know, it's certainly possible though I think unlikely.


Umm, I'm a Beta tester, and never saw a AP WU come through either of the two computers I have doing Beta... (which is why I asked :-) )
.

Hello, from Albany, CA!...
ID: 1211450 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1211455 - Posted: 29 Mar 2012, 7:55:46 UTC - in response to Message 1211417.  

[quote]ya but cant get the _0 and _1 in same time on same PC for the same WU :)

Exactly why I said "two people" and "assume everyone uses a GPU." :P


Bad assumption: lots of people (including me...) don't want AP on their GPU(s),as it would take too long - (I don't have a 580 GTX![or anything close]) but I do run AP on my CPUs (4 of them, one too slow to do an AP WU in a reasonable amount of time. [it's a AMD C50 dual-core laptop - I did try a WU on it, and the WU ran for three days!])
.

Hello, from Albany, CA!...
ID: 1211455 · Report as offensive
Profile cgoodrich

Send message
Joined: 21 Mar 12
Posts: 1
Credit: 2,981,407
RAC: 0
United States
Message 1211764 - Posted: 30 Mar 2012, 4:03:21 UTC - in response to Message 1210767.  

I just started this I have decide you my failover datacenter to help out with the crunching. I am running it on grid of 6 vSphere 5 ESX servers running on Dell m1000e Blade Chassis with M610 blades. I cloned VMs what I call seti@home machines and let them run a 100% 24x7. I have been doing this a week I have 81,681 credits. A problem I have is am not being fed jobs fast enough to tap my servers out 24x7.
ID: 1211764 · Report as offensive
Profile Ex: "Socialist"
Volunteer tester
Avatar

Send message
Joined: 12 Mar 12
Posts: 3433
Credit: 2,616,158
RAC: 2
United States
Message 1211782 - Posted: 30 Mar 2012, 5:09:07 UTC - in response to Message 1211764.  
Last modified: 30 Mar 2012, 5:15:13 UTC

I just started this I have decide you my failover datacenter to help out with the crunching. I am running it on grid of 6 vSphere 5 ESX servers running on Dell m1000e Blade Chassis with M610 blades. I cloned VMs what I call seti@home machines and let them run a 100% 24x7. I have been doing this a week I have 81,681 credits. A problem I have is am not being fed jobs fast enough to tap my servers out 24x7.


/off topic
but
Wow... nice setup. I wish my builds could cross into that territory, ah if I had but the money :-)
-Dave
ID: 1211782 · Report as offensive
Profile Ronald R CODNEY
Avatar

Send message
Joined: 19 Nov 11
Posts: 87
Credit: 420,920
RAC: 0
United States
Message 1211822 - Posted: 30 Mar 2012, 7:27:33 UTC - in response to Message 1211782.  

If you bought the winning MegaMillions ticket for tonights drawing....???
ID: 1211822 · Report as offensive
Profile Pascal

Send message
Joined: 22 Jan 00
Posts: 26
Credit: 3,624,307
RAC: 0
Netherlands
Message 1211826 - Posted: 30 Mar 2012, 7:32:04 UTC
Last modified: 30 Mar 2012, 7:35:54 UTC

Are there "download problems" ?
On my system all workunits done, all new workunits stuck on download.
Have manualy retry to get some thinsg done.

Pascal
ID: 1211826 · Report as offensive
Sakletare
Avatar

Send message
Joined: 18 May 99
Posts: 132
Credit: 23,423,829
RAC: 0
Sweden
Message 1211832 - Posted: 30 Mar 2012, 7:58:54 UTC - in response to Message 1211826.  

Are there "download problems" ?

Yes, there's network problems in both directions. One of those mysterious problems where noone really knows why. Might be the switches mentioned above.
ID: 1211832 · Report as offensive
Profile Pascal

Send message
Joined: 22 Jan 00
Posts: 26
Credit: 3,624,307
RAC: 0
Netherlands
Message 1211856 - Posted: 30 Mar 2012, 9:20:25 UTC - in response to Message 1211832.  

Are there "download problems" ?

Yes, there's network problems in both directions. One of those mysterious problems where noone really knows why. Might be the switches mentioned above.


Okay... thnx!
ID: 1211856 · Report as offensive
1 · 2 · Next

Message boards : Technical News : High Score (Mar 27 2012)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.