Message boards :
Technical News :
High Score (Mar 27 2012)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Another outage day (for database backups, maintenance, etc.). Today we also tackled a couple extra things. First, I did a download test to answer the question: "given our current hardware and software setup, if we had a 1Gbits/sec link available to us (as opposed to currently being choked at 100Mbits/sec) how fast could we actually push bits out?" Well, the answer is: roughly peaking at 450 Mbits/sec, where the next chokepoint is our workunit file server. Not bad. This datum will help when making arguments to the right people about what we hope to gain from network improvements around here. Of course, we'd still average about 100Mbits/sec (like we do now) but we'd drop far less connections, and everything would be faster/happier. Second, Jeff and I did some tests regarding our internal network. Turns out we're finding our few switches handling traffic in the server closet are being completely overloaded. This actually may be the source of several issues recently. However, we're still finding other mysterious chokepoints. Oy, all the hidden bottlenecks! We also hoped to get the VGC-sensitive splitter on line (see previous note) but the recent compile got munged somehow so we had to revert to the previous one as I brought the projects back up this afternoon. Oh well. We'll get it on line soon. We did get beyond all the early drive failures on the new JBOD and now have a full set of 24 working drives on the front of it, all hooked up to georgem, RAIDed up and tested. Below is a picture of them in the rack in the closet (georgem just above the monitor, the JBOD just below). The other new server paddym is still on the lab table pending certain plans and me finding time to get an OS on it. Oh yeah I also updated the server list at the bottom of the server status page. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Thanks for the update Matt, Claggy |
Byron Leigh Hatch @ team Carl Sagan Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4 |
Thanks for the update Matt, +1 |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
That sure looks a lot neater than pics from the past. You keep posting pics like that and people are gonna get the idea you guys actually do know what you are doing! PROUD MEMBER OF Team Starfire World BOINC |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
We also hoped to get the VGC-sensitive splitter on line (see previous note) but the recent compile got munged somehow so we had to revert to the previous one as I brought the projects back up this afternoon. Oh well. We'll get it on line soon. Why isn't this being tested on Beta? (NTM: Why wasn't AP v6.0 tested on Beta?) I thought testing of this sort was what Beta was for? . Hello, from Albany, CA!... |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
Great work on the bottleneck testing.... Now, time for more bandwidth! Meow! "Time is simply the mechanism that keeps everything from happening all at once." |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
We also hoped to get the VGC-sensitive splitter on line (see previous note) but the recent compile got munged somehow so we had to revert to the previous one as I brought the projects back up this afternoon. Oh well. We'll get it on line soon. Astropulse v6 testing started at Beta early last December, and has been more or less continuous since late January. For all of that testing, new splitter code had to be in place with the sign difference which was the primary reason for a new version of Astropulse. Changes to the Validator to reliably sense tasks with abnormal runtimes took place more recently, and a trivial change of the application was needed to support that, hence the release is version 6.01. There are usually less than 4000 active hosts working Beta, and many of those with a fairly low resource share. One tape file typically lasts a month, a far different situation than here where the data flow is much higher. Issues like how many splitters are needed to supply demand cannot be checked at Beta, and using VGC-sensitive splitters there would probably reduce the variations which makes beta testing of applications useful. Whether they may have been used to split some channels I don't know, it's certainly possible though I think unlikely. Joe |
Alaun Send message Joined: 29 Nov 05 Posts: 18 Credit: 9,310,773 RAC: 0 |
Glad to hear there's talk of bandwidth, and nice to see the new server in! Thanks for the update |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
EDIT: How many AP a tape has ? This has been answered a bunch over in Number Crunching over the years, but I did a search and found the information. Josef explains it. ~400 APs per channel per 50.2gb tape X 14 channels = 5600 APs per tape. However, I think.. if I remember from a post years ago, not all 14 channels are used. I think it's just the first 12 (B1_P0 through B6_P1.. B = 'beam' and P = 'polarity'), so that drops the number down to ~4800, but I think the real number is closer to 4700. Now that is WUs that get generated. Then x2 for tasks for the initial replication and that means there's nearly 10,000 AP tasks to be handed out to people who are asking for work.. per tape. Sometimes there are a dozen or so tapes available, so that's a bit over 100,000 AP tasks. Each one of those tasks are 8MB of data to be pushed through a 100mbit pipe. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
wow ! so 4700 tasks Times two. Each of those 4700 WUs get done by two people, so if you assume everyone uses a GPU to do it, you're looking at ~9400hrs (~13 months) of crunching per tape. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Slavac Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0 |
|
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
ya but cant get the _0 and _1 in same time on same PC for the same WU :) Exactly why I said "two people" and "assume everyone uses a GPU." :P Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
Astropulse v6 testing started at Beta early last December, and has been more or less continuous since late January. For all of that testing, new splitter code had to be in place with the sign difference which was the primary reason for a new version of Astropulse. Changes to the Validator to reliably sense tasks with abnormal runtimes took place more recently, and a trivial change of the application was needed to support that, hence the release is version 6.01. Umm, I'm a Beta tester, and never saw a AP WU come through either of the two computers I have doing Beta... (which is why I asked :-) ) . Hello, from Albany, CA!... |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
[quote]ya but cant get the _0 and _1 in same time on same PC for the same WU :) Bad assumption: lots of people (including me...) don't want AP on their GPU(s),as it would take too long - (I don't have a 580 GTX![or anything close]) but I do run AP on my CPUs (4 of them, one too slow to do an AP WU in a reasonable amount of time. [it's a AMD C50 dual-core laptop - I did try a WU on it, and the WU ran for three days!]) . Hello, from Albany, CA!... |
cgoodrich Send message Joined: 21 Mar 12 Posts: 1 Credit: 2,981,407 RAC: 0 |
I just started this I have decide you my failover datacenter to help out with the crunching. I am running it on grid of 6 vSphere 5 ESX servers running on Dell m1000e Blade Chassis with M610 blades. I cloned VMs what I call seti@home machines and let them run a 100% 24x7. I have been doing this a week I have 81,681 credits. A problem I have is am not being fed jobs fast enough to tap my servers out 24x7. |
Ex: "Socialist" Send message Joined: 12 Mar 12 Posts: 3433 Credit: 2,616,158 RAC: 2 |
I just started this I have decide you my failover datacenter to help out with the crunching. I am running it on grid of 6 vSphere 5 ESX servers running on Dell m1000e Blade Chassis with M610 blades. I cloned VMs what I call seti@home machines and let them run a 100% 24x7. I have been doing this a week I have 81,681 credits. A problem I have is am not being fed jobs fast enough to tap my servers out 24x7. /off topic but Wow... nice setup. I wish my builds could cross into that territory, ah if I had but the money :-) -Dave |
Ronald R CODNEY Send message Joined: 19 Nov 11 Posts: 87 Credit: 420,920 RAC: 0 |
If you bought the winning MegaMillions ticket for tonights drawing....??? |
Pascal Send message Joined: 22 Jan 00 Posts: 26 Credit: 3,624,307 RAC: 0 |
Are there "download problems" ? On my system all workunits done, all new workunits stuck on download. Have manualy retry to get some thinsg done. Pascal |
Sakletare Send message Joined: 18 May 99 Posts: 132 Credit: 23,423,829 RAC: 0 |
Are there "download problems" ? Yes, there's network problems in both directions. One of those mysterious problems where noone really knows why. Might be the switches mentioned above. |
Pascal Send message Joined: 22 Jan 00 Posts: 26 Credit: 3,624,307 RAC: 0 |
Are there "download problems" ? Okay... thnx! |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.