Message boards :
Technical News :
Barrel of Bottlenecks (Aug 15 2007)
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
RichaG Send message Joined: 20 May 99 Posts: 1690 Credit: 19,287,294 RAC: 36 |
I just set up a new computer and I can't even get the seti program downloaded. I have tried project reset a couple of times and that doesn't even help. There seems to be a good backlog of wus now, but why can't it download the application? Red Bull Air Racing Gas price by zip at Seti |
Christoph Send message Joined: 21 Apr 03 Posts: 76 Credit: 355,173 RAC: 0 |
Server wise - oddly enough we were philosophically debating this yesterday: why not get a big big big server instead of screwing around with smaller ones and their apparently unique idiosyncrasies? Obviously we'd benefit from having all the CPUs/RAM/disks at our disposal on one system but there are two cons off the top of my head: Had have a look at Sun, more than 500k Dollars! Christoph |
yank Send message Joined: 15 Aug 99 Posts: 522 Credit: 22,545,639 RAC: 0 |
For a 64-processor gorilla, 2.4 GHz SPARCVI dual-core chips, 6 MB of on-chip L2 cache, 128 GB of memory and a 64 x 73 GB SAS drive raise the price tag to $10,100,320. Don't know if this is what the SETI program needs but if all the volunteers in the SETI program give $10.00 each we could be on the way to buying one. Donate to SETI and if possible join the US Navy team. http://boinc.mundayweb.com/teamStats.php?userID=14824 |
Edward Lee Michau Send message Joined: 31 Jul 06 Posts: 138 Credit: 9,640,846 RAC: 0 |
I Have had that problem off and on and have found that as soon as you notice the CPU time going up, the percent staying the same and the time to completion going up you need to shut down the program, turn off the computer. Then restart computer and program. Most of the time the WU will start over and run right. If it still wont run right then abort that workunit. Its not worth running ten times too long and still not getting anywhere. Somebody told me this on the Message Boards. Ed |
buck.r Send message Joined: 1 Apr 06 Posts: 3 Credit: 276,788 RAC: 0 |
Ben - couldn't agree with you more ............. these guys are doing a terrific job and need all of our support with their current situation! I worked in the computer industry from the early 60's (yes, I'm vintage) and the fact that they keep us up to "speed", LOL, through all their debugging is incredible under the circumstances! Back off all you pretenders - the rest of us are happy just helping to look for the odd alien ...... and to hell with the number crunching! Judy (from down under Down Under - almost the edge of the planet) |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
I don't mean criticism of your operations here, but I just wanted to make you think with the following spiel. Despite what the others think, in my professional opinion, it's better to have a few dedicated servers than have many little servers. But I'll take a slow, reliable server over a fast, flaky one any day. The other issue that I do not understand is why several of your servers perform multiple roles: e.g., bruno is upload/download, scheduler, feeder, file_deleter1, transitioner, etc. I see you did well with the database servers being dedicated because I bet they work the hardest. If you lose a box like bruno, it would cripple the project. I'm sure with a couple weeks of time and maybe an additional server or two, you could redesign this setup to be more reliable. Also, if there's a linux guru there, you can save a lot of money using linux instead of SunOS. Sun's support contracts are outrageously priced. I think it would be worth your time to not have to come in on the weekends and fix the servers. Two servers (at least) for each function (for redundancy) and only have servers do multiple roles that are low-impact or low-priority (e.g., file deletion, validation, whatever). |
Jesse Viviano Send message Joined: 27 Feb 00 Posts: 100 Credit: 3,949,583 RAC: 0 |
I don't mean criticism of your operations here, but I just wanted to make you think with the following spiel. Actually, postprocessing tasks (e.g. validation, assimilation, transitioning, and deletion) are probably the highest priority tasks in BOINC. Whenever there is a backlog of postprocessing work to do, it slows everything else down. First, there will be intense competition for disk access from the postprocessing tasks. Second, a backlog means that there are more files on the hard drive than there normally would be, causing the file system to slow down on each access. Third, the admins have noted that the disks get dangerously full whenever something is keeping the deleters from operating at top efficiency for too long(e.g. they are disabled, they are slowed down by a slow file system due to too many files on the hard drives, or the network is saturated). Therefore, these tasks are probably the highest priority tasks in BOINC. If there is a backlog, then everything else slows down. |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
That's very informative, thank you. In that case, perhaps someone could take some performance measures on the various components of the system. The trouble with fixing one bottleneck is that it often just creates another one down the chain. It's the same as widening a road only to make traffic merge later; it just moves the bottleneck somewhere else. That is why careful planning of the server resources is important. Using donated hardware/funds make it real difficult to plan for such things, unfortunately, but not impossible. The goal would be to figure out how fast can the system deal with finished results? Then you would just have (ideally) enough upload servers (and a fast, smart scheduler) to queue the finished results so the post-processing servers can pull from the upload servers at their own pace. I would also completely separate splitters/result generation servers from the post-processing ones. Bruno seems the ideal server to be this big upload/scheduler disk. It would be nice to have two download/upload/deleters, in case bruno hiccups. |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
There are a million machines at SETI's disposal, all crunching. The SETI project is centrally based with distributed crunching, but still a star network dependent on the core. I have a couple of web servers on backbones that are spending 98% of their power on SETI, while not interfering with their primary purpose. They could also be employed splitting and/or handling up/downloads, batch up the results and feed to the MSDB a few times a day. Many of us have machines we can donate (virtually) to the project, and many would live topologically close to Berkeley. Perhaps it's time for a fresh rethink on project architecture and considering maximising on the p2p resources available - ie a million machines!? Andy. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Perhaps it's time for a fresh rethink on project architecture and considering maximising on the p2p resources available - ie a million machines!? I've read it before somewhere that ROM would love to incorporate P2P-type capabilities into the BOINC framework. I'm not sure what the stopping point is, though I'm sure it's a technical one. |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
I noticed they only budgeted $13,300 for this year's "Upgrade to failure tolerant server configuration". I think it will require more than that. The biggest contributions are in December, but do you think next January you'll get some upgrades to be able to do it? I also priced out the "near time persistancy checker compute server" for this year's goals....I found a Dell PowerEdge 6950 rack server for $10,353. Two quad-core AMD Opterons with 32 GB RAM and 500GB of RAID5 storage (5 discs). You could add an external array if more space is needed. I'd venture to say you guys could use about 16 TB more storage across your servers, especially now that multi-beam data is in production. |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
Many, I take it many have not researched the last major outage and the major outage prior to that. Or read the Donations or the Hardware Donations II threads. There was information about how the infrastructure is put together. A Lot, can be found here in the Tech News... I will state a few simple things that I know... * Seti is moving the machines that can run Linux to Linux. Yes that also has other issues. Next * It has been stated that the Old Splitters being used prior to MultiBeam can only run Sun OS. Matt spent "I believe" over a week trying to port them... * Part of the fly in the ointment is a NetApp Filer (Fiber channel) with NFS Mounts. I have dealt with several NAS's a couple of those and they were cranky way back then (reading the news and some of the problems it is even crankier now). They we originally built on a striped BSD 3.x kernel... This does not mention they need a few new drives as there are no spares. * A second 24 port Gigabit switch and a couple of 8 ports are needed... This comes from the end of Donate Hardware II. So it becomes easy to offer advice (some is actually good)... I used to do a lot of second guessing at one point in time. I then found out that things were not always what they appeared on the surface. Unless you can see it, touch it and/or have someone explain it, it is very tough. Since Matt has started working the Tech News, "We" now have more information than we have in many years past. Thank You, Matt! He does owe us a few pictures after the Server Closet Cleanup... But that will come in time... Please be Patient. Regards Pappa Please consider a Donation to the Seti Project. |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
So, they need a couple of switches to make the servers run smoother? I found a 24-port 1000M switch w/ 4 GBIC ports on NewEgg for $530. http://www.newegg.com/Product/Product.aspx?Item=N82E16833122178 I'll dig up what I can on eBay and see if I can donate one to you guys. Enough talk, more action, right? :) The Linux vs. SunOS thing...yeah, that's a can of worms. Thank God for NFS. I've heard BAD experiences regarding NAS units. They are picky pieces of hardware. Good once you get it working right; just a pain to do so. You brought up a good point. No one in the message boards, except the people working in the data center, really know how things are arranged and configured. Thanks to Matt we can listen to their logic and at best provide feedback to what actions they take. From my years of experience, I only try to provide constructive ideas and feedback in an effort to help. They are doing the best they can. Yeah, Matt is a posting machine along with Jeff and the rest working late. |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
And, as has been pointed out to me in the past, all the post-processing tasks have to be on (now) Bruno, (then kryten, RIP) because of disk access issues. . Hello, from Albany, CA!... |
zombie67 [MM] Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0 |
The Linux vs. SunOS thing...yeah, that's a can of worms. Thank God for NFS. NFS...created by Sun in 1985. =;^) Dublin, California Team: SETI.USA |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
So, they need a couple of switches to make the servers run smoother? I found a 24-port 1000M switch w/ 4 GBIC ports on NewEgg for $530. http://www.newegg.com/Product/Product.aspx?Item=N82E16833122178 I'll dig up what I can on eBay and see if I can donate one to you guys. Enough talk, more action, right? :) You are kinda missing the point. It's great that you have "years of experience" and are willing to offer advice. The problem isn't advice (or knowledge), it's money. Most of what SETI has is either donated, or just plain hand-me-downs. So, we could launch a fairly effective "distributed denial-of-service attack" with our suggestions (there are enough of us with ideas that we could easily overwhelm them) or we can give hardware, or cash, or both, and really help. |
zombie67 [MM] Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0 |
So, we could launch a fairly effective "distributed denial-of-service attack" with our suggestions (there are enough of us with ideas that we could easily overwhelm them) or we can give hardware, or cash, or both, and really help. Great! Give us the HW list then. It's been asked for time after time. Still waiting. Dublin, California Team: SETI.USA |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
The donation page says we can give a monetary gift or choose a piece of hardware that they have on the list. So, why can I not contribute by donating a piece of hardware from said list? I found a good 3com switch that you quoted as needed. I was about to get it.... |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Great! Give us the HW list then. It's been asked for time after time. Still waiting. Pappa posted the list about 5 messages or so. The official list on the donation page is out of date, I believe. http://setiathome.berkeley.edu//forum_thread.php?id=41554&nowrap=true#620623 |
Bounce Send message Joined: 3 Apr 99 Posts: 66 Credit: 5,604,569 RAC: 0 |
Perhaps it's time for a fresh rethink on project architecture and considering maximising on the p2p resources available - ie a million machines!? many times the biggest hurdle with farming out core processes is that the data "owners" have to be assured or data security and validity. formal SLAs have to be hammered out so that the "contractor" is held to a certain level of performance so that the "contracting entity" (BOINC) can know that their data is handles iow their standards and expectations and that the "farmed out tasks" will be there (day and night) at the same level as if it were on their hardware, in their buildings and managed by their employees. these SLAs are often as difficult (or troublesome) as the technology side of things. people offering up "donated services" think twice about not being able to pull back resources when their own demands change. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.