Message boards :
Technical News :
Gravity (Jun 09 2011)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
So bruno (the upload server) has been having fits. Basically an arbitrary CPU locks up. I'm hoping this is more of a kernel/software issue than hardware, and will clear up on its own. In the meantime, we did get it on a remote power strip so we can kick it from home without having to come to the lab. As for thumper we replaced the correct DIMMs this time around on Tuesday. But then it crashed last night! So there was some cleanup this morning, then re-replacing the DIMMs with the originals, and then coming to terms with the fact that the most likely scenario is that those replacement DIMMs were actually DOA. So we're back to square one on that front, hoping for no uncorrectable memory errors until the next step. In better news we moved some assimilator processes to synergy and were pleasantly surprised how much faster they ran. In fact, we are running the scientific analysis code now which has been causing the assimilators to back up, but they aren't. That's nice. Really nice, actually. [EDIT: I might have spoken too soon on this front - not so nice.] Still trying to hash out the next phase for the NTPCkr and how to present all this to the public. We're doing a bunch of in-house analysis ourselves just to get a feel for the data and clean up junk, and as expected most of the "interesting" stuff is turning out to be RFI. We want to get it to a point where we're presenting people with candidates that contain signals which aren't always obvious RFI. That would be boring and useless. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Matt, thanks for the news! - Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. - |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Thanks for the update Matt, Any chance of Seti Beta being brought online? Claggy |
Akio Send message Joined: 18 May 11 Posts: 375 Credit: 32,129,242 RAC: 0 |
Thank you for the update, Matt! Much appreciated. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Thanks too Matt, Claggy |
Joel Lynn Send message Joined: 8 Sep 05 Posts: 14 Credit: 26,446 RAC: 0 |
Thanks Matt. Just keep the Louisville handy if it all goes south. |
Jack Zhang Send message Joined: 2 Jul 06 Posts: 206 Credit: 6,142,449 RAC: 0 |
Memory incompatibility is rampant these days. Did you check the sticks with Memtest86+ yet? Usually, a BIOS upgrade fixes incompatibility issues, but not all the time. What if Fiction was Fact and Fact was Fiction and vice versa? |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30933 Credit: 53,134,872 RAC: 32 |
Thanks for the update Matt. Good luck on the DOA sticks. |
Cherokee150 Send message Joined: 11 Nov 99 Posts: 192 Credit: 58,513,758 RAC: 74 |
Wow, Matt! Am I reading you correctly that some signals you are finding have -not- been attributable to man-made noise? I understand how much must still be done to rule out natural phenomena and spurious anomalies. However, if you are actually getting some signals that are not ours, that alone is the first big step. Still much to do, of course, but this -is- exciting and feeds the imagination. I, for one, look forward with great anticipation to the day you are ready to present any such candidates for further analysis!!! p.s. I wish you at least one week this month without -any- technical difficulties. You -certainly- deserve one...or two! :) |
Kibble (KB7TIB) Send message Joined: 6 Dec 99 Posts: 27 Credit: 10,121,469 RAC: 2 |
Thank you for the info, Matt. However one issue here is that the newest BOINC version has a prominent place for notices which apparently mirrors the home page. It may have been better to post your news there as the last notice was dated May 28th and doesn't cover this sequence of events. Without current notices on the home page the BOINC development just becomes redundant and a mockery. Users still have to access the boards for any info on which they find concerns. Perhaps a brief comment on the current problem with a link to the appropriate board would work. |
justsomeguy Send message Joined: 27 May 99 Posts: 84 Credit: 6,084,595 RAC: 11 |
For the Sunfire X4x00 series of boxes, Micron is what they started with, then moved to Samsung, both had issues. The best memory I put in any of the 60+ 4000 series boxes I was working on was Hynix. Performed well and 0 DOAs in 3 years. Unlike Micron that was as one point 25% DOA, and Samsung which had much better stats but still had to burn in for 3 days to verify non-DOA before putting into a production box.
Memory incompatibility wouldn't be the issue here...Sun is quirky this way. Incompatible memory and the box won't even fire. Since it was running for a couple of days, I would more likely suspect DOA depending on the brand. Also, while I personally would recommend memtest be run, with the amount of memory in this box (unsure but would suspect 16+ gig) it'll be down for a couple hours to get a good test. As far as the firmware goes, If Matt would post the bios AND firmware levels of Thumper, I can check for the latest and be sure it's up to snuff. BIOS and firmware need to be compatible, you can mix and match but you get flakey results. It was a nightmare to flash and sometimes required up to three retries to get it to go... Kevin "Two things are infinite: The universe and human stupidity; and I'm not sure about the universe." - Albert Einstein |
Slavac Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0 |
|
Jeff Mercer Send message Joined: 14 Aug 08 Posts: 90 Credit: 162,139 RAC: 0 |
Hi Matt, and thanks for the update. Sounds like some of your systems are throwing fits, and I sure am sorry to hear that. One of these days, I hope that things settle down and you guys get a MUCH DESERVED break. In the meantime though, I want you to know that all of us appreciate all of your hard work, and your dedication to the project. LOOKING FORWARD TO GREENBANK WORK UNITS !!!! |
CryptokiD Send message Joined: 2 Dec 00 Posts: 150 Credit: 3,216,632 RAC: 0 |
why does it seem there is so many problems with server memory? i'm not just talking about the seti servers, but any servers in general seem very picky and there seems to be a high failure rate for server memory. i cant remember the last time i had a desktop memory issue. and every computer i build gets a 24 hour memtest+ burn in to verify that part of the computer is running ok. is there something inherent with servers or their memory which makes them have troubles? i have very little server expierence apart from small to medium offices of 25pc's or less. |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30933 Credit: 53,134,872 RAC: 32 |
why does it seem there is so many problems with server memory? i'm not just talking about the seti servers, but any servers in general seem very picky and there seems to be a high failure rate for server memory. Three things come to mind. One, the specs on a server are likely to be correct and not have large margins of error built into them. I.E. Their memory buss speed really is what they say and slightly slow chips won't cut it. Two, their memory is much more exercised than a desktop. This because they are running so many more jobs that they spend much more time using the main memory and much less the cache memory. Obviously good server design (software) is supposed to minimize that. Three, they likely run hot and in hot places. Heat is the enemy. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
is there something inherent with servers or their memory which makes them have troubles? The amount of memory. Most desktop systems these days have 4GB, some 8GB, very few 12GB. A server system can have up to 2TB of memory. That's 64 memory slots. In order for memory to work, the timing has to be spot on- the margin for error is almost non-existant. The more memory modules you have in a system, the greater the electrical load, and the greater the effect of capacitance & inductance- and they really screw timing up. While a server system might work with 12 modules of one type of memory, it may not work with 14. In order for a system to be stable it needs memory that meets the design specs exactly. The slightest variance will introduce timing erros, and you end up with memory faults- even though the memory may not actually be faulty. It's just not suitable for that system with that many memory modules populated. Grant Darwin NT |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
is there something inherent with servers or their memory which makes them have troubles? The overhead involved with mapping and keeping track of such vast amounts of RAM must be incredible indeed. I am sure the fault limits allowed by a server must be miniscule.....once you toss something out there into that vast memory bank, you have to trust that it is safe to stay there for a bit...(no pun intended). "Time is simply the mechanism that keeps everything from happening all at once." |
Igogo Send message Joined: 18 Dec 04 Posts: 125 Credit: 65,303,299 RAC: 44 |
Thanks Matt |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
So bruno (the upload server) has been having fits. Basically an arbitrary CPU locks up. I'm hoping this is more of a kernel/software issue than hardware, and will clear up on its own. In the meantime, we did get it on a remote power strip so we can kick it from home without having to come to the lab. Any chance of doing the same with whatever it is that periodically freezes the Server Status Page, and apparently blocks new work production at the same time? |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
So bruno (the upload server) has been having fits. Basically an arbitrary CPU locks up. I'm hoping this is more of a kernel/software issue than hardware, and will clear up on its own. In the meantime, we did get it on a remote power strip so we can kick it from home without having to come to the lab. Probably whatever does that (the freezes) is dependent on a process in Bruno... (or is a process actually running on Bruno...) . Hello, from Albany, CA!... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.