Gravity (Jun 09 2011) |
![]() |
| log in |
Message boards : Technical News : Gravity (Jun 09 2011)
1 · 2 · Next
| Author | Message |
|---|---|
|
So bruno (the upload server) has been having fits. Basically an arbitrary CPU locks up. I'm hoping this is more of a kernel/software issue than hardware, and will clear up on its own. In the meantime, we did get it on a remote power strip so we can kick it from home without having to come to the lab. | |
| ID: 1115236 · | |
|
Matt, thanks for the news! | |
| ID: 1115237 · | |
|
Thanks for the update Matt, | |
| ID: 1115239 · | |
|
Thank you for the update, Matt! Much appreciated. | |
| ID: 1115243 · | |
|
Thanks too Matt, | |
| ID: 1115246 · | |
|
Thanks Matt. Just keep the Louisville handy if it all goes south. | |
| ID: 1115260 · | |
|
Memory incompatibility is rampant these days. Did you check the sticks with Memtest86+ yet? | |
| ID: 1115264 · | |
|
Thanks for the update Matt. Good luck on the DOA sticks. | |
| ID: 1115276 · | |
|
Wow, Matt! | |
| ID: 1115383 · | |
|
Thank you for the info, Matt. However one issue here is that the newest BOINC version has a prominent place for notices which apparently mirrors the home page. It may have been better to post your news there as the last notice was dated May 28th and doesn't cover this sequence of events. Without current notices on the home page the BOINC development just becomes redundant and a mockery. Users still have to access the boards for any info on which they find concerns. Perhaps a brief comment on the current problem with a link to the appropriate board would work. | |
| ID: 1115385 · | |
For the Sunfire X4x00 series of boxes, Micron is what they started with, then moved to Samsung, both had issues. The best memory I put in any of the 60+ 4000 series boxes I was working on was Hynix. Performed well and 0 DOAs in 3 years. Unlike Micron that was as one point 25% DOA, and Samsung which had much better stats but still had to burn in for 3 days to verify non-DOA before putting into a production box.
Memory incompatibility wouldn't be the issue here...Sun is quirky this way. Incompatible memory and the box won't even fire. Since it was running for a couple of days, I would more likely suspect DOA depending on the brand. Also, while I personally would recommend memtest be run, with the amount of memory in this box (unsure but would suspect 16+ gig) it'll be down for a couple hours to get a good test. As far as the firmware goes, If Matt would post the bios AND firmware levels of Thumper, I can check for the latest and be sure it's up to snuff. BIOS and firmware need to be compatible, you can mix and match but you get flakey results. It was a nightmare to flash and sometimes required up to three retries to get it to go... Kevin ____________ "Two things are infinite: The universe and human stupidity; and I'm not sure about the universe." - Albert Einstein | |
| ID: 1115415 · | |
|
Thanks for taking the time to keep us updated Matt :) | |
| ID: 1115467 · | |
|
Hi Matt, and thanks for the update. Sounds like some of your systems are throwing fits, and I sure am sorry to hear that. One of these days, I hope that things settle down and you guys get a MUCH DESERVED break. In the meantime though, I want you to know that all of us appreciate all of your hard work, and your dedication to the project. | |
| ID: 1115552 · | |
|
why does it seem there is so many problems with server memory? i'm not just talking about the seti servers, but any servers in general seem very picky and there seems to be a high failure rate for server memory. | |
| ID: 1115746 · | |
why does it seem there is so many problems with server memory? i'm not just talking about the seti servers, but any servers in general seem very picky and there seems to be a high failure rate for server memory. Three things come to mind. One, the specs on a server are likely to be correct and not have large margins of error built into them. I.E. Their memory buss speed really is what they say and slightly slow chips won't cut it. Two, their memory is much more exercised than a desktop. This because they are running so many more jobs that they spend much more time using the main memory and much less the cache memory. Obviously good server design (software) is supposed to minimize that. Three, they likely run hot and in hot places. Heat is the enemy. ____________ | |
| ID: 1115781 · | |
is there something inherent with servers or their memory which makes them have troubles? The amount of memory. Most desktop systems these days have 4GB, some 8GB, very few 12GB. A server system can have up to 2TB of memory. That's 64 memory slots. In order for memory to work, the timing has to be spot on- the margin for error is almost non-existant. The more memory modules you have in a system, the greater the electrical load, and the greater the effect of capacitance & inductance- and they really screw timing up. While a server system might work with 12 modules of one type of memory, it may not work with 14. In order for a system to be stable it needs memory that meets the design specs exactly. The slightest variance will introduce timing erros, and you end up with memory faults- even though the memory may not actually be faulty. It's just not suitable for that system with that many memory modules populated. ____________ Grant Darwin NT. | |
| ID: 1115789 · | |
is there something inherent with servers or their memory which makes them have troubles? The overhead involved with mapping and keeping track of such vast amounts of RAM must be incredible indeed. I am sure the fault limits allowed by a server must be miniscule.....once you toss something out there into that vast memory bank, you have to trust that it is safe to stay there for a bit...(no pun intended). ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1115790 · | |
|
Thanks Matt | |
| ID: 1115801 · | |
So bruno (the upload server) has been having fits. Basically an arbitrary CPU locks up. I'm hoping this is more of a kernel/software issue than hardware, and will clear up on its own. In the meantime, we did get it on a remote power strip so we can kick it from home without having to come to the lab. Any chance of doing the same with whatever it is that periodically freezes the Server Status Page, and apparently blocks new work production at the same time? | |
| ID: 1116083 · | |
So bruno (the upload server) has been having fits. Basically an arbitrary CPU locks up. I'm hoping this is more of a kernel/software issue than hardware, and will clear up on its own. In the meantime, we did get it on a remote power strip so we can kick it from home without having to come to the lab. Probably whatever does that (the freezes) is dependent on a process in Bruno... (or is a process actually running on Bruno...) ____________ . | |
| ID: 1116338 · | |
Message boards : Technical News : Gravity (Jun 09 2011)
| Copyright © 2013 University of California |