Gravity (Jun 09 2011)


log in

Advanced search

Message boards : Technical News : Gravity (Jun 09 2011)

1 · 2 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1115236 - Posted: 9 Jun 2011, 22:33:54 UTC
Last modified: 9 Jun 2011, 22:35:43 UTC

So bruno (the upload server) has been having fits. Basically an arbitrary CPU locks up. I'm hoping this is more of a kernel/software issue than hardware, and will clear up on its own. In the meantime, we did get it on a remote power strip so we can kick it from home without having to come to the lab.

As for thumper we replaced the correct DIMMs this time around on Tuesday. But then it crashed last night! So there was some cleanup this morning, then re-replacing the DIMMs with the originals, and then coming to terms with the fact that the most likely scenario is that those replacement DIMMs were actually DOA. So we're back to square one on that front, hoping for no uncorrectable memory errors until the next step.

In better news we moved some assimilator processes to synergy and were pleasantly surprised how much faster they ran. In fact, we are running the scientific analysis code now which has been causing the assimilators to back up, but they aren't. That's nice. Really nice, actually. [EDIT: I might have spoken too soon on this front - not so nice.]

Still trying to hash out the next phase for the NTPCkr and how to present all this to the public. We're doing a bunch of in-house analysis ourselves just to get a feel for the data and clean up junk, and as expected most of the "interesting" stuff is turning out to be RFI. We want to get it to a point where we're presenting people with candidates that contain signals which aren't always obvious RFI. That would be boring and useless.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7069
Credit: 60,285,907
RAC: 17,154
Germany
Message 1115237 - Posted: 9 Jun 2011, 22:39:54 UTC - in response to Message 1115236.

Matt, thanks for the news!


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -
____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4087
Credit: 32,996,910
RAC: 5,617
United Kingdom
Message 1115239 - Posted: 9 Jun 2011, 22:47:20 UTC - in response to Message 1115236.
Last modified: 9 Jun 2011, 22:54:54 UTC

Thanks for the update Matt,

Any chance of Seti Beta being brought online?

Claggy

Profile SliverProject donor
Avatar
Send message
Joined: 18 May 11
Posts: 281
Credit: 7,093,628
RAC: 3,151
United States
Message 1115243 - Posted: 9 Jun 2011, 23:04:46 UTC - in response to Message 1115239.

Thank you for the update, Matt! Much appreciated.
____________

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4087
Credit: 32,996,910
RAC: 5,617
United Kingdom
Message 1115246 - Posted: 9 Jun 2011, 23:09:01 UTC - in response to Message 1115243.

Thanks too Matt,

Claggy

Joel Lynn
Send message
Joined: 8 Sep 05
Posts: 14
Credit: 26,446
RAC: 0
United States
Message 1115260 - Posted: 10 Jun 2011, 0:13:16 UTC

Thanks Matt. Just keep the Louisville handy if it all goes south.
____________

Profile Jack Zhang
Volunteer tester
Avatar
Send message
Joined: 2 Jul 06
Posts: 206
Credit: 6,079,079
RAC: 1,021
Canada
Message 1115264 - Posted: 10 Jun 2011, 0:32:23 UTC

Memory incompatibility is rampant these days. Did you check the sticks with Memtest86+ yet?

Usually, a BIOS upgrade fixes incompatibility issues, but not all the time.
____________
What if Fiction was Fact and Fact was Fiction and vice versa?

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12492
Credit: 6,803,135
RAC: 6,029
United States
Message 1115276 - Posted: 10 Jun 2011, 1:31:27 UTC

Thanks for the update Matt. Good luck on the DOA sticks.

____________

Cherokee150
Send message
Joined: 11 Nov 99
Posts: 103
Credit: 24,413,823
RAC: 24,314
United States
Message 1115383 - Posted: 10 Jun 2011, 12:25:58 UTC

Wow, Matt!
Am I reading you correctly that some signals you are finding have -not- been attributable to man-made noise? I understand how much must still be done to rule out natural phenomena and spurious anomalies. However, if you are actually getting some signals that are not ours, that alone is the first big step.

Still much to do, of course, but this -is- exciting and feeds the imagination.

I, for one, look forward with great anticipation to the day you are ready to present any such candidates for further analysis!!!

p.s. I wish you at least one week this month without -any- technical difficulties. You -certainly- deserve one...or two! :)

Profile Kibble (KB7TIB)
Avatar
Send message
Joined: 6 Dec 99
Posts: 21
Credit: 1,493,960
RAC: 4,793
United States
Message 1115385 - Posted: 10 Jun 2011, 12:32:17 UTC

Thank you for the info, Matt. However one issue here is that the newest BOINC version has a prominent place for notices which apparently mirrors the home page. It may have been better to post your news there as the last notice was dated May 28th and doesn't cover this sequence of events. Without current notices on the home page the BOINC development just becomes redundant and a mockery. Users still have to access the boards for any info on which they find concerns. Perhaps a brief comment on the current problem with a link to the appropriate board would work.
____________

Twisted
Send message
Joined: 27 May 99
Posts: 81
Credit: 1,878,062
RAC: 228
United States
Message 1115415 - Posted: 10 Jun 2011, 14:29:51 UTC - in response to Message 1115236.


As for thumper we replaced the correct DIMMs this time around on Tuesday. But then it crashed last night! So there was some cleanup this morning, then re-replacing the DIMMs with the originals, and then coming to terms with the fact that the most likely scenario is that those replacement DIMMs were actually DOA. So we're back to square one on that front, hoping for no uncorrectable memory errors until the next step.

- Matt


For the Sunfire X4x00 series of boxes, Micron is what they started with, then moved to Samsung, both had issues. The best memory I put in any of the 60+ 4000 series boxes I was working on was Hynix. Performed well and 0 DOAs in 3 years. Unlike Micron that was as one point 25% DOA, and Samsung which had much better stats but still had to burn in for 3 days to verify non-DOA before putting into a production box.


Memory incompatibility is rampant these days. Did you check the sticks with Memtest86+ yet?

Usually, a BIOS upgrade fixes incompatibility issues, but not all the time.


Memory incompatibility wouldn't be the issue here...Sun is quirky this way. Incompatible memory and the box won't even fire. Since it was running for a couple of days, I would more likely suspect DOA depending on the brand. Also, while I personally would recommend memtest be run, with the amount of memory in this box (unsure but would suspect 16+ gig) it'll be down for a couple hours to get a good test.

As far as the firmware goes, If Matt would post the bios AND firmware levels of Thumper, I can check for the latest and be sure it's up to snuff.

BIOS and firmware need to be compatible, you can mix and match but you get flakey results. It was a nightmare to flash and sometimes required up to three retries to get it to go...

Kevin

____________
"Two things are infinite: The universe and human stupidity; and I'm not sure about the universe." - Albert Einstein

Profile Slavac
Volunteer tester
Avatar
Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1115467 - Posted: 10 Jun 2011, 15:50:40 UTC - in response to Message 1115415.

Thanks for taking the time to keep us updated Matt :)
____________


Executive Director GPU Users Group Inc. -
brad@gpuug.org

Profile Jeff Mercer
Send message
Joined: 14 Aug 08
Posts: 90
Credit: 162,139
RAC: 0
United States
Message 1115552 - Posted: 10 Jun 2011, 19:43:19 UTC

Hi Matt, and thanks for the update. Sounds like some of your systems are throwing fits, and I sure am sorry to hear that. One of these days, I hope that things settle down and you guys get a MUCH DESERVED break. In the meantime though, I want you to know that all of us appreciate all of your hard work, and your dedication to the project.

LOOKING FORWARD TO GREENBANK WORK UNITS !!!!

CryptokiD
Avatar
Send message
Joined: 2 Dec 00
Posts: 134
Credit: 2,814,936
RAC: 0
United States
Message 1115746 - Posted: 11 Jun 2011, 3:07:39 UTC

why does it seem there is so many problems with server memory? i'm not just talking about the seti servers, but any servers in general seem very picky and there seems to be a high failure rate for server memory.

i cant remember the last time i had a desktop memory issue. and every computer i build gets a 24 hour memtest+ burn in to verify that part of the computer is running ok.

is there something inherent with servers or their memory which makes them have troubles? i have very little server expierence apart from small to medium offices of 25pc's or less.

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12492
Credit: 6,803,135
RAC: 6,029
United States
Message 1115781 - Posted: 11 Jun 2011, 5:03:02 UTC - in response to Message 1115746.

why does it seem there is so many problems with server memory? i'm not just talking about the seti servers, but any servers in general seem very picky and there seems to be a high failure rate for server memory.

i cant remember the last time i had a desktop memory issue. and every computer i build gets a 24 hour memtest+ burn in to verify that part of the computer is running ok.

is there something inherent with servers or their memory which makes them have troubles? i have very little server expierence apart from small to medium offices of 25pc's or less.

Three things come to mind.

One, the specs on a server are likely to be correct and not have large margins of error built into them. I.E. Their memory buss speed really is what they say and slightly slow chips won't cut it.

Two, their memory is much more exercised than a desktop. This because they are running so many more jobs that they spend much more time using the main memory and much less the cache memory. Obviously good server design (software) is supposed to minimize that.

Three, they likely run hot and in hot places. Heat is the enemy.

____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5811
Credit: 58,801,489
RAC: 48,528
Australia
Message 1115789 - Posted: 11 Jun 2011, 5:40:21 UTC - in response to Message 1115746.

is there something inherent with servers or their memory which makes them have troubles?

The amount of memory.
Most desktop systems these days have 4GB, some 8GB, very few 12GB. A server system can have up to 2TB of memory. That's 64 memory slots.
In order for memory to work, the timing has to be spot on- the margin for error is almost non-existant. The more memory modules you have in a system, the greater the electrical load, and the greater the effect of capacitance & inductance- and they really screw timing up.
While a server system might work with 12 modules of one type of memory, it may not work with 14. In order for a system to be stable it needs memory that meets the design specs exactly. The slightest variance will introduce timing erros, and you end up with memory faults- even though the memory may not actually be faulty. It's just not suitable for that system with that many memory modules populated.
____________
Grant
Darwin NT.

Profile IgogoProject donor
Volunteer tester
Avatar
Send message
Joined: 18 Dec 04
Posts: 100
Credit: 37,668,927
RAC: 30,629
Ukraine
Message 1115801 - Posted: 11 Jun 2011, 7:21:54 UTC - in response to Message 1115790.

Thanks Matt

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8491
Credit: 49,761,244
RAC: 55,125
United Kingdom
Message 1116083 - Posted: 11 Jun 2011, 23:20:23 UTC - in response to Message 1115236.

So bruno (the upload server) has been having fits. Basically an arbitrary CPU locks up. I'm hoping this is more of a kernel/software issue than hardware, and will clear up on its own. In the meantime, we did get it on a remote power strip so we can kick it from home without having to come to the lab.

Any chance of doing the same with whatever it is that periodically freezes the Server Status Page, and apparently blocks new work production at the same time?

Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 1937
Credit: 10,046,869
RAC: 17,997
United States
Message 1116338 - Posted: 12 Jun 2011, 17:19:29 UTC - in response to Message 1116083.

So bruno (the upload server) has been having fits. Basically an arbitrary CPU locks up. I'm hoping this is more of a kernel/software issue than hardware, and will clear up on its own. In the meantime, we did get it on a remote power strip so we can kick it from home without having to come to the lab.

Any chance of doing the same with whatever it is that periodically freezes the Server Status Page, and apparently blocks new work production at the same time?


Probably whatever does that (the freezes) is dependent on a process in Bruno... (or is a process actually running on Bruno...)

____________
.

1 · 2 · Next

Message boards : Technical News : Gravity (Jun 09 2011)

Copyright © 2014 University of California