mork


log in

Advanced search

Message boards : Technical News : mork

1 · 2 · Next
Author Message
Jeff Cobb
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 1 Mar 99
Posts: 110
Credit: 40,367
RAC: 0
United States
Message 1041811 - Posted: 13 Oct 2010, 18:55:52 UTC

I'm starting a thread to let people know what's going on with the mork (our boinc DB server) issue.

As most of you know, mork will sometimes hang, requiring a power cycle to boot. There are no footprints as to what causes this. So we strongly suspect hardware.

Mork has a sister machine (mindy, of course) that never really worked (both are donated, used, HW). So mindy is mork's parts machine. This is a little dicey because we don't know why mindy did not work.

The RAM in these machines are arranged on 4 daughter boards. Last week we swapped all four of mindy's identically populated memory boards into mork. But at least one of the "new" sticks was bad because mork then showed differing amounts of memory across subsequent boots.

So we returned mork's original memory and ran the first three memtest tests. They showed no error. The final several tests are very time consuming and we may or may not do them, as mork's OS is down for these tests.

Today, we swapped mindy's two power supplies into mork. This is not because we strongly suspect the power supplies but because this is an easy exercise.

If mork hangs again, we are likely to replace the entire machine. Further component testing is becoming too cumbersome and time consuming. And after all, we now have the funds to do this because your very generous donations (thank you!!!).
____________

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12396
Credit: 2,668,175
RAC: 819
Netherlands
Message 1041833 - Posted: 13 Oct 2010, 19:42:32 UTC

Fun, those gremlins.
Threaten with heavy bodily (hardwarial) harm? ;-)
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile Jack Zhang
Volunteer tester
Avatar
Send message
Joined: 2 Jul 06
Posts: 206
Credit: 6,141,531
RAC: 893
Canada
Message 1041858 - Posted: 13 Oct 2010, 20:32:09 UTC

General Tip:

Memtest passing is not the whole story, Memory timing settings being too tight can also cause IO errors that aren't detectable by memtest.

From overclocking experience, sometimes rated timings do not necessarily mean stable.
____________
What if Fiction was Fact and Fact was Fiction and vice versa?

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,647,395
RAC: 516
United States
Message 1041866 - Posted: 13 Oct 2010, 21:14:00 UTC - in response to Message 1041858.

there could be the "reseat to clear" memory issues.. and they MIGHT not be back.. also suspect is electronic "disks"(non-physical) as well as many other possibilities. But it sounds like you proved either memory or bios issues on mindy. might be worth taking another look at?
____________

Janice

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12998
Credit: 7,665,209
RAC: 6,424
United States
Message 1041868 - Posted: 13 Oct 2010, 21:25:43 UTC

Thanks for the update. Much appreciated.

Sounds like an experience I had a long time ago with a memory test. Ran it and it said every chip was good. Re-ran and every chip was a failure. Knew right then and there they were all 100% good. Problem was elsewhere but still in the memory circuits. Turned out to be a broken trace on the motherboard.

Agree with cash on hand not worth chasing it down further, but might be worth it after it is replaced to have a box for something else.

____________

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46811
Credit: 37,000,889
RAC: 2,559
United States
Message 1041945 - Posted: 14 Oct 2010, 2:41:56 UTC
Last modified: 14 Oct 2010, 2:43:50 UTC

Daughter boards, Yeah I remember something like that on the Amiga 1000 and It's graphics/chip ram, As boards went they were ok, It was the pins coming up through the daughter board from the motherboard that was the problem, Needless to say when the computer worked It worked, As It was those contacts between the two boards that could cause problems. Good luck Jeff and keep up the good work.
____________
My Facebook, War Commander, 2015

Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 1991
Credit: 10,993,145
RAC: 10,361
United States
Message 1042043 - Posted: 14 Oct 2010, 15:02:49 UTC - in response to Message 1041979.

Jeff...
Thanks so much for the update on mork. Best of luck on Friday when you crank back up and he will be heavily stressed again.

Meow meow.


Agreed: but how about cranking up Beta Friday? NTM a stats export? (even a single would help!)

____________
.

Profile Byron Leigh Hatch @ team Carl SaganProject donor
Volunteer tester
Avatar
Send message
Joined: 5 Jul 99
Posts: 3621
Credit: 11,945,878
RAC: 1,123
Canada
Message 1042045 - Posted: 14 Oct 2010, 15:06:45 UTC

thank you Jeff, for the update


Best Wishes
Byron

Profile S@NL - Eesger - www.knoop.nl
Avatar
Send message
Joined: 7 Oct 01
Posts: 384
Credit: 37,572,962
RAC: 6,861
Netherlands
Message 1042092 - Posted: 14 Oct 2010, 17:24:05 UTC - in response to Message 1042043.

As always thanx for the update!

.. NTM a stats export? (even a single would help!)

I'm hoping for a stats-update also. I've made my system so that it can cope with a hickup in statsexports.. even across the change of a month.. but not two, so I really hope you guys can give the export-script a go this month.

Could you tell me if you can make it this month? If not I'dd really like to know.. the I will need to do some thinking & programming to make my stats cope with it..

Thanx very much in advance for your reply.
____________
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4600
Credit: 121,631,617
RAC: 41,010
United States
Message 1042199 - Posted: 15 Oct 2010, 3:08:54 UTC

I love weird H/W issues. We were using an old Compaq ProLiant 5500 server at work that was handed down from our IS department. One day it started randomly crashing/rebooting. Sometimes it would be up for a minute or 6-7 days. We tried reinstalling the OS several times, swapping out both of it's redundant PSUs, pulling out each of the 4 CPU & running the system with only 1 at a time, swapping out memory riser boards & RAM dimms, & swapping out SCSI controllers & drives.

Finally after all of that & several months of troubleshooting we gave up. I installed BOINC on the system & let it run to see what would happen. Turns out that it ran 24/7 without crashing/rebooting while BOINC was running the CPUs full tilt. If BOINC was closed the random reboots would start again.

So we left BOINC on and ran it as part of our infrastructure for 15 more months w/o a single issue. Recently I retired it from use as we got some newer more powerful, and MUCH quieter, machines.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Profile kepan
Send message
Joined: 17 Sep 99
Posts: 7
Credit: 27,395,720
RAC: 0
Sweden
Message 1042374 - Posted: 15 Oct 2010, 17:17:31 UTC

Does anyone knows why BOINCstats does not update the score for SETI@home?
____________

Ruopp
Send message
Joined: 18 May 99
Posts: 2
Credit: 3,313,343
RAC: 0
Switzerland
Message 1042377 - Posted: 15 Oct 2010, 17:30:48 UTC - in response to Message 1042375.

Does anyone knows why BOINCstats does not update the score for SETI@home?

It has updated my total credits, but not the graphs.
Not sure if that will happen today yet, or sort with tomorrow's update.



Extracted from Boinc stats FAQ:

How often is BOINCstats updated?

BOINCstats checks for XML updates every two hours, and, when available, downloads them, reads the content into the database and updates the credits and ranks.
The numbers from this update are used to display current credits and ranks for the stats only.
The incremental updates take between 15 minutes up to one hour to complete.

At 15:00GMT each day all new info from the XML files is imported into the BOINCstats database. New users/teams/countries are inserted at this point, and daily/weekly/monthly numbers are calculated. When there is no new XML file for more then a day, the stats will show zero credits for those days.
The numbers from this update are used to display the numbers on the frontpage and the detailed stats pages.
The daily update takes about 2,5 hours to complete.

The same update, but then just for hosts, runs each day at 1:00GMT, and takes about five hours to complete.

Only users, teams, hosts and countries with at least one (1) total credit are listed!

When an update is running, there is no check for new XML files until the update is finished. This is why the time since last update can be more than one hour.

Until this date, BOINCstats never failed to run its daily update , which means: when new credit is granted and the XML output by the project is OK, you'll get your credit on BOINCstats within 25 hours.

____________

Profile S@NL - Eesger - www.knoop.nl
Avatar
Send message
Joined: 7 Oct 01
Posts: 384
Credit: 37,572,962
RAC: 6,861
Netherlands
Message 1042391 - Posted: 15 Oct 2010, 18:29:54 UTC - in response to Message 1042092.

As always thanx for the update!
.. NTM a stats export? (even a single would help!)

I'm hoping for a stats-update also. ...


Yay! the stats-import is running, thanx guys (and girls?) You made me a happy man ;) (and all/most stats-lovers will get their stats-update shortly!)
____________
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS

Profile Ray_GTI-R
Avatar
Send message
Joined: 17 May 99
Posts: 56
Credit: 276,074
RAC: 0
United Kingdom
Message 1042476 - Posted: 15 Oct 2010, 22:03:03 UTC

it ran 24/7 without crashing/rebooting while BOINC was running the CPUs full tilt. If BOINC was closed the random reboots would start again.

This exact thing happened to me once, running SetiClassic. It turned out to be a CPU that was on the brink of failing.

HTH, Ray
____________
The difference between 0 and 1 is greater than the difference between 1 and 1,000,000

J. Mileski
Volunteer tester
Avatar
Send message
Joined: 9 Jun 02
Posts: 129
Credit: 23,319,557
RAC: 48
United States
Message 1042771 - Posted: 16 Oct 2010, 19:13:08 UTC
Last modified: 16 Oct 2010, 19:17:26 UTC

With the new servers on the way, I was wondering about our 3 day outage. I am under the impression that 2 of the 3 days are for the The Near-Time Persistency Checker, because it needs exclusive database access. I was wondering if a third database backup could be created and use that as the The Near-Time Persistency Checker database? Log the changes then on the database maintenance day, use a log to make changes to the master database then resynchronize the 3rd DB with the new results from the week. I hope I explained my idea good enough, I am a truck driver and only dabble in computers. I like to assemble components and see if I can make them work.

On edit, I was wondering if Mork is stable enough to take on this role
____________

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,647,395
RAC: 516
United States
Message 1042806 - Posted: 16 Oct 2010, 21:07:40 UTC

Just curious to hear from the lab.. do you guys think the bubble gum will hold? Or do you need more black tape?
____________

Janice

1 · 2 · Next

Message boards : Technical News : mork

Copyright © 2014 University of California