mork

Message boards : Technical News : mork
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Jeff Cobb Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Mar 99
Posts: 122
Credit: 40,367
RAC: 0
United States
Message 1041811 - Posted: 13 Oct 2010, 18:55:52 UTC

I'm starting a thread to let people know what's going on with the mork (our boinc DB server) issue.

As most of you know, mork will sometimes hang, requiring a power cycle to boot. There are no footprints as to what causes this. So we strongly suspect hardware.

Mork has a sister machine (mindy, of course) that never really worked (both are donated, used, HW). So mindy is mork's parts machine. This is a little dicey because we don't know why mindy did not work.

The RAM in these machines are arranged on 4 daughter boards. Last week we swapped all four of mindy's identically populated memory boards into mork. But at least one of the "new" sticks was bad because mork then showed differing amounts of memory across subsequent boots.

So we returned mork's original memory and ran the first three memtest tests. They showed no error. The final several tests are very time consuming and we may or may not do them, as mork's OS is down for these tests.

Today, we swapped mindy's two power supplies into mork. This is not because we strongly suspect the power supplies but because this is an easy exercise.

If mork hangs again, we are likely to replace the entire machine. Further component testing is becoming too cumbersome and time consuming. And after all, we now have the funds to do this because your very generous donations (thank you!!!).
ID: 1041811 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1041833 - Posted: 13 Oct 2010, 19:42:32 UTC

Fun, those gremlins.
Threaten with heavy bodily (hardwarial) harm? ;-)
ID: 1041833 · Report as offensive
Profile Jack Zhang
Volunteer tester
Avatar

Send message
Joined: 2 Jul 06
Posts: 206
Credit: 6,142,449
RAC: 0
Canada
Message 1041858 - Posted: 13 Oct 2010, 20:32:09 UTC

General Tip:

Memtest passing is not the whole story, Memory timing settings being too tight can also cause IO errors that aren't detectable by memtest.

From overclocking experience, sometimes rated timings do not necessarily mean stable.
What if Fiction was Fact and Fact was Fiction and vice versa?
ID: 1041858 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1041866 - Posted: 13 Oct 2010, 21:14:00 UTC - in response to Message 1041858.  

there could be the "reseat to clear" memory issues.. and they MIGHT not be back.. also suspect is electronic "disks"(non-physical) as well as many other possibilities. But it sounds like you proved either memory or bios issues on mindy. might be worth taking another look at?
Janice
ID: 1041866 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30932
Credit: 53,134,872
RAC: 32
United States
Message 1041868 - Posted: 13 Oct 2010, 21:25:43 UTC

Thanks for the update. Much appreciated.

Sounds like an experience I had a long time ago with a memory test. Ran it and it said every chip was good. Re-ran and every chip was a failure. Knew right then and there they were all 100% good. Problem was elsewhere but still in the memory circuits. Turned out to be a broken trace on the motherboard.

Agree with cash on hand not worth chasing it down further, but might be worth it after it is replaced to have a box for something else.

ID: 1041868 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 66215
Credit: 55,293,173
RAC: 49
United States
Message 1041945 - Posted: 14 Oct 2010, 2:41:56 UTC
Last modified: 14 Oct 2010, 2:43:50 UTC

Daughter boards, Yeah I remember something like that on the Amiga 1000 and It's graphics/chip ram, As boards went they were ok, It was the pins coming up through the daughter board from the motherboard that was the problem, Needless to say when the computer worked It worked, As It was those contacts between the two boards that could cause problems. Good luck Jeff and keep up the good work.
Savoir-Faire is everywhere!
The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST

ID: 1041945 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1041979 - Posted: 14 Oct 2010, 6:17:28 UTC

Jeff...
Thanks so much for the update on mork. Best of luck on Friday when you crank back up and he will be heavily stressed again.

Meow meow.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1041979 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1042043 - Posted: 14 Oct 2010, 15:02:49 UTC - in response to Message 1041979.  

Jeff...
Thanks so much for the update on mork. Best of luck on Friday when you crank back up and he will be heavily stressed again.

Meow meow.


Agreed: but how about cranking up Beta Friday? NTM a stats export? (even a single would help!)

.

Hello, from Albany, CA!...
ID: 1042043 · Report as offensive
Profile Byron Leigh Hatch @ team Carl Sagan
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 4548
Credit: 35,667,570
RAC: 4
Canada
Message 1042045 - Posted: 14 Oct 2010, 15:06:45 UTC

thank you Jeff, for the update


Best Wishes
Byron
ID: 1042045 · Report as offensive
Profile S@NL - Eesger - www.knoop.nl
Avatar

Send message
Joined: 7 Oct 01
Posts: 385
Credit: 50,200,038
RAC: 0
Netherlands
Message 1042092 - Posted: 14 Oct 2010, 17:24:05 UTC - in response to Message 1042043.  

As always thanx for the update!
.. NTM a stats export? (even a single would help!)

I'm hoping for a stats-update also. I've made my system so that it can cope with a hickup in statsexports.. even across the change of a month.. but not two, so I really hope you guys can give the export-script a go this month.

Could you tell me if you can make it this month? If not I'dd really like to know.. the I will need to do some thinking & programming to make my stats cope with it..

Thanx very much in advance for your reply.
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS
ID: 1042092 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1042199 - Posted: 15 Oct 2010, 3:08:54 UTC

I love weird H/W issues. We were using an old Compaq ProLiant 5500 server at work that was handed down from our IS department. One day it started randomly crashing/rebooting. Sometimes it would be up for a minute or 6-7 days. We tried reinstalling the OS several times, swapping out both of it's redundant PSUs, pulling out each of the 4 CPU & running the system with only 1 at a time, swapping out memory riser boards & RAM dimms, & swapping out SCSI controllers & drives.

Finally after all of that & several months of troubleshooting we gave up. I installed BOINC on the system & let it run to see what would happen. Turns out that it ran 24/7 without crashing/rebooting while BOINC was running the CPUs full tilt. If BOINC was closed the random reboots would start again.

So we left BOINC on and ran it as part of our infrastructure for 15 more months w/o a single issue. Recently I retired it from use as we got some newer more powerful, and MUCH quieter, machines.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1042199 · Report as offensive
Profile kepan

Send message
Joined: 17 Sep 99
Posts: 7
Credit: 27,442,770
RAC: 0
Sweden
Message 1042374 - Posted: 15 Oct 2010, 17:17:31 UTC

Does anyone knows why BOINCstats does not update the score for SETI@home?
ID: 1042374 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1042375 - Posted: 15 Oct 2010, 17:19:46 UTC - in response to Message 1042374.  

Does anyone knows why BOINCstats does not update the score for SETI@home?

It has updated my total credits, but not the graphs.
Not sure if that will happen today yet, or sort with tomorrow's update.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1042375 · Report as offensive
Ruopp

Send message
Joined: 18 May 99
Posts: 2
Credit: 3,793,074
RAC: 0
Switzerland
Message 1042377 - Posted: 15 Oct 2010, 17:30:48 UTC - in response to Message 1042375.  

Does anyone knows why BOINCstats does not update the score for SETI@home?

It has updated my total credits, but not the graphs.
Not sure if that will happen today yet, or sort with tomorrow's update.



Extracted from Boinc stats FAQ:

How often is BOINCstats updated?

BOINCstats checks for XML updates every two hours, and, when available, downloads them, reads the content into the database and updates the credits and ranks.
The numbers from this update are used to display current credits and ranks for the stats only.
The incremental updates take between 15 minutes up to one hour to complete.

At 15:00GMT each day all new info from the XML files is imported into the BOINCstats database. New users/teams/countries are inserted at this point, and daily/weekly/monthly numbers are calculated. When there is no new XML file for more then a day, the stats will show zero credits for those days.
The numbers from this update are used to display the numbers on the frontpage and the detailed stats pages.
The daily update takes about 2,5 hours to complete.

The same update, but then just for hosts, runs each day at 1:00GMT, and takes about five hours to complete.

Only users, teams, hosts and countries with at least one (1) total credit are listed!

When an update is running, there is no check for new XML files until the update is finished. This is why the time since last update can be more than one hour.

Until this date, BOINCstats never failed to run its daily update , which means: when new credit is granted and the XML output by the project is OK, you'll get your credit on BOINCstats within 25 hours.

ID: 1042377 · Report as offensive
Profile S@NL - Eesger - www.knoop.nl
Avatar

Send message
Joined: 7 Oct 01
Posts: 385
Credit: 50,200,038
RAC: 0
Netherlands
Message 1042391 - Posted: 15 Oct 2010, 18:29:54 UTC - in response to Message 1042092.  

As always thanx for the update!
.. NTM a stats export? (even a single would help!)

I'm hoping for a stats-update also. ...


Yay! the stats-import is running, thanx guys (and girls?) You made me a happy man ;) (and all/most stats-lovers will get their stats-update shortly!)
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS
ID: 1042391 · Report as offensive
Profile Ray_GTI-R
Avatar

Send message
Joined: 17 May 99
Posts: 56
Credit: 276,906
RAC: 0
United Kingdom
Message 1042476 - Posted: 15 Oct 2010, 22:03:03 UTC

it ran 24/7 without crashing/rebooting while BOINC was running the CPUs full tilt. If BOINC was closed the random reboots would start again.

This exact thing happened to me once, running SetiClassic. It turned out to be a CPU that was on the brink of failing.

HTH, Ray
The difference between 0 and 1 is greater than the difference between 1 and 1,000,000
ID: 1042476 · Report as offensive
J. Mileski
Volunteer tester
Avatar

Send message
Joined: 9 Jun 02
Posts: 632
Credit: 172,116,532
RAC: 572
United States
Message 1042771 - Posted: 16 Oct 2010, 19:13:08 UTC
Last modified: 16 Oct 2010, 19:17:26 UTC

With the new servers on the way, I was wondering about our 3 day outage. I am under the impression that 2 of the 3 days are for the The Near-Time Persistency Checker, because it needs exclusive database access. I was wondering if a third database backup could be created and use that as the The Near-Time Persistency Checker database? Log the changes then on the database maintenance day, use a log to make changes to the master database then resynchronize the 3rd DB with the new results from the week. I hope I explained my idea good enough, I am a truck driver and only dabble in computers. I like to assemble components and see if I can make them work.

On edit, I was wondering if Mork is stable enough to take on this role
ID: 1042771 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1042775 - Posted: 16 Oct 2010, 19:27:16 UTC - in response to Message 1042771.  

With the new servers on the way, I was wondering about our 3 day outage. I am under the impression that 2 of the 3 days are for the The Near-Time Persistency Checker, because it needs exclusive database access. I was wondering if a third database backup could be created and use that as the The Near-Time Persistency Checker database? Log the changes then on the database maintenance day, use a log to make changes to the master database then resynchronize the 3rd DB with the new results from the week. I hope I explained my idea good enough, I am a truck driver and only dabble in computers. I like to assemble components and see if I can make them work.

On edit, I was wondering if Mork is stable enough to take on this role

Dunno.....and I'm not gonna bother Eric with such questions until the new servers are in the closet and producing.

There was conjecture long ago that a certain setup would allow continuous uptime of the project on our side, whilst allowing for proper backups on the fly.

The new science database will certainly have enough horsepower to allow much of the heavy duty science work to be done without much interference to the daily routine of the project.

We shall have to wait until everything is up and humming before those questions can be entertained.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1042775 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1042806 - Posted: 16 Oct 2010, 21:07:40 UTC

Just curious to hear from the lab.. do you guys think the bubble gum will hold? Or do you need more black tape?
Janice
ID: 1042806 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1042815 - Posted: 16 Oct 2010, 21:29:13 UTC - in response to Message 1042806.  

Just curious to hear from the lab.. do you guys think the bubble gum will hold? Or do you need more black tape?

Eric told me he was going to try to have a look at why the downloads seem so bottled up and the inbound bandwidth is so high.
Not sure when.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1042815 · Report as offensive
1 · 2 · Next

Message boards : Technical News : mork


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.