recent woes


log in

Advanced search

Message boards : Technical News : recent woes

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
Author Message
Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7101
Credit: 60,898,969
RAC: 17,263
Germany
Message 1039266 - Posted: 7 Oct 2010, 23:09:31 UTC - in response to Message 1039257.

Jeff, thanks for the news!

____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

Dudo
Send message
Joined: 25 Dec 99
Posts: 2
Credit: 6,648,547
RAC: 0
Croatia
Message 1039269 - Posted: 7 Oct 2010, 23:18:40 UTC

In my experience with overheating and main intensive DB server, hardware will be probably totaly dead in a year (first RAM, then disk and at the end a motherboard), at he same time replica server (same hardware, in the same rack) was alive, and alive, ... and good spare for the main server :-)
____________

Profile Widouxmaker
Avatar
Send message
Joined: 7 May 02
Posts: 12
Credit: 185,930
RAC: 0
United States
Message 1039284 - Posted: 8 Oct 2010, 0:06:54 UTC

I'd like more units but I'm grateful for the few I'm getting. You guys keep up the good work and I'll send another case...heh...Thanks!
____________
You talk'n to me?

Profile Jim_SProject donor
Avatar
Send message
Joined: 23 Feb 00
Posts: 4526
Credit: 18,813,736
RAC: 8,676
United States
Message 1039299 - Posted: 8 Oct 2010, 0:28:11 UTC

Thanks for the news Jeff...I'm here for the Science...I'll still be here when needed.
You guys work Hard...I for one appreciate it.
____________

I Desire Peace and Justice, Jim Scott (Mod-Ret.)

Profile rebestProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Apr 00
Posts: 1296
Credit: 32,872,183
RAC: 9,783
United States
Message 1039301 - Posted: 8 Oct 2010, 0:30:26 UTC

Thanks for the update!
____________

Join the PACK!

Odysseus
Volunteer tester
Avatar
Send message
Joined: 26 Jul 99
Posts: 1786
Credit: 3,830,311
RAC: 334
Canada
Message 1039303 - Posted: 8 Oct 2010, 0:33:27 UTC - in response to Message 1039162.

I have tasks ready to upload that will expire in 6 days!

I expect the admins will try and clear the backlog of uploading & reporting tasks before turning on the validator(s). As long as a given result is found when the parent WU goes up for validation, having been reported late makes no difference. To put it another way, the “deadline” can be understood, for all practical purposes, as the earliest moment that a task will be liable to validation—rather than automatic rejection. It’s even possible for a result to be accepted after missing a validator pass: if the validation is unsuccessful (whether due to errors or other missing results) or was inconclusive, resulting in a ‘resend’, it effectively gets a deadline extension to match the replacement tasks.

Anyway, the short version is that I wouldn’t give up on any work until I saw that the corresponding WUs had been validated without it.
____________

Profile ScarabDrowner
Volunteer tester
Avatar
Send message
Joined: 13 Sep 03
Posts: 90
Credit: 456,378
RAC: 0
United States
Message 1039370 - Posted: 8 Oct 2010, 2:44:40 UTC - in response to Message 1039303.

The server status page is mostly green again, yay!
____________

H Elzinga
Volunteer tester
Send message
Joined: 20 Aug 99
Posts: 125
Credit: 8,105,524
RAC: 743
Netherlands
Message 1039454 - Posted: 8 Oct 2010, 7:11:10 UTC - in response to Message 1039269.

In my experience with overheating and main intensive DB server, hardware will be probably totaly dead in a year (first RAM, then disk and at the end a motherboard), at he same time replica server (same hardware, in the same rack) was alive, and alive, ... and good spare for the main server :-)


BS

We had a failure of the AC in our main data center over a year ago (may 2009)while evrything was in full usage during bussiness hours.
Temprature inside the cabinets exceeded 48 degrees (we are measuring in Celcius) and many servers shut down on the overheatprotection.

We discoverd that we only had one casualty when we brought evrything up again.
In the months following no other machines failed.
If the event had any noticable impact on the whole server park it could be a slight increase in drive failures.
on the other hand this could be aswel due to the age of the machines so no hard evidence.

On 300 peices of hardware ther has not been a single RAM or MB failure.
____________

Profile ScarabDrowner
Volunteer tester
Avatar
Send message
Joined: 13 Sep 03
Posts: 90
Credit: 456,378
RAC: 0
United States
Message 1039457 - Posted: 8 Oct 2010, 7:19:51 UTC - in response to Message 1039454.

48C? I *WISH* my laptop would run at 48C... It's currently crunching at 68-70C.
____________

Profile ScarabDrowner
Volunteer tester
Avatar
Send message
Joined: 13 Sep 03
Posts: 90
Credit: 456,378
RAC: 0
United States
Message 1039462 - Posted: 8 Oct 2010, 7:28:34 UTC - in response to Message 1039458.

Laptops are, well, laptops.
They have always been constrained by cooling problems.


I understand that, but to hear 48C described as an overheat experience... That's near chilly for a computer in my experience.
____________

Ianab
Volunteer tester
Send message
Joined: 11 Jun 08
Posts: 673
Credit: 12,626,070
RAC: 6,054
New Zealand
Message 1039466 - Posted: 8 Oct 2010, 7:36:01 UTC - in response to Message 1039457.

48C? I *WISH* my laptop would run at 48C... It's currently crunching at 68-70C.


Yeah, but stick your laptop in a cupboard that's at 48C and see what happens. Raise the ambient temp by 20C, and you pretty much raise the component temp by 20C. Hopefully things shut down, or else something breaks down, often in unpredictable ways. The system board or power supply capacitors are a good example, high temps speed up their ageing, and they may not just die. They can just loose capacitance and make the machine hang at random.

It's certainly possible that the heat treatment has prematurely aged a motherboard in one of the servers. Before the cookup it was just within spec, now it's just outside and weird things happen.

Ian

RoosStar
Send message
Joined: 16 Oct 99
Posts: 48
Credit: 5,995,896
RAC: 8,557
Netherlands
Message 1039482 - Posted: 8 Oct 2010, 9:28:24 UTC - in response to Message 1039454.

Temprature inside the cabinets exceeded 48 degrees (we are measuring in Celcius) and many servers shut down on the overheatprotection.

When I understand him correctly, he did not say that the servers where running at 48 degrees, but that it was the "roomtemperature" in the cabinets. :D
____________

Richard Gardner
Send message
Joined: 9 Jul 03
Posts: 1
Credit: 736,823
RAC: 0
United Kingdom
Message 1039484 - Posted: 8 Oct 2010, 9:52:48 UTC

I have to say that for those of us who only have computers at work, where there is a shut down of everything over the weekend, the fact that the weekly outage always falls in the middle of the week means a very reduced window to upload/download. I have 3 big Astropulses that my machine hammered in a great time and finished on Monday evening but they may not report until next Monday.

I know this is a bad time but the 3 day outage hits me like this every week.
I appreciate that work can only be done when people are available but I thought I should point out that my long term results (7 years) are slowly dwindling away.

Profile Chris SProject donor
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 32064
Credit: 13,751,489
RAC: 26,237
United Kingdom
Message 1039502 - Posted: 8 Oct 2010, 12:13:35 UTC

Okay my 2c worth....

Firstly, thanks Jeff and Eric for taking the time to let us know what is going on. It might not seem so on occasions, but it IS appreciated.

Sometimes shutting down is the worst thing you can do to a system that is suppose to be available 24x7.

Yep, but as Seti isn't meant to be up 24/7 that's not a problem here.


Exactly. Boinc was introduced on the basis of "donating your idle computer time to science projects". Note idle computer time, it was never originally envisaged that power crunchers would want to run it 24/7. However most projects are actually up 24/7 and try to maintain that, but it has never been an agreed part of the offering.

That's appalling
news of whats up over a week apart
come on
That's no way to treat CONTRIBUTERS


That's a bit of a harsh comment there. The guys are doing their level best with minimal funding and old equipment. And they also have their own life and families to be part of as well.

____________
Damsel Rescuer, Uli Devotee, Julie Supporter, ES99 Admirer,
Raccoon Friend, Anniet fan, didn't take pot advice!


Eewec
Send message
Joined: 28 Nov 05
Posts: 19
Credit: 190,633
RAC: 0
United Kingdom
Message 1039510 - Posted: 8 Oct 2010, 12:29:28 UTC

.... I have a question. I've just looked at the server status page and in the list is db_purge.x86_64. Now, I've also noticed that during the downtimes when the upload / download servers are offline, unlike the other stat lines "Workunits waiting for db purging" and "Results waiting for db purging" never seem to zero out. I'm curious as to why this is. Surely it would make sense that if the db_purge server is up, that the purging zero's out with no new records being added to the queue.

Like I said, just curious.
____________

Earendil's Star
Send message
Joined: 1 Jun 03
Posts: 13
Credit: 6,493,876
RAC: 0
Zimbabwe
Message 1039512 - Posted: 8 Oct 2010, 12:30:45 UTC

This entry on the Seti home page is making me smile:

The weekly Seti outage allows for a "focus on science processing." Hmmm.

I feel an urge to giggle hysterically when I read, "On Friday, you may experience connectivity issues as the servers catch up with demand."

On the other hand the tasks of Jeff and Matt et al are no doubt arduous and often thankless and I am grateful for their efforts.

There may be merit in what some have suggested in terms of shutting it all down until ways can be found to make Seti run reliably.


____________

Profile Keith T.
Volunteer tester
Avatar
Send message
Joined: 23 Aug 99
Posts: 738
Credit: 232,725
RAC: 10
United Kingdom
Message 1039538 - Posted: 8 Oct 2010, 13:32:30 UTC - in response to Message 1038958.

I'm sure you already know this, and the equipment you're using is very likely much better than what I've used, but overheating problems followed by "hangs" might be caused by bad motherboard capacitors. Check the tops of the caps to see if they're expanded or open and leaking--if they are the motherboard is a goner. Sorry if this is obvious to you folks, but figured I would throw this in since I've run across it in the past. Good luck.


Hi Richard, welcome to the forums.

I expressed a similar theory in 2 posts in the thread http://setiathome.berkeley.edu/forum_thread.php?id=56160&nowrap=true#959975 in November last year, and again on 1 January this year.

I have personally seen this problem in recent years, on a friend's Athlon XP system. It happened first when plugging or unplugging USB devices caused a system hang, with Windows BSOD. A few weeks later the Blue Screens got more frequent, and an examination of the motherboard revealed about 50% of the capacitors had bulging or brown stained tops.

While I am sure that server grade hardware should be built to higher quality standards, and better tolerances than ordinary consumer equipment, I think some of the kit that SETI@home uses is pre-production or prototype.

Add to that the fact that the ageing of the capacitors may have been accelerated by recent overheating.

I hope Matt, Jeff and Eric may find this post helpful, I would certainly put flaky capacitors near the top of my suspect list.
I have just done a Google search on "bulging capacitors" that produced lots of results, including many images.

Keith.

George E. Lass
Send message
Joined: 18 May 99
Posts: 2
Credit: 1,130,349
RAC: 306
United States
Message 1039578 - Posted: 8 Oct 2010, 15:02:02 UTC - in response to Message 1039303.

I have tasks ready to upload that will expire in 6 days!

I expect the admins will try and clear the backlog of uploading & reporting tasks before turning on the validator(s). As long as a given result is found when the parent WU goes up for validation, having been reported late makes no difference. To put it another way, the “deadline” can be understood, for all practical purposes, as the earliest moment that a task will be liable to validation—rather than automatic rejection. It’s even possible for a result to be accepted after missing a validator pass: if the validation is unsuccessful (whether due to errors or other missing results) or was inconclusive, resulting in a ‘resend’, it effectively gets a deadline extension to match the replacement tasks.

Anyway, the short version is that I wouldn’t give up on any work until I saw that the corresponding WUs had been validated without it.


Thanks for that info

George
____________

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Technical News : recent woes

Copyright © 2014 University of California