recent woes


log in

Advanced search

Message boards : Technical News : recent woes

Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next
Author Message
Profile EnsonDeath
Avatar
Send message
Joined: 1 Aug 10
Posts: 61
Credit: 2,317,715
RAC: 0
United States
Message 1039070 - Posted: 7 Oct 2010, 6:25:35 UTC

this is kind of a cross thread question about old equipment failure and the new server, would it be feasible/desirable to integrate a high speed real time off site backup of the servers via (just an example) http://www.asperasoft.com/en/products/synchronization_23/aspera_sync_23
and https://aws.amazon.com/ (or other company)

if they would agree to provide services for the tax deduction

just tossing the idea out there where your implementing new equipment

Profile [B^S] madmac
Volunteer tester
Avatar
Send message
Joined: 9 Feb 04
Posts: 1142
Credit: 3,739,748
RAC: 3,945
United Kingdom
Message 1039074 - Posted: 7 Oct 2010, 6:59:36 UTC

Thanks for the info any idea as to when the work I sent up before this latest outage and reported will show up on my account one work is showing on one comp but not the other it was reported around 19:00 UTC on the 29th Sept
____________

Berserker
Volunteer tester
Send message
Joined: 2 Jun 99
Posts: 105
Credit: 5,386,463
RAC: 0
United Kingdom
Message 1039077 - Posted: 7 Oct 2010, 7:15:48 UTC

Thanks for the update Jeff and Eric.

Please don't rush the memtest tests - If it takes leaving Mork offline for a day, then that's what it takes. Rooting out intermittent faults takes time.

You could also consider Prime95/mprime as a general system stability test. It's not as specific in pointing at the cause of a fault, but it might trigger the fault while you're watching, which may help in itself.
____________
Stats site - http://www.teamocuk.co.uk - still alive and (just about) kicking.

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1715
Credit: 204,912,905
RAC: 24,535
Australia
Message 1039084 - Posted: 7 Oct 2010, 8:55:54 UTC - in response to Message 1038955.

Chasing hardware problems can be a real nightmare.

But as a software "engineer" once said to me...
"Hardware NEVER fails"

Yeah right :-)

T.A.

slash2084
Volunteer tester
Send message
Joined: 19 Nov 02
Posts: 1
Credit: 78,962
RAC: 0
Germany
Message 1039088 - Posted: 7 Oct 2010, 10:33:07 UTC

hi all
when does SETI@home/AstroPulse Beta the upload online ?, i have 79 ready WU's that wait for upload

ty for answer ;)
____________

slepojpju
Volunteer tester
Send message
Joined: 19 Sep 10
Posts: 2
Credit: 1,605,223
RAC: 0
Russia
Message 1039091 - Posted: 7 Oct 2010, 11:13:33 UTC
Last modified: 7 Oct 2010, 11:17:50 UTC

Интересно. Когда возобновится обмен данными. Постоянные падения сервиса. Как же вы ребята хотите сигнал из космоса поймать если сервер постоянно падает? :( :( :(

It is interesting. When data exchange will renew. Constant falling of the server. How you children want signal from space to catch if the server constantly falls? :( :( :(

Don Austin
Send message
Joined: 30 Jul 99
Posts: 1
Credit: 954,939
RAC: 0
Japan
Message 1039110 - Posted: 7 Oct 2010, 12:31:34 UTC - in response to Message 1038933.
Last modified: 7 Oct 2010, 12:35:07 UTC

I have currently around 30 tasks waiting to be uploaded and according to my statistics chart, it hasn't been updated since sept 21st. I have been trying to manually upload for the past week or so and keep getting the "internet access is ok but servers may be temporarily down" message. Is this due to ongoing tech issues on your end or is there something wrong on my end?


I have exactly the same problem with my uploads as well.
____________

rriverstone
Avatar
Send message
Joined: 29 Sep 10
Posts: 22
Credit: 1,854
RAC: 0
United States
Message 1039125 - Posted: 7 Oct 2010, 13:32:54 UTC

I know this doesn't make sense, but it's the only Get Well card I could find with stars on it, and you gots a boo boo.

____________
For small creatures such as we the vastness is bearable only through love.
Carl Sagan

Profile Bishamon
Send message
Joined: 18 Jun 03
Posts: 7
Credit: 7,806,675
RAC: 2,459
Canada
Message 1039150 - Posted: 7 Oct 2010, 15:20:51 UTC - in response to Message 1039110.

I have currently around 30 tasks waiting to be uploaded and according to my statistics chart, it hasn't been updated since sept 21st. I have been trying to manually upload for the past week or so and keep getting the "internet access is ok but servers may be temporarily down" message. Is this due to ongoing tech issues on your end or is there something wrong on my end?


I have exactly the same problem with my uploads as well.


Same here.
____________

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3406
Credit: 20,089,256
RAC: 22,161
Sweden
Message 1039154 - Posted: 7 Oct 2010, 15:35:50 UTC

No hurry in getting the system up. If SETI s down for a week more, or even a month, or a year to solve the hardware problem, the world will not end.

Take the time you need.

____________

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,631,059
RAC: 6
United States
Message 1039157 - Posted: 7 Oct 2010, 15:43:27 UTC - in response to Message 1039093.

Они - едва дети. Данные находятся благополучно на диске несмотря на некоторую память и проблемы сервера. Очень ограниченный бюджет, стареющее оборудование и ограниченный штат берут их потери. Терпение пожалуйста, в то время как ваша попытка исправлять проблемы.

Спасибо,

Дженис
____________

Janice

George E. Lass
Send message
Joined: 18 May 99
Posts: 2
Credit: 1,115,435
RAC: 1,056
United States
Message 1039162 - Posted: 7 Oct 2010, 15:51:21 UTC - in response to Message 1039150.

I have tasks ready to upload that will expire in 6 days!
____________

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 12,148,788
RAC: 4,127
United States
Message 1039201 - Posted: 7 Oct 2010, 17:15:53 UTC - in response to Message 1038841.

Tom, the thing is there are a lot of folks who are 'SETI only' advocates. They have no interest in any 'foreign' distributed processing BOINC project (foreign defined as anything but SETI). For this group, given that SETI is increasingly a part time accessible project (running 'normally' is now defined as running with a planned half-week outage for uploads/downloads/reporting, and then you add the all too frequent outages being coped with these days), a one day cache will essentially shut them down for 60 to 75 percent of any given week.

This population is one which has been quite resistent to the multi-project functionality which has been part and parcel to the BOINC client. Enforcing a one day cache might well be seen as unduly targeting this group of strong SETI advocates -- a number of whom no doubt are regular contributors in terms of money and perhaps hardware. So taking a move which would antogonise this group would probably not be in the project's self interest.

Indeed, the release of work units with average run times (say of three hours or so) but with due dates well over a month out suggests instead a project recognition first of its own part time functioning status, but also of a need to support large per user caches to tide over SETI specific users during not only the regular half week outage cycle, but also the frequent 'unplanned' outages that have increasing defined SETI.

For those who have become increasingly multi-project oriented (either because of approval of the multi-project concept or in response to the decreasing availability of SETI), outage streaks (such as we seen here over the past several weeks) are something we get to cope with by redirecting our CPU and GPU useage to one of the many other smaller, lower resourced, and more reliable projects that populate the BOINC environment.

For me, I typiocally suspend SETI processes during SETI outages since I have plenty of other projects running. It is only with the current extended outage that I've found I need to only to suspend processing but also kill off SETI work units not yet started -- though the client will do that on its own when due dates actually pass.

The thing is, there remain hundreds if not thousands of users for whom SETI is the only project they will ever connect to. They are loyal to SETI (and for all I know, some might view those who are (and advocate) multiproject configurations as being in some way tainted). Forcing a one day cache on them is not the sort of thing I could ever see happening.



Could these latest issues be a result of the flood of work coming in after opening the pipe to the outside world?

Just because the SETI machines have gone into maintenance doesn't mean that all those BOINC machines are dozing. I believe that the SETI member have generally increased the amount of local work they hold to cover the normal 3 day outage and are are building up a massive backlog that has to look like a Tsunami once the gates are reopened. Now that the outage is longer the backlog is likely even larger causing even more problems for the SETI systems.

Is there anyway that the SETI team can force the SETI preferences to limit work to only one day on local machines until this gets sorted out?

Sometimes shutting down is the worst thing you can do to a system that is suppose to be available 24x7.


____________

Tom95134Project donor
Send message
Joined: 27 Nov 01
Posts: 213
Credit: 3,354,924
RAC: 986
United States
Message 1039239 - Posted: 7 Oct 2010, 18:34:53 UTC - in response to Message 1039201.

Thanks for the information.

I fully understand the one project only approach and I use to be in that group. However, by doing some careful tweaking of preferences I'm now servicing three projects with the major emphasis on SETI.

My comment was based on having been involved with other systems that were designed around the concept of having the central systems always available. This worked well until for various reasons the central systems went down and then had to catch up. The stress placed on these systems caused more problems that resulted a cascade of downtime followed by recovery stress followed by more downtime.

An example would be just how much impact is there on the SETI systems when everybody is out there hammering to upload and then to report hundreds or thousands of tasks.

The extended due dates at least keeps one from loosing credits.

It was just a thought.

Tom, the thing is there are a lot of folks who are 'SETI only' advocates. They have no interest in any 'foreign' distributed processing BOINC project (foreign defined as anything but SETI). For this group, given that SETI is increasingly a part time accessible project (running 'normally' is now defined as running with a planned half-week outage for uploads/downloads/reporting, and then you add the all too frequent outages being coped with these days), a one day cache will essentially shut them down for 60 to 75 percent of any given week.

(Part of message deleted)


Could these latest issues be a result of the flood of work coming in after opening the pipe to the outside world?

Just because the SETI machines have gone into maintenance doesn't mean that all those BOINC machines are dozing. I believe that the SETI member have generally increased the amount of local work they hold to cover the normal 3 day outage and are are building up a massive backlog that has to look like a Tsunami once the gates are reopened. Now that the outage is longer the backlog is likely even larger causing even more problems for the SETI systems.

Is there anyway that the SETI team can force the SETI preferences to limit work to only one day on local machines until this gets sorted out?

Sometimes shutting down is the worst thing you can do to a system that is suppose to be available 24x7.


servicing
____________

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1622
Credit: 22,185,548
RAC: 3,784
United States
Message 1039245 - Posted: 7 Oct 2010, 19:00:43 UTC

I agree with BarryAZ's comments.

While reading them, however, and I don't know why, I drifted off into the nether world of fantasy and envisioned a marauding gang of power users moving from project to project en masse, grabbing as much work as is available that could be legitimately processed, crunching until that project crashed or otherwise has no work, and then move on to the next project. They go back to seti between raids but move to the next target project when seti crashes again. The trigger would be any time seti crashes and no work is available. To keep some coherency, they keep a list of target projects that becomes the common round-robin list. No body knows when the maruaders would pillage the next project because seti is randomly stable. It is rumored that the humans responsible for this gang actually where rags on their heads, swill copious amounts of Captain Morgan's, and sing filthy songs while they adjust their resource shares and enable and disable the different projects on their rigs. Their culture is impatient and therefore they only employs a 1 day cache.

I can feel the Tropic sea breeze now.

Robert Ribbeck
Avatar
Send message
Joined: 7 Jun 02
Posts: 644
Credit: 5,283,174
RAC: 0
United States
Message 1039252 - Posted: 7 Oct 2010, 19:38:58 UTC
Last modified: 7 Oct 2010, 19:39:57 UTC

The fact still remains for what ever reason's in the last 30 days user have not been able to access or use seti more than 50% of the time

That's appalling
news of whats up over a week apart
come on
That's no way to treat CONTRIBUTERS
____________

Jeff Cobb
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 1 Mar 99
Posts: 110
Credit: 40,367
RAC: 0
United States
Message 1039257 - Posted: 7 Oct 2010, 22:27:07 UTC

We just ran a series of memory tests on mork, thus the forum outage. We ran a subset of memtest86+ and there were no errors. We will run more tests during next week's outage.
____________

Profile SciManStevProject donor
Volunteer tester
Avatar
Send message
Joined: 20 Jun 99
Posts: 4861
Credit: 81,833,330
RAC: 39,858
United States
Message 1039258 - Posted: 7 Oct 2010, 22:39:53 UTC - in response to Message 1039257.

Excellent news! You guys are doing an excellent job!

Steve
____________
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 15,571,329
RAC: 11,467
United States
Message 1039259 - Posted: 7 Oct 2010, 22:43:25 UTC

So, any idea when we can report and get new work?
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,631,059
RAC: 6
United States
Message 1039260 - Posted: 7 Oct 2010, 22:49:07 UTC

Keep us posted if there is anything you need please.

I have a hunch we would all like to see things smooth out.
____________

Janice

Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

Message boards : Technical News : recent woes

Copyright © 2014 University of California