Panic Mode On (78) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (78) Server Problems?

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 22 · Next
Author Message
Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8550
Credit: 50,379,826
RAC: 50,659
United Kingdom
Message 1302086 - Posted: 4 Nov 2012, 14:32:20 UTC - in response to Message 1302085.

I was just checking the same thing. Across 7 machines, the server thinks I've got more than 1500 tasks in progress. But I've actually got 91 - and almost all of those are because two machines have started getting lost tasks resent within the last hour.

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4101
Credit: 33,140,679
RAC: 8,337
United Kingdom
Message 1302088 - Posted: 4 Nov 2012, 14:35:44 UTC - in response to Message 1302073.

Results ready to send is now 0 and 0, and scheduler contacts to Main and Beta are going through without problem,

Claggy

Not necessarily so. Take my host 5828732, which

Sorry, i should have said scheduler contacts are going through more speedily when reporting only, trying to get the resends resent is still hit and miss,

Claggy

Profile Bill GProject donor
Avatar
Send message
Joined: 1 Jun 01
Posts: 347
Credit: 41,991,571
RAC: 77,969
United States
Message 1302090 - Posted: 4 Nov 2012, 14:40:27 UTC - in response to Message 1302086.

I have one computer that has just over 400 tasks on it while the server thinks it has just short of 5,200........
It can not connect to the server and should I get that many tasks I will have to abort most of them because they will never finish in time. I do keep trying to connect but am unable to.
I guess there is nothing I can do about this right now.
____________

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12577
Credit: 6,881,707
RAC: 6,589
United States
Message 1302096 - Posted: 4 Nov 2012, 14:49:04 UTC

Shortly the server will break from the ghost storm it is creating. It will run out of result storage space. Perhaps then Eric and or Jeff will take action. With Matt out I hope they have notes on the last time this happened.

____________

fscheel
Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1302100 - Posted: 4 Nov 2012, 14:58:54 UTC - in response to Message 1302053.

The tasks in progress is now showing 592 with only 1 completed completed task on the machine. Is there something I need to do? Or do I just sit back and wait for it to correct itself? this machine has only received 2 tasks in the last 18 hours.


Wow..within minutes of posting this the machine actually got 20 "resent lost tasks" Hoping this is a sign of things to come,



After getting the 20 resent task crunched and reported have not got anything else. since it appears the lost tasks are all for my GPU I set it to only use the gpu. however, instead of getting lost tasks resent it appears to try to send new tasks and the number of lost tasks is up to 701.

Filipe
Send message
Joined: 12 Aug 00
Posts: 111
Credit: 4,106,573
RAC: 434
Portugal
Message 1302102 - Posted: 4 Nov 2012, 15:03:36 UTC

What if some of us just suspend network contact with the seti server for maybe 12hours?

It could help, since the server is hamerred with 1300queries/second.

Most of witch are the same computers trying again an again to report or receive new tasks desesperately?
____________

Profile Fred E.Project donor
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,139,004
RAC: 7
United States
Message 1302106 - Posted: 4 Nov 2012, 15:09:35 UTC

What if some of us just suspend network contact with the seti server for maybe 12hours?

It could help, since the server is hamerred with 1300queries/second.

Most of witch are the same computers trying again an again to report or receive new tasks desesperately?

I've gone NNT and will stay there for a day or two - have 2.75 days of CPU work and 5 days of gpu work, so I will wait it out. After my last post, I again received a new task rather than a lost task resend, so that's when I went NNT.

If you suspend network operations, you can't report completions, and some folks crunch a second project (or more).
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24186
Credit: 33,308,463
RAC: 24,348
Germany
Message 1302119 - Posted: 4 Nov 2012, 15:35:10 UTC - in response to Message 1302102.

What if some of us just suspend network contact with the seti server for maybe 12hours?

It could help, since the server is hamerred with 1300queries/second.

Most of witch are the same computers trying again an again to report or receive new tasks desesperately?


Maybe a good idea.

No server contact from me for the next 12 hours.

____________

fscheel
Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1302139 - Posted: 4 Nov 2012, 16:36:47 UTC

If I reset the project will it abort the lost tasks and resend them to me or someone else?

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8550
Credit: 50,379,826
RAC: 50,659
United Kingdom
Message 1302141 - Posted: 4 Nov 2012, 16:42:59 UTC - in response to Message 1302139.

If I reset the project will it abort the lost tasks and resend them to me or someone else?

No - reset applies to your local machine only. You would lose all the work you have managed to download (without telling the server anything). Exactly the opposite of what you're trying to achieve.

fscheel
Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1302146 - Posted: 4 Nov 2012, 16:49:27 UTC - in response to Message 1302141.

If I reset the project will it abort the lost tasks and resend them to me or someone else?

No - reset applies to your local machine only. You would lose all the work you have managed to download (without telling the server anything). Exactly the opposite of what you're trying to achieve.


Ok thanks. I didn't know what it did and therefore I asked the question.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8550
Credit: 50,379,826
RAC: 50,659
United Kingdom
Message 1302148 - Posted: 4 Nov 2012, 16:53:41 UTC - in response to Message 1302146.

I didn't know what it did and therefore I asked the question.

Now that is exactly the right thing to do. So many click first, and ask questions afterwards...

chromespringerProject donor
Avatar
Send message
Joined: 3 Dec 05
Posts: 269
Credit: 21,778,132
RAC: 47,287
United States
Message 1302153 - Posted: 4 Nov 2012, 16:55:52 UTC

11/4/2012 9:25:57 AM | SETI@home | update requested by user
11/4/2012 9:26:00 AM | SETI@home | Sending scheduler request: Requested by user.
11/4/2012 9:26:00 AM | SETI@home | Reporting 8 completed tasks, requesting new tasks for CPU and ATI
11/4/2012 9:31:10 AM | SETI@home | Scheduler request failed: Timeout was reached
11/4/2012 9:31:13 AM | | Project communication failed: attempting access to reference site
11/4/2012 9:31:14 AM | | Internet access OK - project servers may be temporarily down.
11/4/2012 9:39:25 AM | SETI@home | Sending scheduler request: To fetch work.
11/4/2012 9:39:25 AM | SETI@home | Reporting 8 completed tasks, requesting new tasks for CPU and ATI
11/4/2012 9:43:44 AM | SETI@home | update requested by user
11/4/2012 9:44:32 AM | SETI@home | Scheduler request failed: Timeout was reached
11/4/2012 9:44:36 AM | | Project communication failed: attempting access to reference site
11/4/2012 9:44:37 AM | | Internet access OK - project servers may be temporarily down.


well, that was a somewhat short-lived spurt LOL :)
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8550
Credit: 50,379,826
RAC: 50,659
United Kingdom
Message 1302155 - Posted: 4 Nov 2012, 16:59:13 UTC - in response to Message 1302086.

I was just checking the same thing. Across 7 machines, the server thinks I've got more than 1500 tasks in progress. But I've actually got 91 - and almost all of those are because two machines have started getting lost tasks resent within the last hour.

My notional list of "work in progress" has gone up from 1,500 to 2,100 in the last two hours.

Everything is getting set to NNT and staying there, until I see zero tasks ready to send AND the splitters disabled.

WendyR
Volunteer tester
Avatar
Send message
Joined: 1 Aug 05
Posts: 44
Credit: 1,962,140
RAC: 0
United States
Message 1302162 - Posted: 4 Nov 2012, 17:06:40 UTC - in response to Message 1302155.

Will it get resends when on NNT??

My total is now closer to 300 "missing" tasks.
____________

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4101
Credit: 33,140,679
RAC: 8,337
United Kingdom
Message 1302163 - Posted: 4 Nov 2012, 17:09:51 UTC - in response to Message 1302162.
Last modified: 4 Nov 2012, 17:11:52 UTC

Will it get resends when on NNT??

My total is now closer to 300 "missing" tasks.

No, not at this project, some projects like Einstein (which uses older server software) do get resends when NNT is set, with newer Server software resends only happen if you ask for work.

Claggy

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8550
Credit: 50,379,826
RAC: 50,659
United Kingdom
Message 1302169 - Posted: 4 Nov 2012, 17:20:03 UTC

Anybody got any idea whether we're being haunted by Astropulse ghosts, to the same degree?

I think it's getting to the point that I need to put out an urgent APB, reluctant though I am to do so on a Sunday - just to get a lid put on the situation, until they can look at it tomorrow or Tuesday.

The question is - is stopping the MB splitters enough, or should I ask them to stop AP as well?

With splitters stopped, and no new work entering the system, we could allow work fetch and fish for resends of the work we've already supposedly got.

Profile Chris SProject donor
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 31763
Credit: 13,177,510
RAC: 37,569
United Kingdom
Message 1302173 - Posted: 4 Nov 2012, 17:27:59 UTC

There is only so much that can be done remotely on a Sunday morning. But you have enough experience to make a judgement call.

GaryG
Avatar
Send message
Joined: 17 Mar 12
Posts: 8
Credit: 2,593,273
RAC: 0
United States
Message 1302174 - Posted: 4 Nov 2012, 17:28:32 UTC - in response to Message 1302169.

Anybody got any idea whether we're being haunted by Astropulse ghosts, to the same degree?

I think it's getting to the point that I need to put out an urgent APB, reluctant though I am to do so on a Sunday - just to get a lid put on the situation, until they can look at it tomorrow or Tuesday.

The question is - is stopping the MB splitters enough, or should I ask them to stop AP as well?

With splitters stopped, and no new work entering the system, we could allow work fetch and fish for resends of the work we've already supposedly got.


My vote would be to stop both. If most systems are like mine upwards of 900 tasks assigned but not received so there is plenty of work to run if we can get it.
____________

chromespringerProject donor
Avatar
Send message
Joined: 3 Dec 05
Posts: 269
Credit: 21,778,132
RAC: 47,287
United States
Message 1302175 - Posted: 4 Nov 2012, 17:29:31 UTC

I'm showing over 5000 "in progress" with 0 work in BOINC Manager
____________

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (78) Server Problems?

Copyright © 2014 University of California