Panic Mode On (78) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (78) Server Problems?

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 22 · Next
Author Message
Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5861
Credit: 60,380,649
RAC: 48,850
Australia
Message 1301981 - Posted: 4 Nov 2012, 7:53:14 UTC - in response to Message 1301975.
Last modified: 4 Nov 2012, 7:53:35 UTC

Even if i could report it wouldn't do me much good- the splitters appear to have slowed down, and the amount of work ready to send is falling like a stone. Add to that the database activity remains high, very high.

Hopefully this all just result of the Scheduler issues, or it's something else entirely new being piled up on top of the existing problems.
____________
Grant
Darwin NT.

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7100
Credit: 60,847,835
RAC: 17,006
Germany
Message 1301987 - Posted: 4 Nov 2012, 8:30:45 UTC - in response to Message 1301878.
Last modified: 4 Nov 2012, 8:48:19 UTC

Grant (SSSF) wrote:
My client_state is 2.4MB in size, my sched_request_setiathome.berkeley.edu is 450kB in size. I suspect yours are a lot smaller. You're in the US, i'm a few thousand kms away.
End result- you may be able to get work, i'm lucky if i can even report work- even after 30min of endless Update clicking with No New Tasks set.


BOINC send the sched_request_setiathome.berkeley.edu.xml file to the scheduler server.

If you set a small value at <max_tasks_reported> (cc_config.xml file), the file is small.

- - - - - - - - - -

I got response from Dave and sent him my sched_request_setiathome.berkeley.edu.xml file from an unsuccessful contact.

He will look why it happened.

The file had 259 KB and wasn't accepted.


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

tbretProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2858
Credit: 214,684,653
RAC: 177,482
United States
Message 1301988 - Posted: 4 Nov 2012, 8:31:25 UTC - in response to Message 1301914.
Last modified: 4 Nov 2012, 9:01:43 UTC

I can't believe it is Karma. It is just that the big guys are constipating the system. Everybody knows that. Until a politically acceptable and economic doable solution is proposed this is just venting.


No, actually I don't know that at all.

The top team has a RAC of about 8,000,000 and that's about 2% of the total. I doubt that if the top 100 machines were to explode, or burn, or vanish you would even know it happened.

I doubt Berkeley would know it happened.

There are a LOT of work units coming and going.

But what the STAFF needs to know is that this is NOT a bandwidth problem. Let me repeat that for everyone who keeps saying we have no bandwidth. This is NOT NOT NOT a bandwidth problem.

If it were a bandwidth problem, using a proxy would have no effect, but using a proxy has an effect.

If it were a bandwidth problem you couldn't upload/download/report. Right now I cannot report, but I can upload every completed task at a fair clip and *some* of the downloads happen pretty fast AT THE SAME TIME as others trickle. That's not indicative of a bandwidth problem.

THIS ISN'T A RAW BANDWIDTH PROBLEM.

The longer we all assume it is, the more time we don't-spend looking for whatever the problem really is.

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7100
Credit: 60,847,835
RAC: 17,006
Germany
Message 1301991 - Posted: 4 Nov 2012, 8:39:22 UTC
Last modified: 4 Nov 2012, 8:41:29 UTC

By the way ..

My machine can report uploaded results, but only if no work request simultaneously (*no new tasks* set).
Work request (also alone) don't work (Scheduler request failed: Timeout was reached).

I use BOINC V7.0.28.


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

BetelgeuseFive
Volunteer tester
Send message
Joined: 6 Jul 99
Posts: 62
Credit: 5,585,480
RAC: 9,180
Netherlands
Message 1301994 - Posted: 4 Nov 2012, 8:47:51 UTC


My computer has finished and reported all tasks and I get "project has no tasks available". But according to the website there are still a number of tasks that were sent yesterday that have not been reported. I would have expected a "resent lost task" message. Has "resent lost tasks" been disabled or is just another symptom of server side problems ?

Tom

____________

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5380
Credit: 304,998,311
RAC: 338,059
Brazil
Message 1301998 - Posted: 4 Nov 2012, 8:55:04 UTC - in response to Message 1301991.
Last modified: 4 Nov 2012, 8:56:09 UTC

By the way ..

My machine can report uploaded results, but only if no work request simultaneously (*no new tasks* set).
Work request (also alone) don't work (Scheduler request failed: Timeout was reached).

I use BOINC V7.0.28.


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *


Don´t want to enter in some type of controversy but...

I have few relatively hungry hosts, and i was able to feed all of them at >150kpbs and don´t need to use NNT. In some of them report more than 1000 WU with the use of a proxie.

And all simultaneusly (7 hosts DL at the same time from 3 diferent ISP). Before the proxie nothing works.

So the question remains, if bandwith is the problem, why everything works fine with a proxie, for any chance the proxie uses a diferent bandwith...
____________

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8525
Credit: 58,982,161
RAC: 79,529
United Kingdom
Message 1302000 - Posted: 4 Nov 2012, 8:57:38 UTC

Because of the way the distribution system works it is quite possible to get a "no tasks available" message when there are a lots (300,000) of tasks ready for distribution.
The distribution system works something like this - One hundred tasks are loaded into the output cache, the tasks are allocated to the next few users to request task, the tasks are now move to the delivery servers, a bit of internal house keeping is done, and the next one hundred tasks are loaded. The whole cycle takes about a second to complete, so if your request arrives just after the tasks have been allocated you have to make another attempt in a short while (and it can take a few attempts to arrive just at the right fraction of a second...)
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7100
Credit: 60,847,835
RAC: 17,006
Germany
Message 1302007 - Posted: 4 Nov 2012, 9:15:31 UTC - in response to Message 1301998.
Last modified: 4 Nov 2012, 9:16:48 UTC

Juan wrote:
Don´t want to enter in some type of controversy but...

I have few relatively hungry hosts, and i was able to feed all of them at >150kpbs and don´t need to use NNT. In some of them report more than 1000 WU with the use of a proxie.

And all simultaneusly (7 hosts DL at the same time from 3 diferent ISP). Before the proxie nothing works.

So the question remains, if bandwith is the problem, why everything works fine with a proxie, for any chance the proxie uses a diferent bandwith...


I don't know why this or this work or don't ..

The S@h crew have the knowledge about their equipment .. - and they are informed that something is broken.

I could only guess .. - maybe a S@h router need again a reboot, or something other ..


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8525
Credit: 58,982,161
RAC: 79,529
United Kingdom
Message 1302008 - Posted: 4 Nov 2012, 9:21:14 UTC

It certainly looks as if there is a problem - nothing in the "ready to send" queue, plenty of tapes available to split, splitters chugging at idle.
Something in the server closet needs the delicate attention of they tyre kicker in chief. But its the wee small hours of Sunday morning in CA, so it will be a few hours before he arises from his well earned slumbers and can administer the required potion.....
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5861
Credit: 60,380,649
RAC: 48,850
Australia
Message 1302023 - Posted: 4 Nov 2012, 9:45:33 UTC - in response to Message 1301991.

By the way ..

My machine can report uploaded results, but only if no work request simultaneously (*no new tasks* set).

Even with NNT set, most times i still get a Scheduler timeout.
____________
Grant
Darwin NT.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7299
Credit: 96,609,477
RAC: 69,917
Australia
Message 1302026 - Posted: 4 Nov 2012, 9:58:08 UTC - in response to Message 1302023.

All 3 of my rigs are getting work, 20 at a time resent lost tasks but I am getting them at good speed.

Cheers.
____________

Profile Fred E.Project donor
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,139,004
RAC: 2
United States
Message 1302033 - Posted: 4 Nov 2012, 10:42:15 UTC

When I connect on a work request, I'm now getting "no tasks avaiilable" on most, even though I had 150 lost tasks out there. Just got the first batch of 20, but it also included the "no tasks available" message. (

11/4/2012 4:14:48 AM Scheduler request completed: got 20 new tasks
11/4/2012 4:14:48 AM Resent lost task 09se12ac.20763.9065.140733193388045.10.10_0
(19 more of those, then:)
11/4/2012 4:14:48 AM Project has no tasks available
11/4/2012 4:14:48 AM Project requested delay of 303 seconds
etc.

I have not seen that combination of responses before. Usually doesn't try to assign new work when resending ghosts.

Like others, I can report on NNT with a limit of 50.

Rob wrote:

The distribution system works something like this - One hundred tasks are loaded into the output cache, the tasks are allocated to the next few users to request task, the tasks are now move to the delivery servers, a bit of internal house keeping is done, and the next one hundred tasks are loaded.

Rob, the feeder size was increased above 100 tasks several weeks ago - I die not see an announcement, but there were posts about it in the previous Panic Mode thread. Don't know if the refresh interval was changed or what the new feeder size is. Think 150 is the highest I've seen on one request.
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7299
Credit: 96,609,477
RAC: 69,917
Australia
Message 1302037 - Posted: 4 Nov 2012, 10:54:59 UTC - in response to Message 1302033.

Rob wrote:
The distribution system works something like this - One hundred tasks are loaded into the output cache, the tasks are allocated to the next few users to request task, the tasks are now move to the delivery servers, a bit of internal house keeping is done, and the next one hundred tasks are loaded.

Rob, the feeder size was increased above 100 tasks several weeks ago - I die not see an announcement, but there were posts about it in the previous Panic Mode thread. Don't know if the refresh interval was changed or what the new feeder size is. Think 150 is the highest I've seen on one request.

Actually the highest lot that I've received was 186 (plus many others well over 150) so the feeder must hold at least 200 though no official word has been given on that yet.

Cheers.
____________

fscheel
Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1302047 - Posted: 4 Nov 2012, 11:59:20 UTC - in response to Message 1301843.

The tasks in progress is now showing 592 with only 1 completed completed task on the machine. Is there something I need to do? Or do I just sit back and wait for it to correct itself? this machine has only received 2 tasks in the last 18 hours.

Profile Screaming Eagle
Volunteer tester
Send message
Joined: 3 Jun 12
Posts: 1
Credit: 184,114
RAC: 15
Australia
Message 1302051 - Posted: 4 Nov 2012, 12:18:24 UTC - in response to Message 1302047.

The tasks in progress is now showing 592 with only 1 completed completed task on the machine. Is there something I need to do? Or do I just sit back and wait for it to correct itself? this machine has only received 2 tasks in the last 18 hours.


I have this as well. I have 146 tasks sent to one of my machines but it has only received about 20 or so.

fscheel
Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1302053 - Posted: 4 Nov 2012, 12:26:40 UTC - in response to Message 1302047.

The tasks in progress is now showing 592 with only 1 completed completed task on the machine. Is there something I need to do? Or do I just sit back and wait for it to correct itself? this machine has only received 2 tasks in the last 18 hours.


Wow..within minutes of posting this the machine actually got 20 "resent lost tasks" Hoping this is a sign of things to come,

Profile Fred E.Project donor
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,139,004
RAC: 2
United States
Message 1302060 - Posted: 4 Nov 2012, 12:54:54 UTC
Last modified: 4 Nov 2012, 13:25:39 UTC

Scheduler seems to be assigning new work while handling lost tasks, and I have not seen that before. Earlier in the thread I said I was working on 150 lost tasks and had received the first 20. Have had more timeouts since then and just received the 2nd 20. But the total difference between the website totals and BOINCTasks counts has grown to 371, suggesting more tasks were assigned on those requests that timed out.

Must be happening to fscheel as well because the lost task count (592) exceeds the feeder amount, and mine also exceeds it.

BOINC will keep asking for work until it actually gets it, so I could easily get more than my cache setting stuck in the lost category. If I reduce my cache settings, can't get any of the lost tasks. Not sure what to do.

Edit: My lost task count contimues to cl8imb, and I just received 1 new task (not a lost task resend).
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4139
Credit: 33,418,520
RAC: 19,243
United Kingdom
Message 1302067 - Posted: 4 Nov 2012, 13:17:23 UTC
Last modified: 4 Nov 2012, 13:17:58 UTC

Results ready to send is now 0 and 0, and scheduler contacts to Main and Beta are going through without problem,

Claggy

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8629
Credit: 51,378,222
RAC: 50,319
United Kingdom
Message 1302073 - Posted: 4 Nov 2012, 14:05:45 UTC - in response to Message 1302067.

Results ready to send is now 0 and 0, and scheduler contacts to Main and Beta are going through without problem,

Claggy

Not necessarily so. Take my host 5828732, which I'm monitoring for boinc_dev.

Scheduler request at 13:23 - allocated 34 new tasks
Scheduler request at 13:30 - oldest 20 'in progress' retimed
Scheduler request at 13:38 - oldest 20 'in progress' retimed

All three requests ended in timeout, although the latter two should have been 'resent lost tasks'.

This morning only, that host has been allocated:

08:59:01 UTC - 40 tasks
09:36:57 UTC - 19 tasks
09:44:17 UTC - 44 tasks
10:10:39 UTC - 48 tasks
11:46:47 UTC - 1 task
12:04:47 UTC - 49 tasks
12:31:04 UTC - 38 tasks
12:37:54 UTC - 34 tasks
13:23:12 UTC - 34 tasks

It's a moderately powerful laptop, with a 1 day cache specified. For the mid-AR tasks which have been split this morning, 34 tasks is roughly a whole day's work. So my alleged 'work in progress' list probably represents five or six times the amount of work I've actually asked for - but the only one I received was that single-task allocation (somebody else's weird timeout 1108392847, not new work)

And then while I was typing, this came in:

SETI@home 04/11/2012 13:50:37 Sending scheduler request: To fetch work.
SETI@home 04/11/2012 13:52:36 Scheduler request completed: got 20 new tasks

Some resends at last, but it took the scheduler two minutes to assemble the list.

WendyR
Volunteer tester
Avatar
Send message
Joined: 1 Aug 05
Posts: 44
Credit: 1,962,140
RAC: 0
United States
Message 1302085 - Posted: 4 Nov 2012, 14:26:33 UTC - in response to Message 1302073.

I have been able to report tasks using the NNT trick. That responds pretty quickly, and the tasks disappear from my local computer, and show up on the web in a few seconds.

It seems that GETTING new tasks is the issue right now. The web page claims I should have 160 some tasks, but the local computer shows 28. Occasionally, I actually get something, most of the successful ones seem to be the resending 20 tasks variety.
____________

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (78) Server Problems?

Copyright © 2014 University of California