Panic Mode On (78) Server Problems?

Message boards : Number crunching : Panic Mode On (78) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 22 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1301981 - Posted: 4 Nov 2012, 7:53:14 UTC - in response to Message 1301975.  
Last modified: 4 Nov 2012, 7:53:35 UTC

Even if i could report it wouldn't do me much good- the splitters appear to have slowed down, and the amount of work ready to send is falling like a stone. Add to that the database activity remains high, very high.

Hopefully this all just result of the Scheduler issues, or it's something else entirely new being piled up on top of the existing problems.
Grant
Darwin NT
ID: 1301981 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1301987 - Posted: 4 Nov 2012, 8:30:45 UTC - in response to Message 1301878.  
Last modified: 4 Nov 2012, 8:48:19 UTC

Grant (SSSF) wrote:
My client_state is 2.4MB in size, my sched_request_setiathome.berkeley.edu is 450kB in size. I suspect yours are a lot smaller. You're in the US, i'm a few thousand kms away.
End result- you may be able to get work, i'm lucky if i can even report work- even after 30min of endless Update clicking with No New Tasks set.


BOINC send the sched_request_setiathome.berkeley.edu.xml file to the scheduler server.

If you set a small value at <max_tasks_reported> (cc_config.xml file), the file is small.

- - - - - - - - - -

I got response from Dave and sent him my sched_request_setiathome.berkeley.edu.xml file from an unsuccessful contact.

He will look why it happened.

The file had 259 KB and wasn't accepted.


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
ID: 1301987 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1301988 - Posted: 4 Nov 2012, 8:31:25 UTC - in response to Message 1301914.  
Last modified: 4 Nov 2012, 9:01:43 UTC

I can't believe it is Karma. It is just that the big guys are constipating the system. Everybody knows that. Until a politically acceptable and economic doable solution is proposed this is just venting.


No, actually I don't know that at all.

The top team has a RAC of about 8,000,000 and that's about 2% of the total. I doubt that if the top 100 machines were to explode, or burn, or vanish you would even know it happened.

I doubt Berkeley would know it happened.

There are a LOT of work units coming and going.

But what the STAFF needs to know is that this is NOT a bandwidth problem. Let me repeat that for everyone who keeps saying we have no bandwidth. This is NOT NOT NOT a bandwidth problem.

If it were a bandwidth problem, using a proxy would have no effect, but using a proxy has an effect.

If it were a bandwidth problem you couldn't upload/download/report. Right now I cannot report, but I can upload every completed task at a fair clip and *some* of the downloads happen pretty fast AT THE SAME TIME as others trickle. That's not indicative of a bandwidth problem.

THIS ISN'T A RAW BANDWIDTH PROBLEM.

The longer we all assume it is, the more time we don't-spend looking for whatever the problem really is.
ID: 1301988 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1301991 - Posted: 4 Nov 2012, 8:39:22 UTC
Last modified: 4 Nov 2012, 8:41:29 UTC

By the way ..

My machine can report uploaded results, but only if no work request simultaneously (*no new tasks* set).
Work request (also alone) don't work (Scheduler request failed: Timeout was reached).

I use BOINC V7.0.28.


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
ID: 1301991 · Report as offensive
BetelgeuseFive Project Donor
Volunteer tester

Send message
Joined: 6 Jul 99
Posts: 158
Credit: 17,117,787
RAC: 19
Netherlands
Message 1301994 - Posted: 4 Nov 2012, 8:47:51 UTC


My computer has finished and reported all tasks and I get "project has no tasks available". But according to the website there are still a number of tasks that were sent yesterday that have not been reported. I would have expected a "resent lost task" message. Has "resent lost tasks" been disabled or is just another symptom of server side problems ?

Tom

ID: 1301994 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1301998 - Posted: 4 Nov 2012, 8:55:04 UTC - in response to Message 1301991.  
Last modified: 4 Nov 2012, 8:56:09 UTC

By the way ..

My machine can report uploaded results, but only if no work request simultaneously (*no new tasks* set).
Work request (also alone) don't work (Scheduler request failed: Timeout was reached).

I use BOINC V7.0.28.


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *


Don´t want to enter in some type of controversy but...

I have few relatively hungry hosts, and i was able to feed all of them at >150kpbs and don´t need to use NNT. In some of them report more than 1000 WU with the use of a proxie.

And all simultaneusly (7 hosts DL at the same time from 3 diferent ISP). Before the proxie nothing works.

So the question remains, if bandwith is the problem, why everything works fine with a proxie, for any chance the proxie uses a diferent bandwith...
ID: 1301998 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22149
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1302000 - Posted: 4 Nov 2012, 8:57:38 UTC

Because of the way the distribution system works it is quite possible to get a "no tasks available" message when there are a lots (300,000) of tasks ready for distribution.
The distribution system works something like this - One hundred tasks are loaded into the output cache, the tasks are allocated to the next few users to request task, the tasks are now move to the delivery servers, a bit of internal house keeping is done, and the next one hundred tasks are loaded. The whole cycle takes about a second to complete, so if your request arrives just after the tasks have been allocated you have to make another attempt in a short while (and it can take a few attempts to arrive just at the right fraction of a second...)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1302000 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1302007 - Posted: 4 Nov 2012, 9:15:31 UTC - in response to Message 1301998.  
Last modified: 4 Nov 2012, 9:16:48 UTC

Juan wrote:
Don´t want to enter in some type of controversy but...

I have few relatively hungry hosts, and i was able to feed all of them at >150kpbs and don´t need to use NNT. In some of them report more than 1000 WU with the use of a proxie.

And all simultaneusly (7 hosts DL at the same time from 3 diferent ISP). Before the proxie nothing works.

So the question remains, if bandwith is the problem, why everything works fine with a proxie, for any chance the proxie uses a diferent bandwith...


I don't know why this or this work or don't ..

The S@h crew have the knowledge about their equipment .. - and they are informed that something is broken.

I could only guess .. - maybe a S@h router need again a reboot, or something other ..


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
ID: 1302007 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22149
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1302008 - Posted: 4 Nov 2012, 9:21:14 UTC

It certainly looks as if there is a problem - nothing in the "ready to send" queue, plenty of tapes available to split, splitters chugging at idle.
Something in the server closet needs the delicate attention of they tyre kicker in chief. But its the wee small hours of Sunday morning in CA, so it will be a few hours before he arises from his well earned slumbers and can administer the required potion.....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1302008 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1302023 - Posted: 4 Nov 2012, 9:45:33 UTC - in response to Message 1301991.  

By the way ..

My machine can report uploaded results, but only if no work request simultaneously (*no new tasks* set).

Even with NNT set, most times i still get a Scheduler timeout.
Grant
Darwin NT
ID: 1302023 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1302026 - Posted: 4 Nov 2012, 9:58:08 UTC - in response to Message 1302023.  

All 3 of my rigs are getting work, 20 at a time resent lost tasks but I am getting them at good speed.

Cheers.
ID: 1302026 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1302033 - Posted: 4 Nov 2012, 10:42:15 UTC

When I connect on a work request, I'm now getting "no tasks avaiilable" on most, even though I had 150 lost tasks out there. Just got the first batch of 20, but it also included the "no tasks available" message. (

11/4/2012 4:14:48 AM Scheduler request completed: got 20 new tasks
11/4/2012 4:14:48 AM Resent lost task 09se12ac.20763.9065.140733193388045.10.10_0
(19 more of those, then:)
11/4/2012 4:14:48 AM Project has no tasks available
11/4/2012 4:14:48 AM Project requested delay of 303 seconds
etc.

I have not seen that combination of responses before. Usually doesn't try to assign new work when resending ghosts.

Like others, I can report on NNT with a limit of 50.

Rob wrote:
The distribution system works something like this - One hundred tasks are loaded into the output cache, the tasks are allocated to the next few users to request task, the tasks are now move to the delivery servers, a bit of internal house keeping is done, and the next one hundred tasks are loaded.

Rob, the feeder size was increased above 100 tasks several weeks ago - I die not see an announcement, but there were posts about it in the previous Panic Mode thread. Don't know if the refresh interval was changed or what the new feeder size is. Think 150 is the highest I've seen on one request.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1302033 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1302037 - Posted: 4 Nov 2012, 10:54:59 UTC - in response to Message 1302033.  

Rob wrote:
The distribution system works something like this - One hundred tasks are loaded into the output cache, the tasks are allocated to the next few users to request task, the tasks are now move to the delivery servers, a bit of internal house keeping is done, and the next one hundred tasks are loaded.

Rob, the feeder size was increased above 100 tasks several weeks ago - I die not see an announcement, but there were posts about it in the previous Panic Mode thread. Don't know if the refresh interval was changed or what the new feeder size is. Think 150 is the highest I've seen on one request.

Actually the highest lot that I've received was 186 (plus many others well over 150) so the feeder must hold at least 200 though no official word has been given on that yet.

Cheers.
ID: 1302037 · Report as offensive
fscheel

Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1302047 - Posted: 4 Nov 2012, 11:59:20 UTC - in response to Message 1301843.  

The tasks in progress is now showing 592 with only 1 completed completed task on the machine. Is there something I need to do? Or do I just sit back and wait for it to correct itself? this machine has only received 2 tasks in the last 18 hours.
ID: 1302047 · Report as offensive
Profile Screaming97135Eagle
Volunteer tester

Send message
Joined: 3 Jun 12
Posts: 4
Credit: 1,034,996
RAC: 10
Australia
Message 1302051 - Posted: 4 Nov 2012, 12:18:24 UTC - in response to Message 1302047.  

The tasks in progress is now showing 592 with only 1 completed completed task on the machine. Is there something I need to do? Or do I just sit back and wait for it to correct itself? this machine has only received 2 tasks in the last 18 hours.


I have this as well. I have 146 tasks sent to one of my machines but it has only received about 20 or so.
ID: 1302051 · Report as offensive
fscheel

Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1302053 - Posted: 4 Nov 2012, 12:26:40 UTC - in response to Message 1302047.  

The tasks in progress is now showing 592 with only 1 completed completed task on the machine. Is there something I need to do? Or do I just sit back and wait for it to correct itself? this machine has only received 2 tasks in the last 18 hours.


Wow..within minutes of posting this the machine actually got 20 "resent lost tasks" Hoping this is a sign of things to come,
ID: 1302053 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1302060 - Posted: 4 Nov 2012, 12:54:54 UTC
Last modified: 4 Nov 2012, 13:25:39 UTC

Scheduler seems to be assigning new work while handling lost tasks, and I have not seen that before. Earlier in the thread I said I was working on 150 lost tasks and had received the first 20. Have had more timeouts since then and just received the 2nd 20. But the total difference between the website totals and BOINCTasks counts has grown to 371, suggesting more tasks were assigned on those requests that timed out.

Must be happening to fscheel as well because the lost task count (592) exceeds the feeder amount, and mine also exceeds it.

BOINC will keep asking for work until it actually gets it, so I could easily get more than my cache setting stuck in the lost category. If I reduce my cache settings, can't get any of the lost tasks. Not sure what to do.

Edit: My lost task count contimues to cl8imb, and I just received 1 new task (not a lost task resend).
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1302060 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1302067 - Posted: 4 Nov 2012, 13:17:23 UTC
Last modified: 4 Nov 2012, 13:17:58 UTC

Results ready to send is now 0 and 0, and scheduler contacts to Main and Beta are going through without problem,

Claggy
ID: 1302067 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14645
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1302073 - Posted: 4 Nov 2012, 14:05:45 UTC - in response to Message 1302067.  

Results ready to send is now 0 and 0, and scheduler contacts to Main and Beta are going through without problem,

Claggy

Not necessarily so. Take my host 5828732, which I'm monitoring for boinc_dev.

Scheduler request at 13:23 - allocated 34 new tasks
Scheduler request at 13:30 - oldest 20 'in progress' retimed
Scheduler request at 13:38 - oldest 20 'in progress' retimed

All three requests ended in timeout, although the latter two should have been 'resent lost tasks'.

This morning only, that host has been allocated:

08:59:01 UTC - 40 tasks
09:36:57 UTC - 19 tasks
09:44:17 UTC - 44 tasks
10:10:39 UTC - 48 tasks
11:46:47 UTC - 1 task
12:04:47 UTC - 49 tasks
12:31:04 UTC - 38 tasks
12:37:54 UTC - 34 tasks
13:23:12 UTC - 34 tasks

It's a moderately powerful laptop, with a 1 day cache specified. For the mid-AR tasks which have been split this morning, 34 tasks is roughly a whole day's work. So my alleged 'work in progress' list probably represents five or six times the amount of work I've actually asked for - but the only one I received was that single-task allocation (somebody else's weird timeout 1108392847, not new work)

And then while I was typing, this came in:

SETI@home 04/11/2012 13:50:37 Sending scheduler request: To fetch work.
SETI@home 04/11/2012 13:52:36 Scheduler request completed: got 20 new tasks

Some resends at last, but it took the scheduler two minutes to assemble the list.
ID: 1302073 · Report as offensive
WendyR
Volunteer tester
Avatar

Send message
Joined: 1 Aug 05
Posts: 44
Credit: 1,962,140
RAC: 0
United States
Message 1302085 - Posted: 4 Nov 2012, 14:26:33 UTC - in response to Message 1302073.  

I have been able to report tasks using the NNT trick. That responds pretty quickly, and the tasks disappear from my local computer, and show up on the web in a few seconds.

It seems that GETTING new tasks is the issue right now. The web page claims I should have 160 some tasks, but the local computer shows 28. Occasionally, I actually get something, most of the successful ones seem to be the resending 20 tasks variety.
ID: 1302085 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (78) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.