Message boards :
Number crunching :
Panic Mode On (79) Server Problems?
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 22 · Next
Author | Message |
---|---|
MikeN Send message Joined: 24 Jan 11 Posts: 319 Credit: 64,719,409 RAC: 85 |
Cricket graphs show everything maxed out so I suspect it is just a feeding frenzy after such a long outage. Everyone trying to refill their WU allowance at the same time. I only know of two solutions: 1. Serious update button abuse 2. patience, try again in a few hours when things have settled down a little. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Cricket graphs show everything maxed out so I suspect it is just a feeding frenzy after such a long outage. Nope, that results in different errors such as the well known time out. And if they kept the routing through the campus network it would mean it's not affected by the upload & download traffic. EDIT- just pinged the Scheduler, looks like it's back off the campus network. However packet loss varies between 0 & 25%, previously it was around 75% and even then it was possible to contact the Schelduer. You'd rarely get a reply, but you could contact it. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Cricket graphs show everything maxed out so I suspect it is just a feeding frenzy after such a long outage. Hmm, it could be system load related- one system has managed to contact the Scheduler a couple of times- but most times are still Couldn't connect to server errors , the other system is completely unable to. But if that's the case it means they've changed some settings somewhwere. In the past, no matter how bad the load you could generally contact the Scheduler. It's just been this last month & a bit where we started getting the timeouts. As it is, inbound traffic is only around 10Mb/s, usually its around 14, closer to 20 after an outage. And i suspect the low inbound traffic is due to the inability to contact the Scheduler. Grant Darwin NT |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Cricket graphs show everything maxed out so I suspect it is just a feeding frenzy after such a long outage. Inbound traffic is about average....86K MB results received in the last hour. The rigs have been getting a tad bit of work here and there. Mostly CPU tasks, so the GPUs have been mostly falling back to Einstein. Oh well, the kitties will take whatever they can claw out of the servers for now and happily crunch it. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Inbound traffic is about average....86K MB results received in the last hour. The number of results per hour may be, but the number of bytes per second is way down. And since the number of results being returned is about average, that means it must be the traffic to the Scheduler that's dropped off significantly as it's no longer using the campus network. Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
Inbound traffic is about average....86K MB results received in the last hour. Other way round. At the moment, it's been switched back, so that the scheduler traffic is attempting to use the Hurricane Electric link we're used to seeing on the gigabitethernet2_3 Cricket graph. That's the same setting that we had at the beginning of this week, and the beginning of the month. The difference seems to be that for the last three weeks, our requests reached the scheduler, but the replies got lost: now, the difficulty seems to be connecting to the scheduler in the first place, so that the full request message doesn't get sent. |
Bill G Send message Joined: 1 Jun 01 Posts: 1282 Credit: 187,688,550 RAC: 182 |
One of my computer had run dry....with a 5 day backoff so I did an update...got 20 lost tasks then while they were downloading got another 121 tasks. Now here is the interesting thing, normally you get a max of two downloads at a time....for awhile there I was getting 3 at the same time, of course one was an AP, so it would appear that things are changing for the AP crunching. SETI@home classic workunits 4,019 SETI@home classic CPU time 34,348 hours |
Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22 |
It's been 7 hours since recovery and I'm still getting 11/24/2012 6:22:06 PM | | Project communication failed: attempting access to reference site 11/24/2012 6:22:06 PM | SETI@home | Scheduler request failed: Failure when receiving data from the peer 11/24/2012 6:22:10 PM | | Internet access OK - project servers may be temporarily down. At least I got through once, okay twice to report the 100+ done workunits but that was hours ago. "Life is just nature's way of keeping meat fresh." - The Doctor |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Quick script to provide the percentage of failures on your machine. Thanks for that. Scheduler Requests: 821 Scheduler Success: 42 % Scheduler Failure: 57 % Scheduler Timeout: 22 % of total Scheduler Timeout: 38 % of failures That's since the 26/10/2012 18:30hrs (UTC +9:30), so just under one month's worth. Other system Scheduler Requests: 431 Scheduler Success: 54 % Scheduler Failure: 45 % Scheduler Timeout: 0 % of total Scheduler Timeout: 0 % of failures That's simce 9/11/2012 16:30 (UTC +9:30), so just over 2 weeks worth. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
It's been 7 hours since recovery and I'm still getting For me it's mostly Couldn't connect to server errors. Grant Darwin NT |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Quick script to provide the percentage of failures on your machine. So I decided to run this on mine. Had to slightly modify it since the old builds of BOINC had successes worded as "scheduler request succeeded" rather than "completed". Since Jan 30, 2012: Scheduler Requests: 15685 Scheduler Success: 89 % Scheduler Failure: 10 % Scheduler Timeout: 1 % of total Scheduler Timeout: 14 % of failures Since Oct 1, 2012: Scheduler Requests: 2877 Scheduler Success: 72 % Scheduler Failure: 27 % Scheduler Timeout: 3 % of total Scheduler Timeout: 13 % of failures Since Nov 1, 2012: Scheduler Requests: 564 Scheduler Success: 63 % Scheduler Failure: 36 % Scheduler Timeout: 16 % of total Scheduler Timeout: 43 % of failures My log size is set to 100MB. Currently it is coming up on 9MB since it started Jan 30. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
At least I got through once, okay twice to report the 100+ done workunits but that was hours ago. One machine has managed to connect a few times- taking 2 minutes or so to get a response. Grant Darwin NT |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I've managed to get to my 200 task limit, all but about 10 of them are Shorties, Claggy |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65759 Credit: 55,293,173 RAC: 49 |
I've managed to get to my 200 task limit, all but about 10 of them are Shorties, 200? I got 100 and that's it... The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
100 per CPU/GPU = 200 on a GPU host |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65759 Credit: 55,293,173 RAC: 49 |
100 per CPU/GPU = 200 on a GPU host I only use My gpus of course... The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
100 per CPU/GPU = 200 on a GPU host 0 CPU + 100 GPU = 100 That´s why you have a 100WU cache only. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Get Shorty! I thought 4 minute shorties were a pain until I saw where the 1 and only AstroPulse I snagged went. Check it out http://setiathome.berkeley.edu/workunit.php?wuid=1110816835 After working out how long 660,846.5 seconds is, I can't find it in me to complain about a 4 minute shorty... |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Wooow.. 7.64 days of run time but only 2 seconds of CPU time before erroring out. That's brutal. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Timeouts, Timeouts, Timeouts!!!!!!! Now it's Couldn't connect to server, Couldn't connect to server, Couldn't connect to server!!! Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.