Panic Mode On (76) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (76) Server Problems?

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 21 · Next
Author Message
Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 840
Credit: 1,578,326
RAC: 52
Germany
Message 1258965 - Posted: 11 Jul 2012, 16:50:09 UTC - in response to Message 1258953.

Yeah... but that was earlier I think and at about 21:00 Berkeley time the graphs were down to less than 30Mbit/s. And I don't think they turned the databases at midnight on for some short period, in which such amout of WUs could be generated.
____________
.

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 6368
Credit: 801,034
RAC: 1,653
United States
Message 1259001 - Posted: 11 Jul 2012, 17:32:46 UTC - in response to Message 1258965.

Yeah... but that was earlier I think and at about 21:00 Berkeley time the graphs were down to less than 30Mbit/s. And I don't think they turned the databases at midnight on for some short period, in which such amout of WUs could be generated.

While the Science databases are offline, the BOINC master database is still up, and the scheduler and upload/download servers are online.

Any Tasks returned as "Error while computing", "Aborted" or "Timed-out" should still be available for reassignment to crunchers until two "successful" results for that WU are returned.
____________
Donald
Infernal Optimist / Submariner, retired

Dave
Avatar
Send message
Joined: 29 Mar 02
Posts: 774
Credit: 23,193,139
RAC: 0
United Kingdom
Message 1259024 - Posted: 11 Jul 2012, 17:59:10 UTC

Cataclysmic drop in RAC atm despite constantly producing (I know I have because SpeedFan log shows constant temps).

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5955
Credit: 62,493,801
RAC: 40,496
Australia
Message 1259034 - Posted: 11 Jul 2012, 18:17:22 UTC - in response to Message 1259024.


Looks like the validators may be running again, but still no new work being produced, yet as noted network traffic is maxed out.
____________
Grant
Darwin NT.

DesO
Send message
Joined: 2 Feb 12
Posts: 144
Credit: 2,624,617
RAC: 0
United Kingdom
Message 1259037 - Posted: 11 Jul 2012, 18:20:12 UTC - in response to Message 1259024.

Cataclysmic drop in RAC atm despite constantly producing (I know I have because SpeedFan log shows constant temps).



Dito as my Rac is approx 30 % down ! This isweakley to mildey disapointing because my willingness to help is = my competative nature and constant pull towards higher achievement. RAc drop is the same for all of us< I think so its an equal playing field ?

However we do the best with what we have resource wise. Its obvious SETI is under resourced and so this month I plan to donate, who knows it could make a difference.

Best me

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4664
Credit: 123,762,968
RAC: 96,208
United States
Message 1259040 - Posted: 11 Jul 2012, 18:24:20 UTC

[As of 11 Jul 2012 | 18:02:58 UTC]
Tasks RTS: MB:10 AP:0

The AP db is online, but not the main db. Perhaps they are online & will show on the next update.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3770
Credit: 21,503,452
RAC: 15,499
Sweden
Message 1259103 - Posted: 11 Jul 2012, 20:29:02 UTC

Work will soon flow again. Splitters running and all databases too.
____________

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4664
Credit: 123,762,968
RAC: 96,208
United States
Message 1259104 - Posted: 11 Jul 2012, 20:32:28 UTC - in response to Message 1259103.

Work will soon flow again. Splitters running and all databases too.

I started getting work requests filled with more than 1 or 2 about 10 minutes ago. Granted they have only been 10 or 12.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8829
Credit: 53,630,261
RAC: 48,632
United Kingdom
Message 1259129 - Posted: 11 Jul 2012, 21:24:50 UTC - in response to Message 1259103.

Work will soon flow again. Splitters running and all databases too.

"Flow" may be a bit of an overstatement, but I've got some allocated, at least.

Dave
Avatar
Send message
Joined: 29 Mar 02
Posts: 774
Credit: 23,193,139
RAC: 0
United Kingdom
Message 1259135 - Posted: 11 Jul 2012, 21:38:29 UTC

More like a sort of gloopy dribble.

Profile SciManStevProject donor
Volunteer tester
Avatar
Send message
Joined: 20 Jun 99
Posts: 4908
Credit: 84,422,934
RAC: 30,512
United States
Message 1259155 - Posted: 11 Jul 2012, 22:41:13 UTC

Yipee! I see PaddyM as the new science database! It's great to see it working!

Steve
____________
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 6368
Credit: 801,034
RAC: 1,653
United States
Message 1259158 - Posted: 11 Jul 2012, 23:01:36 UTC

[As of 11 Jul 2012 | 22:50:16 UTC]

Only BOINC databases, Astropulse Science database, and upload/download servers online.

Still working on the transistions, i guess.
____________
Donald
Infernal Optimist / Submariner, retired

Martin Longbow
Send message
Joined: 12 Dec 00
Posts: 4
Credit: 46,983,597
RAC: 38,976
United States
Message 1259660 - Posted: 13 Jul 2012, 2:03:07 UTC

Hi, I wasn't sure where to post so I'm writing here.
I keep getting a single file that tries to transfer:
27ap12ad.2303.12030.9.10.169_0_0
I've tried to shut BOINC down and restart it, I've restarted the computer (including a cold boot), and I have highlighted the file and clicked "Abort Transfer" (received a message that says "Are you sure you want to Abort this file transfer) and clicked "Yes".
But then the file tries to download again.
I'm not sure if it is a problem on my side or on the server side.
Thanks in advance for any help,
Martin
____________

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8819
Credit: 63,214,127
RAC: 82,403
United Kingdom
Message 1259695 - Posted: 13 Jul 2012, 5:18:54 UTC

Martin,
What happens during transfer of that file?

Does it just stop for hours with no message like "Retry in 4hr", or a project backoff message like "Project backoff for 17hrs"? If so these are normal server side messages. Two choices here, either let them eventually download, or use the re-try button to force them through.

Reloading BOINC will not cure the problem, indeed it might even make it worse.
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Martin Longbow
Send message
Joined: 12 Dec 00
Posts: 4
Credit: 46,983,597
RAC: 38,976
United States
Message 1259773 - Posted: 13 Jul 2012, 11:01:05 UTC - in response to Message 1259695.

Hi Rob,
The file seems to start downloading and then it disappears from the "Transfers" list as if it had downloaded successfully.

I seem to have figured it out. I deleted a cuda fermi file named:
27ap12ad.2303.12030.9.10.169_0

This listed the filed as "Aborted" and then I "Updated". After the update, the problem disappeared. LOL, only took 5 hours to figure this out (all night long on my day off).

Thanks a lot Rob. This community is great and if anyone else has this problem, at least I'll be able to help or if they catch this post.

Sleepy but successful,
Martin
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5955
Credit: 62,493,801
RAC: 40,496
Australia
Message 1260024 - Posted: 13 Jul 2012, 22:23:39 UTC - in response to Message 1259773.


The validators & assimilators appear to have caught up, and the file purgers are almost there, and database queries are, if anything, lower than normal. Yet the replicator continues to fall behind.
____________
Grant
Darwin NT.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8829
Credit: 53,630,261
RAC: 48,632
United Kingdom
Message 1260036 - Posted: 13 Jul 2012, 22:56:24 UTC - in response to Message 1260024.

The validators & assimilators appear to have caught up, and the file purgers are almost there, and database queries are, if anything, lower than normal. Yet the replicator continues to fall behind.

See Technical News.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5955
Credit: 62,493,801
RAC: 40,496
Australia
Message 1260123 - Posted: 14 Jul 2012, 3:21:22 UTC - in response to Message 1260036.

The validators & assimilators appear to have caught up, and the file purgers are almost there, and database queries are, if anything, lower than normal. Yet the replicator continues to fall behind.

See Technical News.

I noticed that.
Looks like more tweaking required.
____________
Grant
Darwin NT.

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4248
Credit: 34,984,778
RAC: 21,338
United Kingdom
Message 1260216 - Posted: 14 Jul 2012, 9:56:19 UTC

I've been having trouble reporting tasks, scheduler contacts keep timing out, and have have had to limit max_tasks_reported to 20 to get them reported,

Claggy

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5955
Credit: 62,493,801
RAC: 40,496
Australia
Message 1260217 - Posted: 14 Jul 2012, 10:08:16 UTC - in response to Message 1260216.

I've been having trouble reporting tasks, scheduler contacts keep timing out, and have have had to limit max_tasks_reported to 20 to get them reported,

Claggy

Jst had a look in my log, only a couple of Scheduler requests have timed out in the last 5 hours or so here.
____________
Grant
Darwin NT.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 21 · Next

Message boards : Number crunching : Panic Mode On (76) Server Problems?

Copyright © 2014 University of California