Panic Mode On (22) Server problems

Message boards : Number crunching : Panic Mode On (22) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 925792 - Posted: 13 Aug 2009, 12:05:39 UTC


Network traffic's taken a dive, even though there's work ready to send & the splitters are still running.
Grant
Darwin NT
ID: 925792 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 925796 - Posted: 13 Aug 2009, 12:57:10 UTC - in response to Message 925792.  

Well I have been getting no new work since 11:40 UTC and no there is no work on the server page. Can someone put in more data to be split please?
ID: 925796 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 925799 - Posted: 13 Aug 2009, 13:13:54 UTC - in response to Message 925796.  

Well I have been getting no new work since 11:40 UTC and no there is no work on the server page. Can someone put in more data to be split please?

There's oodles of work ready to be split - at least a terabyte loaded and primed.

I suspect they need to take a lot of completed work off - I bet the NAS storage box is bursting at the seams. Several people have tried to alert them - unsuccessfully - to the stuck assimilators: and the number of results awaiting validation - basically, all those shorties where one wingmate has reported, but the other hasn't - has grown very rapidly to record levels.

No, belay that, how can 'awaiting validation' be so much higher than 'results in the field'? It must include both halves of the assimilation queue workunits, too.
ID: 925799 · Report as offensive
mich181189
Volunteer tester

Send message
Joined: 2 May 06
Posts: 3
Credit: 123,706
RAC: 0
United Kingdom
Message 925812 - Posted: 13 Aug 2009, 14:42:35 UTC - in response to Message 925799.  

Yes. As it says at the bottom of the page, "Results returned and awaiting validation" is where not all computers have returned their result yet. Once all parts of a result are returned, they are grouped together and put under "Workunits waiting for validation" which is only 18 right now. Once validated, they are then assimilated. and once assimilated, they are deleted. The problem is the stuck assimilators.
ID: 925812 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 925818 - Posted: 13 Aug 2009, 15:44:25 UTC - in response to Message 925614.  


And the Assimilators haven't assimilated anything for the last few days.
Log jam approaching.

Looks like the logs have hit the fan.......
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 925818 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 925844 - Posted: 13 Aug 2009, 18:23:28 UTC

And the splitters have started back up!!

Now it will probably be a bit before the backlog and bandwidth clear so we can start uploading again.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 925844 · Report as offensive
Zap de Ridder
Volunteer tester

Send message
Joined: 9 Jan 00
Posts: 227
Credit: 1,468,844
RAC: 1
Netherlands
Message 925871 - Posted: 13 Aug 2009, 20:18:04 UTC - in response to Message 925844.  

Stupid me.
Einstein servers are down so I thought lets try Seti again for Ap's.
Forgot to set the cach down to .25 day ( already had Seti/ Einstein 100/500) But it looks like I'm a day late so now I have 17 MB's to do.I got 7 first and thought let's instantly set now new work alowed. To late, so another 10 were downloaded.
11 "long" ones in total. I'm not realy happy with that.
ID: 925871 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 925877 - Posted: 13 Aug 2009, 20:31:48 UTC

Yeah, Matt just explained what was going on a few minutes ago in Tech News. Should be on the road to recovery now from how it sounds.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 925877 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 925899 - Posted: 13 Aug 2009, 23:30:48 UTC

Two panic threads ago, I let slip a little secret:

Hic!

Being a little dis-inhibited, I'm going to reveal that when I turned off networking before going to the pub, I turned off networking for this project only: the code exists (no, I didn't write it) and I'm testing it. I found another (small, cosmetic-only) bug this week, which I haven't reported to the author yet, but apart from that I believe it's nearly ready to submit to BOINC as a ready-made patch. When that time comes, I hope you'll all lobby for trac #139 to be actioned. It really helps. (message 918226)

I'm pleased to say that Thyme Lawn has now officially announced this enhancement via a comment added to http://boinc.berkeley.edu/trac/ticket/139.

I find this an incredibly useful tool: it's currently holding back 7 uploads ultimately destined for Einstein, when they get their failed (again) fileserver repaired (again).

David Anderson was busy writing an NSF grant application, deadline yesterday (or today, as it still is in their part of the world). Once he's had a chance for some well-deserved sleep, I hope there will be cross-project support for him to include the facility as soon as possible.
ID: 925899 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 926146 - Posted: 15 Aug 2009, 0:02:36 UTC - in response to Message 925899.  
Last modified: 15 Aug 2009, 0:47:04 UTC

The assimilators are slowly (very slowly) assimilating, and the validators are also slowly getting through the backlog, but now getting no work avaialble messages. Ready to send queue is down to zero & the splitters, while splitting, haven't pickd up the pace from 6/sec- usually 15/s is the minimum just to keep up with demand without building up a buffer.



EDIT- Result creation rate now below 3.
Grant
Darwin NT
ID: 926146 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 926226 - Posted: 15 Aug 2009, 8:16:19 UTC - in response to Message 926146.  

The assimilators are slowly (very slowly) assimilating, and the validators are also slowly getting through the backlog, but now getting no work avaialble messages. Ready to send queue is down to zero & the splitters, while splitting, haven't pickd up the pace from 6/sec- usually 15/s is the minimum just to keep up with demand without building up a buffer.



EDIT- Result creation rate now below 3.

Looking at the cricket graph, it looks like the WU storage is just busting at the seams, so there is no space for new WUs to be made. The backlog of shorties needs to be returned for things to start stabilizing yet again.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 926226 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 926230 - Posted: 15 Aug 2009, 8:51:47 UTC - in response to Message 926226.  
Last modified: 15 Aug 2009, 8:53:35 UTC

Looking at the cricket graph, it looks like the WU storage is just busting at the seams, so there is no space for new WUs to be made. The backlog of shorties needs to be returned for things to start stabilizing yet again.

I think it might be a repeat of the download/ work creation issue from a few weeks ago.
Even when the Validator & Assimilator backlogs were at their peak, work was still being pumped out at a good rate- the ready to send buffer remaining full.
But even as those backlogs have slowly dropped, the work creation rate hasn't increased.
Grant
Darwin NT
ID: 926230 · Report as offensive
Profile yank Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 522
Credit: 22,545,639
RAC: 0
United States
Message 926246 - Posted: 15 Aug 2009, 13:00:46 UTC

Start of a bad day..no SETI work units... No Einstein work unit...no Bionic simap work units... no Leiden Classic work units...finally got a bunch of World Community Grid.
http://boinc.mundayweb.com/teamStats.php?userID=14824
ID: 926246 · Report as offensive
Profile RandyC
Avatar

Send message
Joined: 20 Oct 99
Posts: 714
Credit: 1,704,345
RAC: 0
United States
Message 926247 - Posted: 15 Aug 2009, 13:30:16 UTC - in response to Message 926246.  

Start of a bad day..no SETI work units... No Einstein work unit...no Bionic simap work units... no Leiden Classic work units...finally got a bunch of World Community Grid.


Malaria has plenty if you restart that project.
ID: 926247 · Report as offensive
kevin6912
Volunteer tester

Send message
Joined: 18 Jul 99
Posts: 17
Credit: 10,539,602
RAC: 0
United States
Message 926275 - Posted: 15 Aug 2009, 15:47:11 UTC - in response to Message 925899.  


I'm pleased to say that Thyme Lawn has now officially announced this enhancement via a comment added to http://boinc.berkeley.edu/trac/ticket/139.

I find this an incredibly useful tool: it's currently holding back 7 uploads ultimately destined for Einstein, when they get their failed (again) fileserver repaired (again).

David Anderson was busy writing an NSF grant application, deadline yesterday (or today, as it still is in their part of the world). Once he's had a chance for some well-deserved sleep, I hope there will be cross-project support for him to include the facility as soon as possible.


Seeing how David Anderson feels this type of functionality is a workaround and not needed. How are people going to get this type of control over their network connections?

ID: 926275 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 926290 - Posted: 15 Aug 2009, 16:54:24 UTC - in response to Message 926275.  

I'm pleased to say that Thyme Lawn has now officially announced this enhancement via a comment added to http://boinc.berkeley.edu/trac/ticket/139.

I find this an incredibly useful tool: it's currently holding back 7 uploads ultimately destined for Einstein, when they get their failed (again) fileserver repaired (again).

David Anderson was busy writing an NSF grant application, deadline yesterday (or today, as it still is in their part of the world). Once he's had a chance for some well-deserved sleep, I hope there will be cross-project support for him to include the facility as soon as possible.

Seeing how David Anderson feels this type of functionality is a workaround and not needed. How are people going to get this type of control over their network connections?

It rather depends on the answer to a couple of related questions, which possibly boil down to a single question:

a) How are people going to get any kind of control over the BOINC development process?
b) How are people going to get any kind of control over David Anderson?
ID: 926290 · Report as offensive
Profile cliff west

Send message
Joined: 7 May 01
Posts: 211
Credit: 16,180,728
RAC: 15
United States
Message 926316 - Posted: 15 Aug 2009, 17:54:14 UTC - in response to Message 926290.  

i ran out of Cuda units... ops
ID: 926316 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 926324 - Posted: 15 Aug 2009, 18:48:10 UTC

glad i stocked up on WU's for the mac yesterday morning.
The old p4 has two AP V5,05 so she's set for at least another 60 or so hours.
Worried about the i7 though only 7 WU's left. never could fill up the cache on that yesterday. only had 8 running with 8 standing by.

[/quote]

Old James
ID: 926324 · Report as offensive
Profile yank Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 522
Credit: 22,545,639
RAC: 0
United States
Message 926350 - Posted: 15 Aug 2009, 20:54:15 UTC - in response to Message 926247.  

Start of a bad day..no SETI work units... No Einstein work unit...no Bionic simap work units... no Leiden Classic work units...finally got a bunch of World Community Grid.


Malaria has plenty if you restart that project.



Just took some. Thanks

http://boinc.mundayweb.com/teamStats.php?userID=14824
ID: 926350 · Report as offensive
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 926362 - Posted: 15 Aug 2009, 21:53:34 UTC - in response to Message 926350.  

Happy 10th Anniversary Yank
ID: 926362 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Panic Mode On (22) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.