Panic Mode On (43) Server problems

Message boards : Number crunching : Panic Mode On (43) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next

AuthorMessage
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1076816 - Posted: 13 Feb 2011, 15:01:17 UTC - in response to Message 1076812.  

[quote][quote]that was sorted late last year

I am running 6.10.58 on all 8 rigs, and I seem to recall that everything worked out OK the last time we had an outage of this magnitude.



I am on 6.10.58 so will hold off upgrading, will see how it goes.

Kevin


Kevin


ID: 1076816 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1076817 - Posted: 13 Feb 2011, 15:01:20 UTC - in response to Message 1076812.  

I'm thinking that was back around 6.10.47 plus something was tweaked server side a year ago. Sorry I don't remember much more and don't have any records.
ID: 1076817 · Report as offensive
Profile Vipin Palazhi
Avatar

Send message
Joined: 29 Feb 08
Posts: 286
Credit: 167,386,578
RAC: 0
India
Message 1076818 - Posted: 13 Feb 2011, 15:23:15 UTC

Any ideas as to how much longer this outage is going to last? No updates on the technical new thread. I still have work for another day or so.
______________

ID: 1076818 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1076820 - Posted: 13 Feb 2011, 15:31:15 UTC - in response to Message 1076818.  

My guess is right before they take it down for the Tuesday 3 day outage.
But I'm an optimist. OTOH, when and if I run out of Seti work, I still have
Milkyway, Einstein, Rosetta, Docking, Enigma, and others to crunch.
ID: 1076820 · Report as offensive
Dave

Send message
Joined: 29 Mar 02
Posts: 778
Credit: 25,001,396
RAC: 0
United Kingdom
Message 1076829 - Posted: 13 Feb 2011, 16:01:51 UTC

I'm guessing sometime Monday work-hours UT-0800.
ID: 1076829 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1076832 - Posted: 13 Feb 2011, 16:05:54 UTC - in response to Message 1076818.  

...No updates on the technical new thread...

Message 1075931 - Posted: 10 Feb 2011 | 22:13:48 UTC:
Meanwhile, we'll be off for the foreseeable future. Like at least until next week, I imagine. Bummer.

Looks like an (actual) update to me. ;-)

Gruß,
Gundolf
ID: 1076832 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1076839 - Posted: 13 Feb 2011, 16:13:16 UTC - in response to Message 1076833.  

My guess is well after the coming Tuesday outage. They will not get it up on Monday, that I'm sure about. On Tuesday is outage day, so Tuesday is not the day. With some luck it will happen of Wednesday, but even that is doubtful. I even doubt it will happen at all the coming week.

Don't be surprised if it doesn't happen until earliest Monday Jan 21. It may even take longer, yeah much much longer.....

Edit: It may even be so, that it doesn't happen at all, ever again.....

Monday 21 Jan is a Federal Holiday here in the US (President's Day), so they probably won't work that day, either. So if we are not fixed and back up by Friday 18th, hope for Tuesday 22nd or Wednesday 23rd.

But sooner or later, Matt & company WILL succeed in getting everything back online, at least for a day or two (or more). (8{)
Donald
Infernal Optimist / Submariner, retired
ID: 1076839 · Report as offensive
Profile Zeus Fab3r
Avatar

Send message
Joined: 17 Jan 01
Posts: 649
Credit: 275,335,635
RAC: 597
Serbia
Message 1076841 - Posted: 13 Feb 2011, 16:15:29 UTC - in response to Message 1076816.  
Last modified: 13 Feb 2011, 16:16:00 UTC

that was sorted late last year
I am running 6.10.58 on all 8 rigs, and I seem to recall that everything worked out OK the last time we had an outage of this magnitude.

I am on 6.10.58 so will hold off upgrading, will see how it goes.

Kevin


IIRC, Vyper was the first that drew more attention to this question and the issue is resolved thanks to Jord and Dr D.A. His rig was running 6.10.56 back then, as it is running now. Check... Manually uploading client_state.xml to s@h

Who the hell is General Failure and why is he reading my harddisk?¿
ID: 1076841 · Report as offensive
Profile Vipin Palazhi
Avatar

Send message
Joined: 29 Feb 08
Posts: 286
Credit: 167,386,578
RAC: 0
India
Message 1076842 - Posted: 13 Feb 2011, 16:17:20 UTC

I wanted to give a valentine's day gift to my crunchers by showering them with lots of work units. Alas, that wont happen.
______________

ID: 1076842 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14676
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1076848 - Posted: 13 Feb 2011, 16:27:22 UTC - in response to Message 1076841.  

IIRC, Vyper was the first that drew more attention to this question and the issue is resolved thanks to Jord and Dr D.A. His rig was running 6.10.56 back then, as it is running now. Check... Manually uploading client_state.xml to s@h

That takes us to changeset [trac]changeset:22389[/trac], which only affected the server code:

scheduler: fix crashing bug when client reports a large # (1000+) of results (256KB not enough for query in this case)

That happened round about client v6.11.8, so if nobody has suffered the problem since then - without upgrading - I think we can take it that a client upgrade won't be needed this time either.
ID: 1076848 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1076877 - Posted: 13 Feb 2011, 17:18:09 UTC - in response to Message 1076841.  

that was sorted late last year
I am running 6.10.58 on all 8 rigs, and I seem to recall that everything worked out OK the last time we had an outage of this magnitude.

I am on 6.10.58 so will hold off upgrading, will see how it goes.

Kevin


IIRC, Vyper was the first that drew more attention to this question and the issue is resolved thanks to Jord and Dr D.A. His rig was running 6.10.56 back then, as it is running now. Check... Manually uploading client_state.xml to s@h

Thanks. I looked for that thread for a while, but did not locate it.
I am not sure what version of Boinc I was running at the time, but everything was handled without manual intervention on my part.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1076877 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1076882 - Posted: 13 Feb 2011, 17:38:31 UTC - in response to Message 1076833.  



Don't be surprised if it doesn't happen until earliest Monday Jan 21. It may even take longer, yeah much much longer.....
/quote]

[quote]Monday 21 Jan is a Federal Holiday here in the US (President's Day), so they probably won't work that day, either. So if we are not fixed and back up by Friday 18th, hope for Tuesday 22nd or Wednesday 23rd.


I hope you both mean February 21st.

ID: 1076882 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1076895 - Posted: 13 Feb 2011, 18:06:13 UTC - in response to Message 1076882.  



Don't be surprised if it doesn't happen until earliest Monday Jan 21. It may even take longer, yeah much much longer.....
/quote]

[quote]Monday 21 Jan is a Federal Holiday here in the US (President's Day), so they probably won't work that day, either. So if we are not fixed and back up by Friday 18th, hope for Tuesday 22nd or Wednesday 23rd.


I hope you both mean February 21st.



Could do with it earlier than that, got a bunch of shorties just before this started, 17/02/2011 is the earliest, some are done and some are still in my download queue, would be a shame to see them go to waste.

Kevin


Kevin


ID: 1076895 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1076901 - Posted: 13 Feb 2011, 18:16:56 UTC - in response to Message 1076848.  

IIRC, Vyper was the first that drew more attention to this question and the issue is resolved thanks to Jord and Dr D.A. His rig was running 6.10.56 back then, as it is running now. Check... Manually uploading client_state.xml to s@h

That takes us to changeset [trac]changeset:22389[/trac], which only affected the server code:

scheduler: fix crashing bug when client reports a large # (1000+) of results (256KB not enough for query in this case)

That happened round about client v6.11.8, so if nobody has suffered the problem since then - without upgrading - I think we can take it that a client upgrade won't be needed this time either.

And if you scroll down to the last post, you'll find out, that there was also a client change in v6.12.2, which made it possible to limit the # of completed tasks reported per RPC. However, it seems that this change is mostly intended for those who get problems caused by slow internet connection.
ID: 1076901 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14676
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1076923 - Posted: 13 Feb 2011, 18:36:57 UTC - in response to Message 1076901.  

IIRC, Vyper was the first that drew more attention to this question and the issue is resolved thanks to Jord and Dr D.A. His rig was running 6.10.56 back then, as it is running now. Check... Manually uploading client_state.xml to s@h

That takes us to changeset [trac]changeset:22389[/trac], which only affected the server code:

scheduler: fix crashing bug when client reports a large # (1000+) of results (256KB not enough for query in this case)

That happened round about client v6.11.8, so if nobody has suffered the problem since then - without upgrading - I think we can take it that a client upgrade won't be needed this time either.

And if you scroll down to the last post, you'll find out, that there was also a client change in v6.12.2, which made it possible to limit the # of completed tasks reported per RPC. However, it seems that this change is mostly intended for those who get problems caused by slow internet connection.

Josef Segur reported problems with the way that was originally coded in changeset [trac]changeset:22467[/trac]. I rather doubt that anyone has tested it to destruction yet....

Corrections were made (22500, 22503, 22505.....), but if anyone has the courage, and the reportable task list, to test, I'm sure that would be appreciated. ;)
ID: 1076923 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1076926 - Posted: 13 Feb 2011, 18:43:13 UTC - in response to Message 1076923.  

IIRC, Vyper was the first that drew more attention to this question and the issue is resolved thanks to Jord and Dr D.A. His rig was running 6.10.56 back then, as it is running now. Check... Manually uploading client_state.xml to s@h

That takes us to changeset [trac]changeset:22389[/trac], which only affected the server code:

scheduler: fix crashing bug when client reports a large # (1000+) of results (256KB not enough for query in this case)

That happened round about client v6.11.8, so if nobody has suffered the problem since then - without upgrading - I think we can take it that a client upgrade won't be needed this time either.

And if you scroll down to the last post, you'll find out, that there was also a client change in v6.12.2, which made it possible to limit the # of completed tasks reported per RPC. However, it seems that this change is mostly intended for those who get problems caused by slow internet connection.

Josef Segur reported problems with the way that was originally coded in changeset [trac]changeset:22467[/trac]. I rather doubt that anyone has tested it to destruction yet....

Corrections were made (22500, 22503, 22505.....), but if anyone has the courage, and the reportable task list, to test, I'm sure that would be appreciated. ;)

Right now, the Frozen 920 has 3,149 results it would like to report.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1076926 · Report as offensive
Profile Tim Norton
Volunteer tester
Avatar

Send message
Joined: 2 Jun 99
Posts: 835
Credit: 33,540,164
RAC: 0
United Kingdom
Message 1076984 - Posted: 13 Feb 2011, 21:17:18 UTC

Since the server side "fix" [22389] and any subsequent tweaks i have on several occasions due to outages or network issues, my end, reported well above a 1000 results in one go

usually in the 1-3k range, without any problem

basically it takes the first 1000 results and leaves the remainder for reporting next time etc - still has to upload the whole file though

this is on hosts with the 6.10.58 client

i dont think i am alone in this as Mark and anybody in the top 100+ will probably have used this feature a time or two already even if they have not seen it happen

as i was lucky(?) enough to have a full cache just prior to this outage i will be reporting just under 4000 tasks as an when we get going again - although i intend to leave it a day or so before i report after we come back up to ease the strain in a small way

the newer client may well have extra features to control this which is good as it probably will stress the servers less if we can limit the number reported in one go
Tim

ID: 1076984 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1077029 - Posted: 14 Feb 2011, 1:31:55 UTC
Last modified: 14 Feb 2011, 1:52:15 UTC

Just for to tell..

The 1.000 results split was on the server. No specially BOINC needed. It can be on/off, depend of the S@h crew. I don't know if this is now on.

I had always probs to report > ~ 1.000 results.

I was in contact with Dr. Anderson and he asked Rom to add <max_tasks_reported>N</max_tasks_reported> to the client (cc_config.xml).

I updated today to the last DEV-V6.12.14 of BOINC (because my 940 BE + 4x GTX260 have thousands of results ULed) and it work fine. But to now no well S@h scheduler contact.. ;-)
ID: 1077029 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 66276
Credit: 55,293,173
RAC: 49
United States
Message 1077035 - Posted: 14 Feb 2011, 2:04:16 UTC

It would be nice to be able to tell a Fermi card to run 2 or 3 WU's per gpu while having an older card do 1 WU per gpu at a time.
Savoir-Faire is everywhere!
The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST

ID: 1077035 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1077041 - Posted: 14 Feb 2011, 2:40:33 UTC - in response to Message 1076882.  
Last modified: 14 Feb 2011, 2:43:51 UTC



Don't be surprised if it doesn't happen until earliest Monday Jan 21. It may even take longer, yeah much much longer.....


Monday 21 Jan is a Federal Holiday here in the US (President's Day), so they probably won't work that day, either. So if we are not fixed and back up by Friday 18th, hope for Tuesday 22nd or Wednesday 23rd.


I hope you both mean February 21st.

Yes, arkayn, you are correct, as usual. Hadn't had my 2nd cup of coffee yet.
Donald
Infernal Optimist / Submariner, retired
ID: 1077041 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (43) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.