Panic Mode On (58) Server problems?

Message boards : Number crunching : Panic Mode On (58) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1160188 - Posted: 8 Oct 2011, 13:36:03 UTC - in response to Message 1160178.  

I cannot remember the last time the daily cricket graph looked the way it does at present - a perfect, solid mass of green. Long may it continue.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d%3Aw%3Am%3Ay;view=Octets

But if nothing is actually uploading?

Uploads are working fine here, and for the majority of users - otherwise we couldn't be seeing "Results received in last hour 48,463" on the server status page.

You may have been caught by the HE router fault - you could try the proxy solutions suggested in the two sticky 'HE connection problems' threads at the top of this board.
ID: 1160188 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1160193 - Posted: 8 Oct 2011, 14:00:45 UTC - in response to Message 1160188.  

You may have been caught by the HE router fault - you could try the proxy solutions suggested in the two sticky 'HE connection problems' threads at the top of this board.


Never any problems before. Downloads have been fine. The upload problem only started a couple of hours ago.

ID: 1160193 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1160201 - Posted: 8 Oct 2011, 14:34:21 UTC - in response to Message 1160188.  
Last modified: 8 Oct 2011, 14:38:02 UTC

Uploads are working fine here, and for the majority of users - otherwise we couldn't be seeing "Results received in last hour 48,463" on the server status page.

It may be totally irrelevant but I see that results received is down to 45,088 so perhaps I am not the only one with sudden upload problems?

[Edit]
Now I see that Bruno is actually in disabled state, so no wonder uploads are not going through!

ID: 1160201 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1160210 - Posted: 8 Oct 2011, 14:48:07 UTC - in response to Message 1160201.  

Uploads are working fine here, and for the majority of users - otherwise we couldn't be seeing "Results received in last hour 48,463" on the server status page.

It may be totally irrelevant but I see that results received is down to 45,088 so perhaps I am not the only one with sudden upload problems?

[Edit]
Now I see that Bruno is actually in disabled state, so no wonder uploads are not going through!

Honestly, uploads are going through normally here - I have none waiting (across five busy machines), and three have gone through since I started typing this reply.

There are regular 'false negatives' on the server status page. My rule of thumb is not to believe it unless the same server shows as disabled on two consecutive updates (i.e. updates with different timestamps - Bruno is running on the 14:40 UTC copy of the page).
ID: 1160210 · Report as offensive
Profile Lint trap

Send message
Joined: 30 May 03
Posts: 871
Credit: 28,092,319
RAC: 0
United States
Message 1160211 - Posted: 8 Oct 2011, 14:57:14 UTC - in response to Message 1160201.  
Last modified: 8 Oct 2011, 14:59:24 UTC

Uploads are working fine here, and for the majority of users - otherwise we couldn't be seeing "Results received in last hour 48,463" on the server status page.

It may be totally irrelevant but I see that results received is down to 45,088 so perhaps I am not the only one with sudden upload problems?

[Edit]
Now I see that Bruno is actually in disabled state, so no wonder uploads are not going through!



Bruno looks fine on the latest script run (Update)...

Downloads have slowed to Molasses speed for me and the numbers are starting to pile up. Uploading and reporting has not been a problem at all for a couple days.

Lt

I see Richard beat me to it...re Bruno. He must be faster at two finger typing than I am...:)
ID: 1160211 · Report as offensive
Profile Sunny129
Avatar

Send message
Joined: 7 Nov 00
Posts: 190
Credit: 3,163,755
RAC: 0
United States
Message 1160217 - Posted: 8 Oct 2011, 15:13:02 UTC
Last modified: 8 Oct 2011, 15:13:59 UTC

i just want to say that i'm getting AP GPU tasks for the first time in ~2 weeks now. they've been flowing for the past ~36 hours or so, albeit sparsely - my AP queue has yet to climb above ~10 tasks. i should also note that this isn't enough to keep my GPU crunching constantly, but i understand we're still debugging some problems. the last time i posted about the server problems was in the "Panic Mode On (55) Server problems" thread over 2 weeks ago.
ID: 1160217 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1160229 - Posted: 8 Oct 2011, 15:54:23 UTC - in response to Message 1160211.  

DownLoads are also slow(as a snail), but UPLoads aren't happening, at all.

Not SETI "only" though, CPDN also doesn't get DownLoads through, as well.
(Average throughput between 200 and 1000bytes/sec.). 1.6 to 8Kbit/s followed by
a waiting period and several retries.


ID: 1160229 · Report as offensive
Profile Lint trap

Send message
Joined: 30 May 03
Posts: 871
Credit: 28,092,319
RAC: 0
United States
Message 1160230 - Posted: 8 Oct 2011, 16:05:25 UTC


See Soft^Spirit's new "Highlight: Update from Jeff 10/8/11" thread.

After the PAIX router's memory is increased many of these quirky, intermittent connection issues might disappear. (We can all hope!)

Lt

ID: 1160230 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1160232 - Posted: 8 Oct 2011, 16:09:15 UTC

Uploads and reporting running fine here.

Downloads very slow.


Kevin


ID: 1160232 · Report as offensive
Starman
Avatar

Send message
Joined: 15 May 99
Posts: 204
Credit: 81,351,915
RAC: 25
Canada
Message 1160241 - Posted: 8 Oct 2011, 16:29:46 UTC - in response to Message 1160232.  

Downloads are slow, but I'm getting lots of work now including a good handfull of AP's. In fact my 1 rig is becoming a bit of a piggy wiggy with them, seems to be getting the lions share in numbers and as a % of total work.

Brett
ID: 1160241 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1160242 - Posted: 8 Oct 2011, 16:33:00 UTC

Uploads, downloads and reporting all working really well here. Hope thats not tempting fate.
ID: 1160242 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1160248 - Posted: 8 Oct 2011, 16:57:41 UTC - in response to Message 1160179.  

Ok, whoever it is that have "the yellow one" in custody, prepare to bring it out in about 30 minutes.

/me ducks.

LOL

You mean ?


Yes, exactly that one. So it was you who had it in custody. Well, it's time to release the little bugger..


Its now four and a half hours since the 30 minute duck warning and no sign of him. I wonder if the fact that I have just roasted his legs and they are now in my belly along with half a bottle of chianti could have anything to do with it.
ID: 1160248 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1160249 - Posted: 8 Oct 2011, 16:59:35 UTC - in response to Message 1160248.  

My kitties said to tell you he was delicious! :-)


PROUD MEMBER OF Team Starfire World BOINC
ID: 1160249 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1160251 - Posted: 8 Oct 2011, 17:06:10 UTC

All seems well (aside from the much nicer limits for you power crunchers). The avian friend is happy, and my cache is full..with the correct ETAs I might add.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1160251 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1160253 - Posted: 8 Oct 2011, 17:09:38 UTC - in response to Message 1160186.  


As you know, I reverted my host 3751792 to stock when you reported the 'no work to stock GPUs' bug earlier this week.

I'm seeing correct runtime estimates with a DCF currently around 1.2 (it'll drop back again when I reach the next batch of shorties). Since I'm running GPU only, and the card is well fast enough to show APR/estimate anomalies, I'm assuming that the 'non-anonymous-platform' bits of [trac]changeset:24217[/trac] were never applied here, even though we know the change in the ratio limit from 2 to 10 is active. I think the verdict on the APR outlier code is the old Scottish standby of 'not proven', but we ought to test it sometime, for both the stock and anonymous platform cases.
...

For any host with more than 20 validations for an app_version, it's nearly inconceivable that APR/estimates anomalies will make much difference.

For non-anonymous hosts the APR is sent to the host as the <flops> for that app version, so the ratio limit of 10 would have to be exceeded between two requests to the Scheduler for any capping to take place. 1.01^231 = 9.96 so even 231 near zero validated runtimes wouldn't get into the capping. OTOH assuming the APR was about right before that, that shift of ~10 would put the host on the border of the range where -177 elapsed time exceeded errors could happen.

For anonymous hosts where the users are allowing totally inadequate <flops> based on the Whetstone benchmark to be sent by the core client, it is effectively guaranteed that estimates for most GPU processing will be a mess. Those who are getting by because DCF down near 0.02 is enough to compensate for the bad <flops> are merely in danger of DCF going even lower and restricting work fetch, whatever happens to APR matters not since the user has chosen to operate in the zone where the server estimate is always capped.

For all host app versions, the first 10 validations are critical. If the APR isn't somewhat reasonable at that point it will take a lot of good runtimes to shift it to a better approximation. That's where the runtime_outlier logic will be helpful, and I too hope the Astropulse validator code will be updated soon. The project gets about 600 new hosts a day (either really new or new hostID), it's not nice to leave them exposed to a known weakness in the system.
                                                                 Joe
ID: 1160253 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1160263 - Posted: 8 Oct 2011, 17:49:35 UTC - in response to Message 1160253.  

For all host app versions, the first 10 validations are critical. If the APR isn't somewhat reasonable at that point it will take a lot of good runtimes to shift it to a better approximation. That's where the runtime_outlier logic will be helpful, and I too hope the Astropulse validator code will be updated soon. The project gets about 600 new hosts a day (either really new or new hostID), it's not nice to leave them exposed to a known weakness in the system.
                                                                 Joe

And when v7 goes live on SETI, the project will get about a quarter of a million new hosts in the first month - at least, as far as the application_details are concerned. We really ought to prevail on David to consider that number before the event.....

Anonymous_platform hosts return their GPU hardware characteristics in sched_request. I really don't see why that can't be used to seed APR with a first approximation, instead of using a totally irrelevant CPU metric.
ID: 1160263 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1160264 - Posted: 8 Oct 2011, 17:59:24 UTC - in response to Message 1160263.  

Are there any plans of making version 6.12 usable by high end hosts before trying to roll to V7?
Janice
ID: 1160264 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1160267 - Posted: 8 Oct 2011, 18:05:18 UTC - in response to Message 1160264.  

Are there any plans of making version 6.12 usable by high end hosts before trying to roll to V7?

Versions of what?

The discussion with Joe was about the SETI science application - currently at v6.03 for CPUs, v6.08/09/10 for CUDA GPUs.

Version 6.12 sounds like a BOINC version number - I'm not having any problems with BOINC v6.12.34, though I don't run what you would call a 'high end host'.

What issues make it unusable? I haven't seen any reported on the boinc_alpha mailing list: that would be a better venue for discussing boinc issues than here, though I can pass on messages if needed.
ID: 1160267 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1160271 - Posted: 8 Oct 2011, 18:15:34 UTC - in response to Message 1160267.  

Still nothing uploading from here. I do not believe Bruno is allowing them. I see his vote monitor function is disabled.

ID: 1160271 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1160277 - Posted: 8 Oct 2011, 18:25:49 UTC

Why would it start now? Never happened before.


ID: 1160277 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (58) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.