Panic Mode On (68) Server problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (68) Server problems?

1 · 2 · 3 · 4 . . . 10 · Next
Author Message
Profile arkayn
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3594
Credit: 47,346,074
RAC: 985
United States
Message 1196373 - Posted: 17 Feb 2012, 16:47:28 UTC

Due to database errors and now shorties, it is time for another thread.
____________

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8375
Credit: 46,741,398
RAC: 21,662
United Kingdom
Message 1196387 - Posted: 17 Feb 2012, 17:11:26 UTC

But don't let us lose Cliff's last post in the previous thread: message 1196377

Those 'lost' WU are becomming a problem, I've had about 7 resends of the same WU's dispite the fact I already have them on my rig. They are chewing up my ISP's data d/l quota with the endless resends of the exact same WU.
The amount of space used on my HDD is static since these WU's simply overwrite the already there WU's

Someone at S@H HQ needs to sort out the handshaking problem Boinc and the servers seem to have.

That's a new one on me. Anybody else seeing it? I'll have a scout round my rigs, just in case.

If there are a lot of multiple (unneccessary) resends, that might account for the high traffic flows we were seeing last week.

Profile James Sotherden
Avatar
Send message
Joined: 16 May 99
Posts: 8549
Credit: 31,417,764
RAC: 57,607
United States
Message 1196389 - Posted: 17 Feb 2012, 17:16:00 UTC

Any body else having slow response times in the forums? Just seems to lag.
____________

Old James

Profile arkayn
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3594
Credit: 47,346,074
RAC: 985
United States
Message 1196392 - Posted: 17 Feb 2012, 17:27:21 UTC - in response to Message 1196389.

Any body else having slow response times in the forums? Just seems to lag.


Sometimes yes, and sometimes no.
____________

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8375
Credit: 46,741,398
RAC: 21,662
United Kingdom
Message 1196400 - Posted: 17 Feb 2012, 18:03:20 UTC - in response to Message 1196387.

But don't let us lose Cliff's last post in the previous thread: message 1196377

Those 'lost' WU are becomming a problem, I've had about 7 resends of the same WU's dispite the fact I already have them on my rig. They are chewing up my ISP's data d/l quota with the endless resends of the exact same WU.
The amount of space used on my HDD is static since these WU's simply overwrite the already there WU's

Someone at S@H HQ needs to sort out the handshaking problem Boinc and the servers seem to have.

That's a new one on me. Anybody else seeing it? I'll have a scout round my rigs, just in case.

If there are a lot of multiple (unneccessary) resends, that might account for the high traffic flows we were seeing last week.

Well, I found about 80 resends across my two busiest rigs (which feels high) - but absolutely none of them were repeat resends. I'd need to see some real evidence before calling this one a bug.

uglybiker
Avatar
Send message
Joined: 6 Dec 02
Posts: 15
Credit: 3,010,958
RAC: 0
United States
Message 1196419 - Posted: 17 Feb 2012, 19:31:20 UTC - in response to Message 1196389.

Any body else having slow response times in the forums? Just seems to lag.

The only lag I'm seeing right now is from my wingmen.

____________

Profile Chris S
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 31145
Credit: 11,319,171
RAC: 20,478
United Kingdom
Message 1196428 - Posted: 17 Feb 2012, 19:44:59 UTC

Yes I am seeing forum lag.

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8375
Credit: 46,741,398
RAC: 21,662
United Kingdom
Message 1196452 - Posted: 17 Feb 2012, 21:12:16 UTC - in response to Message 1196428.

Yes I am seeing forum lag.

A whole load of other sites that I was trying to access this afternoon were slow too. That seems to have cleared up, and with it access to SETI has returned to normal as well.

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3334
Credit: 19,050,009
RAC: 19,760
Sweden
Message 1196455 - Posted: 17 Feb 2012, 21:17:34 UTC

We've had such a good AP run for many hours now, but I fear we will soon once again run out of AP work. Crap, I haven't filled my caches yet, I still need at least 175 AP tasks.


____________

Profile Michel448a
Volunteer tester
Avatar
Send message
Joined: 27 Oct 00
Posts: 1201
Credit: 2,891,635
RAC: 0
Canada
Message 1196466 - Posted: 17 Feb 2012, 21:52:03 UTC - in response to Message 1196400.
Last modified: 17 Feb 2012, 21:56:22 UTC

But don't let us lose Cliff's last post in the previous thread: message 1196377

Those 'lost' WU are becomming a problem, I've had about 7 resends of the same WU's dispite the fact I already have them on my rig. They are chewing up my ISP's data d/l quota with the endless resends of the exact same WU.
The amount of space used on my HDD is static since these WU's simply overwrite the already there WU's

Someone at S@H HQ needs to sort out the handshaking problem Boinc and the servers seem to have.

That's a new one on me. Anybody else seeing it? I'll have a scout round my rigs, just in case.

If there are a lot of multiple (unneccessary) resends, that might account for the high traffic flows we were seeing last week.

Well, I found about 80 resends across my two busiest rigs (which feels high) - but absolutely none of them were repeat resends. I'd need to see some real evidence before calling this one a bug.


a thing is sure, if you got ''resent'' you never paid 2x the amount of kb downloaded. they only resent if you never downloaded already.

the only way to have downloaded it 2 times, it s cause you formatted / manually deleted between both sending.

so no worries, a resend = 1x unique download, and it s happening at the moment you see the word ''resend''.

you can verify, open fast your boinc dir / seti / and check the name of the files which the download never started yet (like the bottom of the queue)
____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2237
Credit: 8,451,117
RAC: 4,117
United States
Message 1196481 - Posted: 17 Feb 2012, 22:16:10 UTC
Last modified: 17 Feb 2012, 22:45:07 UTC

I've been seeing not only forum lag, but setiathome.berkeley.edu lag since yesterday afternoon. Sometimes it takes 15-20 seconds for a page to start loading, and other times it is nearly instant.

I just updated my libcurl.dll since when one download server is chosen and fails, changing the hosts file didn't affect which server would get used when hitting "retry now." Doing an ipconfig /flushdns didn't do it, but /release and /renew would.. even though the hosts file is a DNS override, that was the only way I could get it to switch servers. Of course shutting BOINC down and starting it back up would probably work, but that should be moot now with the updated libcurl.

And I saw something that I haven't seen in quite a while in my BOINC Manger. A scrollbar on the tasks tab. All APs. My manager is sized to 1024x720. 35 tasks is when the scrollbar appears for me. I'm excited. Had to push two B3_P1s through, and then the next-in-line started after I was done with that and a task finished.. it was a B6_P0 and it also has 100% blanking. I thought the common failures was one of the B5 channels.. but maybe it is B6_P0.

edit: Okay, so this was annoying me. I pulled up job_log_setiathome.berkeley.edu.txt from the old rig and out of the last ~1800 APs that it did, there were 71 100% blanked tasks.

36 - B3_P1 30 - B6_P0 3 - B6_P1 1 - B0_P1 1 - B4_P0


So I think I have my answer. B6_P0 is the one that fails a lot.. but I think it sometimes has successes, unlike B3_P1.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile Gary Charpentier
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12137
Credit: 6,421,959
RAC: 8,225
United States
Message 1196482 - Posted: 17 Feb 2012, 22:17:20 UTC - in response to Message 1196452.

Yes I am seeing forum lag.

A whole load of other sites that I was trying to access this afternoon were slow too. That seems to have cleared up, and with it access to SETI has returned to normal as well.

www.googleanylitics.com ?

____________

Profile Zapped Sparky
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 30 Aug 08
Posts: 6667
Credit: 1,200,844
RAC: 46
United Kingdom
Message 1196551 - Posted: 18 Feb 2012, 1:19:37 UTC - in response to Message 1196389.

Any body else having slow response times in the forums? Just seems to lag.

Yep, I've had that for a few months. Pretty much "transferring data from setiathome.berkeley.edu", while nothing is actually being transferred.
____________
In an alternate universe, it was a ZX81 that asked for clothes, boots and motorcycle.

Client error 418: I'm a teapot

Tropical Goldfish Fish 13: You're not crazy if you crunch for Seti :)

Profile cliff
Avatar
Send message
Joined: 16 Dec 07
Posts: 322
Credit: 2,509,590
RAC: 0
United Kingdom
Message 1196663 - Posted: 18 Feb 2012, 5:34:14 UTC - in response to Message 1196387.

Hi,
Quick update, got a trial copy of visual route and checked out from me to the seti web server..
VR reports that between IP address 128.32.255.105 and 128.32.18.150 there is a non responding link. it wont respond to VR asking for id or whatever..
Also shows from the UK to S@H traffic is routed via Chicago & White Plains and route is described as good up to that 1st ip address.

Dunno if this is any help or not..

Do know that adding the download server addrsses to my hosts file has 'not' improved the connectivity, boinc is still having problems with http dropouts.
However the 'lost' WU situation seems to have cleared up.

I'm getting a LOT of WU.. 57, 49 at a time:-) Building up a fair reserve, but the endless 'retry' situation persists:-/

There is IMHO a problem somewhere in the Berkeley area thsts stuffing up contact between boinc and the servers..

Regards,

____________
Cliff,
Been there, Done that, Still no damm T shirt!

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2237
Credit: 8,451,117
RAC: 4,117
United States
Message 1196673 - Posted: 18 Feb 2012, 5:44:16 UTC

Something is fishy. Still plenty of channels/tapes for AP to split.. but the creation rate is really low, and there isn't a pile of ready to send, either.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5697
Credit: 56,410,025
RAC: 48,911
Australia
Message 1196763 - Posted: 18 Feb 2012, 8:47:49 UTC - in response to Message 1196663.

There is IMHO a problem somewhere in the Berkeley area thsts stuffing up contact between boinc and the servers..

For some time for those of us not in the US it's been quicker to use a US proxy than just go directly (50-100kB/s v 2kB/s or less).

Lately i've just set my hosts file to use the "good" download server & have had no problems with downloads (within a couple of attempts the WU will download).
____________
Grant
Darwin NT.

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1668
Credit: 203,553,951
RAC: 24,828
Australia
Message 1196767 - Posted: 18 Feb 2012, 9:00:09 UTC - in response to Message 1196763.

There is IMHO a problem somewhere in the Berkeley area thsts stuffing up contact between boinc and the servers..

For some time for those of us not in the US it's been quicker to use a US proxy than just go directly (50-100kB/s v 2kB/s or less).

Lately i've just set my hosts file to use the "good" download server & have had no problems with downloads (within a couple of attempts the WU will download).

+1

But then I'm only just down the road from Grant. (By NT standards anyway.) :-)

T.A.

Profile Anthony Arbuzoff
Volunteer tester
Avatar
Send message
Joined: 6 Apr 00
Posts: 204
Credit: 2,386,618
RAC: 2,919
Russia
Message 1196768 - Posted: 18 Feb 2012, 9:01:59 UTC

Alarm! Count of "Results ready to send" is decreasing for last time! Again splitters have a holiday?
____________

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8375
Credit: 46,741,398
RAC: 21,662
United Kingdom
Message 1196770 - Posted: 18 Feb 2012, 9:33:37 UTC

Remember that Matt said (in Technical News):

...you have to recreate a whole new table from scratch ... and repopulate it with all the data from the "full" table. We have a billion workunits in that table, so to speed this process up we only moved over workunits 90 days old (or newer) before turning the projects on again. We only need 90 days of recent workunits around for the assimilators to work, but to get the NTPCkrs rolling again we need to repopulate the whole thing, which we'll do more casually.

My guess is that they set the science database to copy the other 946 million workunits over the weekend: and if the science database is that busy, the splitters and assimilators - both of which need to access it - will be kept waiting quite a lot of the time.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2237
Credit: 8,451,117
RAC: 4,117
United States
Message 1196773 - Posted: 18 Feb 2012, 9:51:47 UTC - in response to Message 1196770.

Remember that Matt said (in Technical News):

...you have to recreate a whole new table from scratch ... and repopulate it with all the data from the "full" table. We have a billion workunits in that table, so to speed this process up we only moved over workunits 90 days old (or newer) before turning the projects on again. We only need 90 days of recent workunits around for the assimilators to work, but to get the NTPCkrs rolling again we need to repopulate the whole thing, which we'll do more casually.

My guess is that they set the science database to copy the other 946 million workunits over the weekend: and if the science database is that busy, the splitters and assimilators - both of which need to access it - will be kept waiting quite a lot of the time.


I forgot about that. Does make sense now. I would have thought queries/sec would be higher than they are for that operation though.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

1 · 2 · 3 · 4 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (68) Server problems?

Copyright © 2014 University of California