Panic Mode On (68) Server problems?

Message boards : Number crunching : Panic Mode On (68) Server problems?

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 10 · Next

AuthorMessage
Profile arkaynProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4097
Credit: 51,575,896
RAC: 1,692
United States
Message 1196373 - Posted: 17 Feb 2012, 16:47:28 UTC

Due to database errors and now shorties, it is time for another thread.



ID: 1196373 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11132
Credit: 83,456,783
RAC: 40,661
United Kingdom
Message 1196387 - Posted: 17 Feb 2012, 17:11:26 UTC

But don't let us lose Cliff's last post in the previous thread: message 1196377

Those 'lost' WU are becomming a problem, I've had about 7 resends of the same WU's dispite the fact I already have them on my rig. They are chewing up my ISP's data d/l quota with the endless resends of the exact same WU.
The amount of space used on my HDD is static since these WU's simply overwrite the already there WU's

Someone at S@H HQ needs to sort out the handshaking problem Boinc and the servers seem to have.

That's a new one on me. Anybody else seeing it? I'll have a scout round my rigs, just in case.

If there are a lot of multiple (unneccessary) resends, that might account for the high traffic flows we were seeing last week.

ID: 1196387 · Report as offensive
Profile James SotherdenProject Donor
Avatar

Send message
Joined: 16 May 99
Posts: 10133
Credit: 65,578,809
RAC: 34,805
United States
Message 1196389 - Posted: 17 Feb 2012, 17:16:00 UTC

Any body else having slow response times in the forums? Just seems to lag.


[/quote]

Old James

ID: 1196389 · Report as offensive
Profile arkaynProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4097
Credit: 51,575,896
RAC: 1,692
United States
Message 1196392 - Posted: 17 Feb 2012, 17:27:21 UTC - in response to Message 1196389.

Any body else having slow response times in the forums? Just seems to lag.


Sometimes yes, and sometimes no.

ID: 1196392 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11132
Credit: 83,456,783
RAC: 40,661
United Kingdom
Message 1196400 - Posted: 17 Feb 2012, 18:03:20 UTC - in response to Message 1196387.

But don't let us lose Cliff's last post in the previous thread: message 1196377

Those 'lost' WU are becomming a problem, I've had about 7 resends of the same WU's dispite the fact I already have them on my rig. They are chewing up my ISP's data d/l quota with the endless resends of the exact same WU.
The amount of space used on my HDD is static since these WU's simply overwrite the already there WU's

Someone at S@H HQ needs to sort out the handshaking problem Boinc and the servers seem to have.

That's a new one on me. Anybody else seeing it? I'll have a scout round my rigs, just in case.

If there are a lot of multiple (unneccessary) resends, that might account for the high traffic flows we were seeing last week.

Well, I found about 80 resends across my two busiest rigs (which feels high) - but absolutely none of them were repeat resends. I'd need to see some real evidence before calling this one a bug.

ID: 1196400 · Report as offensive
uglybiker
Volunteer tester
Avatar

Send message
Joined: 6 Dec 02
Posts: 29
Credit: 9,690,484
RAC: 0
United States
Message 1196419 - Posted: 17 Feb 2012, 19:31:20 UTC - in response to Message 1196389.

Any body else having slow response times in the forums? Just seems to lag.

The only lag I'm seeing right now is from my wingmen.

ID: 1196419 · Report as offensive
Profile Chris SCrowdfunding Project Donor
Volunteer tester
Avatar

Send message
Joined: 19 Nov 00
Posts: 38176
Credit: 21,205,548
RAC: 27,664
United Kingdom
Message 1196428 - Posted: 17 Feb 2012, 19:44:59 UTC

Yes I am seeing forum lag.

ID: 1196428 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11132
Credit: 83,456,783
RAC: 40,661
United Kingdom
Message 1196452 - Posted: 17 Feb 2012, 21:12:16 UTC - in response to Message 1196428.

Yes I am seeing forum lag.

A whole load of other sites that I was trying to access this afternoon were slow too. That seems to have cleared up, and with it access to SETI has returned to normal as well.

ID: 1196452 · Report as offensive
Tutankhamon "Communist"
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 6081
Credit: 37,580,725
RAC: 14,629
Sweden
Message 1196455 - Posted: 17 Feb 2012, 21:17:34 UTC

We've had such a good AP run for many hours now, but I fear we will soon once again run out of AP work. Crap, I haven't filled my caches yet, I still need at least 175 AP tasks.


This is a test of the Emergency Moron System. Had there been a real moron in the room, there would've been a small mushroom cloud in the place where the idiot had been standing.

ID: 1196455 · Report as offensive
Profile Michel448a
Volunteer tester
Avatar

Send message
Joined: 27 Oct 00
Posts: 1201
Credit: 2,891,635
RAC: 0
Canada
Message 1196466 - Posted: 17 Feb 2012, 21:52:03 UTC - in response to Message 1196400.
Last modified: 17 Feb 2012, 21:56:22 UTC

But don't let us lose Cliff's last post in the previous thread: message 1196377

Those 'lost' WU are becomming a problem, I've had about 7 resends of the same WU's dispite the fact I already have them on my rig. They are chewing up my ISP's data d/l quota with the endless resends of the exact same WU.
The amount of space used on my HDD is static since these WU's simply overwrite the already there WU's

Someone at S@H HQ needs to sort out the handshaking problem Boinc and the servers seem to have.

That's a new one on me. Anybody else seeing it? I'll have a scout round my rigs, just in case.

If there are a lot of multiple (unneccessary) resends, that might account for the high traffic flows we were seeing last week.

Well, I found about 80 resends across my two busiest rigs (which feels high) - but absolutely none of them were repeat resends. I'd need to see some real evidence before calling this one a bug.


a thing is sure, if you got ''resent'' you never paid 2x the amount of kb downloaded. they only resent if you never downloaded already.

the only way to have downloaded it 2 times, it s cause you formatted / manually deleted between both sending.

so no worries, a resend = 1x unique download, and it s happening at the moment you see the word ''resend''.

you can verify, open fast your boinc dir / seti / and check the name of the files which the download never started yet (like the bottom of the queue)

ID: 1196466 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,619,636
RAC: 317
United States
Message 1196481 - Posted: 17 Feb 2012, 22:16:10 UTC
Last modified: 17 Feb 2012, 22:45:07 UTC

I've been seeing not only forum lag, but setiathome.berkeley.edu lag since yesterday afternoon. Sometimes it takes 15-20 seconds for a page to start loading, and other times it is nearly instant.

I just updated my libcurl.dll since when one download server is chosen and fails, changing the hosts file didn't affect which server would get used when hitting "retry now." Doing an ipconfig /flushdns didn't do it, but /release and /renew would.. even though the hosts file is a DNS override, that was the only way I could get it to switch servers. Of course shutting BOINC down and starting it back up would probably work, but that should be moot now with the updated libcurl.

And I saw something that I haven't seen in quite a while in my BOINC Manger. A scrollbar on the tasks tab. All APs. My manager is sized to 1024x720. 35 tasks is when the scrollbar appears for me. I'm excited. Had to push two B3_P1s through, and then the next-in-line started after I was done with that and a task finished.. it was a B6_P0 and it also has 100% blanking. I thought the common failures was one of the B5 channels.. but maybe it is B6_P0.

edit: Okay, so this was annoying me. I pulled up job_log_setiathome.berkeley.edu.txt from the old rig and out of the last ~1800 APs that it did, there were 71 100% blanked tasks.

36 - B3_P1
30 - B6_P0
 3 - B6_P1
 1 - B0_P1
 1 - B4_P0


So I think I have my answer. B6_P0 is the one that fails a lot.. but I think it sometimes has successes, unlike B3_P1.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)

ID: 1196481 · Report as offensive
Profile Gary CharpentierCrowdfunding Project Donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 18602
Credit: 21,342,012
RAC: 19,550
United States
Message 1196482 - Posted: 17 Feb 2012, 22:17:20 UTC - in response to Message 1196452.

Yes I am seeing forum lag.

A whole load of other sites that I was trying to access this afternoon were slow too. That seems to have cleared up, and with it access to SETI has returned to normal as well.

www.googleanylitics.com ?

ID: 1196482 · Report as offensive
Profile Dimly Lit Lightbulb 😀Project Donor
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 14363
Credit: 2,922,656
RAC: 6,250
United Kingdom
Message 1196551 - Posted: 18 Feb 2012, 1:19:37 UTC - in response to Message 1196389.

Any body else having slow response times in the forums? Just seems to lag.

Yep, I've had that for a few months. Pretty much "transferring data from setiathome.berkeley.edu", while nothing is actually being transferred.

ID: 1196551 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1196663 - Posted: 18 Feb 2012, 5:34:14 UTC - in response to Message 1196387.

Hi,
Quick update, got a trial copy of visual route and checked out from me to the seti web server..
VR reports that between IP address 128.32.255.105 and 128.32.18.150 there is a non responding link. it wont respond to VR asking for id or whatever..
Also shows from the UK to S@H traffic is routed via Chicago & White Plains and route is described as good up to that 1st ip address.

Dunno if this is any help or not..

Do know that adding the download server addrsses to my hosts file has 'not' improved the connectivity, boinc is still having problems with http dropouts.
However the 'lost' WU situation seems to have cleared up.

I'm getting a LOT of WU.. 57, 49 at a time:-) Building up a fair reserve, but the endless 'retry' situation persists:-/

There is IMHO a problem somewhere in the Berkeley area thsts stuffing up contact between boinc and the servers..

Regards,


Cliff,
Been there, Done that, Still no damm T shirt!

ID: 1196663 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,619,636
RAC: 317
United States
Message 1196673 - Posted: 18 Feb 2012, 5:44:16 UTC

Something is fishy. Still plenty of channels/tapes for AP to split.. but the creation rate is really low, and there isn't a pile of ready to send, either.


Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)

ID: 1196673 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7474
Credit: 90,820,360
RAC: 45,047
Australia
Message 1196763 - Posted: 18 Feb 2012, 8:47:49 UTC - in response to Message 1196663.

There is IMHO a problem somewhere in the Berkeley area thsts stuffing up contact between boinc and the servers..

For some time for those of us not in the US it's been quicker to use a US proxy than just go directly (50-100kB/s v 2kB/s or less).

Lately i've just set my hosts file to use the "good" download server & have had no problems with downloads (within a couple of attempts the WU will download).
Grant
Darwin NT

ID: 1196763 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1790
Credit: 225,254,910
RAC: 10,104
Australia
Message 1196767 - Posted: 18 Feb 2012, 9:00:09 UTC - in response to Message 1196763.

There is IMHO a problem somewhere in the Berkeley area thsts stuffing up contact between boinc and the servers..

For some time for those of us not in the US it's been quicker to use a US proxy than just go directly (50-100kB/s v 2kB/s or less).

Lately i've just set my hosts file to use the "good" download server & have had no problems with downloads (within a couple of attempts the WU will download).

+1

But then I'm only just down the road from Grant. (By NT standards anyway.) :-)

T.A.

ID: 1196767 · Report as offensive
Profile Belthazor
Volunteer tester
Avatar

Send message
Joined: 6 Apr 00
Posts: 218
Credit: 3,974,155
RAC: 4,026
Russia
Message 1196768 - Posted: 18 Feb 2012, 9:01:59 UTC

Alarm! Count of "Results ready to send" is decreasing for last time! Again splitters have a holiday?


ID: 1196768 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11132
Credit: 83,456,783
RAC: 40,661
United Kingdom
Message 1196770 - Posted: 18 Feb 2012, 9:33:37 UTC

Remember that Matt said (in Technical News):

...you have to recreate a whole new table from scratch ... and repopulate it with all the data from the "full" table. We have a billion workunits in that table, so to speed this process up we only moved over workunits 90 days old (or newer) before turning the projects on again. We only need 90 days of recent workunits around for the assimilators to work, but to get the NTPCkrs rolling again we need to repopulate the whole thing, which we'll do more casually.

My guess is that they set the science database to copy the other 946 million workunits over the weekend: and if the science database is that busy, the splitters and assimilators - both of which need to access it - will be kept waiting quite a lot of the time.

ID: 1196770 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,619,636
RAC: 317
United States
Message 1196773 - Posted: 18 Feb 2012, 9:51:47 UTC - in response to Message 1196770.

Remember that Matt said (in Technical News):

...you have to recreate a whole new table from scratch ... and repopulate it with all the data from the "full" table. We have a billion workunits in that table, so to speed this process up we only moved over workunits 90 days old (or newer) before turning the projects on again. We only need 90 days of recent workunits around for the assimilators to work, but to get the NTPCkrs rolling again we need to repopulate the whole thing, which we'll do more casually.

My guess is that they set the science database to copy the other 946 million workunits over the weekend: and if the science database is that busy, the splitters and assimilators - both of which need to access it - will be kept waiting quite a lot of the time.


I forgot about that. Does make sense now. I would have thought queries/sec would be higher than they are for that operation though.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)

ID: 1196773 · Report as offensive
1 · 2 · 3 · 4 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (68) Server problems?


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.