Panic Mode On (68) Server problems?

Message boards : Number crunching : Panic Mode On (68) Server problems?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 8 · Next

AuthorMessage
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1196373 - Posted: 17 Feb 2012, 16:47:28 UTC

Due to database errors and now shorties, it is time for another thread.

ID: 1196373 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1196387 - Posted: 17 Feb 2012, 17:11:26 UTC

But don't let us lose Cliff's last post in the previous thread: message 1196377

Those 'lost' WU are becomming a problem, I've had about 7 resends of the same WU's dispite the fact I already have them on my rig. They are chewing up my ISP's data d/l quota with the endless resends of the exact same WU.
The amount of space used on my HDD is static since these WU's simply overwrite the already there WU's

Someone at S@H HQ needs to sort out the handshaking problem Boinc and the servers seem to have.

That's a new one on me. Anybody else seeing it? I'll have a scout round my rigs, just in case.

If there are a lot of multiple (unneccessary) resends, that might account for the high traffic flows we were seeing last week.
ID: 1196387 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1196389 - Posted: 17 Feb 2012, 17:16:00 UTC

Any body else having slow response times in the forums? Just seems to lag.
[/quote]

Old James
ID: 1196389 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1196392 - Posted: 17 Feb 2012, 17:27:21 UTC - in response to Message 1196389.  

Any body else having slow response times in the forums? Just seems to lag.


Sometimes yes, and sometimes no.

ID: 1196392 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1196400 - Posted: 17 Feb 2012, 18:03:20 UTC - in response to Message 1196387.  

But don't let us lose Cliff's last post in the previous thread: message 1196377

Those 'lost' WU are becomming a problem, I've had about 7 resends of the same WU's dispite the fact I already have them on my rig. They are chewing up my ISP's data d/l quota with the endless resends of the exact same WU.
The amount of space used on my HDD is static since these WU's simply overwrite the already there WU's

Someone at S@H HQ needs to sort out the handshaking problem Boinc and the servers seem to have.

That's a new one on me. Anybody else seeing it? I'll have a scout round my rigs, just in case.

If there are a lot of multiple (unneccessary) resends, that might account for the high traffic flows we were seeing last week.

Well, I found about 80 resends across my two busiest rigs (which feels high) - but absolutely none of them were repeat resends. I'd need to see some real evidence before calling this one a bug.
ID: 1196400 · Report as offensive
uglybiker
Volunteer tester
Avatar

Send message
Joined: 6 Dec 02
Posts: 32
Credit: 11,417,951
RAC: 42
United States
Message 1196419 - Posted: 17 Feb 2012, 19:31:20 UTC - in response to Message 1196389.  

Any body else having slow response times in the forums? Just seems to lag.

The only lag I'm seeing right now is from my wingmen.

ID: 1196419 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1196452 - Posted: 17 Feb 2012, 21:12:16 UTC - in response to Message 1196428.  

Yes I am seeing forum lag.

A whole load of other sites that I was trying to access this afternoon were slow too. That seems to have cleared up, and with it access to SETI has returned to normal as well.
ID: 1196452 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1196481 - Posted: 17 Feb 2012, 22:16:10 UTC
Last modified: 17 Feb 2012, 22:45:07 UTC

I've been seeing not only forum lag, but setiathome.berkeley.edu lag since yesterday afternoon. Sometimes it takes 15-20 seconds for a page to start loading, and other times it is nearly instant.

I just updated my libcurl.dll since when one download server is chosen and fails, changing the hosts file didn't affect which server would get used when hitting "retry now." Doing an ipconfig /flushdns didn't do it, but /release and /renew would.. even though the hosts file is a DNS override, that was the only way I could get it to switch servers. Of course shutting BOINC down and starting it back up would probably work, but that should be moot now with the updated libcurl.

And I saw something that I haven't seen in quite a while in my BOINC Manger. A scrollbar on the tasks tab. All APs. My manager is sized to 1024x720. 35 tasks is when the scrollbar appears for me. I'm excited. Had to push two B3_P1s through, and then the next-in-line started after I was done with that and a task finished.. it was a B6_P0 and it also has 100% blanking. I thought the common failures was one of the B5 channels.. but maybe it is B6_P0.

edit: Okay, so this was annoying me. I pulled up job_log_setiathome.berkeley.edu.txt from the old rig and out of the last ~1800 APs that it did, there were 71 100% blanked tasks.

36 - B3_P1
30 - B6_P0
 3 - B6_P1
 1 - B0_P1
 1 - B4_P0


So I think I have my answer. B6_P0 is the one that fails a lot.. but I think it sometimes has successes, unlike B3_P1.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1196481 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30608
Credit: 53,134,872
RAC: 32
United States
Message 1196482 - Posted: 17 Feb 2012, 22:17:20 UTC - in response to Message 1196452.  

Yes I am seeing forum lag.

A whole load of other sites that I was trying to access this afternoon were slow too. That seems to have cleared up, and with it access to SETI has returned to normal as well.

www.googleanylitics.com ?

ID: 1196482 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1196551 - Posted: 18 Feb 2012, 1:19:37 UTC - in response to Message 1196389.  

Any body else having slow response times in the forums? Just seems to lag.

Yep, I've had that for a few months. Pretty much "transferring data from setiathome.berkeley.edu", while nothing is actually being transferred.

Member of the People Encouraging Niceness In Society club.

ID: 1196551 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1196663 - Posted: 18 Feb 2012, 5:34:14 UTC - in response to Message 1196387.  

Hi,
Quick update, got a trial copy of visual route and checked out from me to the seti web server..
VR reports that between IP address 128.32.255.105 and 128.32.18.150 there is a non responding link. it wont respond to VR asking for id or whatever..
Also shows from the UK to S@H traffic is routed via Chicago & White Plains and route is described as good up to that 1st ip address.

Dunno if this is any help or not..

Do know that adding the download server addrsses to my hosts file has 'not' improved the connectivity, boinc is still having problems with http dropouts.
However the 'lost' WU situation seems to have cleared up.

I'm getting a LOT of WU.. 57, 49 at a time:-) Building up a fair reserve, but the endless 'retry' situation persists:-/

There is IMHO a problem somewhere in the Berkeley area thsts stuffing up contact between boinc and the servers..

Regards,

Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1196663 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1196673 - Posted: 18 Feb 2012, 5:44:16 UTC

Something is fishy. Still plenty of channels/tapes for AP to split.. but the creation rate is really low, and there isn't a pile of ready to send, either.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1196673 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1196763 - Posted: 18 Feb 2012, 8:47:49 UTC - in response to Message 1196663.  

There is IMHO a problem somewhere in the Berkeley area thsts stuffing up contact between boinc and the servers..

For some time for those of us not in the US it's been quicker to use a US proxy than just go directly (50-100kB/s v 2kB/s or less).

Lately i've just set my hosts file to use the "good" download server & have had no problems with downloads (within a couple of attempts the WU will download).
Grant
Darwin NT
ID: 1196763 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1196767 - Posted: 18 Feb 2012, 9:00:09 UTC - in response to Message 1196763.  

There is IMHO a problem somewhere in the Berkeley area thsts stuffing up contact between boinc and the servers..

For some time for those of us not in the US it's been quicker to use a US proxy than just go directly (50-100kB/s v 2kB/s or less).

Lately i've just set my hosts file to use the "good" download server & have had no problems with downloads (within a couple of attempts the WU will download).

+1

But then I'm only just down the road from Grant. (By NT standards anyway.) :-)

T.A.
ID: 1196767 · Report as offensive
Profile Belthazor
Volunteer tester
Avatar

Send message
Joined: 6 Apr 00
Posts: 219
Credit: 10,373,795
RAC: 13
Russia
Message 1196768 - Posted: 18 Feb 2012, 9:01:59 UTC

Alarm! Count of "Results ready to send" is decreasing for last time! Again splitters have a holiday?
ID: 1196768 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1196770 - Posted: 18 Feb 2012, 9:33:37 UTC

Remember that Matt said (in Technical News):

...you have to recreate a whole new table from scratch ... and repopulate it with all the data from the "full" table. We have a billion workunits in that table, so to speed this process up we only moved over workunits 90 days old (or newer) before turning the projects on again. We only need 90 days of recent workunits around for the assimilators to work, but to get the NTPCkrs rolling again we need to repopulate the whole thing, which we'll do more casually.

My guess is that they set the science database to copy the other 946 million workunits over the weekend: and if the science database is that busy, the splitters and assimilators - both of which need to access it - will be kept waiting quite a lot of the time.

ID: 1196770 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1196773 - Posted: 18 Feb 2012, 9:51:47 UTC - in response to Message 1196770.  

Remember that Matt said (in Technical News):

...you have to recreate a whole new table from scratch ... and repopulate it with all the data from the "full" table. We have a billion workunits in that table, so to speed this process up we only moved over workunits 90 days old (or newer) before turning the projects on again. We only need 90 days of recent workunits around for the assimilators to work, but to get the NTPCkrs rolling again we need to repopulate the whole thing, which we'll do more casually.

My guess is that they set the science database to copy the other 946 million workunits over the weekend: and if the science database is that busy, the splitters and assimilators - both of which need to access it - will be kept waiting quite a lot of the time.


I forgot about that. Does make sense now. I would have thought queries/sec would be higher than they are for that operation though.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1196773 · Report as offensive
Profile Belthazor
Volunteer tester
Avatar

Send message
Joined: 6 Apr 00
Posts: 219
Credit: 10,373,795
RAC: 13
Russia
Message 1196777 - Posted: 18 Feb 2012, 10:02:30 UTC

As for me, it's a wrong point of view. If it would be so, splitters must be disabled, but they working though without producing new WUs. Curious situation. Moreover, if I would be S@H stuff, I hardly schedulled such task on the holiday when I myself outside of the lab.
ID: 1196777 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1196778 - Posted: 18 Feb 2012, 10:09:36 UTC - in response to Message 1196773.  

Remember that Matt said (in Technical News):

...you have to recreate a whole new table from scratch ... and repopulate it with all the data from the "full" table. We have a billion workunits in that table, so to speed this process up we only moved over workunits 90 days old (or newer) before turning the projects on again. We only need 90 days of recent workunits around for the assimilators to work, but to get the NTPCkrs rolling again we need to repopulate the whole thing, which we'll do more casually.

My guess is that they set the science database to copy the other 946 million workunits over the weekend: and if the science database is that busy, the splitters and assimilators - both of which need to access it - will be kept waiting quite a lot of the time.

I forgot about that. Does make sense now. I would have thought queries/sec would be higher than they are for that operation though.

I think the query rate is for the BOINC database, rather than the science database.

At least, new WUs are still being split (11/sec), even if they're being allocated as fast as they can be produced - so there are none spare for a 'ready to send' buffer.
ID: 1196778 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1196780 - Posted: 18 Feb 2012, 10:21:52 UTC - in response to Message 1196770.  

Remember that Matt said (in Technical News):

...you have to recreate a whole new table from scratch ... and repopulate it with all the data from the "full" table. We have a billion workunits in that table, so to speed this process up we only moved over workunits 90 days old (or newer) before turning the projects on again. We only need 90 days of recent workunits around for the assimilators to work, but to get the NTPCkrs rolling again we need to repopulate the whole thing, which we'll do more casually.

My guess is that they set the science database to copy the other 946 million workunits over the weekend: and if the science database is that busy, the splitters and assimilators - both of which need to access it - will be kept waiting quite a lot of the time.


Also related is from Matt's previous post,

Speaking of network competition - yes, we're away that we are dropping all kinds of connections during uploads/downloads. This isn't because of our router (which was definitely the problem over the summer before we added RAM to it), but somewhere else further up the pipeline. Still figuring this out, but it's certainly load related.

which still needs to be fixed (which is where a good proxy will get you around it).

Cheers.
ID: 1196780 · Report as offensive
1 · 2 · 3 · 4 . . . 8 · Next

Message boards : Number crunching : Panic Mode On (68) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.