Panic Mode On (28) Server problems

Message boards : Number crunching : Panic Mode On (28) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 16 · Next

AuthorMessage
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 963868 - Posted: 16 Jan 2010, 18:31:28 UTC - in response to Message 963866.  

And if all else fails READ THE MANUAL.

Dave
ID: 963868 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 963870 - Posted: 16 Jan 2010, 18:40:21 UTC - in response to Message 963868.  
Last modified: 16 Jan 2010, 18:44:37 UTC

And if all else fails READ THE MANUAL.

Dave

LOL......there actually IS a Manual..........


Tells you all you need to know about how to screw your Boinc client into the ground.......

This one works especially well......
<ignore_cuda_dev>, <ignore_ati_dev>
ignore (don't use) a specific NVIDIA or ATI GPU. You can ignore more than one. Image:list-add.pngNew in 6.10.19

You can ignore more than one Cuda card at a time.......LOL.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 963870 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 963871 - Posted: 16 Jan 2010, 18:42:48 UTC - in response to Message 963868.  
Last modified: 16 Jan 2010, 18:43:31 UTC

And if all else fails READ THE MANUAL.

I've never yet come across a manual which would own up to statements like

"This router will scramble DNS lookups if run continuously for more than n months."

But I do find I have to reboot my mobile [cell] phone every couple of weeks, and my HDD digital TV recorder every month or two - Windows XP is stable compared to either of those, LOL.
ID: 963871 · Report as offensive
Profile dnolan
Avatar

Send message
Joined: 30 Aug 01
Posts: 1228
Credit: 47,779,411
RAC: 32
United States
Message 963873 - Posted: 16 Jan 2010, 18:44:28 UTC - in response to Message 963868.  

And if all else fails READ THE MANUAL.

Dave


When I did tech support I frequently got people calling in and I would tell them, "Go find the manual and I'll tell you what page to look at...", and they would reply, "I don't need a manual, I have the support number..." or something similar.

-Dave
ID: 963873 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 963874 - Posted: 16 Jan 2010, 18:46:17 UTC - in response to Message 963873.  

And if all else fails READ THE MANUAL.

Dave


When I did tech support I frequently got people calling in and I would tell them, "Go find the manual and I'll tell you what page to look at...", and they would reply, "I don't need a manual, I have the support number..." or something similar.

-Dave

Aww come on now......the classic has to be the lady who called in and said her cupholder would not retract anymore.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 963874 · Report as offensive
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 963876 - Posted: 16 Jan 2010, 18:55:13 UTC

Don't you just love it when there are LOADS of Shorties about.
You do a thousand and your RAC goes down LOL
Server getting a bit of a hammering.

Dave
ID: 963876 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 963881 - Posted: 16 Jan 2010, 19:07:59 UTC - in response to Message 963876.  

Don't you just love it when there are LOADS of Shorties about.
You do a thousand and your RAC goes down LOL
Server getting a bit of a hammering.

Dave

Dunno.......my rigs are always digging into a 10 day lunch.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 963881 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 963906 - Posted: 16 Jan 2010, 20:16:40 UTC - in response to Message 963876.  

Don't you just love it when there are LOADS of Shorties about.
You do a thousand and your RAC goes down LOL
Server getting a bit of a hammering.

Dave

Going thru the same thing...At least the bounce back is good.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 963906 · Report as offensive
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 963907 - Posted: 16 Jan 2010, 20:17:44 UTC - in response to Message 963906.  

Yep
ID: 963907 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 963939 - Posted: 16 Jan 2010, 22:14:36 UTC - in response to Message 963871.  

And if all else fails READ THE MANUAL.

I've never yet come across a manual which would own up to statements like

"This router will scramble DNS lookups if run continuously for more than n months."

But I do find I have to reboot my mobile [cell] phone every couple of weeks, and my HDD digital TV recorder every month or two - Windows XP is stable compared to either of those, LOL.


I have to reboot my wireless router and my wireless access point (I need both to cover my house since there's a lot of interference) every so often otherwise I can't access the management interfaces. They keep working regardless, so it's not a big deal, but it's annoying when I try to change a setting remotely only to find that I have to walk over to the device and power cycle it just to do what I want.

Don't have the problem with my DVR, thank the flying spaghetti monster. I'll find out how well my new cell phone works when I get it on Wednesday.
ID: 963939 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 963955 - Posted: 16 Jan 2010, 23:11:40 UTC - in response to Message 963550.  


Flipped a few log flags, found it was trying the wrong IP address. Backtracked up the IP/DNS trail, and found that my local machine DNS cache looked like this:

         setiboincdata.ssl.berkeley.edu
         ----------------------------------------
         Record Name . . . . . : setiboincdata.ssl.berkeley.edu
         A (Host) Record . . . : 16.240.68.208

         boinc2.ssl.berkeley.edu
         ----------------------------------------
         Record Name . . . . . : boinc2.ssl.berkeley.edu
         A (Host) Record . . . : 208.68.240.13


Finger of suspicion? I power-cycled the router, and all the stuck uploads went through at the first attempt, from all three affected machines.


Richard,

I think I'd call the vendor and explain to them that it's fine that this interferes when you're using the 'net solely for entertainment, but sooner or later it is going to send your online banking details to some third party.

... and if they empty your bank account, they know who you're going to call.

DNS is mission critical, and if the router is messing this up (and I think we finally know the culprit) that's inexcusable.

-- Ned

ID: 963955 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 963962 - Posted: 16 Jan 2010, 23:26:31 UTC - in response to Message 963955.  

Or could it be the ISP?, and repowering the Router just got updated DNS?

Slashdot had a Story of another Strange Glitch: AT&T Glitch Connects Users To Wrong Accounts

Claggy
ID: 963962 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 963964 - Posted: 16 Jan 2010, 23:33:50 UTC

I couldn't count how many service call I did, when I was in the TV repair business back in the stone age, where all I did was put the plug back in the wall socket. Got to love them dogs, cats and vacuum cleaners.

Boinc V7.2.42
Win7 i5 3.33G 4GB, GTX470
ID: 963964 · Report as offensive
Scarecrow

Send message
Joined: 15 Jul 00
Posts: 4520
Credit: 486,601
RAC: 0
United States
Message 963968 - Posted: 16 Jan 2010, 23:45:04 UTC - in response to Message 963964.  

I couldn't count how many service call I did, when I was in the TV repair business back in the stone age, where all I did was put the plug back in the wall socket. Got to love them dogs, cats and vacuum cleaners.


Hee hee... I had that very scenario a while back at a rather important US military installation. "Mission critical line printer" went stone dead. 02:00 service call with the highest priority was issued. I got there and found a large waste container had been slid behind the printer unplugging the power cord from the wall. Lots of red faces in khaki uniforms in the room at that point. I told them that I would sleep much better when I got back home knowing they were on duty. :)
_________________
*** BOFH excuse #205:
Quantum dynamics are affecting the transistors

ID: 963968 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 963970 - Posted: 16 Jan 2010, 23:45:55 UTC - in response to Message 963962.  

Or could it be the ISP?, and repowering the Router just got updated DNS?

Slashdot had a Story of another Strange Glitch: AT&T Glitch Connects Users To Wrong Accounts

Claggy

It could be his ISP, but it's unlikely. Across their entire customer base, this would generate enough complaints that they'd know about it.

... and if it was rare, it wouldn't hit Richard so consistently.

Most ISPs use BIND or something derived from BIND. I've not seen anyone talk about a BIND bug that does this.
ID: 963970 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 963979 - Posted: 17 Jan 2010, 0:42:09 UTC - in response to Message 963970.  

Or could it be the ISP?, and repowering the Router just got updated DNS?

Slashdot had a Story of another Strange Glitch: AT&T Glitch Connects Users To Wrong Accounts

Claggy

It could be his ISP, but it's unlikely. Across their entire customer base, this would generate enough complaints that they'd know about it.

... and if it was rare, it wouldn't hit Richard so consistently.

Most ISPs use BIND or something derived from BIND. I've not seen anyone talk about a BIND bug that does this.

Yes, I've seen this before - and that's why I checked the IP addresses so thoroughly, and recognised the 'reversed octets' when I saw them.

But "consistently"? Maybe three times in as many years? Evidence is limited, my friend. Remember how much effort (and research on your part) we needed to fix the libcurl bug - and nobody was accepting ownership of that one, either. How many of your clients research the data, and present documented facts, before they pick up the phone and demand their service be restored? I'd like to think that the only 'consistency' here is my attempt to research and comprehend the problem, and lay the facts out in public, before I attempt to apportion any blame.

I've used the Draytek Vigor range, and must confess to liking and recommending them, since 2001. First, because their 2200USB model was the only legal way of connecting multiple computers (running seti@home classic, of course) to the new-fangled ADSL service and Alcatel 'Frog' modem that came with it that year: and secondly, because it's the only reasonably-priced router range that I've found that can accept VPN termination in the router, rather than pass-through to the server I'm trying to manage remotely. I like that double security.

So: it could be the router, and it could be the ISP. Since I have - for certain - also been sent a virus-infected email by that same ISP's helpdesk (see silicon.com, and other press reports), I am at least as inclined to blame the ISP as the router. Anyone that can help me tie-break those possibilities is welcome to chip in.
ID: 963979 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 963987 - Posted: 17 Jan 2010, 1:23:32 UTC - in response to Message 963979.  

Yes, I've seen this before - and that's why I checked the IP addresses so thoroughly, and recognised the 'reversed octets' when I saw them.

But "consistently"? Maybe three times in as many years? Evidence is limited, my friend. Remember how much effort (and research on your part) we needed to fix the libcurl bug - and nobody was accepting ownership of that one, either. How many of your clients research the data, and present documented facts, before they pick up the phone and demand their service be restored? I'd like to think that the only 'consistency' here is my attempt to research and comprehend the problem, and lay the facts out in public, before I attempt to apportion any blame.

None of my clients research problems (or features), so I follow a couple of rules.

I test and verify everything as much as possible.

I eat my own dog-food -- which is to say, I'm probably the most critical user I have, and I use all my own services every day.

The origin of the problem is that we've got a lot of little-endian processors in a big-endian internet. (explanation)

Little endian processors have to reverse the bytes to make the internet work, and something is doing that just a tiny bit wrong.

I'd think that if the average user hit this three times a year, your ISP would get tons of calls about it -- and I think if resetting the router fixes it, that's a very interesting tell-tale.

... and as a developer, three times a year is a nightmare.

That's why I don't think it's your ISP, but that's an opinion. We need a way to test.

Does this happen to every machine when it happens? Manually point some of your machines to your ISPs DNS and leave some pointed at the router.

Maybe that'll prove it -- all we have to do is wait about six months.

ID: 963987 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 963992 - Posted: 17 Jan 2010, 1:37:37 UTC - in response to Message 963987.  

Yes, I've seen this before - and that's why I checked the IP addresses so thoroughly, and recognised the 'reversed octets' when I saw them.

But "consistently"? Maybe three times in as many years? Evidence is limited, my friend. Remember how much effort (and research on your part) we needed to fix the libcurl bug - and nobody was accepting ownership of that one, either. How many of your clients research the data, and present documented facts, before they pick up the phone and demand their service be restored? I'd like to think that the only 'consistency' here is my attempt to research and comprehend the problem, and lay the facts out in public, before I attempt to apportion any blame.

None of my clients research problems (or features), so I follow a couple of rules.

I test and verify everything as much as possible.

I eat my own dog-food -- which is to say, I'm probably the most critical user I have, and I use all my own services every day.

The origin of the problem is that we've got a lot of little-endian processors in a big-endian internet. (explanation)

Little endian processors have to reverse the bytes to make the internet work, and something is doing that just a tiny bit wrong.

I'd think that if the average user hit this three times a year, your ISP would get tons of calls about it -- and I think if resetting the router fixes it, that's a very interesting tell-tale.

... and as a developer, three times a year is a nightmare.

That's why I don't think it's your ISP, but that's an opinion. We need a way to test.

Does this happen to every machine when it happens? Manually point some of your machines to your ISPs DNS and leave some pointed at the router.

Maybe that'll prove it -- all we have to do is wait about six months.

Indeed. The problem, as I stated in the initial post, only affected the three machines I had earlier updated to v6.10.29 (first false trail!).

I got as far as checking that my Vista32, BOINC v5.10.13 box could upload to SETI OK through the same router, and the v6.10.29 boxes could upload to Einstein. That didn't help! But I neglected to note what upstream DNS the router had been assigned at the time.

No problems since then, so I'm placing it on record (for my own future reference) that BT's DNS servers 62.6.40.178 (p) and 194.72.9.38 (s) are working correctly as at the time of this posting. Since the router has been online for 32:26:41 (I also like the Vigor's nice clean management interface, and fast - 3 second - reboot time for this model), I think those are the DNS addresses I got at the power cycle.
ID: 963992 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 963997 - Posted: 17 Jan 2010, 2:04:00 UTC
Last modified: 17 Jan 2010, 2:08:51 UTC


Because of service people/call center..

You know the sitcom 'The IT Crowd' ?

The first question they say if they get a call: 'You made already switch OFF/ON'?

:-D


Original:
'Have you tried turning it off and on again?'
'Are you sure it's plugged in?'



____________
[Optimized project applications, for to increase your PC performance (double RAC)!][Overview of abbreviations, which are used often in forum and their meaning.]
ID: 963997 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 964010 - Posted: 17 Jan 2010, 3:14:16 UTC - in response to Message 963992.  
Last modified: 17 Jan 2010, 3:14:58 UTC

No problems since then, so I'm placing it on record (for my own future reference) that BT's DNS servers 62.6.40.178 (p) and 194.72.9.38 (s) are working correctly as at the time of this posting. Since the router has been online for 32:26:41 (I also like the Vigor's nice clean management interface, and fast - 3 second - reboot time for this model), I think those are the DNS addresses I got at the power cycle.

... and these things are wicked-difficult to solve, because the information cached on your clients came from a lookup that may have been cached at BT, and probably depends on intermediate cached results at BT, such as the top-level servers for .EDU, and possibly berkeley.edu.

There are way too many cooks who could season the broth.

Something as simple as a lookup for setiboincdata.ssl.berkeley.edu can cascade out to 25 or 30 lookups easily.

Your router might cache, and it might just be a proxy.

Add to that a once every four-month type of interval, and it's a mess.

One of the common support calls we get is "my E-Mail doesn't work!" and one of my first questions is "can you surf the web?"

When they can't, I have them cycle power on their DSL/Cable Modem and router.

... and that fixes their E-Mail. :-)
ID: 964010 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 16 · Next

Message boards : Number crunching : Panic Mode On (28) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.