Message boards :
Number crunching :
it's the AP Splitter processes killing the Scheduler
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
rob smith Send message Joined: 7 Mar 03 Posts: 22202 Credit: 416,307,556 RAC: 380 |
The fact that a proxy connection works while direct connection doesn't suggests to me that there is a routing problem between the user and the lab, not a problem within the lab. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
The fact that a proxy connection works while direct connection doesn't suggests to me that there is a routing problem between the user and the lab, not a problem within the lab. It suggest there is something odd somewhere- it's been that way for years. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Right you are, sir. Yep, although they still take a while to finally some through. It's all sorts of wierd. Grant Darwin NT |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
The fact that a proxy connection works while direct connection doesn't suggests to me that there is a routing problem between the user and the lab, not a problem within the lab. I'm asking a question, not arguing. I don't understand something. I don't understand how it ever works if there is no cause for it to fail that begins in the lab. Since this stuff tends to show its ugly head when we return from a Tuesday time-out, I've always assumed that "something changes in the lab." Where else might it be happening? Let's assume that I'm really sort-of stupid and just don't know nothin' about nothin'. It shore looks to this dumb-dumb like turnin' on the AP Splitter has shore 'nuf flung boogers all over our connections to that-there faincy Schedule thingy. Now, if'n it flung 'em 15 miles down the road and clogged up somebody's ear hole way out yonder... ...well, I don't quite get how that happens. I don't deny that it seems to be happening, I just don't know what the mechanics of that might look-like. |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
The fact that a proxy connection works while direct connection doesn't suggests to me that there is a routing problem between the user and the lab, not a problem within the lab. OK, so just looking at this one step, as you defined it above. Why do we ever not-need a proxy for it to work? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
I don't deny that it seems to be happening, I just don't know what the mechanics of that might look-like. Same here. And the fact that it started happening right after a weekly outage, 3 or 4 weeks ago. Hence the suspicions relating to server configuration. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Why do we ever not-need a proxy for it to work? That's what makes it so screwy. Using a proxy, even when everything is running well, has always (well, at least for the last couple of years) resulted in faster downloads & uploads. The problem with using a proxy is they frequently go AWOL after a few days & then you have to find another one. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Now, just to make life more interesting, i'm now getting "HTTP internal server error" in response to some of the Scheduler requests, while using the proxy. Grant Darwin NT |
Jim Bohan Send message Joined: 23 Dec 01 Posts: 58 Credit: 65,355,247 RAC: 6 |
Hi, The last couple of days my systems seemed to work fine. I have two fairly high end AMD processors, one a 4 core the other a 6 core and a little Intel laptop I3. I was receiving and sending WU's with no problem. Today I have tried to do an Update 6 times with or without the NNT thing and it still will not send the work. I keep getting the error of the project server being down.I have over 70 WU's on one maching, 12 on another and the laptop about 6 that won't update. What the heck is going on? Is there something I can do to help fix this in my configuration? Perplexed, << Jim >> Member B-52 Stratofortress Association Retired Air Force |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
... OK, I think the statute of limitations has run out on this one - let's let the cat out of the bag. Eric told me that David had seen the problems starting to build up, late in the evening of Saturday 3 November. In response, he deliberately turned off 'resend lost results', thinking this would reduce the load on Synergy and allow it to function normally again. Turned out slightly differently.... I think that just shows that programmers and sysops are different animals: you shouldn't expect either to be able to do the other's job. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
... Did they agree to test your theory of "missing ACKs"? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
... No, I haven't pitched it to them yet (unless anyone from the lab is reading this thread). Also, remember I posted at about 5:30 pm their time, when they will have been shutting up the lab at the end of the working day: it's now about 2:30 am for them, which is a time of day (OK, night) when I would not advocate making experimental server configuration changes. I think I'd want to make further tests (perhaps including via a proxy), and review in daylight the logs I captured last night, before making a total fool of myself in the eyes of the lab. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
No, I haven't pitched it to them yet (unless anyone from the lab is reading this thread). Also, remember I posted at about 5:30 pm their time, when they will have been shutting up the lab at the end of the working day: it's now about 2:30 am for them, which is a time of day (OK, night) when I would not advocate making experimental server configuration changes. You will never be a fool if you try to help, your theory was the first one i see that realy explain everything, hope you could test it soon and help us all to leave this dark days behind. I realy don´t belive the Proxy Adm will allow us to use it for a while. |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
Hey - here's a new one for you to contemplate: I just ran the Ghost Detector on my no-ghost machine (Fermibox2) and it said "Hmm, Server indicates less WU 'In Progress' than client_state.xml thinks you have on board. Aborted" Now what does THAT mean? Any ideas? How do I fix this? Or: does it need fixing? EDIT: Guess it was some sort of transient problem - after an Update (61 WUs), I tried GD again, and it was happy. so, in the immortal word of Roseanne Rosannadanna, "Nevermind" |
Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0 |
In response, he deliberately turned off 'resend lost results', thinking this would reduce the load on Synergy and allow it to function normally again. Turned out slightly differently.... Well, at least it was something easy to fix and not some obscure bug in the scheduller for which nobody has time to fix... |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
In response, he deliberately turned off 'resend lost results', thinking this would reduce the load on Synergy and allow it to function normally again. Turned out slightly differently.... No, scheduler Bugs get fixed quickly by David if someone submits the Bug in the first place, I submitted a Scheduler Bug on the 6th, It was fixed on the 7th, and it had further changes on the 8th, Now getting it onto the project can be slow, especially if people are away in China, or touring the world playing Music, and the ones still here are snowed in under an avalanche of other problems, Claggy |
Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0 |
Now getting it onto the project can be slow, especially if people are away in China, or touring the world playing Music, and the ones still here are snowed in under an avalanche of other problems, Which means that in practice the bug is still not fixed, because nobody has time to do it... ;D |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
Eric told me that David had seen <snip> Thank you, thank you, thank you. Just knowing communication is happening gives me some hope. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Now getting it onto the project can be slow, especially if people are away in China, or touring the world playing Music, and the ones still here are snowed in under an avalanche of other problems, It's a minor bug fix and doesn't really need to be deployed immediately, we now know why there was a huge increase in ghosts, and it wasn't because of this bug, Claggy |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Now getting it onto the project can be slow, especially if people are away in China, or touring the world playing Music, and the ones still here are snowed in under an avalanche of other problems, Now everything is explained... Just don´t understand what culd be more important to keep the project working fine? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.