Message boards :
Number crunching :
Panic Mode On (79) Server Problems?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 22 · Next
Author | Message |
---|---|
bill Send message Joined: 16 Jun 99 Posts: 861 Credit: 29,352,955 RAC: 0 |
You're welcome MG. Now can begin the grim solution for the near future. Think Lifeboat. https://en.wikipedia.org/wiki/Lifeboat_%28film%29 |
KB7RZF Send message Joined: 15 Aug 99 Posts: 9549 Credit: 3,308,926 RAC: 2 |
I've only got 42 ghost work units on my new laptop, and none on my other slower machines. But I've stopped crunching for a little while here. I know it doesn't matter much, since most machines are set it-forget it setups, but I just hate when my computers can't get work. So working on other projects for now. Gotta keep those processors warm somehow. LOL |
Draconian Send message Joined: 16 Mar 03 Posts: 21 Credit: 1,809,058 RAC: 0 |
Proxy server working perfectly - data rate hitting the scheduler from individual hosts is too high. This 200 WU has only made it worse I would think - my system keeps asking for tasks about every 5 minutes. Thinking, why not open up the queue and have a mandatory backoff after wus are sent. After all - if you just filled your queue up with 3 or 4 days worth of data - you don't need to talk to the project for AT LEAST a day. Just anyway that the load can be taken off of the scheduler - not letting the systems ask for data all the time when they don't have to would help. I have 200 wu queue - I contact the project and it advises it has 2 units to report and is requesting more work....when...I still have 198 in my queue.... Maybe...set project to where it does not contact until half the queue remains (when the project is up and running well)? |
Ianab Send message Joined: 11 Jun 08 Posts: 732 Credit: 20,635,586 RAC: 5 |
Basically a victim of their own success.... With all the new high powered CUDA crunchers that have been coming online the amount of work in progress has become too much for the database to handle in a reliable way. This led to the recent timeout issues as the database gradually got more sluggish, then to ghost work units, which further compound the database issues. Downward spiral until eventually the database broke completely... I guess once it's fixed the short term fix will be to stop splitting for a while, clear the ghosts and get the work units in progress back to a number the database can handle. Then restrict the new work units to keep a sensible number in progress. So expect some ongoing issues over the short term. I see the plan is to make bigger work units, which should help a lot by making the database only 25% the size, but then that puts more pressure on the internet connection??? I assume there will be some gnashing of teeth, tearing of hair and threats to leave, like there usually is when problems occur. The other 99.9% of us will just sigh, select some other projects, and wait it out... Ian |
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
Personally I an happy that Eric has taken time to explain the problem. Whatever they decide to do is up to them. If it increases crunching time and lowers RAC so be it. I have always been here for the science. I have actually stopped most of my crunching here just leaving one machine to "fly the flag". I will not restart at SETI@Home, I will give them time to sort it out. As should we all. |
Draconian Send message Joined: 16 Mar 03 Posts: 21 Credit: 1,809,058 RAC: 0 |
Possibly - but - if the problem is what they stated it is - then my proxy server that I am using wouldn't make a difference at all. It doesn't change the timeout - it only changes the RATE of data that is hitting seti. When I use a proxy server - it's basically flawless. Turn off the proxy - and..once in a great while will I get to the scheduler. I think what we are seeing is another form of an old computer hack - flood the system with data and eventually it will do something wrong (used to be used to break into systems) - think the same thing is happening here. When I am on my proxy - the data flow to seti is moderated - after all, the proxy is sending who knows how much data to multiple places - and it works great. When not on the proxy - seti basically gets my full upload speed - sure - a small amount of data - but at full speed. An analogy - we used to have to interleave hard drives because the system wasn't fast enough to read data straight from the drive. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
[...]I see the plan is to make bigger work units, which should help a lot by making the database only 25% the size, but then that puts more pressure on the internet connection???[...] If they do it like they doubled MB a while back, the WUs will stay the same size, they just increase the FFT resolution and make you do four times more work on the same data file. That change in the resolution is just something that gets changed in the XML-type header for the WU itself, but as always, it requires testing to see if it is going to work like it is expected. There's always that possibility that by increasing the precision, you may end up with many more false positives than you would expect. It's kind of like looking at some of the satellite photos for things like the surface of another planet. It used to be something like 10-meter resolution, which meant that every pixel represented 100 square meters, and the particular color of that pixel was determined by what color was the most abundant in that 100 meter square. As technology increased, I believe we've gotten it down to less than 5 square meters per pixel, so you end up with a huge increase in detail, and now instead of "this 100 meter square is roughly brown and flat" you have "okay, so there's probably a tree there, and oh look, a huge boulder, and it turns out this tree is on the edge of a cliff" because you now have 20 pixels to describe what you only saw in one before. But increased detail can also be a burden, because it might be possible to end up with more signals in one WU, so -9 overflows may become much more likely, unless the limit gets increased for that, as well. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
[...]I see the plan is to make bigger work units, which should help a lot by making the database only 25% the size, but then that puts more pressure on the internet connection???[...] As I recall it was a few years ago when they last cranked up the dial on the resolution & I think it was stated that it was at the max to get any sort of useful data. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Qui-Gon Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 |
One of the red flags that a site has been hacked or spoofed is that you find grammatical or spelling errors that are out of the ordinary, and that make a message hard to read. Did anyone else notice the most recent message on the front page seems to have such errors? For example, "the lookup of result in process", "hosts being assigned large number or [of?] results to compute", and "The host. think it received", among others. These are not normal for the seti@home front page or any technical message one usually finds on the site. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
One of the red flags that a site has been hacked or spoofed is that you find grammatical or spelling errors that are out of the ordinary, and that make a message hard to read. Did anyone else notice the most recent message on the front page seems to have such errors? For example, "the lookup of result in process", "hosts being assigned large number or [of?] results to compute", and "The host. think it received", among others. These are not normal for the seti@home front page or any technical message one usually finds on the site. OR it could have simply been that Eric was tired on a Sunday when he composed that posting and was more concerned with getting the info out than worrying about being grammatically correct.......... "Freedom is just Chaos, with better lighting." Alan Dean Foster |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Belive in the kittyman... The kitties are allways right! |
Lint trap Send message Joined: 30 May 03 Posts: 871 Credit: 28,092,319 RAC: 0 |
I hope Eric et al don't forget to hit the "Turbo" button when they get everything running again....I just ordered upgrades from Newegg!! My current mobo/cpu is 5 yo, so yep, it's about time...:) Lt |
Qui-Gon Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 |
One of the red flags that a site has been hacked or spoofed is that you find grammatical or spelling errors that are out of the ordinary, and that make a message hard to read. Did anyone else notice the most recent message on the front page seems to have such errors? For example, "the lookup of result in process", "hosts being assigned large number or [of?] results to compute", and "The host. think it received", among others. These are not normal for the seti@home front page or any technical message one usually finds on the site. Sure, that could be, but they've had a long time to determine what this problem was, and a long enough time to correct the front page message. I don't recall any messages from Eric in the past that contained so many faults. I'm not saying that aliens have the team held in the basement of the server closet, forcing them to write messages that will throw us off the scent. I'm just commenting on the abnormality of the way this issue is being explained. If I have mistakes made yous maybe see them and wondering you are. |
musicplayer Send message Joined: 17 May 10 Posts: 2430 Credit: 926,046 RAC: 0 |
Oh, Eric is wearing glasses, isn't he? Be thankful that he is not the one who is biting you here. |
TPCBF Send message Joined: 18 May 99 Posts: 54 Credit: 4,594,980 RAC: 0 |
I doubt that this is a hint at a hacked site, rather than "normal" typos of a sysadmin trying to get some info out quickly (which is appreciated), possibly on a smartphone or otherwise touchscreen encumbered device...One of the red flags that a site has been hacked or spoofed is that you find grammatical or spelling errors that are out of the ordinary, and that make a message hard to read. Did anyone else notice the most recent message on the front page seems to have such errors? For example, "the lookup of result in process", "hosts being assigned large number or [of?] results to compute", and "The host. think it received", among others. These are not normal for the seti@home front page or any technical message one usually finds on the site. The "host. think" part is IMHO a clear indication, that happens to me when I am typing accidentally two blanks while walking or driving (as a (bus) passenger!). My Android phone (and I know iPhones/iPads do the same) interprets this as the "end of a sentence" and replaces those two spaces with a dot, and you just keep typing another space to move on... Ralf |
ivan Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223 |
The roads are rolling. |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30593 Credit: 53,134,872 RAC: 32 |
One of the red flags that a site has been hacked or spoofed is that you find grammatical or spelling errors that are out of the ordinary, and that make a message hard to read. Did anyone else notice the most recent message on the front page seems to have such errors? For example, "the lookup of result in process", "hosts being assigned large number or [of?] results to compute", and "The host. think it received", among others. These are not normal for the seti@home front page or any technical message one usually finds on the site. Of course it is a hack, we have an explanation message and we know the staff never does that, and we know Matt is away so it wasn't him. It is signed by Eric, but we know he never writes here. Ergo is must be a hack :) Eric, thanks for taking the time before the Greenbay game to work on it. |
rob smith Send message Joined: 7 Mar 03 Posts: 22149 Credit: 416,307,556 RAC: 380 |
After a short break the servers are getting back on their feet, but are still somewhat unstable - latest request was greeted thus: 19/11/2012 21:30:30 SETI@home Sending scheduler request: To fetch work. Not sure what's going on, but that doesn't look like a well patient to me.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
After a short break the servers are getting back on their feet, but are still somewhat unstable - latest request was greeted thus: One of my machines actually updated, reported, and downloaded new files. The other two say the same thing; 11/19/2012 4:36:36 PM | SETI@home | Sending scheduler request: To fetch work. 11/19/2012 4:36:36 PM | SETI@home | Reporting 60 completed tasks 11/19/2012 4:36:36 PM | SETI@home | Requesting new tasks for NVIDIA 11/19/2012 4:36:40 PM | | Project communication failed: attempting access to reference site 11/19/2012 4:36:40 PM | SETI@home | Scheduler request failed: Server returned nothing (no headers, no data) 11/19/2012 4:36:42 PM | | Internet access OK - project servers may be temporarily down... Another attempt; 19-Nov-2012 16:31:52 [SETI@home] Fetching scheduler list 19-Nov-2012 16:32:07 [SETI@home] Master file download succeeded 19-Nov-2012 16:32:12 [SETI@home] Sending scheduler request: Requested by user. 19-Nov-2012 16:32:12 [SETI@home] Reporting 60 completed tasks 19-Nov-2012 16:32:12 [SETI@home] Requesting new tasks for NVIDIA 19-Nov-2012 16:32:46 [SETI@home] Scheduler request failed: HTTP bad gateway 19-Nov-2012 16:32:52 [---] Using proxy info from GUI 19-Nov-2012 16:32:52 [---] Not using a proxy 19-Nov-2012 16:33:13 [SETI@home] update requested by user 19-Nov-2012 16:33:16 [SETI@home] Sending scheduler request: Requested by user. 19-Nov-2012 16:33:16 [SETI@home] Reporting 60 completed tasks 19-Nov-2012 16:33:16 [SETI@home] Requesting new tasks for NVIDIA 19-Nov-2012 16:33:18 [SETI@home] Scheduler request failed: Failure when receiving data from the peer 19-Nov-2012 16:33:39 [---] Exit requested by user... I'm also getting the notorious 'Timeout was reached'. I've restarted BOINC, tried a proxy, reinstalled BOINC...given up for now... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14644 Credit: 200,643,578 RAC: 874 |
I think for the time being, I'm treating it as what they always warn us about - a period of a few hours congestion after an outage. We've had 200,000 results returned in an hour, 1,400 queries a second, and we've still got 94 Mbit/sec. Assuming they leave the splitters turned off until all the current results ready to send have been allocated and downloaded (which I hope they do), we'll get a better idea how well the scheduler copes with 'report only'. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.