Panic Mode On (79) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 23 · Next
Author Message
bill
Send message
Joined: 16 Jun 99
Posts: 859
Credit: 23,005,575
RAC: 24,050
United States
Message 1307624 - Posted: 19 Nov 2012, 5:56:10 UTC - in response to Message 1307586.

You're welcome MG. Now can begin the grim solution
for the near future.

Think Lifeboat.

https://en.wikipedia.org/wiki/Lifeboat_%28film%29

KB7RZF
Volunteer tester
Avatar
Send message
Joined: 15 Aug 99
Posts: 9463
Credit: 3,109,881
RAC: 2,184
United States
Message 1307630 - Posted: 19 Nov 2012, 6:18:01 UTC

I've only got 42 ghost work units on my new laptop, and none on my other slower machines. But I've stopped crunching for a little while here. I know it doesn't matter much, since most machines are set it-forget it setups, but I just hate when my computers can't get work. So working on other projects for now. Gotta keep those processors warm somehow. LOL
____________

Draconian
Volunteer tester
Send message
Joined: 16 Mar 03
Posts: 21
Credit: 1,809,058
RAC: 0
United States
Message 1307662 - Posted: 19 Nov 2012, 8:15:48 UTC
Last modified: 19 Nov 2012, 8:35:07 UTC

Proxy server working perfectly - data rate hitting the scheduler from individual hosts is too high.
This 200 WU has only made it worse I would think - my system keeps asking for tasks about every 5 minutes. Thinking, why not open up the queue and have a mandatory backoff after wus are sent. After all - if you just filled your queue up with 3 or 4 days worth of data - you don't need to talk to the project for AT LEAST a day.
Just anyway that the load can be taken off of the scheduler - not letting the systems ask for data all the time when they don't have to would help.

I have 200 wu queue - I contact the project and it advises it has 2 units to report and is requesting more work....when...I still have 198 in my queue....
Maybe...set project to where it does not contact until half the queue remains (when the project is up and running well)?
____________

Ianab
Volunteer tester
Send message
Joined: 11 Jun 08
Posts: 659
Credit: 12,258,207
RAC: 10,970
New Zealand
Message 1307665 - Posted: 19 Nov 2012, 8:20:41 UTC

Basically a victim of their own success....

With all the new high powered CUDA crunchers that have been coming online the amount of work in progress has become too much for the database to handle in a reliable way. This led to the recent timeout issues as the database gradually got more sluggish, then to ghost work units, which further compound the database issues. Downward spiral until eventually the database broke completely...

I guess once it's fixed the short term fix will be to stop splitting for a while, clear the ghosts and get the work units in progress back to a number the database can handle. Then restrict the new work units to keep a sensible number in progress.

So expect some ongoing issues over the short term.

I see the plan is to make bigger work units, which should help a lot by making the database only 25% the size, but then that puts more pressure on the internet connection???

I assume there will be some gnashing of teeth, tearing of hair and threats to leave, like there usually is when problems occur. The other 99.9% of us will just sigh, select some other projects, and wait it out...

Ian

Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 6906
Credit: 25,762,185
RAC: 38,947
United Kingdom
Message 1307668 - Posted: 19 Nov 2012, 8:28:25 UTC
Last modified: 19 Nov 2012, 8:28:43 UTC

Personally I an happy that Eric has taken time to explain the problem. Whatever they decide to do is up to them. If it increases crunching time and lowers RAC so be it. I have always been here for the science.

I have actually stopped most of my crunching here just leaving one machine to "fly the flag". I will not restart at SETI@Home, I will give them time to sort it out. As should we all.
____________


Today is life, the only life we're sure of. Make the most of today.

Draconian
Volunteer tester
Send message
Joined: 16 Mar 03
Posts: 21
Credit: 1,809,058
RAC: 0
United States
Message 1307669 - Posted: 19 Nov 2012, 8:29:26 UTC - in response to Message 1307665.

Possibly - but - if the problem is what they stated it is - then my proxy server that I am using wouldn't make a difference at all. It doesn't change the timeout - it only changes the RATE of data that is hitting seti.
When I use a proxy server - it's basically flawless. Turn off the proxy - and..once in a great while will I get to the scheduler.

I think what we are seeing is another form of an old computer hack - flood the system with data and eventually it will do something wrong (used to be used to break into systems) - think the same thing is happening here.

When I am on my proxy - the data flow to seti is moderated - after all, the proxy is sending who knows how much data to multiple places - and it works great. When not on the proxy - seti basically gets my full upload speed - sure - a small amount of data - but at full speed.
An analogy - we used to have to interleave hard drives because the system wasn't fast enough to read data straight from the drive.

____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2246
Credit: 8,596,723
RAC: 4,305
United States
Message 1307745 - Posted: 19 Nov 2012, 16:20:54 UTC - in response to Message 1307665.

[...]I see the plan is to make bigger work units, which should help a lot by making the database only 25% the size, but then that puts more pressure on the internet connection???[...]


If they do it like they doubled MB a while back, the WUs will stay the same size, they just increase the FFT resolution and make you do four times more work on the same data file. That change in the resolution is just something that gets changed in the XML-type header for the WU itself, but as always, it requires testing to see if it is going to work like it is expected. There's always that possibility that by increasing the precision, you may end up with many more false positives than you would expect.

It's kind of like looking at some of the satellite photos for things like the surface of another planet. It used to be something like 10-meter resolution, which meant that every pixel represented 100 square meters, and the particular color of that pixel was determined by what color was the most abundant in that 100 meter square.

As technology increased, I believe we've gotten it down to less than 5 square meters per pixel, so you end up with a huge increase in detail, and now instead of "this 100 meter square is roughly brown and flat" you have "okay, so there's probably a tree there, and oh look, a huge boulder, and it turns out this tree is on the edge of a cliff" because you now have 20 pixels to describe what you only saw in one before.

But increased detail can also be a burden, because it might be possible to end up with more signals in one WU, so -9 overflows may become much more likely, unless the limit gets increased for that, as well.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4081
Credit: 111,713,670
RAC: 147,349
United States
Message 1307761 - Posted: 19 Nov 2012, 17:15:44 UTC - in response to Message 1307745.
Last modified: 19 Nov 2012, 17:21:36 UTC

[...]I see the plan is to make bigger work units, which should help a lot by making the database only 25% the size, but then that puts more pressure on the internet connection???[...]


If they do it like they doubled MB a while back, the WUs will stay the same size, they just increase the FFT resolution and make you do four times more work on the same data file. That change in the resolution is just something that gets changed in the XML-type header for the WU itself, but as always, it requires testing to see if it is going to work like it is expected. There's always that possibility that by increasing the precision, you may end up with many more false positives than you would expect.

It's kind of like looking at some of the satellite photos for things like the surface of another planet. It used to be something like 10-meter resolution, which meant that every pixel represented 100 square meters, and the particular color of that pixel was determined by what color was the most abundant in that 100 meter square.

As technology increased, I believe we've gotten it down to less than 5 square meters per pixel, so you end up with a huge increase in detail, and now instead of "this 100 meter square is roughly brown and flat" you have "okay, so there's probably a tree there, and oh look, a huge boulder, and it turns out this tree is on the edge of a cliff" because you now have 20 pixels to describe what you only saw in one before.

But increased detail can also be a burden, because it might be possible to end up with more signals in one WU, so -9 overflows may become much more likely, unless the limit gets increased for that, as well.


As I recall it was a few years ago when they last cranked up the dial on the resolution & I think it was stated that it was at the max to get any sort of useful data.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Profile Qui-Gon
Volunteer tester
Avatar
Send message
Joined: 15 May 99
Posts: 2910
Credit: 6,550,807
RAC: 1,768
United States
Message 1307765 - Posted: 19 Nov 2012, 17:32:30 UTC

One of the red flags that a site has been hacked or spoofed is that you find grammatical or spelling errors that are out of the ordinary, and that make a message hard to read. Did anyone else notice the most recent message on the front page seems to have such errors? For example, "the lookup of result in process", "hosts being assigned large number or [of?] results to compute", and "The host. think it received", among others. These are not normal for the seti@home front page or any technical message one usually finds on the site.

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38923
Credit: 578,732,382
RAC: 514,951
United States
Message 1307767 - Posted: 19 Nov 2012, 17:36:09 UTC - in response to Message 1307765.

One of the red flags that a site has been hacked or spoofed is that you find grammatical or spelling errors that are out of the ordinary, and that make a message hard to read. Did anyone else notice the most recent message on the front page seems to have such errors? For example, "the lookup of result in process", "hosts being assigned large number or [of?] results to compute", and "The host. think it received", among others. These are not normal for the seti@home front page or any technical message one usually finds on the site.

OR it could have simply been that Eric was tired on a Sunday when he composed that posting and was more concerned with getting the info out than worrying about being grammatically correct..........
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5229
Credit: 285,070,848
RAC: 453,674
Brazil
Message 1307770 - Posted: 19 Nov 2012, 17:39:58 UTC
Last modified: 19 Nov 2012, 17:40:26 UTC

Belive in the kittyman... The kitties are allways right!
____________

Profile Lint trapProject donor
Send message
Joined: 30 May 03
Posts: 859
Credit: 26,296,685
RAC: 14,107
United States
Message 1307771 - Posted: 19 Nov 2012, 17:41:03 UTC



I hope Eric et al don't forget to hit the "Turbo" button when they get everything running again....I just ordered upgrades from Newegg!!

My current mobo/cpu is 5 yo, so yep, it's about time...:)


Lt





Profile Chris SProject donor
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 31452
Credit: 12,182,623
RAC: 28,647
United Kingdom
Message 1307772 - Posted: 19 Nov 2012, 17:42:20 UTC

Exactly Mark. We all know Qui-Gon of old, about par for the course :-)

Profile Qui-Gon
Volunteer tester
Avatar
Send message
Joined: 15 May 99
Posts: 2910
Credit: 6,550,807
RAC: 1,768
United States
Message 1307776 - Posted: 19 Nov 2012, 17:49:12 UTC - in response to Message 1307767.

One of the red flags that a site has been hacked or spoofed is that you find grammatical or spelling errors that are out of the ordinary, and that make a message hard to read. Did anyone else notice the most recent message on the front page seems to have such errors? For example, "the lookup of result in process", "hosts being assigned large number or [of?] results to compute", and "The host. think it received", among others. These are not normal for the seti@home front page or any technical message one usually finds on the site.

OR it could have simply been that Eric was tired on a Sunday when he composed that posting and was more concerned with getting the info out than worrying about being grammatically correct..........

Sure, that could be, but they've had a long time to determine what this problem was, and a long enough time to correct the front page message. I don't recall any messages from Eric in the past that contained so many faults.

I'm not saying that aliens have the team held in the basement of the server closet, forcing them to write messages that will throw us off the scent. I'm just commenting on the abnormality of the way this issue is being explained. If I have mistakes made yous maybe see them and wondering you are.

musicplayer
Send message
Joined: 17 May 10
Posts: 1431
Credit: 687,186
RAC: 3
Message 1307784 - Posted: 19 Nov 2012, 17:56:18 UTC
Last modified: 19 Nov 2012, 17:59:33 UTC

Oh, Eric is wearing glasses, isn't he?

Be thankful that he is not the one who is biting you here.

TPCBF
Send message
Joined: 18 May 99
Posts: 50
Credit: 993,810
RAC: 1,751
United States
Message 1307812 - Posted: 19 Nov 2012, 18:41:29 UTC - in response to Message 1307767.

One of the red flags that a site has been hacked or spoofed is that you find grammatical or spelling errors that are out of the ordinary, and that make a message hard to read. Did anyone else notice the most recent message on the front page seems to have such errors? For example, "the lookup of result in process", "hosts being assigned large number or [of?] results to compute", and "The host. think it received", among others. These are not normal for the seti@home front page or any technical message one usually finds on the site.

OR it could have simply been that Eric was tired on a Sunday when he composed that posting and was more concerned with getting the info out than worrying about being grammatically correct..........
I doubt that this is a hint at a hacked site, rather than "normal" typos of a sysadmin trying to get some info out quickly (which is appreciated), possibly on a smartphone or otherwise touchscreen encumbered device...
The "host. think" part is IMHO a clear indication, that happens to me when I am typing accidentally two blanks while walking or driving (as a (bus) passenger!). My Android phone (and I know iPhones/iPads do the same) interprets this as the "end of a sentence" and replaces those two spaces with a dot, and you just keep typing another space to move on...

Ralf

Profile Chris SProject donor
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 31452
Credit: 12,182,623
RAC: 28,647
United Kingdom
Message 1307816 - Posted: 19 Nov 2012, 18:46:08 UTC

If I have mistakes made yous maybe see them and wondering you are.

Too much spare time methinks you have.

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 602
Credit: 135,558,118
RAC: 132,944
United Kingdom
Message 1307869 - Posted: 19 Nov 2012, 20:45:36 UTC

The roads are rolling.
____________

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12402
Credit: 6,712,440
RAC: 8,839
United States
Message 1307874 - Posted: 19 Nov 2012, 20:57:15 UTC - in response to Message 1307765.

One of the red flags that a site has been hacked or spoofed is that you find grammatical or spelling errors that are out of the ordinary, and that make a message hard to read. Did anyone else notice the most recent message on the front page seems to have such errors? For example, "the lookup of result in process", "hosts being assigned large number or [of?] results to compute", and "The host. think it received", among others. These are not normal for the seti@home front page or any technical message one usually finds on the site.

Of course it is a hack, we have an explanation message and we know the staff never does that, and we know Matt is away so it wasn't him. It is signed by Eric, but we know he never writes here. Ergo is must be a hack :)

Eric, thanks for taking the time before the Greenbay game to work on it.


____________

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8309
Credit: 55,258,648
RAC: 75,250
United Kingdom
Message 1307886 - Posted: 19 Nov 2012, 21:37:33 UTC

After a short break the servers are getting back on their feet, but are still somewhat unstable - latest request was greeted thus:

19/11/2012 21:30:30 SETI@home Sending scheduler request: To fetch work.
19/11/2012 21:30:30 SETI@home Reporting 142 completed tasks, requesting new tasks for GPU
19/11/2012 21:30:32 SETI@home Scheduler request failed: Server returned nothing (no headers, no data)



Not sure what's going on, but that doesn't look like a well patient to me....
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 23 · Next

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Copyright © 2014 University of California