Panic Mode On (77) Server Problems?

Message boards : Number crunching : Panic Mode On (77) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 22 · Next

AuthorMessage
Zapiao
Volunteer tester

Send message
Joined: 29 Oct 01
Posts: 110
Credit: 122,278
RAC: 0
Portugal
Message 1290453 - Posted: 2 Oct 2012, 22:57:33 UTC - in response to Message 1290446.  
Last modified: 2 Oct 2012, 22:58:21 UTC

Where i can find info how the data it s collected in Arecibo and how it s "given" to Seti? Someone spoke about tapes.....
By your command !!!
ID: 1290453 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1290454 - Posted: 2 Oct 2012, 23:00:54 UTC

well,
There is a hundred megabits of bandwidth going somewhere doing something,
One thing i do know about it is it aint comeing here an doing anything.
Looks like Einstein will be heating the house tonight :¬)
ID: 1290454 · Report as offensive
Zapiao
Volunteer tester

Send message
Joined: 29 Oct 01
Posts: 110
Credit: 122,278
RAC: 0
Portugal
Message 1290455 - Posted: 2 Oct 2012, 23:04:25 UTC - in response to Message 1290454.  

well,
There is a hundred megabits of bandwidth going somewhere doing something,
One thing i do know about it is it aint comeing here an doing anything.
Looks like Einstein will be heating the house tonight :¬)

If you werent english i would ask you to translate.......to english
By your command !!!
ID: 1290455 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1290460 - Posted: 2 Oct 2012, 23:16:51 UTC - in response to Message 1290455.  


Inbound network traffic is already dropping off, yet i'm having problems uploading work.
Whatever the upload issue has been after the last few outages, it appears to be back again.
Grant
Darwin NT
ID: 1290460 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1290464 - Posted: 2 Oct 2012, 23:24:16 UTC - in response to Message 1290460.  


Inbound network traffic is already dropping off, yet i'm having problems uploading work.
Whatever the upload issue has been after the last few outages, it appears to be back again.

Yes, things havn't come back right for the last 3 outages now so it seems that something is being overlooked somewhere.

Cheers.
ID: 1290464 · Report as offensive
chromespringer
Avatar

Send message
Joined: 3 Dec 05
Posts: 296
Credit: 55,183,482
RAC: 0
United States
Message 1290467 - Posted: 2 Oct 2012, 23:35:39 UTC

10/2/2012 5:29:03 PM | | Internet access OK - project servers may be temporarily down.
10/2/2012 5:29:33 PM | SETI@home | Temporarily failed upload of 26jl12ae.19088.3748.3.10.101_0_0: transient HTTP error
10/2/2012 5:29:33 PM | SETI@home | Backing off 2 min 47 sec on upload of 26jl12ae.19088.3748.3.10.101_0_0
10/2/2012 5:29:35 PM | | Project communication failed: attempting access to reference site
10/2/2012 5:29:36 PM | | Internet access OK - project servers may be temporarily down.
10/2/2012 5:29:36 PM | SETI@home | Computation for task 25jl12ab.23974.4975.7.10.239_0 finished
10/2/2012 5:29:38 PM | SETI@home | Started upload of 25jl12ab.23974.4975.7.10.239_0_0
10/2/2012 5:30:07 PM | SETI@home | Temporarily failed upload of 25jl12ab.23974.4975.7.10.239_0_0: transient HTTP error
10/2/2012 5:30:07 PM | SETI@home | Backing off 3 min 11 sec on upload of 25jl12ab.23974.4975.7.10.239_0_0
10/2/2012 5:30:11 PM | | Project communication failed: attempting access to reference site
10/2/2012 5:30:12 PM | | Internet access OK - project servers may be temporarily down.

nothing has changed here .. 0 work for cpu and 0 work for gpu on
machine xxxx033 :(
ID: 1290467 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1290468 - Posted: 2 Oct 2012, 23:35:47 UTC - in response to Message 1290464.  
Last modified: 2 Oct 2012, 23:36:49 UTC

Yes, things havn't come back right for the last 3 outages now so it seems that something is being overlooked somewhere.

Or all the shorties presently in the system have found yet another limitation.

We had, what- a couple of months? where there we barely any shorties at all, and still quite a few VLARs being sent out. Now we've got shorties, lots & lots of shorties. So for one of my cards, instead of doing 3 WUs every 15-20min it's now doing 3 every 4-5min. Over 3 times the throughput.
Could be the systems are hitting their limit of RAM, or they need more HDDs for more I/O.

Whatever it is, the problems- inability to upload & frequent "Project has no tasks available", "No tasks sent" or "Scheduler reached timeout" messages- all seem to have started around the time all the shorties started coming through en-mass.


EDIT- one system finally managed to upload enough WUs to request more- and now the Scheduler is just timing out on the request. By the time it can request more work again, there'll be too many uploads backed up for it to be able to do so.
Grant
Darwin NT
ID: 1290468 · Report as offensive
Profile Arvid Almstrom
Avatar

Send message
Joined: 23 Mar 00
Posts: 98
Credit: 137,331,372
RAC: 0
Australia
Message 1290470 - Posted: 2 Oct 2012, 23:40:03 UTC - in response to Message 1290366.  

What is the explanation behind a "shorties storm" ? They don't seem to originate from the same tapes? Yet almost anything being sent out are VHAR units. Is this a server problem? I'm curious to read some info on this ;) Works flows slowly at best right now, better than nothing at all of course.

Kind regards.

In simple terms: SETI gets its data for free, by taking its own copy of the data being recorded during the course of astronmonical observations at the Arecibo radio telescope.

Different groups of radio astronomers are allocated observing time on the telescope, according to an observatory schedule which can be searched online if you're really interested. Each separate group of observers has control of the telescope during their assigned time slot, and control its movement and observing patterns.

Some astronomers are interested in long, steady, deep-space observations of, near enough, point sources. The focal point of the telescope remains steady in relation to the sky - the recordings have a low 'angle range' between the beginning and end of the 109 seconds we study in each workunit. Those sessions create the 'VLAR' tasks when we get to crunch the recordings.

Other observing teams are more interested in fast surveys of large parts of the sky. They use the observatory's radio antenna in what is known as a 'basketweave' mode, with the telescope nodding from side to side while the earth turns under the sky. That leads to the high angle range tasks - we know them as 'shorties', because it's not worth doing such intense analysis when potential signal sources remain in the field of view for such a short time.

And in between, there are observations - or even recordings taken during telescope maintenance - where the antenna is not being actively steered at all, but simply receiving whatever happens to be coming from the sky patch directly overhead as the earth turns. That gives us the normal, mid-AR tasks which form our staple diet.


Hi Richard,

Thank you for this information. I have been crunching SETI work for 12 1/5 years and I did not know this fact. I thought it had to do with the angle that the antenna was pointing at, relative to the sky. and that this has some issues with not picking up the same amount of information and was degraded for some reason or other.

Many thanks for this info,

Arvid
Arvid Almstrom
ID: 1290470 · Report as offensive
Profile Arvid Almstrom
Avatar

Send message
Joined: 23 Mar 00
Posts: 98
Credit: 137,331,372
RAC: 0
Australia
Message 1290475 - Posted: 2 Oct 2012, 23:50:32 UTC - in response to Message 1290470.  

I have just had a record of WU's being processed on my computer, 6 out of 8 slots were processing. Unfortunately, it only lasted about 1 1/2 minute and I am now back to 0 of 8.

I think the uploading / reporting and general slowness is related to the AP Splitters running and that this or their WU's somehow causes the project to take a major dive.

I have not had WU's for my main computer for many days and now I have received a few tasks but I cannot report them or my status to ask for more work.

I hope something gets discovered soon, and I hope that the staff are aware of the problems we are having with communication to the SETI project as a whole.

Arvid
Arvid Almstrom
ID: 1290475 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1290491 - Posted: 3 Oct 2012, 0:21:31 UTC
Last modified: 3 Oct 2012, 0:24:03 UTC

Intrestingly Matt has posted in the Technical News thread http://setiathome.berkeley.edu/forum_thread.php?id=69594 and seems to be totally unaware that there have been any problems at all!!

In fact quite the reverse!!

The download servers have been trading off for a bit - we are now currently settled on using vader and georgem as the download server pair. As well, I just moved from apache to nginx on those servers. I think it's working well, but if any of you notice weird behavior let me know!

ID: 1290491 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1290511 - Posted: 3 Oct 2012, 1:38:09 UTC - in response to Message 1290491.  
Last modified: 3 Oct 2012, 1:41:16 UTC

Things are certainly borked upload wise- prior to the outage & since whatever was done on Sunday there had been a steady tream of work being returned- 100,000 results per hour.
Apart from a very brief burst it has dropped down to 54,000/hr & is still declining. Not because there isn't work to be returned, but becasue it can't be returned.


And on those very rare occasions where i can ask for work, the usual response from the Scheduler at the moment is to not give any, or just timeout.


EDIT- with caches continuing to shrink, if the upload backlog ever fully clears, the Scheduler is going to be hammered even more than it is now. And it's not coping now.
And this is happening even with new AP work disabled & no work available to go out. When AP starts up again it's going to be even worse.
Grant
Darwin NT
ID: 1290511 · Report as offensive
Profile Akio
Avatar

Send message
Joined: 18 May 11
Posts: 375
Credit: 32,129,242
RAC: 0
United States
Message 1290520 - Posted: 3 Oct 2012, 2:07:50 UTC

Sounds like we're in for one heck of a ride. Shorties galore over here, but uploads and downloads are hit an miss still.
ID: 1290520 · Report as offensive
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 1290530 - Posted: 3 Oct 2012, 2:45:08 UTC - in response to Message 1290475.  

I have just had a record of WU's being processed on my computer, 6 out of 8 slots were processing. Unfortunately, it only lasted about 1 1/2 minute and I am now back to 0 of 8.

I think the uploading / reporting and general slowness is related to the AP Splitters running and that this or their WU's somehow causes the project to take a major dive.

I have not had WU's for my main computer for many days and now I have received a few tasks but I cannot report them or my status to ask for more work.

I hope something gets discovered soon, and I hope that the staff are aware of the problems we are having with communication to the SETI project as a whole.

Arvid


Would it be practical to have the AP Splitter program run as a different type of workunit?

ID: 1290530 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1290532 - Posted: 3 Oct 2012, 2:45:20 UTC - in response to Message 1290520.  


Once again, i'm processing work faster than it can be uploaded. Even with never ending clicks on the re-try button.
Grant
Darwin NT
ID: 1290532 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1290533 - Posted: 3 Oct 2012, 2:46:33 UTC - in response to Message 1290532.  


Once again, i'm processing work faster than it can be uploaded. Even with never ending clicks on the re-try button.

Not sure i understand the quesiont- AP WUs are a different type of WU, that's why they are processed by a different programme to the MultiBeam WUs.
Grant
Darwin NT
ID: 1290533 · Report as offensive
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 1290537 - Posted: 3 Oct 2012, 2:54:29 UTC - in response to Message 1290455.  

well,
There is a hundred megabits of bandwidth going somewhere doing something,
One thing i do know about it is it aint comeing here an doing anything.
Looks like Einstein will be heating the house tonight :¬)

If you werent english i would ask you to translate.......to english


I'd expect clive to mean that Einstein@Home workunits will be running tonight, and the heat from that will be warming the house.
ID: 1290537 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1290550 - Posted: 3 Oct 2012, 4:04:11 UTC - in response to Message 1290511.  

Things are certainly borked upload wise- prior to the outage & since whatever was done on Sunday there had been a steady tream of work being returned- 100,000 results per hour.
Apart from a very brief burst it has dropped down to 54,000/hr & is still declining.

It's now down to less than 45,000/hr, and i notice that the splitters are unable to get much above 25/s & so the ready to send buffer has actually dropped down to 1,700, and continues to fall.
So even if people eventually do upload all of their completed work, and a Scheduler request finally does go through- there won't be any work left to allocate.

Something is very, very wrong.

Grant
Darwin NT
ID: 1290550 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1290567 - Posted: 3 Oct 2012, 4:56:39 UTC - in response to Message 1290491.  

Intrestingly Matt has posted in the Technical News thread http://setiathome.berkeley.edu/forum_thread.php?id=69594 and seems to be totally unaware that there have been any problems at all!!

In fact quite the reverse!!



Yes, and this is very curious. Very, very curious. Unless Matt has isolated himself completely, he must be aware that the project is not functioning correctly and hasn't for some time now.

I thought about offering to send him a "cruncher" to use at home (something basic with a GPU) so he could have a user experience, first hand. But then I supposed if he has no trusted confidant here who can get his attention, a cruncher on the shelf at home probably would have the same success wrestling him away from whatever else he is doing.

Obviously something is wrong, yet he's asked for reports of anomalies.

In the BOOK, Jurassic Park, Crichton wrote-in a reason nobody on the island was aware of the number of loose and rampaging dinosaurs. Being unaware that they could breed, the monitoring system "listened-for" 6 of this, 8 of that, 3 of the other. Once that number was accounted-for, the system quit counting. (it stopped looking, so never picked-up on the fact that there were 18 of this, 12 of that, and 46 of the other)

I wonder if Matt (et al) aren't falling victim to the same sort of thing: The 100Mb pipe is full and there are uploads, downloads, and results created and received; therefore everything must be working as well as it can (or some inadequately-deductive equivalent of that).

It is exactly as you say: "...and seems to be totally unaware..."

Our usual grousing about the servers' speed and inadequate bandwidth may have made him deaf to our complaints.

I think he has dozens, if not hundreds, of people who would be thrilled by an opportunity to make him acutely aware of what is happening, if we only had the means.
ID: 1290567 · Report as offensive
Profile S@NL Etienne Dokkum
Volunteer tester
Avatar

Send message
Joined: 11 Jun 99
Posts: 212
Credit: 43,822,095
RAC: 0
Netherlands
Message 1290578 - Posted: 3 Oct 2012, 5:13:13 UTC

As it looks the airco in the Space Science lab seems to have quit and therefore the servers who are not in the "closet" have been disabled :

http://setiathome.berkeley.edu/forum_thread.php?id=69594

So for the duration of today (it's night in California now) we all have to sit this one out and hope for the best...
ID: 1290578 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1290587 - Posted: 3 Oct 2012, 5:42:34 UTC - in response to Message 1290578.  

As it looks the airco in the Space Science lab seems to have quit and therefore the servers who are not in the "closet" have been disabled :

And even with no AP going out, the system is tied up in knots.

Grant
Darwin NT
ID: 1290587 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (77) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.