Gasping for Air (May 14 2007)

Message boards : Technical News : Gasping for Air (May 14 2007)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

AuthorMessage
MidnightDevil

Send message
Joined: 25 Jan 04
Posts: 1
Credit: 56,584
RAC: 0
Portugal
Message 567589 - Posted: 15 May 2007, 2:50:46 UTC - in response to Message 567571.  

15/05/2007 03:47:56||Project communication failed: attempting access to reference site
15/05/2007 03:47:56|SETI@home|[file_xfer] Temporarily failed download of 16fe05ab.10775.21937.36058.3.100: system connect
15/05/2007 03:47:56|SETI@home|Backing off 1 hr 25 min 16 sec on download of file 16fe05ab.10775.21937.36058.3.100
15/05/2007 03:47:59||Access to reference site succeeded - project servers may be temporarily down.


is it working or not ? :D
ID: 567589 · Report as offensive
Profile gypsy
Avatar

Send message
Joined: 25 Jan 03
Posts: 5
Credit: 148,584
RAC: 0
United States
Message 567593 - Posted: 15 May 2007, 2:55:21 UTC - in response to Message 567413.  

Sorry hope this is the right place to post my question. When will I be able to get some work? Do I need to be patient, please let me know because all I get is that the server is temporarily down. I'm not a techie so please forgive me but I know you have had problems quess I just want to get this computer to do something worth while. Thanks

gypsy
ID: 567593 · Report as offensive
Profile Ragnarock_83835
Avatar

Send message
Joined: 18 Feb 06
Posts: 16
Credit: 1,053,610
RAC: 0
United States
Message 567597 - Posted: 15 May 2007, 2:58:03 UTC - in response to Message 567448.  

2007-05-15 00:49:43 [SETI@home] [file_xfer] Started upload of file 29ja04ab.27703.4690.467332.3.113_1_0
2007-05-15 00:49:43 [SETI@home] [file_xfer] Started upload of file 29ja04ab.27703.6272.1003390.3.57_2_0
2007-05-15 00:49:46 [---] Project communication failed: attempting access to reference site
2007-05-15 00:49:46 [SETI@home] [file_xfer] Temporarily failed upload of 29ja04ab.27703.4690.467332.3.113_1_0: system connect
2007-05-15 00:49:46 [SETI@home] Backing off 8 min 30 sec on upload of file 29ja04ab.27703.4690.467332.3.113_1_0
2007-05-15 00:49:46 [SETI@home] [file_xfer] Temporarily failed upload of 29ja04ab.27703.6272.1003390.3.57_2_0: system connect
2007-05-15 00:49:46 [SETI@home] Backing off 28 min 36 sec on upload of file 29ja04ab.27703.6272.1003390.3.57_2_0
2007-05-15 00:49:46 [SETI@home] [file_xfer] Started upload of file 29se04ab.22429.31842.317338.3.64_0_0
2007-05-15 00:49:46 [SETI@home] [file_xfer] Started upload of file 29se04ab.22429.31842.317338.3.62_0_0
2007-05-15 00:49:47 [---] Access to reference site succeeded - project servers may be temporarily down.

Is this a part of some of the recovery problems or what? It seems as if BOINC is able to access the servers, but none is responding or receiving?

Johan


The answer was in the very first post....

copied and pasted
"Bruno is dropping lots of packets right now, resulting in all kinds of upload/download snags and showing up as "disabled" on the server status page. This should clear up over time."

ID: 567597 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 567600 - Posted: 15 May 2007, 3:04:19 UTC

I'm wondering if we should celebrate the arrival of new hardware and all the work being done in server swapomania by NOT 'unsuspending' our project use of SETI. It seems quite clear that the new (and still changing) configuration needs some air to recover. Downloads are currently a bit hit and miss, and uploads, well, uploads just ain't happening right now.

So perhaps by Wednesday (if there is no additional outage for backup tomorrow) or Friday would be a time to look at the project as being truly online ready.


ID: 567600 · Report as offensive
BigBrother

Send message
Joined: 27 Jul 99
Posts: 18
Credit: 3,769,057
RAC: 24
Sweden
Message 567602 - Posted: 15 May 2007, 3:08:50 UTC - in response to Message 567597.  

The answer was in the very first post....

copied and pasted
"Bruno is dropping lots of packets right now, resulting in all kinds of upload/download snags and showing up as "disabled" on the server status page. This should clear up over time."
[/quote]


That's what I suspected, but no improvement in at least 12 hours...? Shurely there must be some more glitches than a deceased drive and strained connctions, as it in the same post was said that "bruno has been taking up most of the slack". It doesn't seem as if Bruno is doing much of that to me.

Johan
ID: 567602 · Report as offensive
Josh

Send message
Joined: 28 Jun 99
Posts: 6
Credit: 2,272,265
RAC: 0
Mexico
Message 567610 - Posted: 15 May 2007, 3:26:52 UTC - in response to Message 567593.  

Sorry hope this is the right place to post my question. When will I be able to get some work? Do I need to be patient, please let me know because all I get is that the server is temporarily down. I'm not a techie so please forgive me but I know you have had problems quess I just want to get this computer to do something worth while. Thanks

gypsy


ok: try this:
you hit the tab in boinc that says "advanced" and then "retry communications".

Look at the tab where is says "messages" and you will see what your computer is trying to do.

If this does not work the go to "proyects" tab and hit "update" and go back to "messages" to see what your machine is doing.

In any case you will not get a working until you get some wu into your machine and the results will not reflect on the different stats until you "send" all your results.

Hope this helps and do not worry too much... me, I just got some work for the Beagle and no results from Santa Maria, Pinta is "dead in the water" since some weeks ago, anyway.

Be patient and cool!

Josh.
ID: 567610 · Report as offensive
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8962
Credit: 12,678,685
RAC: 0
United States
Message 567616 - Posted: 15 May 2007, 3:36:33 UTC

Woohoo! One of my machines just uploaded a WU!

Just one but it's a start!!

Be patient friends...all is not lost.


ID: 567616 · Report as offensive
Profile marsinph
Volunteer tester

Send message
Joined: 7 Apr 01
Posts: 172
Credit: 23,823,824
RAC: 0
Belgium
Message 567635 - Posted: 15 May 2007, 4:15:58 UTC - in response to Message 567616.  

Woohoo! One of my machines just uploaded a WU!

Just one but it's a start!!

Be patient friends...all is not lost.


You are a happy !!!
I let run Boinc by it self, but here the result :
14/05/2007 20:47:17|SETI@home|[file_xfer] Started upload of file 04mr05ab.8726.12370.517322.3.66_2_0
14/05/2007 20:47:39||Project communication failed: attempting access to reference site
14/05/2007 20:47:39|SETI@home|[file_xfer] Temporarily failed upload of 04mr05ab.8726.12370.517322.3.66_2_0: system connect
14/05/2007 20:47:39|SETI@home|Backing off 1 min 0 sec on upload of file 04mr05ab.8726.12370.517322.3.66_2_0
14/05/2007 20:47:40||Access to reference site succeeded - project servers may be temporarily down.
.......
and allways the same till
.......
15/05/2007 6:13:11|SETI@home|[file_xfer] Started upload of file 04mr05ab.8726.12370.517322.3.66_2_0
15/05/2007 6:13:33||Project communication failed: attempting access to reference site
15/05/2007 6:13:33|SETI@home|[file_xfer] Temporarily failed upload of 04mr05ab.8726.12370.517322.3.66_2_0: system connect
15/05/2007 6:13:33|SETI@home|Backing off 2 hr 5 min 39 sec on upload of file 04mr05ab.8726.12370.517322.3.66_2_0
15/05/2007 6:13:35||Access to reference site succeeded - project servers may be temporarily down.

It is the same on all my computers. No any WU downloaded or uploaded.


ID: 567635 · Report as offensive
Profile Ragnarock_83835
Avatar

Send message
Joined: 18 Feb 06
Posts: 16
Credit: 1,053,610
RAC: 0
United States
Message 567645 - Posted: 15 May 2007, 4:32:47 UTC - in response to Message 567602.  
Last modified: 15 May 2007, 5:00:10 UTC

The answer was in the very first post....

copied and pasted
"Bruno is dropping lots of packets right now, resulting in all kinds of upload/download snags and showing up as "disabled" on the server status page. This should clear up over time."



That's what I suspected, but no improvement in at least 12 hours...? Shurely there must be some more glitches than a deceased drive and strained connctions, as it in the same post was said that "bruno has been taking up most of the slack". It doesn't seem as if Bruno is doing much of that to me.

Johan


Ok, but you must remember, they have been down so long, that everyone has completed their work units, so now they have 1,444,386 computers trying to contact their server at once, so the connection is going to be very very poor until things start to idle down.....
Also some people still have their cache set for 5-10 days worth of work, so the server is actually going to need to make 10-30 million WU's untill it is actually caught up.....thats a staggering number "work load" for any server....especially considering the bandwidth needed to pump those WU's out.....this is also compounded by the fact that the few WU's being completed now are needing to be sent back to SETI......
So the wait could be longer than a 12 hour period, realistically, about a few days if not more....
ID: 567645 · Report as offensive
Crazy Phoenix

Send message
Joined: 25 Jul 01
Posts: 2
Credit: 1,056,518
RAC: 0
Belgium
Message 567686 - Posted: 15 May 2007, 6:09:48 UTC - in response to Message 567645.  
Last modified: 15 May 2007, 6:10:46 UTC


Ok, but you must remember, they have been down so long, that everyone has completed their work units, so now they have 1,444,386 computers trying to contact their server at once, so the connection is going to be very very poor until things start to idle down.....
Also some people still have their cache set for 5-10 days worth of work, so the server is actually going to need to make 10-30 million WU's untill it is actually caught up.....thats a staggering number "work load" for any server....especially considering the bandwidth needed to pump those WU's out.....this is also compounded by the fact that the few WU's being completed now are needing to be sent back to SETI......
So the wait could be longer than a 12 hour period, realistically, about a few days if not more....


In my case, I have three computer running. I who is farr so I just known that 14 WU where assigned, but the two others got some of them. The faster one set to 5 days between communication has downloaded 13 WU here of 5 are finnished, and the other one set on three days has donloaded 9 wu where 6 are completed.

A question is what will happen to the wu's uploaded just before the problem or the one's who where reported has timeout during the outage period? for example this wu: http://setiathome.berkeley.edu/workunit.php?wuid=126729761 is reported as "No reply"
Just to know.

Thanks for the job and the update,
Good work Matt (And the others who have helped solving the issue ;-) ).
ID: 567686 · Report as offensive
lanjoe9

Send message
Joined: 11 Apr 00
Posts: 14
Credit: 61,081
RAC: 1
Mexico
Message 567695 - Posted: 15 May 2007, 6:30:07 UTC

W00t!!
Great news!!
Keep up the good work and thanks for all the trouble you people have taken to make this project continue!
ID: 567695 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 567710 - Posted: 15 May 2007, 7:11:55 UTC - in response to Message 567492.  

Say,,,,Is that Kryten and Koloth I see listed on Ebay???

LOL

Two) slightly used servers


Yeah, only used by a little old lady to download recipes on Sundays.

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 567710 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 22 Apr 04
Posts: 758
Credit: 27,771,894
RAC: 0
United States
Message 567720 - Posted: 15 May 2007, 7:35:35 UTC - in response to Message 567708.  
Last modified: 15 May 2007, 7:36:43 UTC

So far It been almost 12 hours and I think I've restarted the PC once for M$ and I've uploaded a WU and reported a grand total of 1 time each, Something is Broken.

I'm beginning to agree.

1) The network bits/second should not be declining while my ability to download is still effectively nothing.

2) The network bits/second should not be declining while my ability to upload is nothing.
Dublin, California
Team: SETI.USA
ID: 567720 · Report as offensive
Profile JerWA

Send message
Joined: 3 Apr 99
Posts: 13
Credit: 4,262,442
RAC: 0
United States
Message 567728 - Posted: 15 May 2007, 7:44:48 UTC - in response to Message 567720.  
Last modified: 15 May 2007, 7:47:07 UTC

So far It been almost 12 hours and I think I've restarted the PC once for M$ and I've uploaded a WU and reported a grand total of 1 time each, Something is Broken.

I'm beginning to agree.

1) The network bits/second should not be declining while my ability to download is still effectively nothing.

2) The network bits/second should not be declining while my ability to upload is nothing.

If you look at the graphs you should notice something "out of spec" right away, in that outbound bytes are spiking WAY out of average, and sometimes higher than receiving. I think, as mentioned in the first post, the problem is simply flooding the network connection on their side. The BOINC client has never been very happy about unstable connections, and I think that's what we're dealing with at the moment.

I'd say it should sort itself out over time, except that with over 600,000 users and a week of backlogged workunits, we (users) may be generating enough traffic to keep it in a constant state of flux until/unless Berkley does something on their side to throttle it down to something the servers are happy with.

Edit: Hey Batman, don't suppose you could be convinced to move those logs someplace else so that it's a link rather than something we have to scroll past? Just a thought.
ID: 567728 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 567762 - Posted: 15 May 2007, 8:30:00 UTC - in response to Message 567708.  

So far It been almost 12 hours and I think I've restarted the PC once for M$ and I've uploaded a WU and reported a grand total of 1 time each,
....

Just a bit of the log would have been sufficient.
Posting all that tends to just annoy & irritate others, and people that are annoyed & irritated tend to be not so helpfull to those causing the annoyance & irritation.


Something is Broken.

As Matt mentioned in his opening post Bruno is stressing under the load & dropping connections here there & everywhere, which of course exacerbates the whole situation. This is in addition to it's tendancy to do odd things at random & a reboot during tomorrows outage should help settle it down, which should help reduce the load on it much faster. Which should help to keep it happy.
Grant
Darwin NT
ID: 567762 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 567764 - Posted: 15 May 2007, 8:33:32 UTC - in response to Message 567728.  

If you look at the graphs you should notice something "out of spec" right away, in that outbound bytes are spiking WAY out of average, and sometimes higher than receiving.

What looks really wierd to me is it's jagged appearance. Normally the load maxes out for a while & then drops away- but that graph shows lots of short frequent periods of no data throughput, and even as the load is slowly dropping those anomalies are still occuring.

Grant
Darwin NT
ID: 567764 · Report as offensive
Conrad Human
Volunteer tester

Send message
Joined: 17 Nov 00
Posts: 67
Credit: 2,009,224
RAC: 0
South Africa
Message 567770 - Posted: 15 May 2007, 8:44:58 UTC

mayby it is becouse there is 1 400 000 + workunits asigned over the weekend
and everyone is trying to download them to download this at the max speed of 60 mbps wil take +- 1.5 days

i think it is getting better
ID: 567770 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 22 Apr 04
Posts: 758
Credit: 27,771,894
RAC: 0
United States
Message 567778 - Posted: 15 May 2007, 8:53:03 UTC - in response to Message 567770.  

i think it is getting better

How is throughput going down "getting better"?
Dublin, California
Team: SETI.USA
ID: 567778 · Report as offensive
TarracoServer
Volunteer tester

Send message
Joined: 11 Apr 07
Posts: 38
Credit: 595,022
RAC: 0
Spain
Message 567785 - Posted: 15 May 2007, 9:01:18 UTC


...
The server situation will be in major flux, and generally in a positive direction, over the next week or so. I'll be trying to keep updating the server status page, but I make no guarantees about its accuracy.

Thanks again for your patience during the past couple of weeks. While I appreciate the kind words and sentiments I should point out that this past weekend for me wasn't exactly restful time off. I was working at
my other job.

- Matt


Yeah, Matt. This was a crazy weekend! Thx for all the info. We'll try to be patient while reconfiguring thumper and the other servers ;)

About the spikes on the graph:

  • Matt mentions that yesterday was cutting some wires on the rack: The deadline was 'cause the servers was unplugged from the net
  • Bruno chocked: No data was transferred to Boinc clients
  • Bruno is on again, but overloaded: Boinc clients will take the work slowly
  • 'Cause this overload, bruno can appear as disabled on the status page



So, we must be patient ;) Everything is on, but there're many work on queue.


ID: 567785 · Report as offensive
Conrad Human
Volunteer tester

Send message
Joined: 17 Nov 00
Posts: 67
Credit: 2,009,224
RAC: 0
South Africa
Message 567788 - Posted: 15 May 2007, 9:04:21 UTC
Last modified: 15 May 2007, 9:08:03 UTC

I think the http errors are getting less

Think of it this way you are trying to copy a 100 files in 100 copy instances it takes ages but as soon as some of them finish copy speed increased alot

there must stil be alot of unsend units out there but they can only get les
bruno si still under stress it shows "disabled"


ID: 567788 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

Message boards : Technical News : Gasping for Air (May 14 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.