Guess what's wrong with uploading...

Message boards : Number crunching : Guess what's wrong with uploading...
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Iztok s52d (and friends)

Send message
Joined: 12 Jan 01
Posts: 136
Credit: 393,469,375
RAC: 116
Slovenia
Message 138274 - Posted: 18 Jul 2005, 15:34:58 UTC - in response to Message 138264.  

To me, it looks like the folks that have modified their seti client software, to retry uploads every minute, are the problem.

I've got the standard windows release, and it radomly retries, typically 2-3 hours!

The every minute folks are bringing the server down.


It is simple: every non-delivered result is separate process, trying to get home.
Even with exponential/random delay: if average time between retries is 30 minutes, and there are 60 WU waiting, machine will connect in average
every 30 seconds. Maximum 2 connects simmultanously, but...

Imagine 100 000 PCs. So, everyone with PC above Pentium 400 MHz is doing it ;-)
Look into your logs.

I summarize logs over few PCs:

times date strted/finished upload
3849 2005-07-18 Started
75 2005-07-18 Finished
4731 2005-07-17 Started
145 2005-07-17 Finished
4967 2005-07-16 Started
143 2005-07-16 Finished
2231 2005-07-15 Started
126 2005-07-15 Finished
2342 2005-07-14 Started
143 2005-07-14 Finished
2777 2005-07-13 Started
217 2005-07-13 Finished
1914 2005-07-12 Started
13 2005-07-12 Finished
145 2005-07-11 Started
145 2005-07-11 Finished
160 2005-07-10 Started
160 2005-07-10 Finished
166 2005-07-09 Started
166 2005-07-09 Finished
172 2005-07-08 Started
168 2005-07-08 Finished
269 2005-07-07 Started
192 2005-07-07 Finished
401 2005-07-06 Started
139 2005-07-06 Finished
187 2005-07-05 Started
184 2005-07-05 Finished
185 2005-07-04 Started
185 2005-07-04 Finished
156 2005-07-03 Started
156 2005-07-03 Finished
134 2005-07-02 Started
134 2005-07-02 Finished
356 2005-07-01 Started
198 2005-07-01 Finished

And it is log for modest 150 WU/day installation.

BR
Iztok




ID: 138274 · Report as offensive
Don Erway
Volunteer tester

Send message
Joined: 18 May 99
Posts: 305
Credit: 471,946
RAC: 0
United States
Message 138287 - Posted: 18 Jul 2005, 15:51:56 UTC - in response to Message 138267.  

To me, it looks like the folks that have modified their seti client software, to retry uploads every minute, are the problem.

I've got the standard windows release, and it radomly retries, typically 2-3 hours!

The every minute folks are bringing the server down.


I run optimised clients (seti and boinc) and I have not seen this before. My retries are all between a few minutes and 3 hours 15 minutes. Anyone else heard of this.....its a futile activity for sure if its true....'cos it will never get better that way.


See this post, which is older in this same thread:

http://setiathome.berkeley.edu/forum_thread.php?id=17147#137900

It shows backing off 1 minute, every time.


ID: 138287 · Report as offensive
Profile cjsoftuk
Volunteer tester

Send message
Joined: 3 Sep 04
Posts: 248
Credit: 183,721
RAC: 0
United Kingdom
Message 138288 - Posted: 18 Jul 2005, 15:52:19 UTC

Well I agree with the timer idea, but have a look at this:
Download Status
That's the download folder accessibility info, and
that is the upload status (HTTP GET only, no data sent).
Seems strange!
ID: 138288 · Report as offensive
Profile trux
Volunteer tester
Avatar

Send message
Joined: 6 Feb 01
Posts: 344
Credit: 1,127,051
RAC: 0
Czech Republic
Message 138294 - Posted: 18 Jul 2005, 15:57:19 UTC - in response to Message 138287.  

To me, it looks like the folks that have modified their seti client software, to retry uploads every minute, are the problem.

I've got the standard windows release, and it radomly retries, typically 2-3 hours!

The every minute folks are bringing the server down.


I run optimised clients (seti and boinc) and I have not seen this before. My retries are all between a few minutes and 3 hours 15 minutes. Anyone else heard of this.....its a futile activity for sure if its true....'cos it will never get better that way.


See this post, which is older in this same thread:

http://setiathome.berkeley.edu/forum_thread.php?id=17147#137900

It shows backing off 1 minute, every time.


Wrong. Your link to the log shows growing backing up from 1 to 8 minutes, and that's pretty normal behaviour. The delays grow from small values up to about 3-4 hours, and then get smaller again.
trux
BOINC software
Freediving Team
Czech Republic
ID: 138294 · Report as offensive
Profile Prognatus

Send message
Joined: 6 Jul 99
Posts: 1600
Credit: 391,546
RAC: 0
Norway
Message 138343 - Posted: 18 Jul 2005, 17:14:27 UTC

OK, philmor is out of work (flak-flak-flak-flak-sound as the tape reaches the end) and there are no more tapes in the queue! ...But the Berkeley plane is still running on 4 splitter engines... Maybe they won't add more tapes, so that uploads can get a breather? Hey, maybe the Gods on Olympus have heard our cries? ;)

ID: 138343 · Report as offensive
metal1633

Send message
Joined: 8 Jun 05
Posts: 2
Credit: 463
RAC: 0
United States
Message 138346 - Posted: 18 Jul 2005, 17:19:31 UTC - in response to Message 138032.  

Probably won't be looked at till Tomorrow, July 18th Berkley time. :-) Until then we will just have to wait. Downloads are fine, as every time I finish a wu i d/l a new one. just no uploading. Just keep on crunching. It will get fixed, just like it always does. :-)

Jeremy
I am no longer downloading either.


ID: 138346 · Report as offensive
KWSN Sir Clark
Volunteer tester

Send message
Joined: 17 Aug 02
Posts: 139
Credit: 1,002,493
RAC: 8
United Kingdom
Message 138356 - Posted: 18 Jul 2005, 17:33:11 UTC - in response to Message 138346.  

I've now set mine to No New Work and will only download work once all the others have uploaded.....

Still got three other projects happily crunching away instead of SETI

ID: 138356 · Report as offensive
Profile MJKelleher
Volunteer tester
Avatar

Send message
Joined: 1 Jul 99
Posts: 2048
Credit: 1,575,401
RAC: 0
United States
Message 138584 - Posted: 18 Jul 2005, 22:30:48 UTC - in response to Message 138245.  

I´m running only seti and seti beta ATM and have set my cache at 10 days.
Still have 67 WUs in cache and no problems with deadline.
Dedached from einstein because of the deadline issue.

I have to say that somebody has to respect that a (few) crunchers only want to run seti.
Sorry for that.

Just my 2$.

greetz Mike

Speaking (writing?) for myself alone, I do respect those who only want to run SETI. However, they don't get my sympathy. It's their choice to only run the one project, regardless of the warnings of potential outages and work shortfalls. You willingly (?) run the risk of running out of work and leaving your CPU idle. That's fine too, but you (the generic, not the specific you) shouldn't get all up in arms when that risk actually happens!

Tossing out a thought -- what if SETI does exist.... and is communicating by gravity waves instead of radio? Einstein@home may find him first! 8-)



ID: 138584 · Report as offensive
Profile gregh

Send message
Joined: 10 Jun 99
Posts: 220
Credit: 4,292,549
RAC: 0
Australia
Message 138596 - Posted: 18 Jul 2005, 22:48:45 UTC - in response to Message 138038.  

There,.....do I win?

You win a free chili cheese dog in the Cafe!



Can he pick the breed or is he only allowed to accept a mutt? ;-}
ID: 138596 · Report as offensive
KB7RZF
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 9549
Credit: 3,308,926
RAC: 2
United States
Message 138598 - Posted: 18 Jul 2005, 22:50:37 UTC
Last modified: 18 Jul 2005, 22:50:50 UTC

From the front page

July 18, 2005
We are still trying to figure out if there is a software reason for the connection drops but are looking at a hardware solution for the short term. We are working to get a more powerful data server machine online. There are several parts to making this work and, barring any show stoppers, we hope to have it online in the next couple of days
ID: 138598 · Report as offensive
DecBassI
Avatar

Send message
Joined: 21 May 05
Posts: 152
Credit: 86,905
RAC: 0
United Kingdom
Message 138601 - Posted: 18 Jul 2005, 22:51:57 UTC - in response to Message 138584.  

I´m running only seti and seti beta ATM and have set my cache at 10 days.
Still have 67 WUs in cache and no problems with deadline.
Dedached from einstein because of the deadline issue.

I have to say that somebody has to respect that a (few) crunchers only want to run seti.
Sorry for that.

Just my 2$.

greetz Mike

Speaking (writing?) for myself alone, I do respect those who only want to run SETI. However, they don't get my sympathy. It's their choice to only run the one project, regardless of the warnings of potential outages and work shortfalls. You willingly (?) run the risk of running out of work and leaving your CPU idle. That's fine too, but you (the generic, not the specific you) shouldn't get all up in arms when that risk actually happens!

Tossing out a thought -- what if SETI does exist.... and is communicating by gravity waves instead of radio? Einstein@home may find him first! 8-)




good point!

ID: 138601 · Report as offensive
Profile Fuzzy Hollynoodles
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 9659
Credit: 251,998
RAC: 0
Message 138615 - Posted: 18 Jul 2005, 23:08:37 UTC

From this post. Thank you, Byron!


"I'm trying to maintain a shred of dignity in this world." - Me

ID: 138615 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 138714 - Posted: 19 Jul 2005, 1:35:28 UTC

One thing that strikes me as a point to ponder with the current problem is this:

The work on the breaker clealy required that all the systems needed to be shut down. It might have been some time since all the servers were hard booted. There could be linux/unix boxes that had an uptime in weeks or months prior to this.

Along with that, on linux/unix, there might be changes to configs which are not used until either the server is restarted, the given service is killed and restarted, or the given service is "huped" (kill -hup)

If, for example, the apache config was modified to change max connections or timeouts weeks ago, but the service (httpd or httpsd) wasn't restarted to use the new config, problems might not appear until a hard boot..

By that time, folks could be scratching their heads saying "but it worked like this for the last few weeks, why did it break now? It's can't be the config..."


Anyway, just a thought. I've seen it before, and have done it myself...

As a suggestion... It seems they need to bring the DB down every week for backups. At that time, bring all the servers down. If something's not working after the restart, the config changes should only be a few days old, and might be easier to recall/backout/fix, than if it was a month or two back...
ID: 138714 · Report as offensive
IT_Eagle03
Avatar

Send message
Joined: 22 Nov 99
Posts: 5
Credit: 154,363
RAC: 0
United States
Message 138926 - Posted: 19 Jul 2005, 7:32:54 UTC - in response to Message 137834.  

The SETI chipmunk died. :(


July 18, 2005
We are still trying to figure out if there is a software reason for the connection drops but are looking at a hardware solution for the short term. We are working to get a more powerful data server machine online. There are several parts to making this work and, barring any show stoppers, we hope to have it online in the next couple of days.


Well, it seems they can rebuild him. They have the technology. They have the capability to make the world's first bionic chipmunk. The SETI Chipmunk will be that chipmunk. Better than he was before.

Better . . . stronger . . . faster.
ID: 138926 · Report as offensive
N/A
Volunteer tester

Send message
Joined: 18 May 01
Posts: 3718
Credit: 93,649
RAC: 0
Message 138930 - Posted: 19 Jul 2005, 7:58:04 UTC

[font='courier,courier new']I think the server committed suicide after exchanging a few words with a paranoid android...[/font]
ID: 138930 · Report as offensive
Peter M. Nielsen
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19
Credit: 19,661
RAC: 0
Denmark
Message 138934 - Posted: 19 Jul 2005, 8:45:07 UTC - in response to Message 137987.  

My CPU is IDLE folks.

It might be a good idea to attach other projects while the SETI-guys are working out their difficulties...

- Peter


_
ID: 138934 · Report as offensive
Profile Mr.Pernod
Volunteer tester
Avatar

Send message
Joined: 8 Feb 04
Posts: 350
Credit: 1,015,988
RAC: 0
Netherlands
Message 139063 - Posted: 19 Jul 2005, 16:03:36 UTC

my guess would be that someone forgot to renice the seti-clients running on the server after the last reboot....
;)
ID: 139063 · Report as offensive
Mr_Zeno

Send message
Joined: 5 Mar 04
Posts: 2
Credit: 75,587
RAC: 0
United Kingdom
Message 139112 - Posted: 19 Jul 2005, 17:48:03 UTC
Last modified: 19 Jul 2005, 17:50:42 UTC

It's a pity that Boinc doesn't have an inbuilt email address that it could send completed wu's to. Maybe have it send the wu's after five consecutive failures and the wu is two days from the report deadline.

Just a though.

Jase

ID: 139112 · Report as offensive
ampoliros
Volunteer tester
Avatar

Send message
Joined: 24 Sep 99
Posts: 152
Credit: 3,542,579
RAC: 5
United States
Message 139127 - Posted: 19 Jul 2005, 18:15:45 UTC

I'm up to about 35 WUs waiting to upload on my "4-project-PC". That makes me afraid to look at how many my "seti-only" monster crunchers (2) have in their upload cues.

I have plenty of work on all computers and I shouldn't start missing deadlines for another week, but it still makes me nervous.

7,049 S@H Classic Credits
ID: 139127 · Report as offensive
N/A
Volunteer tester

Send message
Joined: 18 May 01
Posts: 3718
Credit: 93,649
RAC: 0
Message 139202 - Posted: 19 Jul 2005, 20:22:09 UTC - in response to Message 139112.  

[font='courier,courier new']I see your 2¢ and raise you another 2×2¢ thought - When a WU completes on the host, the result file contains the time when the WU was finished, right? Doesn't that mean that when the server gets back online it'll be told "Yeah, here's the finished WU which was done on time but couldn't be uploaded after 15 tries"? The system would sort itself out.

But since I have an inclination towards being wrong, the alternative is that the deadline should be extended by the amount of time the server is down.

So... those're my thoughts. Anyone wanna raise the stakes further?[/font]
ID: 139202 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Guess what's wrong with uploading...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.