Message boards :
Number crunching :
Guess what's wrong with uploading...
Message board moderation
Author | Message |
---|---|
ML1 Send message Joined: 25 Nov 01 Posts: 20267 Credit: 7,508,002 RAC: 20 |
Here on the boards, we have many experts of various types. Now here is your chance to tell Berkeley what they don't know and what to fix! I'll start off with a few of my guesses. We already have the clues that the upload/download server is 100% CPU bound. Also, the Cogent link is not bandwidth saturated... So here goes: Too many files accumulated in the upload directory causing the filesystem to choke; Database enquiries issues causing excessive server(s) delays; DOS attack with junk WUs; Bad network card or a bad router causing confusion; Too many CPU processes causing the server to thrash; Unexpectedly high disk fragmentation; Very high throughput of DOWNLOADING WUs for starving half crazed users desperate to load up to The Max; ... I'll leave a few options for the other experts to have a chance :) So who gets the prize for best guess? Good luck, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
IT_Eagle03 Send message Joined: 22 Nov 99 Posts: 5 Credit: 154,363 RAC: 0 |
The SETI chipmunk died. :( |
CJOrtega Send message Joined: 15 May 99 Posts: 186 Credit: 1,126,273 RAC: 0 |
Too many people hitting the retry/update button. |
Astro Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0 |
Matt set his beer on the UL/DL server and someone knocked it over (alcohol abuse). |
KWSN - MajorKong Send message Joined: 5 Jan 00 Posts: 2892 Credit: 1,499,890 RAC: 0 |
Too many people hitting the retry/update button. I agree with CJOrtega. Too many people tap-dancing on the update button. The upload/download server can't catch up as long as people are doing this, in my opinion. Quite the Prisoner's Dilemma. https://youtu.be/iY57ErBkFFE #Texit Don't blame me, I voted for Johnson(L) in 2016. Truth is dangerous... especially when it challenges those in power. |
Kevin N. Shapley Send message Joined: 1 Jan 00 Posts: 100 Credit: 2,539,295 RAC: 0 |
PacBell. Oops, I mean SBC ;) - Oderint dum metuant |
ML1 Send message Joined: 25 Nov 01 Posts: 20267 Credit: 7,508,002 RAC: 20 |
Matt set his beer on the UL/DL server... Or rather Matt is enjoying a beer while watching "top" over a remote link while the CPU is maxed out and the load average goes exponential as top gobbles more and more CPU trying to calculate the load! OK, another guess: The server is choked with multiple remote links or nfs mounts getting polled. Or worse still, they are using Linux and have left "fam" enabled and the kernel has gone into meltdown trying to service fam's requests for what files have changed...! Mmmmm, more beer here I think, Cheers, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
ML1 Send message Joined: 25 Nov 01 Posts: 20267 Credit: 7,508,002 RAC: 20 |
Matt set his beer on the UL/DL server... Or better yet, he's started up Xorg server to view the system stats graphically and X has burped into its 100% CPU useage mode...! ;) Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Ned Slider Send message Joined: 12 Oct 01 Posts: 668 Credit: 4,375,315 RAC: 0 |
Someone thought they'd try out a Win X64 installation over the weekend. Windows and servers don't mix :D *** My Guide to Compiling Optimised BOINC and SETI Clients *** *** Download Optimised BOINC and SETI Clients for Linux Here *** |
Iztok s52d (and friends) Send message Joined: 12 Jan 01 Posts: 136 Credit: 393,469,375 RAC: 116 |
Hi! While monitoring upload, I saw quite often result uploaded, but not ACKed. So, it went into game again. Now... If I do not get connection, it is fine. But if I get it, it should do the job. Maybe some timer? Anyhow, if successfull uploads are higher than work we do, then queues will slowly shring. If not, then we might discover some limits on client side. Let us hope they find magic parameter tomorrow, and we are back to normal soon. BR Iztok |
Tigher Send message Joined: 18 Mar 04 Posts: 1547 Credit: 760,577 RAC: 0 |
Hi! Yes I saw this too. Connection accepted and ack'd and established. Data sent to server giving file size and ack'd but then a tcp/ip reset from the server. Strange behaivour for sure. |
ML1 Send message Joined: 25 Nov 01 Posts: 20267 Credit: 7,508,002 RAC: 20 |
Yes I saw this too. Connection accepted and ack'd and established. Data sent to server giving file size and ack'd but then a tcp/ip reset from the server. Strange behaivour for sure. Interesting. A resource limit hit at their end? Too many open connections or a fs timeout?... OK, is their HDD full?! Regards, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Tigher Send message Joined: 18 Mar 04 Posts: 1547 Credit: 760,577 RAC: 0 |
Yes I saw this too. Connection accepted and ack'd and established. Data sent to server giving file size and ack'd but then a tcp/ip reset from the server. Strange behaivour for sure. Well perhaps. I saw in another thread I think that the server was unable to find files it needed. So....full....corruption.....who knows but its seems serious to me. Ian |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
In case you missed it, if you RetryNow an upload and then cancel it in mistream, the upload is actually accepted and posted to your account. Kinda wierd, but it seems to work. I suppose if everyone did this, then the disk might overflow or otherwise crash the upload system. May this Farce be with You |
ML1 Send message Joined: 25 Nov 01 Posts: 20267 Credit: 7,508,002 RAC: 20 |
In case you missed it, if you RetryNow an upload and then cancel it in mistream, the upload is actually accepted and posted to your account. Kinda wierd, but it seems to work. I hope you don't mean "abort" the WU? The 'Abort' trick has been tested elsewhere (see Misfit's posts). Your WU is dumped and you get zero credit and zero science. DO NOT ABORT YOUR WUs! Just be patient and let the WUs get returned as and when they can. I'm getting a steady trickly returned ok after a few (automatic) attempts each. Good luck, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
ML1 Send message Joined: 25 Nov 01 Posts: 20267 Credit: 7,508,002 RAC: 20 |
Yes I saw this too. Connection accepted and ack'd and established. Data sent to server giving file size and ack'd but then a tcp/ip reset from the server. Strange behaivour for sure.... Whatever it is, a lot of the bandwidth is being lost to repeated upload attempts: 2005-07-17 19:19:37 [SETI@home] Started upload of 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:20:52 [SETI@home] Temporarily failed upload of 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:20:52 [SETI@home] Backing off 1 minutes and 0 seconds on transfer of file 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:21:52 [SETI@home] Started upload of 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:23:43 [SETI@home] Temporarily failed upload of 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:23:43 [SETI@home] Backing off 1 minutes and 0 seconds on transfer of file 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:24:43 [SETI@home] Started upload of 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:25:58 [SETI@home] Temporarily failed upload of 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:25:58 [SETI@home] Backing off 1 minutes and 0 seconds on transfer of file 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:26:58 [SETI@home] Started upload of 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:29:15 [SETI@home] Temporarily failed upload of 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:29:15 [SETI@home] Backing off 1 minutes and 0 seconds on transfer of file 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:30:15 [SETI@home] Started upload of 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:31:30 [SETI@home] Temporarily failed upload of 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:31:30 [SETI@home] Backing off 2 minutes and 1 seconds on transfer of file 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:33:31 [SETI@home] Started upload of 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:34:46 [SETI@home] Temporarily failed upload of 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:34:46 [SETI@home] Backing off 1 minutes and 18 seconds on transfer of file 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:36:04 [SETI@home] Started upload of 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:37:19 [SETI@home] Temporarily failed upload of 07fe05aa.532.7760.47148.134_4_0 2005-07-17 19:37:19 [SETI@home] Backing off 8 minutes and 31 seconds on transfer of file 07fe05aa.532.7760.47148.134_4_0 Mmmm, calling all Experts, any other ideas? Regards, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
KB7RZF Send message Joined: 15 Aug 99 Posts: 9549 Credit: 3,308,926 RAC: 2 |
I've done what a few others have done, and just disabled BOINC network access, and i'll leave it that way till probably Monday afternoon, when hopefully (crossing fingers) everything is working or semi-working again. I got plenty of WU's for my computer to crunch, plus another project for it to work on, with no need to connect to anything. And like someone else said, hitting the retry/update buttons only throws more data at the servers, and since they are overloaded already, why overload it more? Don't make sense to me. Oh well. Keep on crunchin everyone!! Jeremy |
tekwyzrd Send message Joined: 21 Nov 01 Posts: 767 Credit: 30,009 RAC: 0 |
I had two wu upload today (due on July 28th) but am still unable to get units due on the 26th and 27th to upload. Well, at least I still have time until the deadline. I'm not beating at the the sever with repeated retries but rather am leaving the connection enabled while online. They upload if and when they want to. Currently the completed unit with the longest time to report is scheduled to retry first.I wonder why they're uploading in such an odd order. |
StokeyBob Send message Joined: 31 Aug 03 Posts: 848 Credit: 2,218,691 RAC: 0 |
If the upload and download would just spend enough time to finish the job it started we wouldn't need a system that needs to retry over and over. Why does it download just enough information to give your machine a work unit number and then quit? Then when you go back later it is going to have to look up that work unit to send you the information. That is just making more work for itself. This can be very stressful on some of us. You never know when you may have to reinstall your operating system. |
JERFilm Send message Joined: 20 Apr 02 Posts: 4 Credit: 4,131,391 RAC: 0 |
There must be some priority to downloading new work units. I seem to get downloads but have as many as 11 uploads sitting around waiting. Seems so strange since they take about 3 seconds to do -= the eleven of them would take less time to upload than one WU downloaded.....hmmmmm...... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.