Guess what's wrong with uploading...

Message boards : Number crunching : Guess what's wrong with uploading...
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20395
Credit: 7,508,002
RAC: 20
United Kingdom
Message 137938 - Posted: 17 Jul 2005, 22:03:09 UTC - in response to Message 137935.  
Last modified: 17 Jul 2005, 22:05:18 UTC

If the upload and download would just spend enough time to finish the job it started we wouldn't need a system that needs to retry over and over.

And there is a symptom of whatever fault they are suffering.

The Boinc system could fault out more efficiently/gracefully. However, I would guess that whatever the root problem is, it is something that wasn't expected and hasn't been directly programmed for.

... That is just making more work for itself.

This can be very stressful on some of us. You never know when you may have to reinstall your operating system.

I'm sure the devs will learn from this latest situation and improve the system so that things run smoothly oncemore. I would expect them to add some more error trapping so that whatever the problem is now, it is circumvented or fails gracefully for the future.


Rest easy. Let the computers suffer the frustration. You can dream of perfection when the system is fully developed :)

(At least the random exponential backoff mechanism works nicely :O )

Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 137938 · Report as offensive
Scarecrow

Send message
Joined: 15 Jul 00
Posts: 4520
Credit: 486,601
RAC: 0
United States
Message 137952 - Posted: 17 Jul 2005, 22:28:20 UTC - in response to Message 137832.  


Database enquiries issues causing excessive server(s) delays;

DOS attack with junk WUs;

Bad network card or a bad router causing confusion;

Too many CPU processes causing the server to thrash;

Unexpectedly high disk fragmentation;

Very high throughput of DOWNLOADING WUs for starving half crazed users desperate to load up to The Max;


The dilithium crystals, they've broken down, Captain

A failure on the rectified flyback side of the hyperbolic gonkulator

Or, as was the case with one of our servers a few months back with all sorts of dropped connections and weird network errors, a single poorly crimped end on a cable. Our LFE's new mantra.... "Never overlook the obvious"










ID: 137952 · Report as offensive
Profile JoatHome

Send message
Joined: 26 Oct 99
Posts: 8
Credit: 751,493
RAC: 0
Germany
Message 137955 - Posted: 17 Jul 2005, 22:37:18 UTC - in response to Message 137952.  


Or, as was the case with one of our servers a few months back with all sorts of dropped connections and weird network errors, a single poorly crimped end on a cable. Our LFE's new mantra.... "Never overlook the obvious"


I guess, in the case of replacing a lot of old cables with new ones during the "Update", there is a BIG CHANCE for something like that.

I got a few years experience in that...

Let´ see what´s goin on on monday.

joathome

ID: 137955 · Report as offensive
Robert Nelson
Volunteer tester
Avatar

Send message
Joined: 13 Aug 99
Posts: 43
Credit: 3,632,674
RAC: 1
United States
Message 137964 - Posted: 17 Jul 2005, 23:10:39 UTC

An imbalance in the antimatter containment vessel, Thus the total annihiliation of all work units...... hmmmm I like the poorly crimped cable that wiggles everytime the earth shakes out there and thus the intermittent nature of the connections.
ID: 137964 · Report as offensive
The Jedi Alliance - Ranger
Avatar

Send message
Joined: 27 Dec 00
Posts: 72
Credit: 60,982,863
RAC: 0
United States
Message 137967 - Posted: 17 Jul 2005, 23:17:58 UTC

I've read in several posts that the UL/DL server is CPU bound. I've also read in several posts that Berkeley uses scripts a lot. Scripts are generally interpreted rather than compiled. In my experience interpreted code is synonymous with CPU pig. When my interpreted code has become a CPU bottleneck, I've invested in a compiled program rather than hardware. Perhaps if they look into this now the onslaught of Classic users moving over to BOINC won't be so bad. This may not be the problem today, but it will be in time.

ID: 137967 · Report as offensive
Profile Graeme of Boinc UK

Send message
Joined: 25 Nov 02
Posts: 114
Credit: 1,250,273
RAC: 0
United Kingdom
Message 137970 - Posted: 17 Jul 2005, 23:34:59 UTC

Bad network card or router.

Strange how nobody else looks at the obvious problems.
Then again you only see these problems if you are prepared
to "Think outside of the box" lol !
Congratulations ML1 for hitting the nail "on the head"

Kind regards,
Graeme Murphy.
www.setiuk.com


ID: 137970 · Report as offensive
duffy

Send message
Joined: 21 Apr 02
Posts: 4
Credit: 20,448,302
RAC: 0
United States
Message 137973 - Posted: 17 Jul 2005, 23:45:33 UTC

My guess is that one of the inputs to a router-hub has reverted to 10M instead of 100M and this caused the whole router-hub to slow down to that speed so the uploads time out before it can finish. Happy crunching!
ID: 137973 · Report as offensive
metal1633

Send message
Joined: 8 Jun 05
Posts: 2
Credit: 463
RAC: 0
United States
Message 137987 - Posted: 18 Jul 2005, 0:42:18 UTC
Last modified: 18 Jul 2005, 0:53:47 UTC

Well whatever it is I hope they get it fixed. I have 9 finished units waiting for upload and the oldest is past its deadline and am also waiting for new units to download. My CPU is IDLE folks.
ID: 137987 · Report as offensive
Profile Graeme of Boinc UK

Send message
Joined: 25 Nov 02
Posts: 114
Credit: 1,250,273
RAC: 0
United Kingdom
Message 137996 - Posted: 18 Jul 2005, 1:00:05 UTC
Last modified: 18 Jul 2005, 1:00:48 UTC

Well now there is a senario.
In another post I made last night, the "experts" said that this problem would/should not result in anyone losing credit due to the workunit being past it's deadline time & date.
As I suspected last night, this senario is now upon us and "we" are losing credit at "our" expense.
Expense being that it is us who choose to run this upon our computers and pay for the power that they consume so why oh why is Berkeley not considering extending the report deadlines instead of sitting upon their hands and making a "minor" mention of this "major" problem on the front page of the website?

Kind regards,
Graeme Murphy.
www.setiuk.com


ID: 137996 · Report as offensive
Profile gregh

Send message
Joined: 10 Jun 99
Posts: 220
Credit: 4,292,549
RAC: 0
Australia
Message 138030 - Posted: 18 Jul 2005, 1:54:51 UTC

Still no idea when it will be fixed anyone? All server status items show green which is obviously not really the case.

Dont want to whinge about it. Just hoping it will be fixed soon.

Thanks anyone who may answer in a similar tone.
ID: 138030 · Report as offensive
Profile Jim Baize
Volunteer tester

Send message
Joined: 6 May 00
Posts: 758
Credit: 149,536
RAC: 0
United States
Message 138031 - Posted: 18 Jul 2005, 1:58:29 UTC - in response to Message 138030.  

That just means that all the systems are running. It doesn't mean that they are all running properly or within specified parameters.

Jim

Still no idea when it will be fixed anyone? All server status items show green which is obviously not really the case.

Dont want to whinge about it. Just hoping it will be fixed soon.

Thanks anyone who may answer in a similar tone.


ID: 138031 · Report as offensive
KB7RZF
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 9549
Credit: 3,308,926
RAC: 2
United States
Message 138032 - Posted: 18 Jul 2005, 1:58:37 UTC

Probably won't be looked at till Tomorrow, July 18th Berkley time. :-) Until then we will just have to wait. Downloads are fine, as every time I finish a wu i d/l a new one. just no uploading. Just keep on crunching. It will get fixed, just like it always does. :-)

Jeremy
ID: 138032 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 138036 - Posted: 18 Jul 2005, 2:08:19 UTC

Let's see,,,,,the game is to "Guess what's wrong with Uploading?" is that correct? So who ever is closer to being right wins? OK,,then, My guess is that:
'
They are taking to long.

There,.....do I win?
ID: 138036 · Report as offensive
Profile Misfit
Volunteer tester
Avatar

Send message
Joined: 21 Jun 01
Posts: 21804
Credit: 2,815,091
RAC: 0
United States
Message 138038 - Posted: 18 Jul 2005, 2:10:36 UTC - in response to Message 138036.  
Last modified: 18 Jul 2005, 2:41:27 UTC

There,.....do I win?

You win a free chili cheese dog in the Cafe!
me@rescam.org
ID: 138038 · Report as offensive
The frozen

Send message
Joined: 2 Jun 99
Posts: 11
Credit: 261,900
RAC: 0
Germany
Message 138066 - Posted: 18 Jul 2005, 2:53:15 UTC - in response to Message 138038.  
Last modified: 18 Jul 2005, 3:00:15 UTC

Here is another guess:

The UL/DL-Server has defunct memory. Maybe really physical defunct, maybe "only" malfunctioning due to something weird. Like an powercable from an UPS, which is not or not well enough shielded (some UPS have been rearranged as far as I remember)). Or to much heat. Or a software error (like the suggested kernel error) but this seems to be the most weird possibility to me...

Edit:

From the technical news page is this text which brought me to the ideas posted above:


July 13, 2005 - 22:00 UTC
Around noon today the master science database server rebooted itself due to a fatal memory upset. This may have been caused by a kernel bug (as evidenced by certain signatures in the logs), and we are currently applying a patch that may prevent this from happening again.

ID: 138066 · Report as offensive
Profile Jim Baize
Volunteer tester

Send message
Joined: 6 May 00
Posts: 758
Credit: 149,536
RAC: 0
United States
Message 138071 - Posted: 18 Jul 2005, 2:57:39 UTC

Personally, I think it is harmonics that are disrupting the information flow, causing the connections to error out and drop.

Jim
ID: 138071 · Report as offensive
Profile mikey
Volunteer tester
Avatar

Send message
Joined: 17 Dec 99
Posts: 4215
Credit: 3,474,603
RAC: 0
United States
Message 138115 - Posted: 18 Jul 2005, 4:02:46 UTC - in response to Message 137937.  

There must be some priority to downloading new work units. I seem to get downloads but have as many as 11 uploads sitting around waiting. Seems so strange since they take about 3 seconds to do -= the eleven of them would take less time to upload than one WU downloaded.....hmmmmm......

This was done a few outages ago so people would at least have units to crunch without people cranking up their cache levels. Downloads are a higher priority than uploads during outages.

ID: 138115 · Report as offensive
Profile JavaPersona
Volunteer tester

Send message
Joined: 4 Jun 99
Posts: 112
Credit: 471,529
RAC: 0
United States
Message 138136 - Posted: 18 Jul 2005, 5:42:14 UTC

After the planned power outage on the 12th the servers were reset and optimized. It resulted in all servers and communications hardware working at optimum efficiency. Slightly unoptimal clients are attempting to connect. The SETI hardware is not tolerant of less-than-optimal clients and is impatiently seeking better connections.

/sarcasm off
ID: 138136 · Report as offensive
K L Hildy

Send message
Joined: 19 May 99
Posts: 7
Credit: 894,098
RAC: 0
United States
Message 138141 - Posted: 18 Jul 2005, 5:57:21 UTC

With 53 uploads waiting, I sure hope something happens soon.
Good luck Keith
KeithH
ID: 138141 · Report as offensive
Profile Joe Nevole

Send message
Joined: 10 Feb 00
Posts: 7
Credit: 3,047,388
RAC: 21
United States
Message 138147 - Posted: 18 Jul 2005, 6:46:38 UTC
Last modified: 18 Jul 2005, 6:48:13 UTC

I have 24 units waiting. They' re good units. Why doesn't anyone want them. Its hurtfull!
ID: 138147 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Guess what's wrong with uploading...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.