Working as Expected (Jul 13 2009)


log in

Advanced search

Message boards : Technical News : Working as Expected (Jul 13 2009)

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next
Author Message
Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8442
Credit: 48,084,865
RAC: 65,533
United Kingdom
Message 920262 - Posted: 22 Jul 2009, 8:35:09 UTC - in response to Message 920217.

Observation FWIW.

Several hours after the beginning of the Tuesday outage there was near zero download bandwidth in use and the usual 5 MBits/sec or so on the upload side from hosts trying work requests each hour. Packet rates were low both ways. I had as usual been running with Network activity disabled and had only 5 uploads ready to go, none had been tried. I enabled Network, BOINC tried and tried and tried, eventually one upload was successful before the backoffs became ridiculous.

The obvious conclusion is that saturated download isn't the only cause of difficult uploads. I wonder if the internal 1 GBit network might have been near saturation doing the database backup, and whether that part which is invisible to us may make most of our theorizing moot.
Joe

To add to that - I had about 20 tasks waiting to upload at the beginning of maintenance. I waited maybe 15 minutes, and hit retry - they all went through at the first attempt. Upload traffic was about 10 Mbits at the time. But later in the outage, as more tasks finished, I saw uploads start to stack up again before there were any downloads - must have been some other internal issue, as Joe says.

Invisible Man
Send message
Joined: 24 Jun 01
Posts: 22
Credit: 1,129,336
RAC: 0
United Kingdom
Message 920263 - Posted: 22 Jul 2009, 8:36:47 UTC

Well, this is the 179th reply to Matt's message of the 13th July. It is now the 22nd.

Has he been run over by a bus, has he got swine flu, on holiday or what? Obviously there is a very real problem here to solve, but just a few lines from somebody at Berkeley would be appreciated.

Anyway, keep up the good work!!!
____________

Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 357,745
RAC: 38
Germany
Message 920265 - Posted: 22 Jul 2009, 8:48:19 UTC - in response to Message 920263.

Has he been run over by a bus, has he got swine flu, on holiday or what?

"Holiday" it is. And there have been several posts (in other threads) to mention that.

Gruß,
Gundolf

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8442
Credit: 48,084,865
RAC: 65,533
United Kingdom
Message 920274 - Posted: 22 Jul 2009, 10:28:27 UTC - in response to Message 920217.

Observation FWIW.

I'm getting consistently:

22/07/2009 11:16:55||[http_debug] [ID#0] info: Empty reply from server
22/07/2009 11:16:55||[http_debug] HTTP error: server returned nothing (no headers, no data)

That sounds like a server problem, not a network problem.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5774
Credit: 57,558,512
RAC: 48,377
Australia
Message 920279 - Posted: 22 Jul 2009, 10:48:09 UTC - in response to Message 920274.


On uploads or downloads?
There's (effectively) no network traffic of mention at the moment, but my uploads are having greater difficulty getting through than when things were previously at their absolute limits.
____________
Grant
Darwin NT.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8442
Credit: 48,084,865
RAC: 65,533
United Kingdom
Message 920281 - Posted: 22 Jul 2009, 10:51:22 UTC - in response to Message 920279.


On uploads or downloads?
There's (effectively) no network traffic of mention at the moment, but my uploads are having greater difficulty getting through than when things were previously at their absolute limits.

Uploads.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5774
Credit: 57,558,512
RAC: 48,377
Australia
Message 920283 - Posted: 22 Jul 2009, 10:54:47 UTC - in response to Message 920281.
Last modified: 22 Jul 2009, 10:58:15 UTC

I'm thinking something got tweaked during the outage, and hasn't worked quite as expected.


EDIT- Another result just joined the queue.
Tried manaully retrying it, most times it will time out instantly (like with the tweak to kill off excess connections when under heavy load instead of queueing them). When it doesn't time out instantly, it takes about 30 seconds for the download to actually start.
____________
Grant
Darwin NT.

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13565
Credit: 29,789,116
RAC: 16,670
United States
Message 920285 - Posted: 22 Jul 2009, 10:57:27 UTC - in response to Message 920259.

I'm happy to hand the issue over to the mods for comment.

I believe this was a current hot-button topic recently, and I believe Eric himself chimed in and said that it wasn't an issue linking to the cricket graph.

I didn't actually read him saying anything,


Ah, but I did read him saying something, only it was where you guys can't see it. ;)
____________

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8359
Credit: 4,097,471
RAC: 1,240
United Kingdom
Message 920293 - Posted: 22 Jul 2009, 11:18:20 UTC

Wow!

About 70Mbit/s on the downloads, and:

All http connections look sweet and smooth and fast. A complete change for the better from previous days.

Either everyone has gone on holiday or some data-rate management fixes have been implemented. (Or?)

Good stuff,

Happy crunchin',
Martin

____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8442
Credit: 48,084,865
RAC: 65,533
United Kingdom
Message 920304 - Posted: 22 Jul 2009, 12:19:30 UTC - in response to Message 920293.

Wow!

About 70Mbit/s on the downloads, and:

All http connections look sweet and smooth and fast. A complete change for the better from previous days.

Either everyone has gone on holiday or some data-rate management fixes have been implemented. (Or?)

Good stuff,

Happy crunchin',
Martin

Is that both directions, or just downloads?

I'm just getting the very occasional successful upload - all others get the 'empty server reply'.

If your http upload works differently from the Windows one, I may just have to consider Linux after all..... :-)

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1622
Credit: 22,068,013
RAC: 4,140
United States
Message 920308 - Posted: 22 Jul 2009, 12:35:20 UTC

Things aren't that good. I've had about 3h of frequent rejected communications after a large upload/download cycle. Cricket implies there is headroom, but my message log is full of failures.

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8359
Credit: 4,097,471
RAC: 1,240
United Kingdom
Message 920310 - Posted: 22 Jul 2009, 12:38:36 UTC - in response to Message 920304.
Last modified: 22 Jul 2009, 12:41:11 UTC

Wow!

About 70Mbit/s on the downloads, and:

All http connections look sweet and smooth and fast. A complete change for the better from previous days.

Is that both directions, or just downloads?

I'm just getting the very occasional successful upload - all others get the 'empty server reply'.

If your http upload works differently from the Windows one, I may just have to consider Linux after all..... :-)

Downloads show about 55Mbit/s and uploads look steady at just under 10Mbits/s.

All my uploads cleared immediately except for just two WUs. The WU cache built up quite quickly to the usual 0.5 day cache.

... OK, so a bit of 'playing' and I've now got 25+ or so WUs. They all downloaded in pairs, consistently, all sweet and smooth, no resends.

The two pending uploads are from before the maintenance shutdown last night and appear to get their connections refused. DNS change?... Or are Berkeley now limiting the maximum number of simultaneous connections rather than limiting WU generation rate?

Whatever, all a vast improvement over the chaos of a saturated link.

I'll see what happens to the two pending WU uploads...


(And hey, it's always worth at least a giggle to give Linux a try ;-) )

Happy crunchin',
Martin

[edit] Might be a coincidence but you might actually notice the little blip in the Cricket chart just now. Was that really me? [/edit]
____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

HAL
Send message
Joined: 28 Mar 03
Posts: 704
Credit: 870,617
RAC: 0
United States
Message 920311 - Posted: 22 Jul 2009, 12:46:24 UTC - in response to Message 920308.
Last modified: 22 Jul 2009, 12:52:34 UTC

I used to have a 2 processor pentium D according to the old version of boinc installed on it. Taking advantage of the fact it was out of work I installed the new version which I was told was available when I started it up.
NOW boinc tells me it is a single processor - and my preferences are correct. In fact the CPU benchmark tells me it is a single processor but everybody else tells me it has 2.
I guess I just take it off boinc completely until the smoke from the current situation clears away.
Indeed here thing are getting worse.

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13565
Credit: 29,789,116
RAC: 16,670
United States
Message 920316 - Posted: 22 Jul 2009, 13:17:22 UTC - in response to Message 920311.

I used to have a 2 processor pentium D according to the old version of boinc installed on it. Taking advantage of the fact it was out of work I installed the new version which I was told was available when I started it up.
NOW boinc tells me it is a single processor - and my preferences are correct. In fact the CPU benchmark tells me it is a single processor but everybody else tells me it has 2.
I guess I just take it off boinc completely until the smoke from the current situation clears away.
Indeed here thing are getting worse.


Your issue seems to be different than the server issues everyone has been going on about in this thread. The servers have no control over how many CPUs BOINC "sees", but your preferences can limit how many CPUs BOINC can use.

Indeed, a Pentium D processor is Intel's first dual-core attempt. I would double-check your local preferences to see if you didn't accidentally set them.

Otherwise, you may want to ask for assistance over in the Q&A Forum. I'm sure we can get your situation solved for you.
____________

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 597
Credit: 134,185,758
RAC: 116,485
United Kingdom
Message 920409 - Posted: 22 Jul 2009, 19:25:51 UTC - in response to Message 919884.

Another frustrating Mon. Have been trying to upload/download work units for three weeks. Very sporadic at best. Managing only one connection per week for two in a row. I wish someone would let us know what's up. I started running Boinc for Seti@home when the classic S@H was shut down and am not really interested in running other "filler" projects. Thanks, and any information would very welcome.


I have a theory, and my theory is this. When things get reset at the Lab,
their DHCP server doesn't necessarily reassign the same IP number to a
given machine. So BOINC keeps trying to send results, etc. to a "stale"
address, because it doesn't know the assignment has changed. When this
happens I find that stopping BOINC, flushing the DNS cache[1], and
restarting BOINC often helps.

[1] Windows: ipconfig /flushdns
Linux: (maybe) sudo /etc.init.d/nscd restart

____________

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13565
Credit: 29,789,116
RAC: 16,670
United States
Message 920413 - Posted: 22 Jul 2009, 19:48:01 UTC - in response to Message 920409.

Another frustrating Mon. Have been trying to upload/download work units for three weeks. Very sporadic at best. Managing only one connection per week for two in a row. I wish someone would let us know what's up. I started running Boinc for Seti@home when the classic S@H was shut down and am not really interested in running other "filler" projects. Thanks, and any information would very welcome.


I have a theory, and my theory is this. When things get reset at the Lab,
their DHCP server doesn't necessarily reassign the same IP number to a
given machine. So BOINC keeps trying to send results, etc. to a "stale"
address, because it doesn't know the assignment has changed. When this
happens I find that stopping BOINC, flushing the DNS cache[1], and
restarting BOINC often helps.

[1] Windows: ipconfig /flushdns
Linux: (maybe) sudo /etc.init.d/nscd restart


I'm certain that any internet-facing machines are using static IP addresses.
____________

Anthony Liggins
Send message
Joined: 23 Aug 99
Posts: 14
Credit: 454,120
RAC: 0
United Kingdom
Message 920414 - Posted: 22 Jul 2009, 19:48:49 UTC - in response to Message 919427.

Hi

I am fairly new to the forum, but reading this thread has compelled me to add to it.

An easy way to look at this is

32 bit = slow
64 bit = fast
GPU = fast

all you have to do then is find a happy medium and throttle between them.

Anthony.

____________

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13565
Credit: 29,789,116
RAC: 16,670
United States
Message 920418 - Posted: 22 Jul 2009, 19:54:19 UTC - in response to Message 920414.

Hi

I am fairly new to the forum, but reading this thread has compelled me to add to it.

An easy way to look at this is

32 bit = slow
64 bit = fast
GPU = fast

all you have to do then is find a happy medium and throttle between them.

Anthony.


What about modern computers operating in 32bit mode? I also wouldn't consider my AMD Athlon XP 3200+, which is 32bit only, a "slow" computer. Nor would I call my Pentium 4 Extreme Edition 3.2GHz w/HT - also 32bit only - a "slow" computer, even if by today's standards technology has surpassed them.
____________

Anthony Liggins
Send message
Joined: 23 Aug 99
Posts: 14
Credit: 454,120
RAC: 0
United Kingdom
Message 920439 - Posted: 22 Jul 2009, 20:59:36 UTC - in response to Message 920418.
Last modified: 22 Jul 2009, 21:01:29 UTC

Well that simple, the database will know if a processor is a true 32 or true 64 bit and or 64 bit running in 32 bit mode, this would still be marked as fast, as it displays the processor information and the operating system that a participant is using.

How long does it take your machine to crunch an average wu?

I have 2 retired supermicro servers each one has (2) 3.2Ghz/533 DP xeons, on average it takes about 8hrs to crunch a normal wu, 11hrs for the larger ones, and about 3hrs for the short ones.

Anthony.

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13565
Credit: 29,789,116
RAC: 16,670
United States
Message 920450 - Posted: 22 Jul 2009, 21:36:32 UTC - in response to Message 920439.

Well that simple, the database will know if a processor is a true 32 or true 64 bit and or 64 bit running in 32 bit mode, this would still be marked as fast, as it displays the processor information and the operating system that a participant is using.


64bit still hasn't even caught on yet in the general market, so why should SETI consider 64bit CPUs running on 64bit OSes to be the only "fast" ones is what I was trying to get across.

How long does it take your machine to crunch an average wu?


The speed isn't entirely the point. SETI shouldn't only be about speed, except those who are only interested in RAC.

I have 2 retired supermicro servers each one has (2) 3.2Ghz/533 DP xeons, on average it takes about 8hrs to crunch a normal wu, 11hrs for the larger ones, and about 3hrs for the short ones.

Anthony.


Sounds about as fast as my current file server which is a dual Xeon 3.4GHz/800 DP with 4GB of DDR2 RAM. Still not necessarily "slow" by any means.
____________

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

Message boards : Technical News : Working as Expected (Jul 13 2009)

Copyright © 2014 University of California