Out of the fire and into the pit of sulfuric acid. (Feb 19, 2010)

Message boards : Technical News : Out of the fire and into the pit of sulfuric acid. (Feb 19, 2010)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 15 · Next

AuthorMessage
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 971925 - Posted: 20 Feb 2010, 0:22:36 UTC - in response to Message 971905.  

maybe look at the configuration of your switches. no work is uploading, nor are scheduler requests.
ID: 971925 · Report as offensive
Swibby Bear

Send message
Joined: 1 Aug 01
Posts: 246
Credit: 7,945,093
RAC: 0
United States
Message 971928 - Posted: 20 Feb 2010, 0:40:24 UTC
Last modified: 20 Feb 2010, 0:44:18 UTC

I'm not sure you understand -- All of my WUs are finally uploaded -- But I C A N - N O T CONNECT TO R E P O R T Them For More Than 48 Hours!!! I don't think this has anything to do with splitters being down.
ID: 971928 · Report as offensive
the3dge

Send message
Joined: 16 May 99
Posts: 19
Credit: 248,813,983
RAC: 0
United States
Message 971930 - Posted: 20 Feb 2010, 0:41:58 UTC - in response to Message 971899.  

It's aggravating that some people simply aren't seeing this problem while others have been seeing this even before the AC crash and the first group is saying "all is well".


Fwiw, I've got more than a couple of computers trying to send results over the last 72 to 96 hours. They managed to get about 75% through since the recovery from the A/C failure (but I was certainly having problems before then) - the rest are sitting here trying to upload but getting HTTP errors and backing off, and when they can connect to download they're being told the project has no jobs available (that may be true given the onslaught of people trying to catch up, i don't know).

I'm in the 'something's still not quite right' camp, but just holding out, since there's obviously not much we can do.

If nothing else it's a good reminder to take a breath. Maybe some of us get tunnelvision for numbercrunching.... must get more!mustgetmore! or maybe that's just me. dunno yet.

Either way, hat's off to the S@H team for giving it all they've got, and hope all have a great weekend.

[/quote]
ID: 971930 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 971933 - Posted: 20 Feb 2010, 0:48:57 UTC - in response to Message 971816.  

... One of the raid arrays on thumper lost a drive...


you guys seem to loose way to many drives! how many do you have?



anybody got work?
ID: 971933 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 971935 - Posted: 20 Feb 2010, 0:57:58 UTC - in response to Message 971933.  

... One of the raid arrays on thumper lost a drive...


you guys seem to loose way to many drives! how many do you have?



anybody got work?

Until about 2am PST, Sure, I've got work, After that, I doubt It highly. It's going to be colder than usual in the morning here(I leave the heater off at night since I'm the only living person around here anymore). Mom doesn't need anything as She's happy on Her shelf in My closet for eternity. :D
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 971935 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 971945 - Posted: 20 Feb 2010, 1:11:23 UTC - in response to Message 971881.  

Superjoker: as explained elsewhere the backoffs are our friend as without them the servers would be flooded with requests & no-one would get anywhere. The longer the backoffs the better as it spreads the load more.

Backoffs are a perfect technique for spreading the load when when the complete system is capable of handling (with an adequate satety margin) the aggregate anticipated demand averaged over an extended period of time. That's how SETI normally runs, and a few backoffs to shave the peaks and fill the troughs are exactly what's needed.

Backoffs do not help if the aggregate load exceeds - over an extended period - the system's capacity to absorb work. Then you have to take more drastic action, to reduce demand or increase supply.

For the last 4.5 days (only), SETI's capacity to absorb work has been below demand. I see no sign that demand has increased: instead, it seems to me that capacity has decreased (hopefully, temporarily).

No amount of smoothing (backoffs) will solve this. What is needed is to restore the status quo ante on the capacity side.

BOINC has a built in mechanism for this as well. If there are more than 2 * ncpus results waiting for uploads, no work will be requested. This was originally put in place to limit the growth on the client for a couple of projects where the upload time exceeded the crunch time. It also has the effect of limiting work being done if upload is slowed for an extended period.


BOINC WIKI
ID: 971945 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 971947 - Posted: 20 Feb 2010, 1:13:24 UTC - in response to Message 971933.  

... One of the raid arrays on thumper lost a drive...


you guys seem to loose way to many drives! how many do you have?



anybody got work?

Lots and lots of work from many projects (just not much from SETI).


BOINC WIKI
ID: 971947 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 971951 - Posted: 20 Feb 2010, 1:19:08 UTC


I can not UL thousands of results and not report/request new work.

Since days only idle PCs.

It's not at my side, DSL router switched off/on and PCs rebooted.


Yes, sure.. I'm patiently..
Only for info.


____________
[Optimized project applications, for to increase your PC performance (double RAC)!][Overview of abbreviations, which are used often in forum and their meaning.]
ID: 971951 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 971952 - Posted: 20 Feb 2010, 1:19:25 UTC

Guess this weekend sucks for seti....

2/19/2010 5:17:07 PM SETI@home Reporting 384 completed tasks, not requesting new tasks
2/19/2010 5:17:29 PM Project communication failed: attempting access to reference site
2/19/2010 5:17:30 PM Internet access OK - project servers may be temporarily down.
2/19/2010 5:17:32 PM SETI@home Scheduler request failed: Couldn't connect to server

Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 971952 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 971962 - Posted: 20 Feb 2010, 1:30:49 UTC - in response to Message 971952.  

Guess this weekend sucks for seti....

2/19/2010 5:17:07 PM SETI@home Reporting 384 completed tasks, not requesting new tasks
2/19/2010 5:17:29 PM Project communication failed: attempting access to reference site
2/19/2010 5:17:30 PM Internet access OK - project servers may be temporarily down.
2/19/2010 5:17:32 PM SETI@home Scheduler request failed: Couldn't connect to server


Dittos here - has anybody a clue as to why this is happening? It's been several days, and not a hint as to what is going on.....
ID: 971962 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 971979 - Posted: 20 Feb 2010, 1:50:09 UTC - in response to Message 971935.  
Last modified: 20 Feb 2010, 1:51:15 UTC

Until about 2am PST, Sure, I've got work, After that, I doubt It highly. It's going to be colder than usual in the morning here(I leave the heater off at night since I'm the only living person around here anymore). Mom doesn't need anything as She's happy on Her shelf in My closet for eternity. :D



we got snow in the forcast for early next week, so it looks like i will be in the cold with you. it's just me and my doberman, i pitty the person who tries and rip me off. i'm sure the staff went home, so this is it for the week end, with just noise on the inlet pipe of clients trying to upload, but no one is home.
ID: 971979 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 971982 - Posted: 20 Feb 2010, 1:53:51 UTC - in response to Message 971962.  

Guess this weekend sucks for seti....

2/19/2010 5:17:07 PM SETI@home Reporting 384 completed tasks, not requesting new tasks
2/19/2010 5:17:29 PM Project communication failed: attempting access to reference site
2/19/2010 5:17:30 PM Internet access OK - project servers may be temporarily down.
2/19/2010 5:17:32 PM SETI@home Scheduler request failed: Couldn't connect to server


Dittos here - has anybody a clue as to why this is happening? It's been several days, and not a hint as to what is going on.....

Nope and It was happening before the last outage and when I said something others said I was wrong, turns out I was right(told ya so), But still there is no idea where the blockage comes from as Seti doesn't seem to be getting anything and We don't get acks for uploads and can't report.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 971982 · Report as offensive
Profile Lint trap

Send message
Joined: 30 May 03
Posts: 871
Credit: 28,092,319
RAC: 0
United States
Message 971988 - Posted: 20 Feb 2010, 2:09:45 UTC
Last modified: 20 Feb 2010, 2:10:25 UTC

Pathping is reporting some packet losses in the San Jose area.

The command in XP is

C:\pathping 208.68.240.16 boinc2.ssl.berkeley.edu

The ip address is the upload server (from my hosts file). You need both parts - the ip and the name. Pathping /? for options.

The losses aren't huge pct-wise, but they are consistent.

Martin
ID: 971988 · Report as offensive
Bob Giel
Volunteer tester

Send message
Joined: 11 Jan 04
Posts: 76
Credit: 5,419,128
RAC: 0
United States
Message 971993 - Posted: 20 Feb 2010, 2:15:44 UTC

Had over 100 uploads at 15:30 Central time. It is now 20:14 Central time and all uploads have completed. The "ready to upload" feature is still having problems connecting. I'm just letting the system do what it does.
ID: 971993 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30608
Credit: 53,134,872
RAC: 32
United States
Message 971998 - Posted: 20 Feb 2010, 2:24:01 UTC

Eric:

Thanks for the update.

FYI: While the recovery from the A/C outage was happening my Mac 10.5.8 was not having problems reporting work units. My PC XP Pro was timing out. I suspect that the Mac has a better TCP/IP stack. My pointing this out is I don't know if there is anything you can tune on your side to assist in a recovery for the PC crowd. If both sides don't time out at about the same time, that will clog the pipes and make it more painful for everyone.

Hope the rain isn't too fast and furious up there. About to start here in LA LA land.

ID: 971998 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 972024 - Posted: 20 Feb 2010, 3:57:14 UTC - in response to Message 971988.  

Pathping is reporting some packet losses in the San Jose area.

The command in XP is

C:\pathping 208.68.240.16 boinc2.ssl.berkeley.edu

The ip address is the upload server (from my hosts file). You need both parts - the ip and the name. Pathping /? for options.

The losses aren't huge pct-wise, but they are consistent.

Martin

Thanks Martin, I never knew about pathping, Here's My results using XP64:
Microsoft Windows [Version 5.2.3790]
(C) Copyright 1985-2003 Microsoft Corp.

C:\Documents and Settings\Administrator.PC1>pathping 208.68.240.16 boinc2.ssl.berkeley.edu

Tracing route to boinc2.ssl.berkeley.edu [208.68.240.18]
over a maximum of 30 hops:
  0  pc1.westell.com [192.168.1.45]
  1  dslrouter.westell.com [192.168.1.1]
  2  L100.LSANCA-DSL-35.verizon-gni.net [71.105.32.1]
  3  9-0-2935.LSANCA-LCR-09.verizon-gni.net [130.81.136.14]
  4  so-4-0-0-0.LAX01-BB-RTR1.verizon-gni.net [130.81.28.72]
  5  0.so-6-3-0.XT1.LAX9.ALTER.NET [152.63.10.153]
  6  0.ge-7-1-0.XL3.SJC7.ALTER.NET [152.63.48.254]
  7  POS6-0-0.GW4.SJC7.ALTER.NET [152.63.48.241]
  8  teliasonera-test-gw.customer.alter.net [157.130.215.70]
  9  hurricane-113209-sjo-bb1.c.telia.net [213.248.86.54]
 10  64.71.140.42
 11  208.68.243.254
 12  boinc2.ssl.berkeley.edu [208.68.240.18]

Computing statistics for 300 seconds...
            Source to Here   This Node/Link
Hop  RTT    Lost/Sent = Pct  Lost/Sent = Pct  Address
  0                                           pc1.westell.com [192.168.1.45]
                                0/ 100 =  0%   |
  1    1ms     0/ 100 =  0%     0/ 100 =  0%  dslrouter.westell.com [192.168.1.1]
                                0/ 100 =  0%   |
  2   37ms     0/ 100 =  0%     0/ 100 =  0%  L100.LSANCA-DSL-35.verizon-gni.net [71.105.32.1]
                                0/ 100 =  0%   |
  3   38ms     0/ 100 =  0%     0/ 100 =  0%  9-0-2935.LSANCA-LCR-09.verizon-gni.net [130.81.136.14]
                                0/ 100 =  0%   |
  4   40ms     2/ 100 =  2%     2/ 100 =  2%  so-4-0-0-0.LAX01-BB-RTR1.verizon-gni.net [130.81.28.72]
                                0/ 100 =  0%   |
  5   42ms     1/ 100 =  1%     1/ 100 =  1%  0.so-6-3-0.XT1.LAX9.ALTER.NET [152.63.10.153]
                                0/ 100 =  0%   |
  6   50ms     1/ 100 =  1%     1/ 100 =  1%  0.ge-7-1-0.XL3.SJC7.ALTER.NET [152.63.48.254]
                                0/ 100 =  0%   |
  7   48ms     0/ 100 =  0%     0/ 100 =  0%  POS6-0-0.GW4.SJC7.ALTER.NET [152.63.48.241]
                                0/ 100 =  0%   |
  8   59ms     1/ 100 =  1%     1/ 100 =  1%  teliasonera-test-gw.customer.alter.net [157.130.215.70]
                                0/ 100 =  0%   |
  9   54ms     1/ 100 =  1%     1/ 100 =  1%  hurricane-113209-sjo-bb1.c.telia.net [213.248.86.54]
                                0/ 100 =  0%   |
 10   57ms     0/ 100 =  0%     0/ 100 =  0%  64.71.140.42
                                5/ 100 =  5%   |
 11   57ms     5/ 100 =  5%     0/ 100 =  0%  208.68.243.254
                                8/ 100 =  8%   |
 12   55ms    13/ 100 = 13%     0/ 100 =  0%  boinc2.ssl.berkeley.edu [208.68.240.18]

Trace complete.

C:\Documents and Settings\Administrator.PC1>

The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 972024 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 972030 - Posted: 20 Feb 2010, 4:36:04 UTC
Last modified: 20 Feb 2010, 4:46:36 UTC


C:\>pathping 208.68.240.16 boinc2.ssl.berkeley.edu

Tracing route to boinc2.ssl.berkeley.edu [208.68.240.18]
over a maximum of 30 hops:
  0  SkullStation [192.168.1.113]
  1  ORION [192.168.1.1]
  2  cab1-1.1scom.net [66.182.x.x]
  3  DSL2-5.1scom.net [66.182.x.x]
  4  Texas-Independent-Energy6328-custidna.cust-rtr.swbell.net [151.164.76.189]
  5     *     bb2-p7-1.rcsntx.sbcglobal.net [151.164.190.248]
  6  ppp-151-164-52-78.rcsntx.swbell.net [151.164.52.78]
  7     *     asn6939-he.eqdltx.sbcgobal.net [151.164.248.150]
  8  10gigabitethernet1-2.core1.lax1.he.net [72.52.92.57]
  9  10gigabitethernet1-3.core1.pao1.he.net [72.52.92.21]
 10     *     64.71.140.42
 11  208.68.243.254
 12  boinc2.ssl.berkeley.edu [208.68.240.18]

Computing statistics for 300 seconds...
            Source to Here   This Node/Link
Hop  RTT    Lost/Sent = Pct  Lost/Sent = Pct  Address
  0                                           SkullStation [192.168.1.113]
                                0/ 100 =  0%   |
  1    0ms     0/ 100 =  0%     0/ 100 =  0%  ORION [192.168.1.1]
                                0/ 100 =  0%   |
  2   14ms     0/ 100 =  0%     0/ 100 =  0%  cab1-1.1scom.net [66.182.x.x]
                                0/ 100 =  0%   |
  3   15ms     0/ 100 =  0%     0/ 100 =  0%  DSL2-5.1scom.net [66.182.x.x]
                                0/ 100 =  0%   |
  4   12ms     0/ 100 =  0%     0/ 100 =  0%  Texas-Independent-Energy6328-custidna.cust-rtr.swbell.net [151.164.76.189]
                                0/ 100 =  0%   |
  5  ---     100/ 100 =100%   100/ 100 =100%  bb2-p7-1.rcsntx.sbcglobal.net [151.164.190.248]
                                0/ 100 =  0%   |
  6  ---     100/ 100 =100%   100/ 100 =100%  ppp-151-164-52-78.rcsntx.swbell.net [151.164.52.78]
                                0/ 100 =  0%   |
  7   13ms     0/ 100 =  0%     0/ 100 =  0%  asn6939-he.eqdltx.sbcgobal.net [151.164.248.150]
                                0/ 100 =  0%   |
  8   51ms     0/ 100 =  0%     0/ 100 =  0%  10gigabitethernet1-2.core1.lax1.he.net [72.52.92.57]
                                0/ 100 =  0%   |
  9   59ms     0/ 100 =  0%     0/ 100 =  0%  10gigabitethernet1-3.core1.pao1.he.net [72.52.92.21]
                                0/ 100 =  0%   |
 10   59ms     0/ 100 =  0%     0/ 100 =  0%  64.71.140.42
                                6/ 100 =  6%   |
 11   64ms     6/ 100 =  6%     0/ 100 =  0%  208.68.243.254
                                1/ 100 =  1%   |
 12   62ms     7/ 100 =  7%     0/ 100 =  0%  boinc2.ssl.berkeley.edu [208.68.240.18]

Trace complete.

line 5 and 6 don't look good, looks like some loss at the upload server and at the last hop.
ID: 972030 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 972031 - Posted: 20 Feb 2010, 4:38:41 UTC - in response to Message 972030.  
Last modified: 20 Feb 2010, 4:54:29 UTC

C:\>pathping 208.68.240.16 boinc2.ssl.berkeley.edu

Tracing route to boinc2.ssl.berkeley.edu [208.68.240.18]
over a maximum of 30 hops:
  0  SkullStation [192.168.1.113]
  1  ORION [192.168.1.1]
  2  cab1-1.1scom.net [66.182.x.x]
  3  DSL2-5.1scom.net [66.182.x.x]
  4  Texas-Independent-Energy6328-custidna.cust-rtr.swbell.net [151.164.76.189]

  5     *     bb2-p7-1.rcsntx.sbcglobal.net [151.164.190.248]
  6  ppp-151-164-52-78.rcsntx.swbell.net [151.164.52.78]
  7     *     asn6939-he.eqdltx.sbcgobal.net [151.164.248.150]
  8  10gigabitethernet1-2.core1.lax1.he.net [72.52.92.57]
  9  10gigabitethernet1-3.core1.pao1.he.net [72.52.92.21]
 10     *     64.71.140.42
 11  208.68.243.254
 12  boinc2.ssl.berkeley.edu [208.68.240.18]

Computing statistics for 300 seconds...
            Source to Here   This Node/Link
Hop  RTT    Lost/Sent = Pct  Lost/Sent = Pct  Address
  0                                           SkullStation [192.168.1.113]
                                0/ 100 =  0%   |
  1    0ms     0/ 100 =  0%     0/ 100 =  0%  ORION [192.168.1.1]
                                0/ 100 =  0%   |
  2   14ms     0/ 100 =  0%     0/ 100 =  0%  cab1-1.1scom.net [66.182.x.x]
                                0/ 100 =  0%   |
  3   15ms     0/ 100 =  0%     0/ 100 =  0%  DSL2-5.1scom.net [66.182.x.x]
                                0/ 100 =  0%   |
  4   12ms     0/ 100 =  0%     0/ 100 =  0%  Texas-Independent-Energy6328-custidna.cust-rtr.swbell.net [151.164.76.189]
                                0/ 100 =  0%   |
  5  ---     100/ 100 =100%   100/ 100 =100%  bb2-p7-1.rcsntx.sbcglobal.net [151.164.190.248]
                                0/ 100 =  0%   |
  6  ---     100/ 100 =100%   100/ 100 =100%  ppp-151-164-52-78.rcsntx.swbell.net [151.164.52.78]
                                0/ 100 =  0%   |
  7   13ms     0/ 100 =  0%     0/ 100 =  0%  asn6939-he.eqdltx.sbcgobal.net [151.164.248.150]
                                0/ 100 =  0%   |
  8   51ms     0/ 100 =  0%     0/ 100 =  0%  10gigabitethernet1-2.core1.lax1.he.net [72.52.92.57]
                                0/ 100 =  0%   |
  9   59ms     0/ 100 =  0%     0/ 100 =  0%  10gigabitethernet1-3.core1.pao1.he.net [72.52.92.21]
                                0/ 100 =  0%   |
 10   59ms     0/ 100 =  0%     0/ 100 =  0%  64.71.140.42 <<Hurricane Electric
                                6/ 100 =  6%   |
 11   64ms     6/ 100 =  6%     0/ 100 =  0%  208.68.243.254 <<Seti@Home
                                1/ 100 =  1%   |
 12   62ms     7/ 100 =  7%     0/ 100 =  0%  boinc2.ssl.berkeley.edu [208.68.240.18]

Trace complete.


I also had to remove the [quote] tags too.

I thought I'd make It more readable, So I added a couple of [pre] and [size] tags to Your output.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 972031 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 972033 - Posted: 20 Feb 2010, 4:48:45 UTC

I just had a thought and I could be wrong, But could somebody be throttling the project like We're a bunch of P2P file transfers?
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 972033 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 972045 - Posted: 20 Feb 2010, 5:09:19 UTC - in response to Message 972033.  

is there anyway to encript the transactions so they aren't blocked???
ID: 972045 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 15 · Next

Message boards : Technical News : Out of the fire and into the pit of sulfuric acid. (Feb 19, 2010)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.