GALIFREAN 的帖子

81) 留言板 : Number crunching : Upload/Download problems? Please see here (消息 149738)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
Check this out from the Classic status page:
Last updated: 15:45:00 (UTC) on 2005-08-10


This web page gets updated automatically every five minutes.

--------------------------------------------------------------------------------
Data Server Status
The data server is up and running, but outgoing data rate is low
Right now the data server machine is up, but its outgoing data rate is very low. This can be due to many factors, the most likely being our server's dedicated connection to the outside world is currently broken. Please try connecting later.

For more information about previous outages, please check our Technical News page.

Wonder if that's the problem.

Seems like the Classic team is more forthcoming to their constituants!

BTW, I bolded up the important part.

82) 留言板 : Number crunching : Oh My! Another outage! (消息 149620)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
The Doctor may be suffering from hangovers with the DNS changes 14 days or so ago. Might be wise for him to flush his dns cache perhaps? That has worked for others. If that fails he could use a proxy. I'll find the thread with the details in. That may help too....as it has for others.

Edit: http://setiathome.berkeley.edu/forum_thread.php?id=18042 has some proxy info in it.

Regards


Hi Tigher, just posted on the upload/download page about this. The Doc is pretty frustrated, hope we don't lose him and all the others ahving problems.

I sent an email to cogent. Maybe that will help. They might know about this particular glitch.
83) 留言板 : Number crunching : Upload/Download problems? Please see here (消息 149615)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
Did some more digging, pretty sure it's a problem with cogent at Los Angeles. Data server unreachable from LA and DC. Dns not registered in cogent database. Sent email to dns@cogentco.com. Hopefully they will look into it soon. Maybe it'll persuade them to move faster if they got more than one.
84) 留言板 : Number crunching : Oh My! Another outage! (消息 149612)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
Ok.I'm going to send them an email with the particulars.
85) 留言板 : Number crunching : Oh My! Another outage! (消息 149607)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
Does anyone know where the data server is located? I'm looking at a network map and cogent has some pretty goofy routes.
86) 留言板 : Number crunching : Oh My! Another outage! (消息 149605)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
Looking Glass Results: Los Angeles
--------------------------------------------------------------------------------
Query: trace
Addr: 66.28.250.125

trace 66.28.250.125

Type escape sequence to abort.
Tracing the route to 66.28.250.125

1 g10-0-224.core01.lax01.atlas.cogentco.com (66.250.4.5) 4 msec 4 msec 0 msec
2 p14-0.core01.sjc01.atlas.cogentco.com (66.28.4.74) 16 msec 12 msec 12 msec
3 p4-0.core01.sfo01.atlas.cogentco.com (66.28.4.93) 16 msec 16 msec 12 msec
4 UC-Berkeley.demarc.cogentco.com (66.250.4.74) 16 msec 16 msec 16 msec
5 * * *
6 * * *
7 * * *
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
_______

Didn't wait for it to finish, but same results. Looks like the either end of the country can't connect.

87) 留言板 : Number crunching : Oh My! Another outage! (消息 149601)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
Click BACK to return to the Looking Glass page


Looking Glass Results: Washington, DC


--------------------------------------------------------------------------------


Query: trace
Addr: 66.28.250.125

trace 66.28.250.125

Type escape sequence to abort.
Tracing the route to 66.28.250.125

1 f29.ba01.b005944-0.dca01.atlas.cogentco.com (66.250.56.189) 0 msec 4 msec 4 msec
2 g0-7.core01.dca01.atlas.cogentco.com (66.28.6.189) 4 msec 4 msec 4 msec
3 p6-0.core01.jfk02.atlas.cogentco.com (66.28.4.82) 8 msec 8 msec 8 msec
4 p15-0.core02.jfk02.atlas.cogentco.com (66.28.4.14) 8 msec 8 msec 8 msec
5 p12-0.core01.mci01.atlas.cogentco.com (154.54.3.202) 44 msec 44 msec 44 msec
6 p10-0.core02.sfo01.atlas.cogentco.com (66.28.4.209) 80 msec 80 msec 76 msec
7 p15-0.core01.sfo01.atlas.cogentco.com (66.28.4.69) 168 msec 144 msec 212 msec
8 UC-Berkeley.demarc.cogentco.com (66.250.4.74) 84 msec 80 msec 80 msec
9 * * *
10 * * *
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *

88) 留言板 : Number crunching : Oh My! Another outage! (消息 149598)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
Standby for some interesting results.
89) 留言板 : Number crunching : Oh My! Another outage! (消息 149593)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
Ned, what do you think of the results on my trace? Something fishy towards the end, no?

Not really.

Reverse-DNS problems are pretty common, and unless we're talking mail servers, things will generall work with broken rDNS.

The ping times are reasonable.


Ok. Thanks. Just asking cause I don't remember seeing that come up before.


ok, last resort here.
I am powering off and clearing all dns and routing tables here at the lab and rebot from a cold start all routers, hubs, and computers. If that doesn't do it and everything else still works [like it does right now with no problems] them I am dumping boinc!

doc


Doc, it's not BOINC! My other computer is running Seti, Einstien and the only problem was with getting to Seti. Had no problems with Einstien at the same time.
90) 留言板 : Number crunching : Oh My! Another outage! (消息 149591)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
Guys, I've been sort of following along here (mostly in my sleep) so if this isn't pertinent, please disregard and slap me around after I wake up. Running traceroute from my Linux machine to setiboincdata.ssl.berkeley.edu, it seems to stop at the same point mentioned,
"8 UC-Berkeley.demarc.cogentco.com (66.250.4.74) 70.477 ms 66.128 ms 67.650 ms"
but I don't have, or haven't had any connect problems at all.


That's strange. That's when I had problems, didn't clear up till I was able to ping the data server again.

And no slaps for you, we had a heck of mess to pick up the last time. :)
91) 留言板 : Number crunching : Oh My! Another outage! (消息 149590)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
Ned, what do you think of the results on my trace? Something fishy towards the end, no?

Not really.

Reverse-DNS problems are pretty common, and unless we're talking mail servers, things will generall work with broken rDNS.

The ping times are reasonable.


Ok. Thanks. Just asking cause I don't remember seeing that come up before.
92) 留言板 : Number crunching : Oh My! Another outage! (消息 149587)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
Ned, what do you think of the results on my trace? Something fishy towards the end, no?
93) 留言板 : Number crunching : Oh My! Another outage! (消息 149579)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
ok, here is the complete tracert log:

Unable to resolve target system name setiboincdata.ssl.berley.edu.

C:Documents and SettingsDr. D.H.Chevalier>tracert 66.28.250.125

Tracing route to 66.28.250.125 over a maximum of 30 hops

1

This is what I get:
08/10/05 02:02:46 Fast traceroute setiboincdata.ssl.berkeley.edu
Trace setiboincdata.ssl.berkeley.edu (66.28.250.125) ...
1 192.168.2.1 1ms 0ms 1ms TTL: 0 (No rDNS)
2 172.31.255.252 36ms 27ms 46ms TTL: 0 (No rDNS)
3 192.168.24.53 27ms 27ms 28ms TTL: 0 (No rDNS)
4 63.210.100.197 27ms 28ms 30ms TTL: 0 (ge-8-0-169.ipcolo2.Chicago1.Level3.net ok)
5 4.68.101.1 28ms 27ms 26ms TTL: 0 (ae-1-51.bbr1.Chicago1.Level3.net ok)
6 209.244.8.10 28ms 26ms 28ms TTL: 0 (so-6-0-0.edge1.Chicago1.Level3.net ok)
7 4.68.127.130 29ms 28ms 28ms TTL: 0 (No rDNS)
8 154.54.2.237 29ms 26ms 29ms TTL: 0 (p12-0.core01.ord01.atlas.cogentco.combogus rDNS: host not found [authoritative])
9 66.28.4.185 76ms 74ms 75ms TTL: 0 (p5-0.core01.sfo01.atlas.cogentco.com bogus rDNS: host not found [authoritative])
10 66.250.4.74 76ms 75ms 75ms TTL: 0 (UC-Berkeley.demarc.cogentco.combogus rDNS: host not found [authoritative])
11 66.28.250.125 77ms 78ms 78ms TTL:243 (setiboincdata.ssl.berkeley.edu ok)














94) 留言板 : Number crunching : Oh My! Another outage! (消息 149573)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
[quote
on my tracert, it stops at UC-Berkeley.demarc.cogentco.com [66.250.4.74

doc[/quote]


I believe that's where it has been stopping for everyone else who's having a problem. I was having the same thing 3 days ago. Sun evening/Mon morning, it cleared up.
95) 留言板 : Number crunching : Oh My! Another outage! (消息 149569)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
[quote
It's not on my end. I have too many computers using differant connection on seperate t3's so it IS a bonic problem.

doc
[/quote]


No, it's not on your end because you're able to reach the scheduler. It's the data server that's not reachable. A whole bunch of other people are having the same problem. There are apparently some servers in between you and the data server that are "black hole"-ing the connection. The easiest work around is to use a proxy for now, and get on the horn with your isp, and tell them you are not going to pay big bucks not to be able to connect. They usually jump for big users like you.

The data server is a different location than the scheduler.
96) 留言板 : Number crunching : Oh My! Another outage! (消息 149555)
发表于:10 Aug 2005 作者: GALIFREAN
Post:

I am about to dump some 92 work units down the drain and regain computer power if Bonic doesn't get it's act together!


All I am running is SETI

doc

It might be good to look at the log for any hints as to why you have work waiting to upload.

It sounds like you're getting to the scheduler (on Berkeley's wire) but not to the upload/download servers (on the Cogent wire).


here is a portion of my message log:

8/10/2005 12:00:30 AM||schedule_cpus: must schedule
8/10/2005 12:00:31 AM|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
8/10/2005 12:00:32 AM|SETI@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded
8/10/2005 12:14:03 AM|SETI@home|Started upload of 27oc03aa.19746.3280.834636.41_0_0
8/10/2005 12:14:24 AM|SETI@home|Temporarily failed upload of 27oc03aa.19746.3280.834636.41_0_0: -106
8/10/2005 12:14:24 AM|SETI@home|Backing off 1 hours, 45 minutes, and 14 seconds on upload of file 27oc03aa.19746.3280.834636.41_0_0
8/10/2005 12:50:30 AM|SETI@home|Started upload of 26fe05aa.10476.19728.709662.235_3_0
8/10/2005 12:50:52 AM|SETI@home|Temporarily failed upload of 26fe05aa.10476.19728.709662.235_3_0: -106
8/10/2005 12:50:52 AM|SETI@home|Backing off 2 hours, 52 minutes, and 23 seconds on upload of file 26fe05aa.10476.19728.709662.235_3_0
8/10/2005 12:53:47 AM|SETI@home|Started download of 26fe05aa.10476.26800.834662.85
8/10/2005 12:54:09 AM|SETI@home|Temporarily failed download of 26fe05aa.10476.26800.834662.85: -106
8/10/2005 12:54:09 AM|SETI@home|Backing off 2 hours, 35 minutes, and 49 seconds on download of file 26fe05aa.10476.26800.834662.85
8/10/2005 1:00:30 AM||schedule_cpus: time 3600.078125

doc



Sounds like the same problem discussed here. Read back at least a couple days if not from the beginning.

I would try pinging the server to find out if you are able to reach it. If not, there are a whole bunch of proxys in the thread I linked to. And, no offense, there are how-to's there too.
97) 留言板 : Number crunching : Oh My! Another outage! (消息 149552)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
It isn't BOINC problem, it is a SETI BOINC problem. Consider re-allocating the CPU power to other BOINC projects which ARE running well. For me, that is Climate and Einstein, but there are another couple of projects as well.



I am about to dump some 92 work units down the drain and regain computer power if Bonic doesn't get it's act together!



All I am running is SETI

doc


Doc, are your net connection "always on"? I believe there have been some dns problems. Would it be possible for you to cycle your connection?
98) 留言板 : Number crunching : Oh My! Another outage! (消息 149548)
发表于:10 Aug 2005 作者: GALIFREAN
Post:

Dr., not to be presuming anything, but have you looked at the messages as to why you can't upload? There have been a lot of problems, but that seems like a long time not to be able to upload. I don't have as many units as you do, but there has never been more than 24hrs of uploading problems on my end.



Yes, the messages only say, scheduling successful; however, nothing happens and the screen saver says no work unit (I.E. Bonic is idle)

doc



Something sounds familiar to me, but I just can't put my finger on it yet. What about the work screen messages. Does it say "waiting to upload" or something like that?
99) 留言板 : Number crunching : I'm not complaining but..... (消息 149544)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
I’m not complaining but…

It seems to me that with the transitioner backlog now down to 0 hours that the in progress and waiting for validation numbers should be dropping. They are not, in fact those numbers are still rising. Without more information from the developers on what is happening at the Berkeley end a diagnosis is difficult. Are all our client computers simply overwhelming the old Berkeley servers with completed work units? Maybe! If this is so is there another solution other than Berkeley shutting down every Wednesday for 3 hours to do a database backup?[quote]

I observed that also, and agree there might be a problem.

[quote]Why are so many of our brothers and sisters having to resort to proxy’s to contact the Berkeley servers when they were not having problems before? Since there has been no info from Berkeley on this Tigher could you post on whether you and your helpers were able to figure anything out on this and your conclusions? I am very interested in this problem. Thanks!


My (conspiracy)theory is that some isp's are blocking them(Berkeley servers)to regain some bandwidth.

Why did the cheerleaders get their pom-pom’s burned?


I still have mine!!

Bottom line is that I see a great need for one of the developers to post some information officially, on this site concerning problems and what is being planned to address those problems. And I mean posting at least 2 or 3 times per week. Not once every 6 to 8 weeks or longer as has been the custom with little to no real information. My feeling right now is that Berkeley, Boinc/Seti are operating like a person stumbling around in the dark with nowhere to go, no direction and no plan. Matt and JM7 are here almost every night posting and answering questions. They are very informative and greatly appreciated by us on the forums but I am sure they are here on their own time and not being compensated. An official source is needed with timely information on the problems being experienced in the system and the solutions.


Maybe we should all go on strike?

And another thing…the plan to entice the crunchers at Classic to make the move here, now, is not going well. The suggestion is being taken as insulting at the Classic site as well it should be. If Berkeley is indeed waiting for the 2 billion mark for advertising and marketing purposes, maybe we should be going back to Classic and make that goal happen sooner. After all, we benefit from Classic’s demise, not the other way around. Something to think about!


My sentiments exactly, and I'm still doing Classic along with Boinc.

100) 留言板 : Number crunching : Oh My! Another outage! (消息 149528)
发表于:10 Aug 2005 作者: GALIFREAN
Post:
I have 23 computers running or did run boinc. They have been waiting for 7 days to dump and reload work units. every other week this is the same problem.

I am about to dump some 92 work units down the drain and regain computer power if Bonic doesn't get it's act together!




Dr., not to be presuming anything, but have you looked at the messages as to why you can't upload? There have been a lot of problems, but that seems like a long time not to be able to upload. I don't have as many units as you do, but there has never been more than 24hrs of uploading problems on my end.


前 20 · 后面 20


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.