Panic Mode On (58) Server problems?

Message boards : Number crunching : Panic Mode On (58) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 11 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1160285 - Posted: 8 Oct 2011, 18:33:59 UTC - in response to Message 1160277.  

Why would it start now? Never happened before.

We can't answer the 'why' until we've worked out what 'it' is.

For starters, have you tried the basic network tests (ping and tracert) to the upload server? Check

ping 208.68.240.16
tracert setiboincdata.ssl.berkeley.edu
ID: 1160285 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1160286 - Posted: 8 Oct 2011, 18:34:16 UTC - in response to Message 1160283.  

Why would it start now? Never happened before.


It never happened before to others either, until it suddenly did. My car haven't broken down either before, but it will sooner or later.

Not very helpful! If you are right, then I have no idea how to do anything about it. Other threads are gobbledegook to me!

ID: 1160286 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1160293 - Posted: 8 Oct 2011, 18:57:39 UTC - in response to Message 1160264.  

Are there any plans of making version 6.12 usable by high end hosts before trying to roll to V7?

I presume you're referring to the increased backoffs in BOINC 6.12.x, and as that's a fundamental design of the series I don't expect the BOINC devs to modify it. They're in bugfixing only mode for that branch, and of course assuming that because 6.12 is the recommended version it's reasonable to consider its effects as if all users have adopted the recommendation.

The issue isn't really the backoffs so much as work delivery here, and I hope that some progress in being able to deliver what is assigned can be made before S@h v7 is rolled out. I don't know what's possible within the University of California hierarchy though.
                                                                   Joe
ID: 1160293 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1160294 - Posted: 8 Oct 2011, 18:59:29 UTC - in response to Message 1160285.  

Why would it start now? Never happened before.

We can't answer the 'why' until we've worked out what 'it' is.

For starters, have you tried the basic network tests (ping and tracert) to the upload server? Check

ping 208.68.240.16
tracert setiboincdata.ssl.berkeley.edu

You are dealing with an idiot here.
How do I do that and what do any results mean?

ID: 1160294 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1160296 - Posted: 8 Oct 2011, 19:03:13 UTC - in response to Message 1160263.  

... when v7 goes live on SETI, the project will get about a quarter of a million new hosts in the first month - at least, as far as the application_details are concerned. We really ought to prevail on David to consider that number before the event.....

And v7 should have CPU, OpenCL ATI, CUDA NVIDIA, and maybe OpenCL NVIDIA application versions, so multiply that quarter of a million by maybe 2 to get the effective "active applications" count...
                                                                   Joe
ID: 1160296 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1160300 - Posted: 8 Oct 2011, 19:12:28 UTC - in response to Message 1160294.  

I found out how to do it!
What next?

Check

ping 208.68.240.16
tracert setiboincdata.ssl.berkeley.edu

You are dealing with an idiot here.
How do I do that and what do any results mean?[/quote]
Microsoft Windows [Version 6.0.6002]
Copyright (c) 2006 Microsoft Corporation. All rights reserved.

C:\Windows\system32>ping 208.68.240.16

Pinging 208.68.240.16 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 208.68.240.16:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

C:\Windows\system32>tracert setiboincdata.ssl.berkeley.edu

Tracing route to setiboincdata.ssl.berkeley.edu [208.68.240.16]
over a maximum of 30 hops:

1 57 ms 99 ms 99 ms 192.168.254.254
2 46 ms 48 ms 49 ms anchor-hg-3-lo100.router.demon.net [194.159.161.
34]
3 47 ms 47 ms 47 ms anchor-access-4-s2010.router.demon.net [194.217.
23.37]
4 48 ms 46 ms 47 ms gi7-0-0-dar3.lah.uk.cw.net [194.159.161.90]
5 47 ms 48 ms 46 ms xe-0-1-0-xur1.lns.uk.cw.net [193.195.25.70]
6 52 ms 48 ms 48 ms lonap.he.net [193.203.5.128]
7 134 ms 130 ms 130 ms 10gigabitethernet6-3.core1.ash1.he.net [72.52.92
.137]
8 207 ms 210 ms 201 ms 10gigabitethernet7-4.core1.pao1.he.net [184.105.
213.177]
9 * * * Request timed out.
10 * * * Request timed out.
11 * * * Request timed out.
12 * * * Request timed out.
13 * * * Request timed out.
14 * * * Request timed out.
15 * * * Request timed out.
16 * * * Request timed out.
17 * * * Request timed out.
18 * * * Request timed out.
19 * * * Request timed out.
20 * * * Request timed out.
21 * * * Request timed out.
22 * * * Request timed out.
23 * * * Request timed out.
24 * * * Request timed out.
25 * * * Request timed out.
26 * * * Request timed out.
27 * * * Request timed out.
28 * * * Request timed out.
29 * * * Request timed out.
30 * * * Request timed out.

Trace complete.

C:\Windows\system32>

ID: 1160300 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1160301 - Posted: 8 Oct 2011, 19:15:28 UTC - in response to Message 1160267.  

Are there any plans of making version 6.12 usable by high end hosts before trying to roll to V7?

Versions of what?

The discussion with Joe was about the SETI science application - currently at v6.03 for CPUs, v6.08/09/10 for CUDA GPUs.

Version 6.12 sounds like a BOINC version number - I'm not having any problems with BOINC v6.12.34, though I don't run what you would call a 'high end host'.

What issues make it unusable? I haven't seen any reported on the boinc_alpha mailing list: that would be a better venue for discussing boinc issues than here, though I can pass on messages if needed.


ATM only 5 of the 40 "Top Hosts" are running v6.12 the rest are running v6.10.

I think the main problem is the increased back off times, Its no good looking at a backed off download queue when when your CPU's - GPU's are sitting back scratching their nether regions.

What could show interesting figures is if the "in progress" figures on the "tasks" page for each machine showed how many in progress tasks were still awaiting download.



Kevin


ID: 1160301 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1160303 - Posted: 8 Oct 2011, 19:19:44 UTC - in response to Message 1160294.  

Why would it start now? Never happened before.

We can't answer the 'why' until we've worked out what 'it' is.

For starters, have you tried the basic network tests (ping and tracert) to the upload server? Check

ping 208.68.240.16
tracert setiboincdata.ssl.berkeley.edu

You are dealing with an idiot here.
How do I do that and what do any results mean?

I refuse to accept that I'm dealing with an idiot. I may very well be dealing with someone who has expertise in some subject area different from computing, but that's not the same thing at all.

OK, one at a time.

First open a "Command Prompt" window - similar to what we used to use as a 'DOS prompt'. There are many ways of doing that, so - since I don't know whether you'll be using your Vista machine or one of your XP machines for this - here's a way which should work with the default settings on any of them.

Click the 'Start' button, click on 'All programs'. From the list, click on 'Accessories' (yellow folder icon), and you should see 'Command Prompt' near the top of the (alphabetical) list. Click it.

In the command prompt window which opens, type that first line I gave you, exactly as it stands:

ping 208.68.240.16

and press the return key at the end. Then wait.

After a few seconds, you should see four lines of results.

Either: lines of numbers, starting with 'Reply from'. That's good.
Or: "Request timed out". That's bad.

Which do you get?
ID: 1160303 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1160306 - Posted: 8 Oct 2011, 19:21:48 UTC - in response to Message 1160303.  

Either: lines of numbers, starting with 'Reply from'. That's good.
Or: "Request timed out". That's bad.

Which do you get?

All bad then!

ID: 1160306 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1160311 - Posted: 8 Oct 2011, 19:39:45 UTC - in response to Message 1160300.  

I found out how to do it!
What next?

There! I said I wasn't dealing with an idiot - you beat me to it :-)

Both of those are classic symptoms of the Hurricane Electric connection problem - especially, since you get the line referencing "10gigabitethernet7-4.core1.pao1.he.net", and nothing but asterisks below that.

You could wait until Jeff's new memory has arrived, and until they've figured out a way to break into the security cage - or, since we're on a roll, you could try using a proxy.

Look in the 'Temporary Fix...' thread, and see what proxies have been mentioned as working recently. The newest one (at the time of writing) seems to be

216.24.193.211:8080

Open BOINC Manager, in Advanced View. Assuming it's one of your BOINC v6.12.34 machines, go to the Tools menu, and click on 'Display and network options'.

Click on the third tab, 'HTTP Proxy'.

Check 'Connect via HTTP proxy server'
Put 216.24.193.211 in the address box.
Put 8080 in the Port box.

(that's the two halves of the proxy line above, splitting it at the ':'. If you try a different proxy, do the same thing - splitting it into 'address' and 'port' - with any other proxy description)

Leave the rest blank, and click 'OK'. Now retry your uploads. Judging by what people have said in the threads, you may need to experiment with different proxies until you find one which works for you. It may also be slow, but if it works at all, that's better than nothing.
ID: 1160311 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1160312 - Posted: 8 Oct 2011, 19:40:22 UTC

I was down to my last work unit wich was an AP couldnt get any work all day. Now this might be a coincedence but I pinged per John instructions and now I have a ton of work downloading, But I also finished that AP at the same time.

What ever happend Im happy.
[/quote]

Old James
ID: 1160312 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1160314 - Posted: 8 Oct 2011, 19:44:32 UTC - in response to Message 1160303.  

Richard, In the first window that opens, shouldn't you type cmd, then hit enter? I forgot that step and then when I tried a ping or tracert in the command prompt window it ran then disappeared as soon as it finished.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1160314 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1160315 - Posted: 8 Oct 2011, 19:46:15 UTC - in response to Message 1160311.  

It may also be slow, but if it works at all, that's better than nothing.

Wow! I bow down to genius. It worked! (at least for now)
Many, many thanks.
Most have now uploaded and I have reported them. Just waiting for the rest..........
All gone from my main machine. Now I have to check the other one.
That is brilliant.
A truly grateful idiot here.
I rely on these things to work and am utterly lost without someone to hold my hand.

ID: 1160315 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1160316 - Posted: 8 Oct 2011, 19:50:52 UTC - in response to Message 1160314.  

Richard, In the first window that opens, shouldn't you type cmd, then hit enter? I forgot that step and then when I tried a ping or tracert in the command prompt window it ran then disappeared as soon as it finished.

No, you're thinking of typing CMD in the 'Run...' prompt direct from the start menu. That's one of the alternative ways into the Command Window, but Run... isn't displayed by default in Vista or Windows 7 (you have to customise the start menu first), which is why I avoided it.

When I go into 'step by step' mode like that, I'm sitting at a desk with both an XP and a Windows 7 computer in front of me, and working through every step on both machines as I type. It's slow, but it catches the gotchas before they reach the user I'm advising.
ID: 1160316 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1160317 - Posted: 8 Oct 2011, 19:51:26 UTC

Well, that's both machines clear and back uploading and maybe even downloading.
That set of instructions needs writing in a permanent message for everyone.
I feel happy again :)


ID: 1160317 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1160320 - Posted: 8 Oct 2011, 19:57:27 UTC - in response to Message 1160315.  

It may also be slow, but if it works at all, that's better than nothing.

It worked! (at least for now)

Good to hear. Keep an eye on these message boards, and remember to switch off your proxy access as soon as the router is fixed. All you have to do is uncheck 'Connect via HTTP proxy server' - you can leave the numbers in there, in case you need to use it again.

Hopefully, we won't need a permanent message, because it'll turn out only to be a temporary problem - but the instructions are here for now.
ID: 1160320 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1160325 - Posted: 8 Oct 2011, 20:11:48 UTC - in response to Message 1160301.  

Are there any plans of making version 6.12 usable by high end hosts before trying to roll to V7?

Versions of what?

The discussion with Joe was about the SETI science application - currently at v6.03 for CPUs, v6.08/09/10 for CUDA GPUs.

Version 6.12 sounds like a BOINC version number - I'm not having any problems with BOINC v6.12.34, though I don't run what you would call a 'high end host'.

What issues make it unusable? I haven't seen any reported on the boinc_alpha mailing list: that would be a better venue for discussing boinc issues than here, though I can pass on messages if needed.


ATM only 5 of the 40 "Top Hosts" are running v6.12 the rest are running v6.10.

I think the main problem is the increased back off times, Its no good looking at a backed off download queue when when your CPU's - GPU's are sitting back scratching their nether regions.

What could show interesting figures is if the "in progress" figures on the "tasks" page for each machine showed how many in progress tasks were still awaiting download.




This is exactly the issue.. last I looked only 1 of the top 40 hosts was running 6.12.

The problem with the backoffs for the faster crunchers under load(this may have not appeared to alpha or beta testing) it is impossible to maintain a working amount of work units without constant baby-sitting and button abuse. When the backoffs increase over 2-3 hours, the large hosts will run dry. When it gets to 10 hours plus, you might as well turn it off for the first couple of days after an outtage. Even a minor one.

It may help the lesser machines fill as a result, since they no longer have to contend with the faster ones.
Janice
ID: 1160325 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1160331 - Posted: 8 Oct 2011, 20:31:12 UTC

Long back-offs don't only affect the big hitters, they can affect those with small windows of opportunity. If you've only got an hour of internet access available in which to do all your uploads and downloads and are "immediately" hit with a two hour back-off its a total waste of time.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1160331 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30608
Credit: 53,134,872
RAC: 32
United States
Message 1160334 - Posted: 8 Oct 2011, 20:38:10 UTC
Last modified: 8 Oct 2011, 20:39:26 UTC

The back off can have a more insidious effect on those who crunch more than one project. Scheduler decides it needs work and asks Seti which replies no work plus back off. Scheduler still wants work but knows it isn't coming from Seti because of the back off so it grabs another project and that's all she wrote. The crunch percentages now mean nothing. Unfortunately BOINC was designed assuming that all projects always have work available and when it was realized that wouldn't be the case a lot of stuff to deal with it was hacked on. The hacks are fighting each other.

Personally I think it is time to retire this BOINC and go with a brand new BOINC2.
ID: 1160334 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1160390 - Posted: 8 Oct 2011, 22:29:39 UTC

I had to abuse the button but finally got every thing dlownloaded. WOW I have never seen so many VLAR's in my life. All except two on the cpu are VLARS. and a lot of 1s at that.

Well its work so who am I to complain.
[/quote]

Old James
ID: 1160390 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (58) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.