Panic Mode On (58) Server problems? |
![]() |
| log in |
Message boards : Number crunching : Panic Mode On (58) Server problems?
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 11 · Next
| Author | Message |
|---|---|
|
All seems well (aside from the much nicer limits for you power crunchers). The avian friend is happy, and my cache is full..with the correct ETAs I might add. | |
| ID: 1160251 · | |
For any host with more than 20 validations for an app_version, it's nearly inconceivable that APR/estimates anomalies will make much difference. For non-anonymous hosts the APR is sent to the host as the <flops> for that app version, so the ratio limit of 10 would have to be exceeded between two requests to the Scheduler for any capping to take place. 1.01^231 = 9.96 so even 231 near zero validated runtimes wouldn't get into the capping. OTOH assuming the APR was about right before that, that shift of ~10 would put the host on the border of the range where -177 elapsed time exceeded errors could happen. For anonymous hosts where the users are allowing totally inadequate <flops> based on the Whetstone benchmark to be sent by the core client, it is effectively guaranteed that estimates for most GPU processing will be a mess. Those who are getting by because DCF down near 0.02 is enough to compensate for the bad <flops> are merely in danger of DCF going even lower and restricting work fetch, whatever happens to APR matters not since the user has chosen to operate in the zone where the server estimate is always capped. For all host app versions, the first 10 validations are critical. If the APR isn't somewhat reasonable at that point it will take a lot of good runtimes to shift it to a better approximation. That's where the runtime_outlier logic will be helpful, and I too hope the Astropulse validator code will be updated soon. The project gets about 600 new hosts a day (either really new or new hostID), it's not nice to leave them exposed to a known weakness in the system. Joe | |
| ID: 1160253 · | |
For all host app versions, the first 10 validations are critical. If the APR isn't somewhat reasonable at that point it will take a lot of good runtimes to shift it to a better approximation. That's where the runtime_outlier logic will be helpful, and I too hope the Astropulse validator code will be updated soon. The project gets about 600 new hosts a day (either really new or new hostID), it's not nice to leave them exposed to a known weakness in the system. And when v7 goes live on SETI, the project will get about a quarter of a million new hosts in the first month - at least, as far as the application_details are concerned. We really ought to prevail on David to consider that number before the event..... Anonymous_platform hosts return their GPU hardware characteristics in sched_request. I really don't see why that can't be used to seed APR with a first approximation, instead of using a totally irrelevant CPU metric. | |
| ID: 1160263 · | |
|
Are there any plans of making version 6.12 usable by high end hosts before trying to roll to V7? | |
| ID: 1160264 · | |
Are there any plans of making version 6.12 usable by high end hosts before trying to roll to V7? Versions of what? The discussion with Joe was about the SETI science application - currently at v6.03 for CPUs, v6.08/09/10 for CUDA GPUs. Version 6.12 sounds like a BOINC version number - I'm not having any problems with BOINC v6.12.34, though I don't run what you would call a 'high end host'. What issues make it unusable? I haven't seen any reported on the boinc_alpha mailing list: that would be a better venue for discussing boinc issues than here, though I can pass on messages if needed. | |
| ID: 1160267 · | |
|
Still nothing uploading from here. I do not believe Bruno is allowing them. I see his vote monitor function is disabled. | |
| ID: 1160271 · | |
Still nothing uploading from here. I do not believe Bruno is allowing them. I see his vote monitor function is disabled. Uploads works fine for almost everyone else. You're having other problems, perhaps the HE issue. ____________ /The grumpy old Swede. "I'm so old, that 98% of all trees in the forest, are younger than I am" | |
| ID: 1160275 · | |
|
Why would it start now? Never happened before. | |
| ID: 1160277 · | |
Why would it start now? Never happened before. It never happened before to others either, until it suddenly did. My car haven't broken down either before, but it will sooner or later. The HE connection problem hits randomly it seems. Some are hit by it, others never. ____________ /The grumpy old Swede. "I'm so old, that 98% of all trees in the forest, are younger than I am" | |
| ID: 1160283 · | |
Why would it start now? Never happened before. We can't answer the 'why' until we've worked out what 'it' is. For starters, have you tried the basic network tests (ping and tracert) to the upload server? Check ping 208.68.240.16 tracert setiboincdata.ssl.berkeley.edu | |
| ID: 1160285 · | |
Why would it start now? Never happened before. Not very helpful! If you are right, then I have no idea how to do anything about it. Other threads are gobbledegook to me! ____________ | |
| ID: 1160286 · | |
Are there any plans of making version 6.12 usable by high end hosts before trying to roll to V7? I presume you're referring to the increased backoffs in BOINC 6.12.x, and as that's a fundamental design of the series I don't expect the BOINC devs to modify it. They're in bugfixing only mode for that branch, and of course assuming that because 6.12 is the recommended version it's reasonable to consider its effects as if all users have adopted the recommendation. The issue isn't really the backoffs so much as work delivery here, and I hope that some progress in being able to deliver what is assigned can be made before S@h v7 is rolled out. I don't know what's possible within the University of California hierarchy though. Joe | |
| ID: 1160293 · | |
Why would it start now? Never happened before. You are dealing with an idiot here. How do I do that and what do any results mean? ____________ | |
| ID: 1160294 · | |
... when v7 goes live on SETI, the project will get about a quarter of a million new hosts in the first month - at least, as far as the application_details are concerned. We really ought to prevail on David to consider that number before the event..... And v7 should have CPU, OpenCL ATI, CUDA NVIDIA, and maybe OpenCL NVIDIA application versions, so multiply that quarter of a million by maybe 2 to get the effective "active applications" count... Joe | |
| ID: 1160296 · | |
|
I found out how to do it! Check You are dealing with an idiot here. How do I do that and what do any results mean?[/quote] Microsoft Windows [Version 6.0.6002] Copyright (c) 2006 Microsoft Corporation. All rights reserved. C:\Windows\system32>ping 208.68.240.16 Pinging 208.68.240.16 with 32 bytes of data: Request timed out. Request timed out. Request timed out. Request timed out. Ping statistics for 208.68.240.16: Packets: Sent = 4, Received = 0, Lost = 4 (100% loss), C:\Windows\system32>tracert setiboincdata.ssl.berkeley.edu Tracing route to setiboincdata.ssl.berkeley.edu [208.68.240.16] over a maximum of 30 hops: 1 57 ms 99 ms 99 ms 192.168.254.254 2 46 ms 48 ms 49 ms anchor-hg-3-lo100.router.demon.net [194.159.161. 34] 3 47 ms 47 ms 47 ms anchor-access-4-s2010.router.demon.net [194.217. 23.37] 4 48 ms 46 ms 47 ms gi7-0-0-dar3.lah.uk.cw.net [194.159.161.90] 5 47 ms 48 ms 46 ms xe-0-1-0-xur1.lns.uk.cw.net [193.195.25.70] 6 52 ms 48 ms 48 ms lonap.he.net [193.203.5.128] 7 134 ms 130 ms 130 ms 10gigabitethernet6-3.core1.ash1.he.net [72.52.92 .137] 8 207 ms 210 ms 201 ms 10gigabitethernet7-4.core1.pao1.he.net [184.105. 213.177] 9 * * * Request timed out. 10 * * * Request timed out. 11 * * * Request timed out. 12 * * * Request timed out. 13 * * * Request timed out. 14 * * * Request timed out. 15 * * * Request timed out. 16 * * * Request timed out. 17 * * * Request timed out. 18 * * * Request timed out. 19 * * * Request timed out. 20 * * * Request timed out. 21 * * * Request timed out. 22 * * * Request timed out. 23 * * * Request timed out. 24 * * * Request timed out. 25 * * * Request timed out. 26 * * * Request timed out. 27 * * * Request timed out. 28 * * * Request timed out. 29 * * * Request timed out. 30 * * * Request timed out. Trace complete. C:\Windows\system32> ____________ | |
| ID: 1160300 · | |
Are there any plans of making version 6.12 usable by high end hosts before trying to roll to V7? ATM only 5 of the 40 "Top Hosts" are running v6.12 the rest are running v6.10. I think the main problem is the increased back off times, Its no good looking at a backed off download queue when when your CPU's - GPU's are sitting back scratching their nether regions. What could show interesting figures is if the "in progress" figures on the "tasks" page for each machine showed how many in progress tasks were still awaiting download. ____________ Kevin | |
| ID: 1160301 · | |
Why would it start now? Never happened before. I refuse to accept that I'm dealing with an idiot. I may very well be dealing with someone who has expertise in some subject area different from computing, but that's not the same thing at all. OK, one at a time. First open a "Command Prompt" window - similar to what we used to use as a 'DOS prompt'. There are many ways of doing that, so - since I don't know whether you'll be using your Vista machine or one of your XP machines for this - here's a way which should work with the default settings on any of them. Click the 'Start' button, click on 'All programs'. From the list, click on 'Accessories' (yellow folder icon), and you should see 'Command Prompt' near the top of the (alphabetical) list. Click it. In the command prompt window which opens, type that first line I gave you, exactly as it stands: ping 208.68.240.16 and press the return key at the end. Then wait. After a few seconds, you should see four lines of results. Either: lines of numbers, starting with 'Reply from'. That's good. Or: "Request timed out". That's bad. Which do you get? | |
| ID: 1160303 · | |
Either: lines of numbers, starting with 'Reply from'. That's good. All bad then! ____________ | |
| ID: 1160306 · | |
I found out how to do it! There! I said I wasn't dealing with an idiot - you beat me to it :-) Both of those are classic symptoms of the Hurricane Electric connection problem - especially, since you get the line referencing "10gigabitethernet7-4.core1.pao1.he.net", and nothing but asterisks below that. You could wait until Jeff's new memory has arrived, and until they've figured out a way to break into the security cage - or, since we're on a roll, you could try using a proxy. Look in the 'Temporary Fix...' thread, and see what proxies have been mentioned as working recently. The newest one (at the time of writing) seems to be 216.24.193.211:8080 Open BOINC Manager, in Advanced View. Assuming it's one of your BOINC v6.12.34 machines, go to the Tools menu, and click on 'Display and network options'. Click on the third tab, 'HTTP Proxy'. Check 'Connect via HTTP proxy server' Put 216.24.193.211 in the address box. Put 8080 in the Port box. (that's the two halves of the proxy line above, splitting it at the ':'. If you try a different proxy, do the same thing - splitting it into 'address' and 'port' - with any other proxy description) Leave the rest blank, and click 'OK'. Now retry your uploads. Judging by what people have said in the threads, you may need to experiment with different proxies until you find one which works for you. It may also be slow, but if it works at all, that's better than nothing. | |
| ID: 1160311 · | |
|
I was down to my last work unit wich was an AP couldnt get any work all day. Now this might be a coincedence but I pinged per John instructions and now I have a ton of work downloading, But I also finished that AP at the same time. | |
| ID: 1160312 · | |
Message boards : Number crunching : Panic Mode On (58) Server problems?
| Copyright © 2013 University of California |