Message boards :
Number crunching :
Can't report or get new tasks
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next
Author | Message |
---|---|
dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217 |
This http://setiathome.berkeley.edu/result.php?resultid=1911973186 has already timed out. From the client_state.xml: <result> <name>30no10ac.2377.138478.4.10.197_1</name> <final_cpu_time>4374.346997</final_cpu_time> <final_elapsed_time>4406.200996</final_elapsed_time> <exit_status>0</exit_status> <state>5</state> <platform>x86_64-pc-linux-gnu</platform> <version_num>528</version_num> <fpops_cumulative>28857695071522.894531</fpops_cumulative> <stderr_out> <![CDATA[ <stderr_txt> Unrecognized XML in parse_init_data_file: hostid Skipping: 5256434 Skipping: /hostid Unrecognized XML in parse_init_data_file: starting_elapsed_time Skipping: 0.000000 Skipping: /starting_elapsed_time Unrecognized XML in parse_init_data_file: computation_deadline Skipping: 1306694498.136000 Skipping: /computation_deadline Unrecognized XML in GLOBAL_PREFS::parse_override: mod_time Skipping: /mod_time Unrecognized XML in GLOBAL_PREFS::parse_override: run_gpu_if_user_active Skipping: 0 Skipping: /run_gpu_if_user_active Unrecognized XML in GLOBAL_PREFS::parse_override: max_ncpus_pct Skipping: 100.000000 Skipping: /max_ncpus_pct Unrecognized XML in parse_init_data_file: hostid Skipping: 5256434 Skipping: /hostid Unrecognized XML in parse_init_data_file: starting_elapsed_time Skipping: 0.000000 Skipping: /starting_elapsed_time Unrecognized XML in parse_init_data_file: computation_deadline Skipping: 1306694498.136000 Skipping: /computation_deadline Unrecognized XML in GLOBAL_PREFS::parse_override: mod_time Skipping: /mod_time Unrecognized XML in GLOBAL_PREFS::parse_override: run_gpu_if_user_active Skipping: 0 Skipping: /run_gpu_if_user_active Unrecognized XML in GLOBAL_PREFS::parse_override: max_ncpus_pct Skipping: 100.000000 Skipping: /max_ncpus_pct setiathome_enhanced 5.28 Revision: 26 g++ (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4) libboinc: BOINC 6.1.0 Work Unit Info: ............... WU true angle range is : 4.875261 Optimal function choices: ----------------------------------------------------- name ----------------------------------------------------- v_BaseLineSmooth (no other) v_vGetPowerSpectrumUnrolled 0.00025 0.00000 v_ChirpData 0.01037 0.00000 v_vTranspose4x16ntw 0.00359 0.00000 AK SSE folding 0.00062 0.00000 Flopcounter: 10120524941780.222656 Spike count: 0 Pulse count: 0 Triplet count: 2 Gaussian count: 0 </stderr_txt> ]]> </stderr_out> <ready_to_report/> <completed_time>1305809600.339605</completed_time> <wu_name>30no10ac.2377.138478.4.10.197</wu_name> <report_deadline>1306698099.000000</report_deadline> <received_time>1305509100.678899</received_time> <file_ref> <file_name>30no10ac.2377.138478.4.10.197_1_0</file_name> <open_name>result.sah</open_name> </file_ref> </result> Fore me it seems like something IS wrong. Please confuse me if I'm right :) |
rob smith Send message Joined: 7 Mar 03 Posts: 22240 Credit: 416,307,556 RAC: 380 |
While S@H is getting over the latest unplanned outage don't expect things to be normal.. Tables will be out of sync for a couple of days to come, there will be times when WU don't get through in either direction. One, or more, of your PCs may not be able to return, or collect, data for no apparent reason. For example, one of my PCs has returned about 50WU, effectively emptying its cache, but has only received two WU in the last few days; another has returned about one hundred WU and has completely replenished it cache; one WU stuck in the output channel for the best part of a day until the re-try didn't time-out. Watching the time-out/re-try process its obvious there are quite a number of factors in play, but in general its as if the time-out setting is a bit tight under the current server load conditions, which is leading to a very high proportion of re-try traffic, which is making the situation worse. It is possible that a small increase in time-out time would improve the overall throughput quite markedly, so reducing the load on the servers and the amount of traffic on S@H's internet connection... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
Take a look at the list of tasks. Many of them are in progress according the list, but most of them has completed. As long as a task is in your client_state.xml, it hasn't been reported and thus is listed as in progress and even still can time out. See to it that your tasks get reported after uploading them, otherwise you might have wasted time and energy. Gruß, Gundolf |
dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217 |
Take a look at the list of tasks. Many of them are in progress according the list, but most of them has completed. And how am I suppose to do that? On all my other machines, this is done automatically... |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Hi Dahls, Try going to your BOINC Manager, advanced view, projects tab, click on SETI, and hit the update button. This should send any completed work in to report. PS: also make sure you see a button there that says "no new tasks". If it says allow new tasks, click it. you may also check in the activity tab to make sure you have allowed network activity. PROUD MEMBER OF Team Starfire World BOINC |
dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217 |
Hi Dahls, Sorry, I forgot to say that this machine is a server without monitor. Command-line only. I have tried different commands: ./boinccmd --file_tramsfer http://setiathome.berkeley.edu/ 13dc10af.17799.90277.11.10.163_0 retry ./boinccmd --file_transfer http://setiathome.berkeley.edu/ 13dc10af.17799.90277.11.10.163_0 retry ./boinccmd --file_transfer http://setiathome.berkeley.edu/ 13dc10af.17799.90277.11.10.158_0 retry ...but that does not seem to do changes to the current state Still, the last recorded contact, according to the "Your computers" list, is May 18th for this machine! BTW, I just stopped downloading new tasks (nomorework). |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
No, that's a retry to upload a task. perryjay spoke about a project update. So, the command should be boinccmd --project http://setiathome.berkeley.edu/ update Here an excerpt from boinccmd --help: usage: boinc_cmd [--host hostname] [--passwd passwd] command Commands: --project URL op project operation op = reset | detach | update | suspend | resume | nomorework | allowmorework Did you check the client log? Its name should contain "stdoutdae". BTW, BOINC clients can be controlled remotely over a LAN. Gruß, Gundolf |
dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217 |
No, that's a retry to upload a task. perryjay spoke about a project update. So, the command should be I have tried './boinccmd --project URL update'. Output can be found at http://www.dahl-stamnes.net/dahls/Ymse/boinc.110530.html The last few lines indicate that 14dc10ac.31820.19290.9.10.247.vlar_0_0 was uploaded, but the status for http://setiathome.berkeley.edu/result.php?resultid=1914836952 is still "in Progress". A './boinccmd --get_results' show: name: 14dc10ac.31820.19290.9.10.247.vlar_0 WU name: 14dc10ac.31820.19290.9.10.247.vlar project URL: http://setiathome.berkeley.edu/ report deadline: Mon Jul 4 03:23:41 2011 ready to report: yes got server ack: no final CPU time: 12402.417545 state: 5 scheduler state: 0 exit_status: 0 signal: 0 suspended via GUI: no active_task_state: 0 stderr_out: app version num: 0 checkpoint CPU time: 0.000000 current CPU time: 0.000000 fraction done: 0.000000 swap size: 0.000000 working set size: 0.000000 estimated CPU time remaining: 0.000000 supports graphics: no (Perhaps some kind soul would explain the different status code, 5 in this case, mean?) BTW, what cause the "Last contact" column to be updated for a computer (on the "Your computer" page)? The last known date for contact with this computer is, according to this page, May 18th. But the log indicate that it has downloaded several working sets since then, and tried to upload and report results... For all the other computers the "Last contact" column seem to be OK. 2nd BTW: Thanks for all feedback so far. :) |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
The last few lines indicate that 14dc10ac.31820.19290.9.10.247.vlar_0_0 was uploaded, but the status for http://setiathome.berkeley.edu/result.php?resultid=1914836952 is still "in Progress". Once again, uploading doesn't change anything with the "in Progress" state. You must report an uploaded task to get it acknowledged. However, your host has obviously problems with that: 30-May-2011 18:36:33 [SETI@home] Reporting 287 completed tasks, not requesting new tasks 30-May-2011 18:36:33 [---] [http_debug] HTTP_OP::init_post(): http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi 30-May-2011 18:36:35 [---] [http_debug] [ID#4] info: About to connect() to setiboinc.ssl.berkeley.edu port 80 (#2) 30-May-2011 18:36:35 [---] [http_debug] [ID#4] info: Trying 208.68.240.20... 30-May-2011 18:36:35 [---] [http_debug] [ID#4] info: Connected to setiboinc.ssl.berkeley.edu (208.68.240.20) port 80 (#2) 30-May-2011 18:36:35 [---] [http_debug] [ID#4] Sent header to server: POST /sah_cgi/cgi HTTP/1.1 User-Agent: BOINC client (x86_64-pc-linux-gnu 6.10.17) Host: setiboinc.ssl.berkeley.edu Accept: */* Accept-Encoding: deflate, gzip Content-Type: application/x-www-form-urlencoded Content-Length: 829814 Expect: 100-continue 30-May-2011 18:36:35 [---] [http_debug] [ID#4] Received header from server: HTTP/1.1 100 Continue 30-May-2011 18:37:38 [---] [http_debug] [ID#4] Received header from server: HTTP/1.1 500 Internal Server Error What has happened to that server after May 18th? Try if setting <http_1_0>1</http_1_0> in the <options> part of your cc_config.xml helps with the problem. Gruß, Gundolf |
dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217 |
It has been up and running the last 266 days. It's a SQL server. I'll try to set http_1_0 to 1 and see what happens during the night. BTW, I attached the machine to Einstein@home today. Seems like I'm getting the same kind of error when trying to report completed work sets... :( |
dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217 |
Short update. After I added the http_1_0 to the cc_config.xml and restarted the manager, I got this line: libgcc_s.so.1 must be installed for pthread_cancel to work It's not in the stdout, probably stderr. Did a 'you install libgcc' which said: # yum search libgcc Loaded plugins: refresh-packagekit ========================================================= Matched: libgcc ========================================================== libgcc.i686 : GCC version 4.4 shared support library libgcc.x86_64 : GCC version 4.4 shared support library # yum install libgcc Loaded plugins: refresh-packagekit Setting up Install Process Package libgcc-4.4.4-10.fc12.x86_64 already installed and latest version Nothing to do |
dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217 |
It has been crunching SETI data...
Did not seem to do any good. Status is still the same. Also noticed that it has processed several Einstein working sets too since yesterday, but Einstein@home say that it has not been in contact with this machine since I attached to it yesterday. The log indicate that the completed Einstein work has been uploaded. As I understand, there is a difference between uploading and reporting completed data - right? If so, it seems like this machine is able to upload data but not able to report them. And it makes me wonder... |
Khangollo Send message Joined: 1 Aug 00 Posts: 245 Credit: 36,410,524 RAC: 0 |
I had similar problem with scheduler RPC. Problem was transparent http proxy (apparently weird/misconfigured) and boinc's use of HTTP 1.1. http_1_0 option solved the problem. Are you sure, you created cc_config.xml correctly (in boinc data root directory) and restarted boinc client (not manager) ? <cc_config> <options> <http_1_0>1</http_1_0> </options> </cc_config> If that doesn't help, try connecting through a http proxy if you run one or your ISP has one. You can configure it with boinc manager (Options). |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
And it would be a good idea to post extracts from you message logs with [http_debug], as you did right at the very beginning of this thread. Not the whole thing - just the first 20 lines or so at the start (that should show us whether the http_1_0 has worked), and another for the http_debug after you've clicked update to initiate a scheduler reporting request. [As you've realised, uploading and reporting are two different things. The log you posted on 29 May was an upload event. Ah, found your externally-hosted log now. Some of that rings a bell: people on other projects have been mentioning those very slow transfers. I'll have a look round]. |
dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217 |
And it would be a good idea to post extracts from you message logs with [http_debug], as you did right at the very beginning of this thread. Not the whole thing - just the first 20 lines or so at the start (that should show us whether the http_1_0 has worked), and another for the http_debug after you've clicked update to initiate a scheduler reporting request. Here is the latest output from boinc: http://www.dahl-stamnes.net/dahls/Ymse/boinc.110531.html. An update-command was issued @ 22:33:26 in this log. No data has been reported for seti@home and einstein@home, but a lot of data has been processed...
And what is the different? How am I suppose to force a report (or is that what I'm doing when I request an update)? Please: no clicking-instructions... I'm using command interface on a linux machine. :) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
[As you've realised, uploading and reporting are two different things. The log you posted on 29 May was an upload event. Ah, found your externally-hosted log now. Some of that rings a bell: people on other projects have been mentioning those very slow transfers. I'll have a look round]. Found the thread I was looking for: it's at climateprediction.net. The suggestion by Thyme Lawn (a technically-competent moderator on that project) is: It looks like your proxy server has allowed BOINC (strictly speaking libcurl) to send it more data than it can pass on to the upload server within BOINC's 5 minute upload inactivity timeout (if BOINC doesn't receive an acknowledgement for the data it has passed to the proxy server before the inactivity timeout expires it classes the connection as failed). In other words: His (and possibly your) computer can send data from BOINC to the proxy very quickly. The proxy can only send data to the project slowly - very slowly when the communications channels are saturated. This confuses BOINC monitoring and timing processes, and you get the 30-May-2011 18:41:43 [---] [http_debug] [ID#4] info: Operation too slow. Less than 10 bytes/sec transfered the last 300 seconds shown in your external log. Sorry about the click. I assumed that as a server operator, you would be able to translate that into the equivalent --project URL operation from Boinccmd. A GUI update is the same as a Boinccmd update: yes, that's what triggers the reporting of any completed work. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Quite aside from the eventual 5 minute timeout, dahls logs invariably show an HTTP error 500 at about one minute after initiating the post to communicate with the Scheduler. It puzzles me that the 6.10.17 client doesn't stop trying at that point and instead waits for Curl to time out the attempt. For this latest log, that error 500 is coming from a server running CentOS, for the earlier log at http://www.dahl-stamnes.net/dahls/Ymse/boinc.110530.html the server was running Fedora. That probably isn't related, it may suggest the project staff is experimenting to see if CentOS will be more capable or stable. It's not the 800+ KiB post size, since that has just gradually grown due to previous failures, and the Einstein attempts are failing in the same way with only 100+ KiB to post. Joe |
dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217 |
I wonder which proxy server that is. I don't use any proxy server. I also wonder why this problem occur on this machine ONLY. I got other machines that has never had this kind of problem. The first time I got this problem was last year. Since then I have reinstalled BOINC again. This make BOINC and SETI work for some time, but each time I have reinstalled, the time it worked has become shorter and shorter. The last time I reinstalled, was on May 18th... in never worked again after that. The 3rd thing I wonder about - how to fix this... Thanks for your reply :) |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
Did you try an advanced forum search already, here and at Einstein@home? Search for "500 Internal Server Error" (without quotation marks) and at least six months into the past. Gruß, Gundolf |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
How is that host connected to the Net?, Could it be a poor wireless signal or a duff network cable or Nic?, i have tended to connect to the net via wireless hotspots, sometimes the connection is too slow to report Wu's at all, but other projects where i don't have Wu's report O.K, Claggy |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.