Can't report or get new tasks

Author	Message
dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217	Message 1111159 - Posted: 29 May 2011, 20:24:30 UTC This http://setiathome.berkeley.edu/result.php?resultid=1911973186 has already timed out. From the client_state.xml: <result> <name>30no10ac.2377.138478.4.10.197_1</name> <final_cpu_time>4374.346997</final_cpu_time> <final_elapsed_time>4406.200996</final_elapsed_time> <exit_status>0</exit_status> <state>5</state> <platform>x86_64-pc-linux-gnu</platform> <version_num>528</version_num> <fpops_cumulative>28857695071522.894531</fpops_cumulative> <stderr_out> <![CDATA[ <stderr_txt> Unrecognized XML in parse_init_data_file: hostid Skipping: 5256434 Skipping: /hostid Unrecognized XML in parse_init_data_file: starting_elapsed_time Skipping: 0.000000 Skipping: /starting_elapsed_time Unrecognized XML in parse_init_data_file: computation_deadline Skipping: 1306694498.136000 Skipping: /computation_deadline Unrecognized XML in GLOBAL_PREFS::parse_override: mod_time Skipping: /mod_time Unrecognized XML in GLOBAL_PREFS::parse_override: run_gpu_if_user_active Skipping: 0 Skipping: /run_gpu_if_user_active Unrecognized XML in GLOBAL_PREFS::parse_override: max_ncpus_pct Skipping: 100.000000 Skipping: /max_ncpus_pct Unrecognized XML in parse_init_data_file: hostid Skipping: 5256434 Skipping: /hostid Unrecognized XML in parse_init_data_file: starting_elapsed_time Skipping: 0.000000 Skipping: /starting_elapsed_time Unrecognized XML in parse_init_data_file: computation_deadline Skipping: 1306694498.136000 Skipping: /computation_deadline Unrecognized XML in GLOBAL_PREFS::parse_override: mod_time Skipping: /mod_time Unrecognized XML in GLOBAL_PREFS::parse_override: run_gpu_if_user_active Skipping: 0 Skipping: /run_gpu_if_user_active Unrecognized XML in GLOBAL_PREFS::parse_override: max_ncpus_pct Skipping: 100.000000 Skipping: /max_ncpus_pct setiathome_enhanced 5.28 Revision: 26 g++ (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4) libboinc: BOINC 6.1.0 Work Unit Info: ............... WU true angle range is : 4.875261 Optimal function choices: ----------------------------------------------------- name ----------------------------------------------------- v_BaseLineSmooth (no other) v_vGetPowerSpectrumUnrolled 0.00025 0.00000 v_ChirpData 0.01037 0.00000 v_vTranspose4x16ntw 0.00359 0.00000 AK SSE folding 0.00062 0.00000 Flopcounter: 10120524941780.222656 Spike count: 0 Pulse count: 0 Triplet count: 2 Gaussian count: 0 </stderr_txt> ]]> </stderr_out> <ready_to_report/> <completed_time>1305809600.339605</completed_time> <wu_name>30no10ac.2377.138478.4.10.197</wu_name> <report_deadline>1306698099.000000</report_deadline> <received_time>1305509100.678899</received_time> <file_ref> <file_name>30no10ac.2377.138478.4.10.197_1_0</file_name> <open_name>result.sah</open_name> </file_ref> </result> Fore me it seems like something IS wrong. Please confuse me if I'm right :) ID: 1111159 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22240 Credit: 416,307,556 RAC: 380	Message 1111172 - Posted: 29 May 2011, 20:59:41 UTC - in response to Message 1111159. While S@H is getting over the latest unplanned outage don't expect things to be normal.. Tables will be out of sync for a couple of days to come, there will be times when WU don't get through in either direction. One, or more, of your PCs may not be able to return, or collect, data for no apparent reason. For example, one of my PCs has returned about 50WU, effectively emptying its cache, but has only received two WU in the last few days; another has returned about one hundred WU and has completely replenished it cache; one WU stuck in the output channel for the best part of a day until the re-try didn't time-out. Watching the time-out/re-try process its obvious there are quite a number of factors in play, but in general its as if the time-out setting is a bit tight under the current server load conditions, which is leading to a very high proportion of re-try traffic, which is making the situation worse. It is possible that a small increase in time-out time would improve the overall throughput quite markedly, so reducing the load on the servers and the amount of traffic on S@H's internet connection... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1111172 ·

Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0	Message 1111196 - Posted: 29 May 2011, 21:53:52 UTC - in response to Message 1111153. Take a look at the list of tasks. Many of them are in progress according the list, but most of them has completed. As an example: http://setiathome.berkeley.edu/result.php?resultid=1910005227 claim to be in progress. But the client_state.xml file indicate that it has completed (if I have understood the file correctly). As long as a task is in your client_state.xml, it hasn't been reported and thus is listed as in progress and even still can time out. See to it that your tasks get reported after uploading them, otherwise you might have wasted time and energy. GruÃŸ, Gundolf ID: 1111196 ·

dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217	Message 1111372 - Posted: 30 May 2011, 13:43:27 UTC - in response to Message 1111196. Take a look at the list of tasks. Many of them are in progress according the list, but most of them has completed. As an example: http://setiathome.berkeley.edu/result.php?resultid=1910005227 claim to be in progress. But the client_state.xml file indicate that it has completed (if I have understood the file correctly). As long as a task is in your client_state.xml, it hasn't been reported and thus is listed as in progress and even still can time out. See to it that your tasks get reported after uploading them, otherwise you might have wasted time and energy. GruÃŸ, Gundolf And how am I suppose to do that? On all my other machines, this is done automatically... ID: 1111372 ·

perryjay Volunteer tester Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0	Message 1111383 - Posted: 30 May 2011, 14:18:11 UTC - in response to Message 1111372. Last modified: 30 May 2011, 14:21:42 UTC Hi Dahls, Try going to your BOINC Manager, advanced view, projects tab, click on SETI, and hit the update button. This should send any completed work in to report. PS: also make sure you see a button there that says "no new tasks". If it says allow new tasks, click it. you may also check in the activity tab to make sure you have allowed network activity. PROUD MEMBER OF Team Starfire World BOINC ID: 1111383 ·

dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217	Message 1111404 - Posted: 30 May 2011, 15:43:25 UTC - in response to Message 1111383. Hi Dahls, Try going to your BOINC Manager, advanced view, projects tab, click on SETI, and hit the update button. This should send any completed work in to report. PS: also make sure you see a button there that says "no new tasks". If it says allow new tasks, click it. you may also check in the activity tab to make sure you have allowed network activity. Sorry, I forgot to say that this machine is a server without monitor. Command-line only. I have tried different commands: ./boinccmd --file_tramsfer http://setiathome.berkeley.edu/ 13dc10af.17799.90277.11.10.163_0 retry ./boinccmd --file_transfer http://setiathome.berkeley.edu/ 13dc10af.17799.90277.11.10.163_0 retry ./boinccmd --file_transfer http://setiathome.berkeley.edu/ 13dc10af.17799.90277.11.10.158_0 retry ...but that does not seem to do changes to the current state Still, the last recorded contact, according to the "Your computers" list, is May 18th for this machine! BTW, I just stopped downloading new tasks (nomorework). ID: 1111404 ·

Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0	Message 1111428 - Posted: 30 May 2011, 16:51:20 UTC - in response to Message 1111404. No, that's a retry to upload a task. perryjay spoke about a project update. So, the command should be boinccmd --project http://setiathome.berkeley.edu/ update Here an excerpt from boinccmd --help: usage: boinc_cmd [--host hostname] [--passwd passwd] command Commands: --project URL op project operation op = reset \| detach \| update \| suspend \| resume \| nomorework \| allowmorework Did you check the client log? Its name should contain "stdoutdae". BTW, BOINC clients can be controlled remotely over a LAN. GruÃŸ, Gundolf ID: 1111428 ·

dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217	Message 1111484 - Posted: 30 May 2011, 19:29:34 UTC - in response to Message 1111428. No, that's a retry to upload a task. perryjay spoke about a project update. So, the command should be boinccmd --project http://setiathome.berkeley.edu/ update Here an excerpt from boinccmd --help: usage: boinc_cmd [--host hostname] [--passwd passwd] command Commands: --project URL op project operation op = reset \| detach \| update \| suspend \| resume \| nomorework \| allowmorework Did you check the client log? Its name should contain "stdoutdae". BTW, BOINC clients can be controlled remotely over a LAN. GruÃŸ, Gundolf I have tried './boinccmd --project URL update'. Output can be found at http://www.dahl-stamnes.net/dahls/Ymse/boinc.110530.html The last few lines indicate that 14dc10ac.31820.19290.9.10.247.vlar_0_0 was uploaded, but the status for http://setiathome.berkeley.edu/result.php?resultid=1914836952 is still "in Progress". A './boinccmd --get_results' show: name: 14dc10ac.31820.19290.9.10.247.vlar_0 WU name: 14dc10ac.31820.19290.9.10.247.vlar project URL: http://setiathome.berkeley.edu/ report deadline: Mon Jul 4 03:23:41 2011 ready to report: yes got server ack: no final CPU time: 12402.417545 state: 5 scheduler state: 0 exit_status: 0 signal: 0 suspended via GUI: no active_task_state: 0 stderr_out: app version num: 0 checkpoint CPU time: 0.000000 current CPU time: 0.000000 fraction done: 0.000000 swap size: 0.000000 working set size: 0.000000 estimated CPU time remaining: 0.000000 supports graphics: no (Perhaps some kind soul would explain the different status code, 5 in this case, mean?) BTW, what cause the "Last contact" column to be updated for a computer (on the "Your computer" page)? The last known date for contact with this computer is, according to this page, May 18th. But the log indicate that it has downloaded several working sets since then, and tried to upload and report results... For all the other computers the "Last contact" column seem to be OK. 2nd BTW: Thanks for all feedback so far. :) ID: 1111484 ·

Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0	Message 1111494 - Posted: 30 May 2011, 20:02:48 UTC - in response to Message 1111484. The last few lines indicate that 14dc10ac.31820.19290.9.10.247.vlar_0_0 was uploaded, but the status for http://setiathome.berkeley.edu/result.php?resultid=1914836952 is still "in Progress". Once again, uploading doesn't change anything with the "in Progress" state. You must report an uploaded task to get it acknowledged. However, your host has obviously problems with that: 30-May-2011 18:36:33 [SETI@home] Reporting 287 completed tasks, not requesting new tasks 30-May-2011 18:36:33 [---] [http_debug] HTTP_OP::init_post(): http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi 30-May-2011 18:36:35 [---] [http_debug] [ID#4] info: About to connect() to setiboinc.ssl.berkeley.edu port 80 (#2) 30-May-2011 18:36:35 [---] [http_debug] [ID#4] info: Trying 208.68.240.20... 30-May-2011 18:36:35 [---] [http_debug] [ID#4] info: Connected to setiboinc.ssl.berkeley.edu (208.68.240.20) port 80 (#2) 30-May-2011 18:36:35 [---] [http_debug] [ID#4] Sent header to server: POST /sah_cgi/cgi HTTP/1.1 User-Agent: BOINC client (x86_64-pc-linux-gnu 6.10.17) Host: setiboinc.ssl.berkeley.edu Accept: / Accept-Encoding: deflate, gzip Content-Type: application/x-www-form-urlencoded Content-Length: 829814 Expect: 100-continue 30-May-2011 18:36:35 [---] [http_debug] [ID#4] Received header from server: HTTP/1.1 100 Continue 30-May-2011 18:37:38 [---] [http_debug] [ID#4] Received header from server: HTTP/1.1 500 Internal Server Error What has happened to that server after May 18th? Try if setting <http_1_0>1</http_1_0> in the <options> part of your cc_config.xml helps with the problem. GruÃŸ, Gundolf ID: 1111494 ·

dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217	Message 1111507 - Posted: 30 May 2011, 20:23:15 UTC - in response to Message 1111494. What has happened to that server after May 18th? Try if setting <http_1_0>1</http_1_0> in the <options> part of your cc_config.xml helps with the problem. GruÃŸ, Gundolf It has been up and running the last 266 days. It's a SQL server. I'll try to set http_1_0 to 1 and see what happens during the night. BTW, I attached the machine to Einstein@home today. Seems like I'm getting the same kind of error when trying to report completed work sets... :( ID: 1111507 ·

dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217	Message 1111511 - Posted: 30 May 2011, 20:32:30 UTC - in response to Message 1111507. Short update. After I added the http_1_0 to the cc_config.xml and restarted the manager, I got this line: libgcc_s.so.1 must be installed for pthread_cancel to work It's not in the stdout, probably stderr. Did a 'you install libgcc' which said: # yum search libgcc Loaded plugins: refresh-packagekit ========================================================= Matched: libgcc ========================================================== libgcc.i686 : GCC version 4.4 shared support library libgcc.x86_64 : GCC version 4.4 shared support library # yum install libgcc Loaded plugins: refresh-packagekit Setting up Install Process Package libgcc-4.4.4-10.fc12.x86_64 already installed and latest version Nothing to do ID: 1111511 ·

dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217	Message 1111731 - Posted: 31 May 2011, 13:38:44 UTC - in response to Message 1111494. What has happened to that server after May 18th? It has been crunching SETI data... Try if setting <http_1_0>1</http_1_0> in the <options> part of your cc_config.xml helps with the problem. GruÃŸ, Gundolf Did not seem to do any good. Status is still the same. Also noticed that it has processed several Einstein working sets too since yesterday, but Einstein@home say that it has not been in contact with this machine since I attached to it yesterday. The log indicate that the completed Einstein work has been uploaded. As I understand, there is a difference between uploading and reporting completed data - right? If so, it seems like this machine is able to upload data but not able to report them. And it makes me wonder... ID: 1111731 ·

Khangollo Send message Joined: 1 Aug 00 Posts: 245 Credit: 36,410,524 RAC: 0	Message 1111741 - Posted: 31 May 2011, 13:47:48 UTC Last modified: 31 May 2011, 13:54:47 UTC I had similar problem with scheduler RPC. Problem was transparent http proxy (apparently weird/misconfigured) and boinc's use of HTTP 1.1. http_1_0 option solved the problem. Are you sure, you created cc_config.xml correctly (in boinc data root directory) and restarted boinc client (not manager) ? <cc_config> <options> <http_1_0>1</http_1_0> </options> </cc_config> If that doesn't help, try connecting through a http proxy if you run one or your ISP has one. You can configure it with boinc manager (Options). ID: 1111741 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 1111745 - Posted: 31 May 2011, 14:06:45 UTC And it would be a good idea to post extracts from you message logs with [http_debug], as you did right at the very beginning of this thread. Not the whole thing - just the first 20 lines or so at the start (that should show us whether the http_1_0 has worked), and another for the http_debug after you've clicked update to initiate a scheduler reporting request. [As you've realised, uploading and reporting are two different things. The log you posted on 29 May was an upload event. Ah, found your externally-hosted log now. Some of that rings a bell: people on other projects have been mentioning those very slow transfers. I'll have a look round]. ID: 1111745 ·

dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217	Message 1111782 - Posted: 31 May 2011, 16:26:21 UTC - in response to Message 1111745. And it would be a good idea to post extracts from you message logs with [http_debug], as you did right at the very beginning of this thread. Not the whole thing - just the first 20 lines or so at the start (that should show us whether the http_1_0 has worked), and another for the http_debug after you've clicked update to initiate a scheduler reporting request. Here is the latest output from boinc: http://www.dahl-stamnes.net/dahls/Ymse/boinc.110531.html. An update-command was issued @ 22:33:26 in this log. No data has been reported for seti@home and einstein@home, but a lot of data has been processed... [As you've realised, uploading and reporting are two different things. The log you posted on 29 May was an upload event. Ah, found your externally-hosted log now. Some of that rings a bell: people on other projects have been mentioning those very slow transfers. I'll have a look round]. And what is the different? How am I suppose to force a report (or is that what I'm doing when I request an update)? Please: no clicking-instructions... I'm using command interface on a linux machine. :) ID: 1111782 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 1111799 - Posted: 31 May 2011, 21:46:39 UTC - in response to Message 1111782. [As you've realised, uploading and reporting are two different things. The log you posted on 29 May was an upload event. Ah, found your externally-hosted log now. Some of that rings a bell: people on other projects have been mentioning those very slow transfers. I'll have a look round]. And what is the different? How am I suppose to force a report (or is that what I'm doing when I request an update)? Please: no clicking-instructions... I'm using command interface on a linux machine. :) Found the thread I was looking for: it's at climateprediction.net. The suggestion by Thyme Lawn (a technically-competent moderator on that project) is: It looks like your proxy server has allowed BOINC (strictly speaking libcurl) to send it more data than it can pass on to the upload server within BOINC's 5 minute upload inactivity timeout (if BOINC doesn't receive an acknowledgement for the data it has passed to the proxy server before the inactivity timeout expires it classes the connection as failed). Is there any way you can reduce the proxy server's cache size (or disable caching) to test this out? In other words: His (and possibly your) computer can send data from BOINC to the proxy very quickly. The proxy can only send data to the project slowly - very slowly when the communications channels are saturated. This confuses BOINC monitoring and timing processes, and you get the 30-May-2011 18:41:43 [---] [http_debug] [ID#4] info: Operation too slow. Less than 10 bytes/sec transfered the last 300 seconds shown in your external log. Sorry about the click. I assumed that as a server operator, you would be able to translate that into the equivalent --project URL operation Do operation on a project, identified by its master URL. Operations: reset: delete current work and get more; detach: delete current work and don't get more; update: contact scheduling server; suspend: stop work for project; resume: resume work for project; nomorework: finish current work but don't get more; allowmorework: undo nomorework detach_when_done: detach project from Boinccmd. A GUI update is the same as a Boinccmd update: yes, that's what triggers the reporting of any completed work. ID: 1111799 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1111943 - Posted: 1 Jun 2011, 6:34:22 UTC Quite aside from the eventual 5 minute timeout, dahls logs invariably show an HTTP error 500 at about one minute after initiating the post to communicate with the Scheduler. It puzzles me that the 6.10.17 client doesn't stop trying at that point and instead waits for Curl to time out the attempt. For this latest log, that error 500 is coming from a server running CentOS, for the earlier log at http://www.dahl-stamnes.net/dahls/Ymse/boinc.110530.html the server was running Fedora. That probably isn't related, it may suggest the project staff is experimenting to see if CentOS will be more capable or stable. It's not the 800+ KiB post size, since that has just gradually grown due to previous failures, and the Einstein attempts are failing in the same way with only 100+ KiB to post. Joe ID: 1111943 ·

dahls Send message Joined: 24 Oct 04 Posts: 135 Credit: 178,942,502 RAC: 217	Message 1112092 - Posted: 1 Jun 2011, 17:14:43 UTC - in response to Message 1111799. Found the thread I was looking for: it's at climateprediction.net. The suggestion by Thyme Lawn (a technically-competent moderator on that project) is: It looks like your proxy server has allowed BOINC (strictly speaking libcurl) to send it more data than it can pass on to the upload server within BOINC's 5 minute upload inactivity timeout (if BOINC doesn't receive an acknowledgement for the data it has passed to the proxy server before the inactivity timeout expires it classes the connection as failed). Is there any way you can reduce the proxy server's cache size (or disable caching) to test this out? In other words: His (and possibly your) computer can send data from BOINC to the proxy very quickly. The proxy can only send data to the project slowly - very slowly when the communications channels are saturated. This confuses BOINC monitoring and timing processes, and you get the 30-May-2011 18:41:43 [---] [http_debug] [ID#4] info: Operation too slow. Less than 10 bytes/sec transfered the last 300 seconds shown in your external log. I wonder which proxy server that is. I don't use any proxy server. I also wonder why this problem occur on this machine ONLY. I got other machines that has never had this kind of problem. The first time I got this problem was last year. Since then I have reinstalled BOINC again. This make BOINC and SETI work for some time, but each time I have reinstalled, the time it worked has become shorter and shorter. The last time I reinstalled, was on May 18th... in never worked again after that. The 3rd thing I wonder about - how to fix this... Thanks for your reply :) ID: 1112092 ·

Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0	Message 1112134 - Posted: 1 Jun 2011, 19:45:42 UTC - in response to Message 1112092. Did you try an advanced forum search already, here and at Einstein@home? Search for "500 Internal Server Error" (without quotation marks) and at least six months into the past. GruÃŸ, Gundolf ID: 1112134 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1112135 - Posted: 1 Jun 2011, 19:52:15 UTC - in response to Message 1112092. Last modified: 1 Jun 2011, 19:52:44 UTC How is that host connected to the Net?, Could it be a poor wireless signal or a duff network cable or Nic?, i have tended to connect to the net via wireless hotspots, sometimes the connection is too slow to report Wu's at all, but other projects where i don't have Wu's report O.K, Claggy ID: 1112135 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.