Message boards :
Number crunching :
Problems...
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 13 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
OK, seems to be failing again (often but not always) this morning. 08/03/2010 11:11:52 SETI@home update requested by user 08/03/2010 11:11:56 SETI@home Sending scheduler request: Requested by user. 08/03/2010 11:11:56 SETI@home Reporting 7 completed tasks, not requesting new tasks 08/03/2010 11:11:56 [http_debug] HTTP_OP::init_post(): http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi 08/03/2010 11:11:56 [http_debug] HTTP_OP::libcurl_exec(): ca-bundle set 08/03/2010 11:11:57 [http_debug] [ID#1] Info: timeout on name lookup is not supported 08/03/2010 11:11:57 [http_debug] [ID#1] Info: About to connect() to setiboinc.ssl.berkeley.edu port 80 (#0) 08/03/2010 11:11:57 [http_debug] [ID#1] Info: Trying 208.68.240.20... 08/03/2010 11:11:57 [http_debug] [ID#1] Info: Connected to setiboinc.ssl.berkeley.edu (208.68.240.20) port 80 (#0) 08/03/2010 11:11:57 [http_debug] [ID#1] Sent header to server: POST /sah_cgi/cgi HTTP/1.1 08/03/2010 11:11:57 [http_debug] [ID#1] Sent header to server: User-Agent: BOINC client (windows_intelx86 6.10.36) 08/03/2010 11:11:57 [http_debug] [ID#1] Sent header to server: Host: setiboinc.ssl.berkeley.edu 08/03/2010 11:11:57 [http_debug] [ID#1] Sent header to server: Accept: */* 08/03/2010 11:11:57 [http_debug] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip 08/03/2010 11:11:57 [http_debug] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded 08/03/2010 11:11:57 [http_debug] [ID#1] Sent header to server: Content-Length: 108233 08/03/2010 11:11:57 [http_debug] [ID#1] Sent header to server: Expect: 100-continue 08/03/2010 11:11:57 [http_debug] [ID#1] Sent header to server: 08/03/2010 11:11:57 [http_debug] [ID#1] Info: Empty reply from server 08/03/2010 11:11:57 [http_debug] [ID#1] Info: Connection #0 to host setiboinc.ssl.berkeley.edu left intact 08/03/2010 11:11:57 [http_debug] HTTP error: Server returned nothing (no headers, no data) 08/03/2010 11:12:01 SETI@home Scheduler request failed: Server returned nothing (no headers, no data) Notice that it's not a time-out. The server sent something, but it wasn't the expected 100-continue. Next stop, wireshark. |
W-K 666 Send message Joined: 18 May 99 Posts: 19367 Credit: 40,757,560 RAC: 67 |
OK, seems to be failing again (often but not always) this morning. Was going to post exactly the same error msg. I had successful connection at 11:00:14, then since 11:15:55 all except one connection (and that was "no work from project") has error'd out as above. That is on both computers, all times UTC. |
Siran d'Vel'nahr Send message Joined: 23 May 99 Posts: 7379 Credit: 44,181,323 RAC: 238 |
Greetings, I know nothing of "failed scheduler conversation"(s), but I would surely like to know what this is all about. I got home from work last night and saw pages of this: 3/7/2010 8:14:01 PM Resuming computation I have never seen this before. It must be something new to v6.10.36 of BOINC. I don't want BOINC suspended every 10 seconds simply because my PC is multi-tasking. Every PC in the world, that is running BOINC, is doing something else as well. I was able to work around this annoyance by going into my preferences and changing a setting that went from: 3/7/2010 8:40:48 PM suspend work if non-BOINC CPU load exceeds 1 % ... to not suspending at all. The initial value was set to 25%. Setting it to 0% was the only way to get BOINC to run continuously without getting suspended. I'm only glad they added the option, to the preferences, to let BOINC run continuously. Keep on BOINCing...! :) CAPT Siran d'Vel'nahr - L L & P _\\// Winders 11 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Wireshark got one at the first attempt (that was easy). First handshake went OK: [SYN] [SYN, ACK] [ACK] POST [FIN, ACK] [ACK] but then the scheduler sent out a couple of [RST] packets. Seems like it thinks it's busy (like Bruno a couple of weeks ago), but Anakin can usually cope with more than 424 database queries/sec and 40,000 results/hour. |
Jerold Russell Send message Joined: 9 Jul 99 Posts: 10 Credit: 15,498,845 RAC: 4 |
Can someone point me to the correct Forum to solve this problem: Upgraded to 6.10.36 on my MacBook Pro and can't upload. Keep getting "temporarily failed upload of....:can't resolve host name". |
tullio Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1 |
Can someone point me to the correct Forum to solve this problem: Upgraded to 6.10.36 on my MacBook Pro and can't upload. Keep getting "temporarily failed upload of....:can't resolve host name". Maybe in Questions and answers, Macintosh? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
"Scheduler request failed: Server returned nothing (no headers, no data)" is a new phenomenon, at least at the frequency we're seeing it now. One of my CUDA hosts has BOINC logs going back to 17 January 2010. The "returned nothing" error didn't start until 21 February - this graph shows the daily scheduler request/success/error numbers for the period. (Direct link) |
B. Rathwell Send message Joined: 10 May 07 Posts: 3 Credit: 1,380,366 RAC: 0 |
When I go to my preference page and look at tasks I see a lot that say still in progress even though I know I have reported them, some as far back a Jan and early Feb. They should say pending or waiting validation shouldnt they? 1536270455 580924004 7 Mar 2010 11:35:24 UTC 21 Apr 2010 13:59:40 UTC In progress --- --- --- --- SETI@home Enhanced v6.03 1533981957 579904102 5 Mar 2010 22:31:57 UTC 21 Apr 2010 6:48:56 UTC In progress --- --- --- --- SETI@home Enhanced v6.03 1533338688 579618103 5 Mar 2010 10:06:30 UTC 20 Apr 2010 4:06:22 UTC In progress --- --- --- --- SETI@home Enhanced v6.03 1532542715 579217519 4 Mar 2010 19:40:11 UTC 7 Mar 2010 18:29:55 UTC Completed, waiting for validation 59,516.11 50,568.01 168.98 pending SETI@home Enhanced v6.03 1531843414 578963739 4 Mar 2010 6:45:06 UTC 7 Mar 2010 1:10:37 UTC Completed, waiting for validation 52,623.58 37,749.92 82.40 pending SETI@home Enhanced v6.03 1530972524 578596642 3 Mar 2010 9:54:54 UTC 5 Mar 2010 20:37:25 UTC Completed, waiting for validation 49,497.88 42,795.20 118.67 pending SETI@home Enhanced v6.03 1529962724 578160157 2 Mar 2010 9:58:54 UTC 17 Apr 2010 18:28:58 UTC In progress --- --- --- --- SETI@home Enhanced v6.03 1529958295 578158233 2 Mar 2010 9:52:43 UTC 17 Apr 2010 18:22:46 UTC In progress --- --- --- --- SETI@home Enhanced v6.03 1529552339 577973597 2 Mar 2010 0:26:16 UTC 4 Mar 2010 3:25:43 UTC Completed, waiting for validation 43,210.56 37,843.81 83.24 pending SETI@home Enhanced v6.03 1527906075 577259813 28 Feb 2010 12:16:14 UTC 3 Mar 2010 0:02:33 UTC Completed, waiting for validation 43,437.66 36,454.64 83.50 pending SETI@home Enhanced v6.03 1527581180 577119085 28 Feb 2010 5:49:28 UTC 25 Apr 2010 23:22:52 UTC In progress --- --- --- --- SETI@home Enhanced v6.03 1527572996 577115461 28 Feb 2010 5:36:48 UTC 15 Apr 2010 17:43:28 UTC In progress --- --- --- --- SETI@home Enhanced v6.03 1527568164 577113355 28 Feb 2010 5:28:38 UTC 15 Apr 2010 17:35:18 UTC In progress --- --- --- --- SETI@home Enhanced v6.03 1527564455 577111624 28 Feb 2010 5:22:28 UTC 16 Apr 2010 18:35:54 UTC In progress --- --- --- --- SETI@home Enhanced v6.03 1527559634 577109282 28 Feb 2010 5:16:13 UTC 13 Mar 2010 23:32:53 UTC In progress --- --- --- --- SETI@home Enhanced v6.03 1519260246 573417542 21 Feb 2010 14:04:43 UTC 26 Feb 2010 7:50:38 UTC Completed, waiting for validation 287,324.27 264,175.40 813.30 pending Astropulse v505 v5.05 1519259818 573417462 21 Feb 2010 13:58:55 UTC 18 Mar 2010 13:58:55 UTC In progress --- --- --- --- Astropulse v505 v5.05 1510402072 569752765 8 Feb 2010 13:54:59 UTC 28 Mar 2010 13:59:30 UTC In progress --- --- --- --- SETI@home Enhanced v6.03 1508064768 568748689 6 Feb 2010 10:49:09 UTC 8 Feb 2010 14:49:50 UTC Completed, waiting for validation 40,710.17 38,673.11 118.20 pending SETI@home Enhanced v6.03 1488854811 560408193 20 Jan 2010 23:38:36 UTC 10 Mar 2010 11:22:48 UTC In progress --- --- --- --- SETI@home Enhanced v6.03 B. Rathwell |
W-K 666 Send message Joined: 18 May 99 Posts: 19367 Credit: 40,757,560 RAC: 67 |
The details for tasks are read from the replica database, earlier this was off, but is now back on but ~30hrs behind. My guess, is that it will probably not catch up before the Tues maintenance, and the overworked staff will try to fix it then. |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
Wireshark got one at the first attempt (that was easy). Thank You Richard This was conveyed to Matt and Eric. The odd part is I have <http_debug>1</http_debug> set for logging and it will not "hickup" for me (maybe I should go by a Lotto Ticket snciker). Realistically the messages in the Boinc Core are "incomplete or less descriptive" than they should be. It might be more realistically stated; The connection to the Server was completed, Boinc sent the "sched_request_setiathome.berkeley.edu.xml" to the server while waiting for the "sched_reply_setiathome.berkeley.edu.xml" from the server Nothing Happened. Boinc will rety later. I would presume that the RST Packets are that it has not analyzed the data from the "sched_request_setiathome.berkeley.edu.xml" yet. Okay (SWAG), as I think about the note I got from Eric that due the database crash (reload), the transitioner is going to be looking for results that "are not there" (the Validate issue). The transitioner could be holding up the scheduler (looking for the missing files). This would mean as all the missing results were resent and/or flushed from the system the problem would be selfhealing. Then as my machines have went through so many WU's and I have a short turnaround time, I am not seeing the issue (I am past it). Those with 10 day Caches or slower machines could be affected for a longer period of time. Regards Please consider a Donation to the Seti Project. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
The odd part is I have <http_debug>1</http_debug> set for logging and it will not "hickup" for me Do a couple of updates in quick succession. It will do it only about half of the time. The first couple may be going out correctly, then all of a sudden you'll hit the problem. The BOINC Devs are also busy with this problem, since it came up through another thread I was sending them updates about with comms problems on 6.10.36 |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Okay (SWAG), as I think about the note I got from Eric that due the database crash (reload), the transitioner is going to be looking for results that "are not there" (the Validate issue). The transitioner could be holding up the scheduler (looking for the missing files). This would mean as all the missing results were resent and/or flushed from the system the problem would be selfhealing. Then as my machines have went through so many WU's and I have a short turnaround time, I am not seeing the issue (I am past it). Those with 10 day Caches or slower machines could be affected for a longer period of time. I don't see how the missing uploads could have anything to do with it. At the time the server is issuing the [RST] packets, it only has two pieces of information available to it: a) The IP address the POST request is coming from. b) The number of bytes in the file I'm trying to POST There certainly wasn't time during the transaction I captured to do any sort of a database lookup to see which host might be calling from this IP address (and there are six to choose from). As it happens, it was 3755243 this time. Normally, that counts as 'fast, small cache', but it's the one which had a small accident with a CUDA card at Beta recently, so it will have passed a lot of "pseudo -9" on Sunday: it has 122 SETI tasks onboard at the moment, but 1823 tasks known to the database (awaiting validation/purging). But, I stress: the server didn't know any of that at the time. All it knew was that some host wanted to send a sched_request containing 108233 bytes - and it threw up its hands in horror and said 'go away'. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
The difference is this: 09-Mar-10 00:36:38 [http_debug] [ID#1] Sent header to server: Host: setiboinc.ssl.berkeley.edu 09-Mar-10 00:36:38 [http_debug] [ID#1] Sent header to server: Accept: */* 09-Mar-10 00:36:38 [http_debug] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip 09-Mar-10 00:36:38 [http_debug] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded 09-Mar-10 00:36:38 [http_debug] [ID#1] Sent header to server: Content-Length: 7375 09-Mar-10 00:36:38 [http_debug] [ID#1] Sent header to server: Expect: 100-continue 09-Mar-10 00:36:38 [http_debug] [ID#1] Sent header to server: 09-Mar-10 00:36:38 [http_debug] [ID#1] Received header from server: HTTP/1.1 100 Continue 09-Mar-10 00:36:39 [http_debug] [ID#1] Received header from server: HTTP/1.1 200 OK 09-Mar-10 00:36:39 [http_debug] [ID#1] Received header from server: Date: Mon, 08 Mar 2010 23:36:33 GMT When it doesn't get through it's doing this: 09-Mar-10 00:37:40 [http_debug] [ID#1] Sent header to server: Host: setiboinc.ssl.berkeley.edu 09-Mar-10 00:37:40 [http_debug] [ID#1] Sent header to server: Accept: */* 09-Mar-10 00:37:40 [http_debug] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip 09-Mar-10 00:37:40 [http_debug] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded 09-Mar-10 00:37:40 [http_debug] [ID#1] Sent header to server: Content-Length: 7375 09-Mar-10 00:37:40 [http_debug] [ID#1] Sent header to server: Expect: 100-continue 09-Mar-10 00:37:40 [http_debug] [ID#1] Sent header to server: 09-Mar-10 00:37:40 [http_debug] [ID#1] Info: Empty reply from server 09-Mar-10 00:37:40 [http_debug] [ID#1] Info: Connection #0 to host setiboinc.ssl.berkeley.edu left intact 09-Mar-10 00:37:40 [http_debug] HTTP error: Server returned nothing (no headers, no data) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
When you do it with BOINC v5.10.13, you get SETI@home 09/03/2010 07:21:45 Scheduler request failed: failed sending data to the peer Why the log can't say "Server reset connection", I'll never know! Edit - I checked the logs for that v5.10.13 machine yesterday, when I was preparing the graph. Over the last two weeks, it has had a scheduler contact failure rate of 46% (271 failures / 589 attempts): the CUDA machine had a failure rate of 'only' 37% (692 failures / 1868 attempts). |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Hi, upload or down-load problems, seems to have vanished, since a few days, only (little) problems with BOINC 6.10.18, loosing it's connection, to local host, a few times a day. All 3 QUAD's, have this problem, one runs 6.10.15 (x86), other 6.10.18 (both Q6600's, same build/mobo), one X64.(6.10.18). Not my LT (WLAN)? Btw. I've set caches to 3 day's on all (4)host's. |
T-Armstrong Send message Joined: 2 Feb 10 Posts: 9 Credit: 312,965 RAC: 0 |
@ Pappa Hello Pappa I always have problems when pay with my MasterCard Platinum Credidt if I send "to" click comes back: You're not a "Präfax" - whats that? I want to help but does not function. |
T-Armstrong Send message Joined: 2 Feb 10 Posts: 9 Credit: 312,965 RAC: 0 |
Sorrx, so Yestserday i send 200 $ an come back in 5 Minutes |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
What about validation errors problem? Las post of that thread from 6 March, but my host experienced one of 8 March http://setiathome.berkeley.edu/workunit.php?wuid=578704554 |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Hi Raistmer, saw only a few invalid WU's, 4 to be exact, 1 on 8, 2 on 7 march, 1 on 27 feb and a few error's, a -9 and -12* and some VLAR kill's. Work Unit Info: ............... WU true angle range is : 0.261920 After app init: total GPU memory 536543232 free GPU memory 487837696 Exception detected inside cudaAcc_find_triplets, dumping client state icfft=204075, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel File: ..\analyzePoT.cpp Line: 348 </stderr_txt> ]]> And this error . (Upper host is mine) Btw. I mentioned before, that all 3 QUAD's were having problems with BOINC 6.10.18./6.10.15, too. That is not correct, my (Q6600;ASUS-P5E;ATI HD5770; 2x1GB DDR2;WIN XP x86), host, has none Only 1 QUAD, the latter, with BOINC 6.10.15. doesn't have problems, loosing it's connection to local host?! Also no upload-problems, in this part of the globe, atm. I'm seriously thinking of downgrading BOINC 6.10.18 to 15 or 6.6.36. Can't remember, those having had, connection problems. The UP-- and DOWN-Loading problems, are IMO, not directly related to a specific BOINC version, although I've experienced a lot of other troubles with 6.10.18. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
And I see completely good result with "validate error " state. Look: http://setiathome.berkeley.edu/result.php?resultid=1532205824 |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.