Incorrect timing in the "Not sending work - last request too recent:" logic.

Author	Message
Cherokee150 Send message Joined: 11 Nov 99 Posts: 192 Credit: 58,513,758 RAC: 74	Message 1970089 - Posted: 13 Dec 2018, 5:38:48 UTC Last modified: 13 Dec 2018, 5:49:39 UTC Has anyone encountered this error? I first noticed about 2 1/2 days ago. It may be a software problem with SETI and/or BOINC, or it just might be a problem with my configuration. I have been trying different settings on my Core2Quad computer to maximize the throughput for SETI. With the current settings I am now getting two error messages. The type of messages and their appearance on the same day lead me to believe they may be related. The main error message (and likely the cause of the second message) appears to be a timing logic error. Here is what I've found: 1. My Core2Quad is currently set to receive no CPU units. 2. If a GPU unit completes within SETI's 303 second wait time, the scheduler sends the "To fetch work." request. 3. The request is denied with "Not sending work - last request too recent:". 4. The "too recent" time listed is always between 296 and 298 seconds. 5. After a set number of errors I get a message like the following: "New system time (1544639648) < old system time (1544639739); clearing timeouts", which I believe is error tallying logic attempting to clear a possible discrepancy between BOINC's and the client computer's clocks. 6. I have verified that the "Communication deferred 00:05:03 countdown that shows in BOINC Manager's Project/Status column does reach zero just before the next request for more work is made. 7. This cycle repeats approximately every 5 minutes (303 seconds). It would seem that this problem will most likely be limited to very fast rigs. It also appears that communication between the client and the server does occur on each of these failed attempts to obtain new work. If this problem is occurring on many of the very fast rigs, then it is most likely causing both a large amount of unnecessary internet traffic between those clients and the server, and also many wasted CPU cycles on Berkeley's server. Luckily for me, I occasionally get a wait between completed units long enough to get new work before my 100 unit cache runs dry. However, this problem cycle usually starts again shortly. I am running BOINC version 7.14.2 (x64) on Microsoft Windows 10 Professional x64 Edition, (10.00.17134.00). If you like, I will be happy to provide any other information that would help in diagnosing this problem. Thanks! ID: 1970089 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1970093 - Posted: 13 Dec 2018, 5:48:43 UTC - in response to Message 1970089. 2. If a GPU unit completes within SETI's 303 second wait time, the scheduler sends the "To fetch work." request. I'd suggest going back to the default setting of notrequesting new work as soon as a WU completes. Just let it request work at the end of each request time out period as it normally does. Grant Darwin NT ID: 1970093 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1970095 - Posted: 13 Dec 2018, 5:51:26 UTC - in response to Message 1970093. +1 You are requesting work too soon so that is why it's refusing you new work. I complete work in 57 seconds but I allow BOINC to wait until the default timer calls for new work. ID: 1970095 ·

Cherokee150 Send message Joined: 11 Nov 99 Posts: 192 Credit: 58,513,758 RAC: 74	Message 1970098 - Posted: 13 Dec 2018, 6:06:49 UTC - in response to Message 1970093. Thank you, Grant. I don't know how you do it, but you seem to very often be the first or second to respond to questions, and usually within a few minutes of posting! I don't know what we would do without you and the amazing group of fellow SETIzens who are always there for us!! I think most of us know those to whom I am referring. :-) I will try your suggestion immediately, Grant. I don't remember if I ever changed it, but it's highly possible I did sometime during my 19 years of processing SETI. Is that a cc_config setting, an online Preferences setting, or somewhere else? ID: 1970098 ·

Cherokee150 Send message Joined: 11 Nov 99 Posts: 192 Credit: 58,513,758 RAC: 74	Message 1970099 - Posted: 13 Dec 2018, 6:11:23 UTC - in response to Message 1970095. Thank you, too, Zalster. I will change that setting as soon as I locate it. As I mentioned to Grant, perhaps one of you could point me in the right direction to make that change. ID: 1970099 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1970100 - Posted: 13 Dec 2018, 6:24:31 UTC Last modified: 13 Dec 2018, 6:28:14 UTC The only one that comes to mind is the <report_results_immediately> option in the cc_config.xml file <options> </options> section. Edit- Or report_results_immediately used in app_info.xml Grant Darwin NT ID: 1970100 ·

Cherokee150 Send message Joined: 11 Nov 99 Posts: 192 Credit: 58,513,758 RAC: 74	Message 1970101 - Posted: 13 Dec 2018, 7:21:54 UTC - in response to Message 1970100. Last modified: 13 Dec 2018, 7:41:30 UTC I have checked cc_config.xml and found: <report_results_immediately>0</report_results_immediately> I also checked app_info.xml and found no <report_results_immediately> line. Is there anywhere else or, perhaps, some other option that might be causing BOINC to be reporting immediately? My configuration is a little unusual right now in that: 1. app_info.xml's <CUDA> is set to <count>0.5</count> so that only two GPU units are running concurrently. 2. The online Preference "Use CPU" is unchecked. Could only running very fast GPU units without CPU units be causing a problem? EDIT- Come to think of it, my results do not normally post immediately. They usually sit for awhile before BOINC sends them off. During this problem, there are less than 100 GPU units in my cache and no CPU units. This makes me wonder even more if it might be related to the 2 GPU and no CPU configuration I currently have set up. I don't believe the problem started before the CPU units left over in my cache cleared out. They were in there from when the online Preference "Use CPU" was checked. ID: 1970101 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1970102 - Posted: 13 Dec 2018, 7:31:42 UTC - in response to Message 1970101. Last modified: 13 Dec 2018, 7:34:00 UTC Could only running very fast GPU units without CPU units be causing a problem? Nope, it would have to be due to some setting somewhere that's overriding the usual behaviour of the Manager. Edit- unless there is some new bug in the latest version, which is what you appear to be running (i'm still on 7.6.33) Grant Darwin NT ID: 1970102 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 1970105 - Posted: 13 Dec 2018, 7:47:04 UTC I think this might caused be a hardware issue with your computer: 1544639648 12/12/2018 @ 6:34pm (UTC) 2018-12-12T18:34:08+00:00 in ISO 8601 Wed, 12 Dec 2018 18:34:08 +0000 Wednesday, 12-Dec-18 18:34:08 UTC 2018-12-12T18:34:08+00:00 1544639739 Is equivalent to: 12/12/2018 @ 6:35pm (UTC) 2018-12-12T18:35:39+00:00 in ISO 8601 Wed, 12 Dec 2018 18:35:39 +0000 Wednesday, 12-Dec-18 18:35:39 2018-12-12T18:35:39+00:00 Last time I saw something like this it was the PC real-time clock starting to fail, another source can be the battery on the motherboard failing, or a bus timing issue. Or something else setting & resetting the RTC continuously. It is worth noting that your tasks are running in about 10-11 minutes, which is "about right" for SoG tasks on a GTX1070. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1970105 ·

Cherokee150 Send message Joined: 11 Nov 99 Posts: 192 Credit: 58,513,758 RAC: 74	Message 1970106 - Posted: 13 Dec 2018, 7:47:26 UTC - in response to Message 1970102. You may have something there, Grant. I did install the latest version, 7.14.2 (x64), very recently, on November 22. I wonder if anyone else may have noticed this with the new release? ID: 1970106 ·

Cherokee150 Send message Joined: 11 Nov 99 Posts: 192 Credit: 58,513,758 RAC: 74	Message 1970111 - Posted: 13 Dec 2018, 8:27:37 UTC - in response to Message 1970105. Last modified: 13 Dec 2018, 8:31:40 UTC Rob, you may have found the answer! I apologize for neglecting to mention that I recently had to replace my Intel CPU, as the original failed after running 100% CPU 24x365 for over seven years - mostly SETI , of course. :-) Luckily, I was able to locate an identical Intel CPU . I have been slowly overclocking it a few Mhz at a time. It is definitely possible that I have it currently set at a speed that is interfering with the bus timing. I currently have the bus overclocked at 2.7%. Since the overclocking testing was going quite well, and I have had no SETI errors (we all know how good SETI is at thoroughly testing a rig), the idea that the overclocking might be the cause completely escaped me. I will immediately try a different clock setting to see if that corrects the problem! I do very careful testing on overclocking, stepping up only the minimum amount possible each time. Therefore, it could take up to a day to know for sure if the overclocking frequency currently in use was the problem. If I am lucky, I might be able to verify that my SETI problem is fixed in just a few hours. Regardless, I will let all of you know what I find. Thanks again to each of you for your help. Each of you gave me pertinent information that, combined, quickly led to a plausible hypothesis. Now, I'm off to test the theory! ;-) EDIT- By the way, Rob, I feel the same about Pluto! ID: 1970111 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 1970113 - Posted: 13 Dec 2018, 8:40:47 UTC You get that effect as well when you have a very low Store at least N days of work and a high Store up to an additional N days of work, as the first is the low water mark and when BOINC sends one task back, it immediately tries to up the cache again by at least one task to make the low water mark. ID: 1970113 ·

Cherokee150 Send message Joined: 11 Nov 99 Posts: 192 Credit: 58,513,758 RAC: 74	Message 1970115 - Posted: 13 Dec 2018, 8:55:11 UTC - in response to Message 1970113. Last modified: 13 Dec 2018, 9:01:44 UTC Thank you, Ageless (sorry, I don't know your given name). While that is not the case here, your information is very useful to me, as I very often tweak the work and additional work stores on all my machines, usually in an attempt to get more than a handful of astropulse units when they become available. Astropulses have always been very important to me for a very special reason, which I might relate some day. EDIT- By the way, my name is David. ID: 1970115 ·

Cherokee150 Send message Joined: 11 Nov 99 Posts: 192 Credit: 58,513,758 RAC: 74	Message 1970304 - Posted: 14 Dec 2018, 14:36:12 UTC Hi everyone! I promised an update on this. I reset the bus, memory and CPU speeds, updated the system time via time.nist.gov, and restarted SETI. The problem disappeared. I then increased the speeds again, and the problem returned. So, for now, I will leave the system at the defaults. Thank you again for your help! David ID: 1970304 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 1970341 - Posted: 14 Dec 2018, 18:45:23 UTC Glad you've sorted it without having to spend any money :-) Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1970341 ·

Kissagogo27 Send message Joined: 6 Nov 99 Posts: 716 Credit: 8,032,827 RAC: 62	Message 1970457 - Posted: 15 Dec 2018, 11:30:12 UTC Last modified: 15 Dec 2018, 11:31:54 UTC u can modify the time between time updates in the registry ... for XP with Regedit, u have to find where with 7 or X HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\W32Time\TimeProviders\NtpClient Modify SpecialPollInterval, if you set 3600, it update time every hours , default setting is 604800sec aka 1 Week i've sometimes the RTC fail at cold startup, too fast time count . ID: 1970457 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1970459 - Posted: 15 Dec 2018, 11:52:19 UTC - in response to Message 1970457. But if the clock is drifting by more than 1% in five minutes [], resetting once an hour is nowhere near enough - and the public NTP servers won't thank you for pestering them every five minutes. You'll probably get your IP address blacklisted. The best solution would be to set up an authoritative time source (Windows Domain Controller or equivalent) on your local LAN: update that daily, and your problematic machine can pester it every second if you want. There's an inbuilt 1% safeguard in the BOINC system already, to try and prevent this type of failure. This project has set a backoff time of 300 seconds, but our clients are told to wait for 303 seconds. ID: 1970459 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.