Incorrect timing in the "Not sending work - last request too recent:" logic.

Message boards : Number crunching : Incorrect timing in the "Not sending work - last request too recent:" logic.
Message board moderation

To post messages, you must log in.

AuthorMessage
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 1970089 - Posted: 13 Dec 2018, 5:38:48 UTC
Last modified: 13 Dec 2018, 5:49:39 UTC

Has anyone encountered this error? I first noticed about 2 1/2 days ago.

It may be a software problem with SETI and/or BOINC, or it just might be a problem with my configuration.
I have been trying different settings on my Core2Quad computer to maximize the throughput for SETI. With the current settings I am now getting two error messages. The type of messages and their appearance on the same day lead me to believe they may be related.
The main error message (and likely the cause of the second message) appears to be a timing logic error.

Here is what I've found:
1. My Core2Quad is currently set to receive no CPU units.
2. If a GPU unit completes within SETI's 303 second wait time, the scheduler sends the "To fetch work." request.
3. The request is denied with "Not sending work - last request too recent:".
4. The "too recent" time listed is always between 296 and 298 seconds.
5. After a set number of errors I get a message like the following: "New system time (1544639648) < old system time (1544639739); clearing timeouts", which I believe is error tallying logic attempting to clear a possible discrepancy between BOINC's and the client computer's clocks.
6. I have verified that the "Communication deferred 00:05:03 countdown that shows in BOINC Manager's Project/Status column does reach zero just before the next request for more work is made.
7. This cycle repeats approximately every 5 minutes (303 seconds).

It would seem that this problem will most likely be limited to very fast rigs. It also appears that communication between the client and the server does occur on each of these failed attempts to obtain new work. If this problem is occurring on many of the very fast rigs, then it is most likely causing both a large amount of unnecessary internet traffic between those clients and the server, and also many wasted CPU cycles on Berkeley's server.
Luckily for me, I occasionally get a wait between completed units long enough to get new work before my 100 unit cache runs dry. However, this problem cycle usually starts again shortly.

I am running BOINC version 7.14.2 (x64) on Microsoft Windows 10 Professional x64 Edition, (10.00.17134.00).

If you like, I will be happy to provide any other information that would help in diagnosing this problem.

Thanks!
ID: 1970089 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1970093 - Posted: 13 Dec 2018, 5:48:43 UTC - in response to Message 1970089.  

2. If a GPU unit completes within SETI's 303 second wait time, the scheduler sends the "To fetch work." request.

I'd suggest going back to the default setting of notrequesting new work as soon as a WU completes. Just let it request work at the end of each request time out period as it normally does.
Grant
Darwin NT
ID: 1970093 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1970095 - Posted: 13 Dec 2018, 5:51:26 UTC - in response to Message 1970093.  

+1

You are requesting work too soon so that is why it's refusing you new work. I complete work in 57 seconds but I allow BOINC to wait until the default timer calls for new work.
ID: 1970095 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 1970098 - Posted: 13 Dec 2018, 6:06:49 UTC - in response to Message 1970093.  

Thank you, Grant. I don't know how you do it, but you seem to very often be the first or second to respond to questions, and usually within a few minutes of posting! I don't know what we would do without you and the amazing group of fellow SETIzens who are always there for us!! I think most of us know those to whom I am referring. :-)

I will try your suggestion immediately, Grant. I don't remember if I ever changed it, but it's highly possible I did sometime during my 19 years of processing SETI. Is that a cc_config setting, an online Preferences setting, or somewhere else?
ID: 1970098 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 1970099 - Posted: 13 Dec 2018, 6:11:23 UTC - in response to Message 1970095.  

Thank you, too, Zalster. I will change that setting as soon as I locate it. As I mentioned to Grant, perhaps one of you could point me in the right direction to make that change.
ID: 1970099 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1970100 - Posted: 13 Dec 2018, 6:24:31 UTC
Last modified: 13 Dec 2018, 6:28:14 UTC

The only one that comes to mind is the <report_results_immediately> option in the cc_config.xml file <options> </options> section.

Edit-
Or report_results_immediately used in app_info.xml
Grant
Darwin NT
ID: 1970100 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 1970101 - Posted: 13 Dec 2018, 7:21:54 UTC - in response to Message 1970100.  
Last modified: 13 Dec 2018, 7:41:30 UTC

I have checked cc_config.xml and found:
<report_results_immediately>0</report_results_immediately>

I also checked app_info.xml and found no <report_results_immediately> line.

Is there anywhere else or, perhaps, some other option that might be causing BOINC to be reporting immediately? My configuration is a little unusual right now in that:
1. app_info.xml's <CUDA> is set to <count>0.5</count> so that only two GPU units are running concurrently.
2. The online Preference "Use CPU" is unchecked.

Could only running very fast GPU units without CPU units be causing a problem?

EDIT-
Come to think of it, my results do not normally post immediately. They usually sit for awhile before BOINC sends them off. During this problem, there are less than 100 GPU units in my cache and no CPU units. This makes me wonder even more if it might be related to the 2 GPU and no CPU configuration I currently have set up. I don't believe the problem started before the CPU units left over in my cache cleared out. They were in there from when the online Preference "Use CPU" was checked.
ID: 1970101 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1970102 - Posted: 13 Dec 2018, 7:31:42 UTC - in response to Message 1970101.  
Last modified: 13 Dec 2018, 7:34:00 UTC

Could only running very fast GPU units without CPU units be causing a problem?

Nope, it would have to be due to some setting somewhere that's overriding the usual behaviour of the Manager.

Edit- unless there is some new bug in the latest version, which is what you appear to be running (i'm still on 7.6.33)
Grant
Darwin NT
ID: 1970102 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22200
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1970105 - Posted: 13 Dec 2018, 7:47:04 UTC

I think this might caused be a hardware issue with your computer:

1544639648
12/12/2018 @ 6:34pm (UTC)
2018-12-12T18:34:08+00:00 in ISO 8601
Wed, 12 Dec 2018 18:34:08 +0000
Wednesday, 12-Dec-18 18:34:08 UTC
2018-12-12T18:34:08+00:00


1544639739
Is equivalent to:
12/12/2018 @ 6:35pm (UTC)
2018-12-12T18:35:39+00:00 in ISO 8601
Wed, 12 Dec 2018 18:35:39 +0000
Wednesday, 12-Dec-18 18:35:39
2018-12-12T18:35:39+00:00


Last time I saw something like this it was the PC real-time clock starting to fail, another source can be the battery on the motherboard failing, or a bus timing issue. Or something else setting & resetting the RTC continuously.
It is worth noting that your tasks are running in about 10-11 minutes, which is "about right" for SoG tasks on a GTX1070.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1970105 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 1970106 - Posted: 13 Dec 2018, 7:47:26 UTC - in response to Message 1970102.  

You may have something there, Grant. I did install the latest version, 7.14.2 (x64), very recently, on November 22.
I wonder if anyone else may have noticed this with the new release?
ID: 1970106 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 1970111 - Posted: 13 Dec 2018, 8:27:37 UTC - in response to Message 1970105.  
Last modified: 13 Dec 2018, 8:31:40 UTC

Rob, you may have found the answer! I apologize for neglecting to mention that I recently had to replace my Intel CPU, as the original failed after running 100% CPU 24x365 for over seven years - mostly SETI , of course. :-) Luckily, I was able to locate an identical Intel CPU . I have been slowly overclocking it a few Mhz at a time. It is definitely possible that I have it currently set at a speed that is interfering with the bus timing. I currently have the bus overclocked at 2.7%.

Since the overclocking testing was going quite well, and I have had no SETI errors (we all know how good SETI is at thoroughly testing a rig), the idea that the overclocking might be the cause completely escaped me. I will immediately try a different clock setting to see if that corrects the problem!

I do very careful testing on overclocking, stepping up only the minimum amount possible each time. Therefore, it could take up to a day to know for sure if the overclocking frequency currently in use was the problem. If I am lucky, I might be able to verify that my SETI problem is fixed in just a few hours. Regardless, I will let all of you know what I find.

Thanks again to each of you for your help. Each of you gave me pertinent information that, combined, quickly led to a plausible hypothesis. Now, I'm off to test the theory! ;-)

EDIT-
By the way, Rob, I feel the same about Pluto!
ID: 1970111 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1970113 - Posted: 13 Dec 2018, 8:40:47 UTC

You get that effect as well when you have a very low Store at least N days of work and a high Store up to an additional N days of work, as the first is the low water mark and when BOINC sends one task back, it immediately tries to up the cache again by at least one task to make the low water mark.
ID: 1970113 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 1970115 - Posted: 13 Dec 2018, 8:55:11 UTC - in response to Message 1970113.  
Last modified: 13 Dec 2018, 9:01:44 UTC

Thank you, Ageless (sorry, I don't know your given name). While that is not the case here, your information is very useful to me, as I very often tweak the work and additional work stores on all my machines, usually in an attempt to get more than a handful of astropulse units when they become available. Astropulses have always been very important to me for a very special reason, which I might relate some day.

EDIT-
By the way, my name is David.
ID: 1970115 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 1970304 - Posted: 14 Dec 2018, 14:36:12 UTC

Hi everyone!
I promised an update on this. I reset the bus, memory and CPU speeds, updated the system time via time.nist.gov, and restarted SETI. The problem disappeared. I then increased the speeds again, and the problem returned. So, for now, I will leave the system at the defaults.

Thank you again for your help!
David
ID: 1970304 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22200
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1970341 - Posted: 14 Dec 2018, 18:45:23 UTC

Glad you've sorted it without having to spend any money :-)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1970341 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 1970457 - Posted: 15 Dec 2018, 11:30:12 UTC
Last modified: 15 Dec 2018, 11:31:54 UTC

u can modify the time between time updates in the registry ...

for XP with Regedit, u have to find where with 7 or X

HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\W32Time\TimeProviders\NtpClient

Modify SpecialPollInterval, if you set 3600, it update time every hours , default setting is 604800sec aka 1 Week

i've sometimes the RTC fail at cold startup, too fast time count .
ID: 1970457 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1970459 - Posted: 15 Dec 2018, 11:52:19 UTC - in response to Message 1970457.  

But if the clock is drifting by more than 1% in five minutes [*], resetting once an hour is nowhere near enough - and the public NTP servers won't thank you for pestering them every five minutes. You'll probably get your IP address blacklisted. The best solution would be to set up an authoritative time source (Windows Domain Controller or equivalent) on your local LAN: update that daily, and your problematic machine can pester it every second if you want.

* There's an inbuilt 1% safeguard in the BOINC system already, to try and prevent this type of failure. This project has set a backoff time of 300 seconds, but our clients are told to wait for 303 seconds.
ID: 1970459 · Report as offensive

Message boards : Number crunching : Incorrect timing in the "Not sending work - last request too recent:" logic.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.