Message boards :
Number crunching :
Got *much* more work than asked for
Message board moderation
Author | Message |
---|---|
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
This got to be the first time someone complains about receiving work on this board... 21.1.2009 10:37:30|SETI@home|Requesting 6198 seconds of new work, and reporting 1 completed tasks 21.1.2009 10:37:35|SETI@home|Scheduler RPC succeeded [server version 607] 21.1.2009 10:37:35|SETI@home|Deferring communication for 11 sec 21.1.2009 10:37:35|SETI@home|Reason: requested by project After this BOINC downloaded 20 workunits, totalling about 100-130 hours of crunch time. Estimated crunch time per work unit is around 3 hours or 10 hours. As far as I can tell, all the time stats, DCF and all the other metrics were sane at time of request (scheduler request and reply have not yet been overwritten). I have cache set to 0.5+0.5 so hundred hours is way too much. This may have something to do with reporting a -9 result at the same time, although I am not quite sure if it's related. I think this is a bug of some sort, likely something in server side. I'm using BOINC 5.10.13 and this is the first time anything like this has happenened so I don't think it is a bug on this end. I haven't been around the boards for a while so excuse me if this has already been reported in some other thread. -Juha |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
I looked at your new work. YOu should be fine with the work you have. you have about 50% small WU's with short(1 week) TAT and 50% large WU's with long(3+ weeks) TAT. I don't think you'll have any time issues after looking at how fast you are returning the small WU's 10 small WU's should take you about 30 hours to return. thats not very long considering you have aweek to do them. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Mike Bader Send message Joined: 18 May 99 Posts: 231 Credit: 20,366,214 RAC: 33 |
I also got a ton of work units, many times the usual number. Just on one of my machines. Yesh they were all very short. We will see if they are in fact short or if maybe the benchmark estimate just glithced. computer 4728132 Mike Bader BOINC V7.16.5 http://setiathome.berkeley.edu/team_join_form.php?id=5 - Join Our International Team [img]http://boinc.mundayweb.com/one/stats.php? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
I had the same thing on 15 January, with a 69-second work request being filled with a 12-day allocation - reported in Work fetch anomaly. So although it's still rare, there does seem to be more than a one-off problem: that perhaps makes it worth investigating? |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
It looks like they received short WU's with a few long ones sprinkled in. I don't think that its an anomaly. Just that the server is sending out short WU's recently In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
It looks like they received short WU's with a few long ones sprinkled in. I don't think that its an anomaly. Just that the server is sending out short WU's recently No. I received 5 Astropulse tasks on a slow, single-core P4 that should never, ever, be allocated more than one AP task at a time - they take 2 days to run, and I have a 1 day, 50% resource share cache. |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
I looked at your new work. YOu should be fine with the work you have. you have about 50% small WU's with short(1 week) TAT and 50% large WU's with long(3+ weeks) TAT. I don't think you'll have any time issues after looking at how fast you are returning the small WU's 10 small WU's should take you about 30 hours to return. thats not very long considering you have aweek to do them. Yes, one could say I got lucky. Had I had larger cache, say 5 days, I think I would have hard time returning the shorties in time. I also have some Spinhenge workunits on board and those too have one week deadline. For some other cruncher this may have meant missed deadlines. -Juha |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
I had the same thing on 15 January, with a 69-second work request being filled with a 12-day allocation - reported in Work fetch anomaly. So although it's still rare, there does seem to be more than a one-off problem: that perhaps makes it worth investigating? I thought this was a new problem and looked at the first page only. Your thread is, of course, in second page. IIRC, the server is allowed to send at most 20 workunits at a time. What I find interesting is that that is what I got. As if the server didn't bother counting how much work it had already assigned to me and just gave as much as it's allowed. Yours doesn't quite match that or maybe the server didn't have enough work at hand at that time. There's a post at Beta that sounds like same issue. So that's four reports. Might need investigating. Btw. I have preferences set to allow AP but it's not included in app_info.xml so that is why I only got MB workunits. Twenty 300 hour AP workunits would have been fun. -Juha |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
So that's four reports. Might need investigating. On two different servers, both of which are likely to have been recently updated with the very latest server patches. Also there are two reports at BOINC dev of work being allocated with no work request at all! Might be related. |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
Also there are two reports at BOINC dev of work being allocated with no work request at all! Might be related. So what we really have is several cases where scheduler ignores some of the constraints given to it. It sure looks related. -Juha |
W-K 666 Send message Joined: 18 May 99 Posts: 19321 Credit: 40,757,560 RAC: 67 |
I have this report: 20/01/2009 12:15:51|SETI@home|Sending scheduler request: To report completed tasks 20/01/2009 12:15:51|SETI@home|Reporting 5 tasks 20/01/2009 12:16:01|SETI@home|Scheduler RPC succeeded [server version 607] 20/01/2009 12:16:01|SETI@home|Deferring communication for 11 sec 20/01/2009 12:16:01|SETI@home|Reason: requested by project on host 688149 which resulted in 20 tasks downloaded. But next pages also shows 21 tasks d/loaded @ 04:31, no msgs, s/ware update required re-boot between d/loads. All times GMT (AKA UTC), [off-topic]Q? Why do we use UTC, by definition UTC time is only known after the event.[/off-topic] edit] these d/loads totaled over 200hrs of work on quad with 1 day cache, shared with Einstein 75:25. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
You can still see what what the messages were, by retrieving the stdoutdae.txt file (I think that's the right name, don't have a copy here) from the BOINC data directory. |
W-K 666 Send message Joined: 18 May 99 Posts: 19321 Credit: 40,757,560 RAC: 67 |
You can still see what what the messages were, by retrieving the stdoutdae.txt file (I think that's the right name, don't have a copy here) from the BOINC data directory. No I can't mine is cprrupt for some reason, since June last year. And I have been too busy (maybe should read "idle") to do anything about it. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
You can still see what what the messages were, by retrieving the stdoutdae.txt file (I think that's the right name, don't have a copy here) from the BOINC data directory. Hm. Well if it is corrupt, one thing you can do is shut BOINC down and just delete it. It will create a new one from scratch. One thing I did for mine was in cc_config, I set it to rotate once it gets to 100mb. Often times with a 3-5 week uptime, a 2mb log file only goes back about 15-20 days, depending on how much work a system can do, and how much (if any) debugging flags you have set. Just more options for you if you decide that you want to do that. [on topic] I have also noticed how the requested work seconds and the tasks that get assigned don't really seem to have any correlation. I still use 6.2.19, and I've noticed this oddidty since the 5.x.x days. Requesting <100 seconds of work will either get one normal MB, one shorty, or one AP. That doesn't really seem to add up at all. The other thing I've noticed is sometimes for example, requesting 1500 seconds of work results in one full-length MB, but requesting 1100 seconds of work results in 20 shorties. I don't know, just seems like there's an issue somewhere. *shrug* Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
RandyC Send message Joined: 20 Oct 99 Posts: 714 Credit: 1,704,345 RAC: 0 |
You can still see what what the messages were, by retrieving the stdoutdae.txt file (I think that's the right name, don't have a copy here) from the BOINC data directory. Easy enough to fix then... 1. Stop BOINC 2. Rename stdoutdae.txt [edit]or delete it[/edit] 3. Start BOINC |
W-K 666 Send message Joined: 18 May 99 Posts: 19321 Credit: 40,757,560 RAC: 67 |
This is what I get from a new stderrdae.txt file: UNRECOGNIZED: suspend_if_no_recent_input UNRECOGNIZED: max_ncpus_pct UNRECOGNIZED: suspend_if_no_recent_input UNRECOGNIZED: max_ncpus_pct UNRECOGNIZED: day_prefs UNRECOGNIZED: /global_preferences idea's please |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Those messages only tell you that the science application version you are running wasn't built against the latest BOINC API. Seeing how you run an optimized 5.28, that's the reason. |
W-K 666 Send message Joined: 18 May 99 Posts: 19321 Credit: 40,757,560 RAC: 67 |
Those messages only tell you that the science application version you are running wasn't built against the latest BOINC API. Seeing how you run an optimized 5.28, that's the reason. Jord, Now you really got me confused, AFAIK, Richard runs same versions of BOINC and Seti app, but he has good stderrdae.txt. Richards Computer ID 3751792 Report deadline 28 Jan 2009 9:36:08 UTC CPU time 1164.969 stderr out <core_client_version>5.10.13</core_client_version> <![CDATA[ <stderr_txt> Windows optimized S@H Enhanced application by Alex Kan Version info: SSSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan SSSE3x Win32 Build 59 , Ported by : Jason G, Raistmer, JDWhale CPUID: Intel(R) Core(TM)2 Quad CPU @ 2.40GHz Speed: 4 x 2398 MHz Cache: L1=64K L2=4096K Features: MMX SSE SSE2 SSE3 SSSE3 My Computer ID 688149 Report deadline 27 Jan 2009 4:31:48 UTC CPU time 821.7813 stderr out <core_client_version>5.10.13</core_client_version> <![CDATA[ <stderr_txt> Windows optimized S@H Enhanced application by Alex Kan Version info: SSSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan SSSE3x Win32 Build 41 , Ported by : Jason G, Raistmer, JDWhale CPUID: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz Speed: 4 x 2996 MHz Cache: L1=64K L2=4096K Features: MMX SSE SSE2 SSE3 SSSE3 Andy |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Richard is still using 5.8.something, isn't he? Or was that Brian Silvers? Edit... I see he's using 5.10.13 .. these messages only show when you're using a BOINC 6 with an older application. Anyway, I have them too... <core_client_version>6.6.2</core_client_version> <![CDATA[ <stderr_txt> Unrecognized XML in parse_init_data_file: computation_deadline Skipping: 1234496995.136000 Skipping: /computation_deadline Unrecognized XML in parse_init_data_file: computation_deadline Skipping: 1234496995.136000 Skipping: /computation_deadline Windows optimized S@H Enhanced application by Alex Kan Version info: SSE2x (AMD/Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan SSE2x Win32 Build 44 , Ported by : Jason G, Raistmer, JDWhale CPUID: Intel(R) Pentium(R) 4 CPU 3.00GHz Speed: 1 x 2999 MHz Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 0.418633 Unrecognized XML in parse_init_data_file: computation_deadline Skipping: 1234496995.136000 Skipping: /computation_deadline Unrecognized XML in parse_init_data_file: computation_deadline Skipping: 1234496995.136000 Skipping: /computation_deadline Restarted at 29.10 percent. Unrecognized XML in parse_init_data_file: computation_deadline Skipping: 1234496995.136000 Skipping: /computation_deadline Unrecognized XML in parse_init_data_file: computation_deadline Skipping: 1234496995.136000 Skipping: /computation_deadline Restarted at 92.28 percent. Flopcounter: 16093706979741.047000 Spike count: 0 Pulse count: 1 Triplet count: 0 Gaussian count: 0 called boinc_finish </stderr_txt> ]]> Also running 5.28 |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
@ Andy, Now I'm back at home, I can confirm those filenames. stderrdae.txt : BOINC (daemon) errors, like those UNRECOGNIZED: suspend_if_no_recent_input UNRECOGNIZED: max_ncpus_pct - yes, I get them too, with a v5 client and a v5 SETI app. Maybe it's because I run Einstein too. stdoutdae.txt : BOINC (daemon) output, the message log we were looking for in the first place. stderr.txt : An application (SETI, in this case) output file, copied to the application task page like the sample you showed us. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.