Got *much* more work than asked for

Message boards : Number crunching : Got *much* more work than asked for
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 855982 - Posted: 21 Jan 2009, 14:46:33 UTC

This got to be the first time someone complains about receiving work on this board...

21.1.2009 10:37:30|SETI@home|Requesting 6198 seconds of new work, and reporting 1 completed tasks
21.1.2009 10:37:35|SETI@home|Scheduler RPC succeeded [server version 607]
21.1.2009 10:37:35|SETI@home|Deferring communication for 11 sec
21.1.2009 10:37:35|SETI@home|Reason: requested by project


After this BOINC downloaded 20 workunits, totalling about 100-130 hours of crunch time. Estimated crunch time per work unit is around 3 hours or 10 hours.

As far as I can tell, all the time stats, DCF and all the other metrics were sane at time of request (scheduler request and reply have not yet been overwritten). I have cache set to 0.5+0.5 so hundred hours is way too much.

This may have something to do with reporting a -9 result at the same time, although I am not quite sure if it's related.

I think this is a bug of some sort, likely something in server side. I'm using BOINC 5.10.13 and this is the first time anything like this has happenened so I don't think it is a bug on this end.

I haven't been around the boards for a while so excuse me if this has already been reported in some other thread.

-Juha
ID: 855982 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 855984 - Posted: 21 Jan 2009, 14:54:22 UTC - in response to Message 855982.  

I looked at your new work. YOu should be fine with the work you have. you have about 50% small WU's with short(1 week) TAT and 50% large WU's with long(3+ weeks) TAT. I don't think you'll have any time issues after looking at how fast you are returning the small WU's 10 small WU's should take you about 30 hours to return. thats not very long considering you have aweek to do them.


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 855984 · Report as offensive
Profile Mike Bader Project Donor
Volunteer tester
Avatar

Send message
Joined: 18 May 99
Posts: 231
Credit: 20,366,214
RAC: 33
Message 855990 - Posted: 21 Jan 2009, 15:13:45 UTC - in response to Message 855984.  

I also got a ton of work units, many times the usual number.
Just on one of my machines.
Yesh they were all very short.
We will see if they are in fact short or if maybe the benchmark estimate just glithced.
computer 4728132


Mike Bader
BOINC V7.16.5
http://setiathome.berkeley.edu/team_join_form.php?id=5 - Join Our International Team
[img]http://boinc.mundayweb.com/one/stats.php?
ID: 855990 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 855992 - Posted: 21 Jan 2009, 15:14:21 UTC

I had the same thing on 15 January, with a 69-second work request being filled with a 12-day allocation - reported in Work fetch anomaly. So although it's still rare, there does seem to be more than a one-off problem: that perhaps makes it worth investigating?
ID: 855992 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 855996 - Posted: 21 Jan 2009, 15:21:59 UTC

It looks like they received short WU's with a few long ones sprinkled in. I don't think that its an anomaly. Just that the server is sending out short WU's recently


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 855996 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 855997 - Posted: 21 Jan 2009, 15:25:53 UTC - in response to Message 855996.  

It looks like they received short WU's with a few long ones sprinkled in. I don't think that its an anomaly. Just that the server is sending out short WU's recently

No.

I received 5 Astropulse tasks on a slow, single-core P4 that should never, ever, be allocated more than one AP task at a time - they take 2 days to run, and I have a 1 day, 50% resource share cache.
ID: 855997 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 856000 - Posted: 21 Jan 2009, 15:41:45 UTC - in response to Message 855984.  

I looked at your new work. YOu should be fine with the work you have. you have about 50% small WU's with short(1 week) TAT and 50% large WU's with long(3+ weeks) TAT. I don't think you'll have any time issues after looking at how fast you are returning the small WU's 10 small WU's should take you about 30 hours to return. thats not very long considering you have aweek to do them.

Yes, one could say I got lucky. Had I had larger cache, say 5 days, I think I would have hard time returning the shorties in time. I also have some Spinhenge workunits on board and those too have one week deadline.

For some other cruncher this may have meant missed deadlines.

-Juha
ID: 856000 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 856004 - Posted: 21 Jan 2009, 16:06:29 UTC - in response to Message 855992.  

I had the same thing on 15 January, with a 69-second work request being filled with a 12-day allocation - reported in Work fetch anomaly. So although it's still rare, there does seem to be more than a one-off problem: that perhaps makes it worth investigating?

I thought this was a new problem and looked at the first page only. Your thread is, of course, in second page.

IIRC, the server is allowed to send at most 20 workunits at a time. What I find interesting is that that is what I got. As if the server didn't bother counting how much work it had already assigned to me and just gave as much as it's allowed.

Yours doesn't quite match that or maybe the server didn't have enough work at hand at that time.

There's a post at Beta that sounds like same issue. So that's four reports. Might need investigating.

Btw. I have preferences set to allow AP but it's not included in app_info.xml so that is why I only got MB workunits. Twenty 300 hour AP workunits would have been fun.

-Juha
ID: 856004 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 856007 - Posted: 21 Jan 2009, 16:12:36 UTC - in response to Message 856004.  

So that's four reports. Might need investigating.

-Juha

On two different servers, both of which are likely to have been recently updated with the very latest server patches.

Also there are two reports at BOINC dev of work being allocated with no work request at all! Might be related.
ID: 856007 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 856020 - Posted: 21 Jan 2009, 17:04:45 UTC - in response to Message 856007.  

Also there are two reports at BOINC dev of work being allocated with no work request at all! Might be related.

So what we really have is several cases where scheduler ignores some of the constraints given to it. It sure looks related.

-Juha
ID: 856020 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19403
Credit: 40,757,560
RAC: 67
United Kingdom
Message 856022 - Posted: 21 Jan 2009, 17:10:37 UTC
Last modified: 21 Jan 2009, 17:12:43 UTC

I have this report:
20/01/2009 12:15:51|SETI@home|Sending scheduler request: To report completed tasks
20/01/2009 12:15:51|SETI@home|Reporting 5 tasks
20/01/2009 12:16:01|SETI@home|Scheduler RPC succeeded [server version 607]
20/01/2009 12:16:01|SETI@home|Deferring communication for 11 sec
20/01/2009 12:16:01|SETI@home|Reason: requested by project

on host 688149 which resulted in 20 tasks downloaded.

But next pages also shows 21 tasks d/loaded @ 04:31, no msgs, s/ware update required re-boot between d/loads.

All times GMT (AKA UTC), [off-topic]Q? Why do we use UTC, by definition UTC time is only known after the event.[/off-topic]

edit] these d/loads totaled over 200hrs of work on quad with 1 day cache, shared with Einstein 75:25.
ID: 856022 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 856025 - Posted: 21 Jan 2009, 17:19:54 UTC - in response to Message 856022.  

You can still see what what the messages were, by retrieving the stdoutdae.txt file (I think that's the right name, don't have a copy here) from the BOINC data directory.
ID: 856025 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19403
Credit: 40,757,560
RAC: 67
United Kingdom
Message 856027 - Posted: 21 Jan 2009, 17:23:59 UTC - in response to Message 856025.  

You can still see what what the messages were, by retrieving the stdoutdae.txt file (I think that's the right name, don't have a copy here) from the BOINC data directory.

No I can't mine is cprrupt for some reason, since June last year. And I have been too busy (maybe should read "idle") to do anything about it.
ID: 856027 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 856048 - Posted: 21 Jan 2009, 18:23:46 UTC - in response to Message 856027.  

You can still see what what the messages were, by retrieving the stdoutdae.txt file (I think that's the right name, don't have a copy here) from the BOINC data directory.

No I can't mine is cprrupt for some reason, since June last year. And I have been too busy (maybe should read "idle") to do anything about it.

Hm. Well if it is corrupt, one thing you can do is shut BOINC down and just delete it. It will create a new one from scratch. One thing I did for mine was in cc_config, I set it to rotate once it gets to 100mb. Often times with a 3-5 week uptime, a 2mb log file only goes back about 15-20 days, depending on how much work a system can do, and how much (if any) debugging flags you have set.

Just more options for you if you decide that you want to do that.


[on topic] I have also noticed how the requested work seconds and the tasks that get assigned don't really seem to have any correlation. I still use 6.2.19, and I've noticed this oddidty since the 5.x.x days. Requesting <100 seconds of work will either get one normal MB, one shorty, or one AP. That doesn't really seem to add up at all. The other thing I've noticed is sometimes for example, requesting 1500 seconds of work results in one full-length MB, but requesting 1100 seconds of work results in 20 shorties. I don't know, just seems like there's an issue somewhere. *shrug*
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 856048 · Report as offensive
Profile RandyC
Avatar

Send message
Joined: 20 Oct 99
Posts: 714
Credit: 1,704,345
RAC: 0
United States
Message 856050 - Posted: 21 Jan 2009, 18:28:10 UTC - in response to Message 856027.  
Last modified: 21 Jan 2009, 18:28:58 UTC

You can still see what what the messages were, by retrieving the stdoutdae.txt file (I think that's the right name, don't have a copy here) from the BOINC data directory.

No I can't mine is cprrupt for some reason, since June last year. And I have been too busy (maybe should read "idle") to do anything about it.


Easy enough to fix then...
1. Stop BOINC
2. Rename stdoutdae.txt [edit]or delete it[/edit]
3. Start BOINC
ID: 856050 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19403
Credit: 40,757,560
RAC: 67
United Kingdom
Message 856055 - Posted: 21 Jan 2009, 18:37:04 UTC

This is what I get from a new stderrdae.txt file:

UNRECOGNIZED: suspend_if_no_recent_input
UNRECOGNIZED: max_ncpus_pct
UNRECOGNIZED: suspend_if_no_recent_input
UNRECOGNIZED: max_ncpus_pct
UNRECOGNIZED: day_prefs
UNRECOGNIZED: /global_preferences

idea's please
ID: 856055 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 856057 - Posted: 21 Jan 2009, 18:39:18 UTC - in response to Message 856055.  

Those messages only tell you that the science application version you are running wasn't built against the latest BOINC API. Seeing how you run an optimized 5.28, that's the reason.
ID: 856057 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19403
Credit: 40,757,560
RAC: 67
United Kingdom
Message 856065 - Posted: 21 Jan 2009, 18:47:16 UTC - in response to Message 856057.  

Those messages only tell you that the science application version you are running wasn't built against the latest BOINC API. Seeing how you run an optimized 5.28, that's the reason.

Jord,
Now you really got me confused, AFAIK, Richard runs same versions of BOINC and Seti app, but he has good stderrdae.txt.

Richards
Computer ID 3751792
Report deadline 28 Jan 2009 9:36:08 UTC
CPU time 1164.969
stderr out

<core_client_version>5.10.13</core_client_version>
<![CDATA[
<stderr_txt>
Windows optimized S@H Enhanced application by Alex Kan
Version info: SSSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSSE3x Win32 Build 59 , Ported by : Jason G, Raistmer, JDWhale

CPUID: Intel(R) Core(TM)2 Quad CPU @ 2.40GHz
Speed: 4 x 2398 MHz
Cache: L1=64K L2=4096K
Features: MMX SSE SSE2 SSE3 SSSE3

My
Computer ID 688149
Report deadline 27 Jan 2009 4:31:48 UTC
CPU time 821.7813
stderr out

<core_client_version>5.10.13</core_client_version>
<![CDATA[
<stderr_txt>
Windows optimized S@H Enhanced application by Alex Kan
Version info: SSSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSSE3x Win32 Build 41 , Ported by : Jason G, Raistmer, JDWhale

CPUID: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
Speed: 4 x 2996 MHz
Cache: L1=64K L2=4096K
Features: MMX SSE SSE2 SSE3 SSSE3

Andy
ID: 856065 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 856076 - Posted: 21 Jan 2009, 18:56:51 UTC - in response to Message 856065.  
Last modified: 21 Jan 2009, 18:58:14 UTC

Richard is still using 5.8.something, isn't he? Or was that Brian Silvers?
Edit... I see he's using 5.10.13 .. these messages only show when you're using a BOINC 6 with an older application.

Anyway, I have them too...
<core_client_version>6.6.2</core_client_version>
<![CDATA[
<stderr_txt>
Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1234496995.136000
Skipping: /computation_deadline
Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1234496995.136000
Skipping: /computation_deadline
Windows optimized S@H Enhanced application by Alex Kan
Version info: SSE2x (AMD/Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSE2x Win32 Build 44 , Ported by : Jason G, Raistmer, JDWhale

CPUID: Intel(R) Pentium(R) 4 CPU 3.00GHz
Speed: 1 x 2999 MHz
Work Unit Info:
...............
Credit multiplier is : 2.85
WU true angle range is : 0.418633
Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1234496995.136000
Skipping: /computation_deadline
Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1234496995.136000
Skipping: /computation_deadline
Restarted at 29.10 percent.
Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1234496995.136000
Skipping: /computation_deadline
Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1234496995.136000
Skipping: /computation_deadline
Restarted at 92.28 percent.

Flopcounter: 16093706979741.047000

Spike count: 0
Pulse count: 1
Triplet count: 0
Gaussian count: 0
called boinc_finish

</stderr_txt>
]]>

Also running 5.28
ID: 856076 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 856085 - Posted: 21 Jan 2009, 19:09:26 UTC

@ Andy,

Now I'm back at home, I can confirm those filenames.

stderrdae.txt : BOINC (daemon) errors, like those

UNRECOGNIZED: suspend_if_no_recent_input
UNRECOGNIZED: max_ncpus_pct

- yes, I get them too, with a v5 client and a v5 SETI app. Maybe it's because I run Einstein too.

stdoutdae.txt : BOINC (daemon) output, the message log we were looking for in the first place.

stderr.txt : An application (SETI, in this case) output file, copied to the application task page like the sample you showed us.
ID: 856085 · Report as offensive
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : Got *much* more work than asked for


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.