Message boards :
Number crunching :
Got *much* more work than asked for
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
lets do some math you had 20 Seti WU's that you complete in less than 15 minutes on a phenom. so thats 1.25 hours worth of work. divide that into 24 hours. that's approximately 5%(I rounded up the WU time) of your total time for 24 hours. Since you've already aborted most of those WU's this is pointless. however you say you have seti at 4%. I'd say your BOINC is working properly and would have completed the 20 WU's in about 5% of your daily time. Not bad for something works on its own without needing someone watching over it. I would even bet that your repeated aborting of Seti WU's(I see you've done this for the last 3 weeks) is the reason for it fetching more and more work. BOINC wants or has in the past, to keep the cpu running at the percentage you've requested. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
W-K 666 Send message Joined: 18 May 99 Posts: 19402 Credit: 40,757,560 RAC: 67 |
lets do some math In that particular case 20 mb units might not be much, but did you look at one of my previous posts, 856162. No user button pushing, and got 11 AP + 9 MB tasks, over 150 hrs of work when requesting 170 sec of work. This not a cuda capable computer and still using V5.10.13. So to conclude; It is not only Button abusers. It is not only CUDA capable, or recent clients. It is not always just a little bit extra work. |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
I did the simple math that archae86 hadn't done. His BOINC process is working fine and its delivering Seti work at the level that he asked for it. nothing more. I never implied their wasn't a glitch but everyone that suddenly downloads 20 MB WU's thinks that they have the glitch. and since not everyone says what they set their cache to I have to assume its set to longer than 1 day. It's all about the pertinent info. Yes people here have seen and been victims of the glitch. not everyone here is a victim. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
I can confirm that BOINC 6.2.19 also gets work when its not requested. See log below. All projects were set to NNW. I was trying to report a task. As you can see it still gets work despite not asking for any. Seeing as 5.10.45 is doing the same thing this would suggest the bug is on the server side as BOINC hasn't been updated. 24/01/2009 11:21:10 AM||Starting BOINC client version 6.2.19 for windows_intelx86 24/01/2009 11:21:10 AM||log flags: task, file_xfer, sched_ops 24/01/2009 11:21:10 AM||Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3 24/01/2009 11:21:10 AM||Data directory: C:\Documents and Settings\Seti\Application Data\BOINC 24/01/2009 11:21:10 AM||Running under account Seti 24/01/2009 11:21:10 AM|SETI@home|Found app_info.xml; using anonymous platform 24/01/2009 11:21:10 AM||Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz [x86 Family 6 Model 15 Stepping 11] 24/01/2009 11:21:10 AM||Processor features: fpu tsc pae nx sse sse2 mmx 24/01/2009 11:21:10 AM||OS: Microsoft Windows XP: Professional x86 Editon, Service Pack 3, (05.01.2600.00) 24/01/2009 11:21:10 AM||Memory: 1.99 GB physical, 3.84 GB virtual 24/01/2009 11:21:10 AM||Disk: 148.78 GB total, 138.69 GB free 24/01/2009 11:21:10 AM||Local time is UTC +11 hours 24/01/2009 11:21:10 AM|Einstein@Home|URL: http://einstein.phys.uwm.edu/; Computer ID: 1243052; location: home; project prefs: default 24/01/2009 11:21:10 AM|orbit@home|URL: http://orbit.psi.edu/oah/; Computer ID: 6892; location: home; project prefs: default 24/01/2009 11:21:10 AM|SETI@home|URL: http://setiathome.berkeley.edu/; Computer ID: 4238333; location: home; project prefs: default 24/01/2009 11:21:10 AM||General prefs: from SETI@home (last modified 20-Jul-2008 13:15:35) 24/01/2009 11:21:10 AM||Computer location: home 24/01/2009 11:21:10 AM||General prefs: no separate prefs for home; using your defaults 24/01/2009 11:21:10 AM||Reading preferences override file 24/01/2009 11:21:10 AM||Preferences limit memory usage when active to 1019.21MB 24/01/2009 11:21:10 AM||Preferences limit memory usage when idle to 1834.58MB 24/01/2009 11:21:10 AM||Preferences limit disk usage to 37.25GB 24/01/2009 11:21:13 AM||Can't resolve hostname CNF7411NB1 in remote_hosts.cfg 24/01/2009 11:21:13 AM||file projects/setiathome.berkeley.edu/ap_5.00r69_SSE3.exe not found 24/01/2009 11:21:13 AM||Suspending network activity - user is active 24/01/2009 11:21:13 AM|SETI@home|Restarting task ap_10dc08ad_B4_P1_00275_20090117_26141.wu_0 using astropulse version 500 24/01/2009 11:21:13 AM|SETI@home|Restarting task ap_10dc08ad_B6_P0_00072_20090117_09535.wu_1 using astropulse version 500 24/01/2009 11:21:13 AM|SETI@home|Restarting task ap_10dc08ad_B6_P0_00075_20090117_09535.wu_1 using astropulse version 500 24/01/2009 11:21:13 AM|SETI@home|Restarting task 07no08aa.3235.72.16.8.3_0 using setiathome_enhanced version 528 24/01/2009 11:21:54 AM||Resuming network activity 24/01/2009 11:21:54 AM|SETI@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 1 completed tasks 24/01/2009 11:21:59 AM|SETI@home|Scheduler request succeeded: got 15 new tasks 24/01/2009 11:22:01 AM|SETI@home|Started download of 15dc08ae.19202.40748.16.8.74 24/01/2009 11:22:01 AM|SETI@home|Started download of 08no08ae.21335.13978.7.8.245 24/01/2009 11:22:14 AM||Suspending network activity - user request BOINC blog |
Slow_Target Send message Joined: 5 Oct 02 Posts: 58 Credit: 6,704,641 RAC: 2 |
Its not just a user asked problem. Mine started when it restarted from the 6.6.2 install. I just kept repeating the 354240.00 seconds cuda work request over and over. I have cuda (8600gt) but was set to no cuda. It only stopped when set to NNT but started up again with ANT. Changed to 6.4.5 and seems to be ok again. 600+ wu's with ~60 ap's is a lot more then my 4 day cache should be. My normal disk usage is about 50-70mb and it is at about 700mb. |
dnolan Send message Joined: 30 Aug 01 Posts: 1228 Credit: 47,779,411 RAC: 32 |
lets do some math Is this in response to Archae86's post? Because if it is, I'm seeing a couple of AP tasks in the list he posted, that's not 1.25 hours worth of work even on Mark's frozen Nehi machine. -Dave |
archae86 Send message Joined: 31 Aug 99 Posts: 909 Credit: 1,582,816 RAC: 0 |
I did the simple math that archae86 hadn't done. His BOINC process is working fine and its delivering Seti work at the level that he asked for it. nothing more. My BOINC process asked for 1 second of work from SETI. That should have been served with 1 WU by the Berkeley site. The anomaly here was on the Berkeley side. It does not know, and is not told the state of my work from other projects. Short of computer telepathy, it could not have been somehow figuring out what my host "really" wanted and sending that instead. It is just supposed to translate the number of seconds of work requested given the estimate of work for each WU, and its current estimate of my hosts ability to perform work. Were the BOINC process on my host cleverly asking for more, it would have been expressed as a request for more seconds of work (it is quite happy to ask for hundreds of thousands of seconds, when indicated). As to simple math, it is true that I did not do that--instead I let BOINCView do it for me, as documented in the post. I have plenty of accumulated experience of observing on which I base my confidence in the math BOINCView does in this case. It noticed the Astropulse work, which while only three of the twenty WUs weighs rather more heavily in the work they represent. |
Ianab Send message Joined: 11 Jun 08 Posts: 732 Credit: 20,635,586 RAC: 5 |
There is certainly a gremlin in the system as I've noticed it on several PCS. One is an OLD PIII that sits in the corner just doing it's thing. It takes about 30 hours to crunch a normal work unit, and has a 3 day cache. So it normally has 3 or 4 WUs. It asked for 17 seconds work and got 17 wus.!!! About 3 weeks work. Another machine I had switched to 'No new work' and ran the cache out. Manually updated to report the last few work units, requested 0 seconds, got 20 new work units ??? No CUDA or new Clients involved, although I wonder if something has spazzed out in the programming changes that were being made to allow more work units to sent to high spec CUDA machines? Did the system mmistake my ole PIII for a CUDA equipped I7 ???? I wish.. :-) Ian |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
I did the simple math that archae86 hadn't done. His BOINC process is working fine and its delivering Seti work at the level that he asked for it. nothing more. Agreed. One point of clarification though, the project does know the exact state of all the work which is onboard the host at the time of a connection. It's included in the scheduler request. Here's an example from the last one to SAH from one of my hosts: <other_results /> - <in_progress_results> - <ip_result> <name>h1_0722.55_S5R4__376_S5R4a_1</name> <report_deadline>1233001475.000000</report_deadline> <cpu_time_remaining>177103.386424</cpu_time_remaining> </ip_result> - <ip_result> <name>wu_164284800_1227694842_65230_0</name> <report_deadline>1232977824.000000</report_deadline> <cpu_time_remaining>90368.861345</cpu_time_remaining> </ip_result> </in_progress_results> </scheduler_request> I chose this host, because this contact was supposed to be just to report a completed task (as determined by the CC, no button pushing here): <scheduler_request> <authenticator>CLASSIFIED</authenticator> <hostid>782084</hostid> <rpc_seqno>394</rpc_seqno> <core_client_major_version>5</core_client_major_version> <core_client_minor_version>10</core_client_minor_version> <core_client_release>13</core_client_release> <work_req_seconds>0.000000</work_req_seconds> <resource_share_fraction>0.009615</resource_share_fraction> <rrs_fraction>0.500000</rrs_fraction> <prrs_fraction>0.250000</prrs_fraction> <estimated_delay>0.000000</estimated_delay> <duration_correction_factor>1.871565</duration_correction_factor> <platform_name>anonymous</platform_name> For reference, this host runs a CI/CO of 0.01/0.25 days respectively, and a quick look at the link I gave for it shows that the project completely out of the blue, and on it's own decided to pop the host with 6 new tasks totaling 543530 seconds (6.29 days), when it should have sent none. There is no other way to explain this other than the project is currently screwing up how it is handling work requests and what it is calculating for work assignments. In the case of my host, this has guaranteed blown deadlines for at least 2 (probably 3) of the 6 SAH tasks it sent. The reason is it must now run the two tasks for the other projects in HP/EDF first (due to deadlines sooner than the SAH tasks) which will leave only about 4 days till deadline for SAH to run over 6 days of work. That's the worst case I have so far, but it also spoffed the assignment to my T2400, and sent 20 to it when it should have sent one. However in it's case (CI/CO = 0.01/0.6667 days) it has sufficient performance overhead to work thorough it without missing a deadline. However, it is guaranteed to diverge from Resource Share until this gets fixed (and the sooner the better). Alinator |
archae86 Send message Joined: 31 Aug 99 Posts: 909 Credit: 1,582,816 RAC: 0 |
The anomaly here was on the Berkeley side. It does not know, and is not told the state of my work from other projects. Alinator, Thanks for the useful information. With the clue provided by your post I noticed the scheduler request and reply files in the top-level BOINC directory. Opening them for Einstein and Seti (my only two projects) showed a WU-by-WU accounting of results in progress for both projects in the request sent to each project. So the specific assertion in the second sentence (of my own) quoted is entirely false. Nevertheless, I think we generally agree on the description of the Berkeley-side response as in error. Thanks for educating me. At Ye Aulde Microprocessor Workes we called this sort of conversation "violent agreement". It can actually be pretty productive at times. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Reports in this thread seem to indicate that this issue is not limited to BOINC version (so it cannot be a bug within the BOINC code; older versions that worked fine are now exhibiting this flaw), and is not limited to 'button pushers' or 'CUDA users'. I have PM'd Eric to take a look at this thread. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Agreed. AFAICT, this is not a client side issue, but rather a SAH specific server side problem. None of the other projects I run have exhibited any unexpected request handling and/or work assignments. Unfortunately, it doesn't appear to be a cut and dried bug, as I have had SAH requests go through correctly as well since Richard first reported it 10 days or so ago in the Work Fetch Anomaly thread. Alinator |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
An interesting(?) side note to this is that, although not entirely Boinc version dependent, while I was running v6.6.2 for a couple of days on my quaddie with cuda (1 day cache set) I got 900 WU's each downloaded in response to the first server contact of the day. That first request was for 0 seconds of work but after the first 20 had downloaded, the client continuously requested 86400 seconds for both the CPU and GPU until the daily limit was reached. I reverted to Boinc 6.6.0 about 12 hours ago and have not received any more WU's (which the aim of the exercise as, even with my 25% CPDN share suspended, it is going to take several days with everything running in high priority to clear the backlog). F. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
We seem to have two quite separate problems being reported in this thread. First, the SETI server issuing large amounts of work in response to a small, or null, work request. Theis is pretty much independent of the BOINC client being used - my Work Fetch Anomaly was BOINC v5.10.13: the event was timed at 15 Jan 2009 5:56:54 UTC, if that helps track down the precise server code update which brought on the problem. Second, and very particular to the BOINC v6.6.2 client, is a client-initiated problem of actually requesting excess work until inhibited by the daily quota. Murphy's law strikes: the daily quota has recently been increased for CUDA-capable machines, and Matt has recently installed lots of extra storage. So the old failsafe of stopping work production when the disks get full ain't goona happen for a while yet. The downhill pipe has been saturated all night, and will be doubly-saturated from an hour ago with the release of the new daily quota. About the only limit in operation right now will be the stalled uploads (being shouldered out of the way by the downloads), which should inhibit all scheduler contact when the pending uploads on each host reach 2 x nCPUs. |
Eric Steensels Send message Joined: 16 May 99 Posts: 14 Credit: 7,015,744 RAC: 10 |
Hi guys, I have the same problem, in BOINC 6.4.5 : 23/01/2009 17:54:29|SETI@home|Started upload of 10dc08af.20296.18068.13.8.174_0_0 23/01/2009 17:54:29|SETI@home|Started upload of 10dc08af.20296.18068.13.8.188_0_0 23/01/2009 17:54:34|SETI@home|Finished upload of 10dc08af.20296.18068.13.8.174_0_0 23/01/2009 17:54:34|SETI@home|Finished upload of 10dc08af.20296.18068.13.8.188_0_0 23/01/2009 18:00:06|SETI@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 18 completed tasks 23/01/2009 18:00:11|SETI@home|Scheduler request completed: got 0 new tasks 23/01/2009 18:00:26|SETI@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks 23/01/2009 18:00:31|SETI@home|Scheduler request completed: got 20 new tasks 23/01/2009 18:00:33|SETI@home|Started download of 15dc08ae.2372.38703.15.8.148 23/01/2009 18:00:33|SETI@home|Started download of 15dc08ae.2372.38703.15.8.175 I hope I'll get everything done before the deadline? Greetings, Eric |
Bukken Send message Joined: 3 Apr 99 Posts: 50 Credit: 3,007,776 RAC: 0 |
I had a try at 6.6.2 also, but had no text in the boinc panels and couldn´t make it work so next day shifted back to 6.6.0 My cache was then filled with 225 AP and 1200 MB workunits ??? Talk about a days work here ?? |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
I saw 20 Wu's that had a return date of 1 week. These were not AP WU's. We don't know if he's checked his account page to make sure seti doesnt send AP. As far as I can see from his work is that he constantly aborts work which puts seti in a work debt which makes BOINC want to do more Seti to catch up.lets do some math In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
I saw 20 Wu's that had a return date of 1 week. These were not AP WU's. We don't know if he's checked his account page to make sure seti doesnt send AP. As far as I can see from his work is that he constantly aborts work which puts seti in a work debt which makes BOINC want to do more Seti to catch up.lets do some math I don't know how you are coming up with Archae86 has been 'aborting tasks constantly'. I looked over all his hosts and found 3 or 4 instances of aborting tasks dating back to the 14th of this month, all related in time to cases where the project erroneously sent his host a ton of work (as he related and documented in his posts). Second, aborting a task has absolutely no effect on the LTD situation for a project. What it does effect is the amount of overall host cache and individual project cache slack on the host. However, it should be intuitively obvious that if the host is currently running at cache equilibrium, the normal course of operation is for the host to make small requests for work (1 to a few hundred seconds or so) for the projects when slack opens up as the current work is processed. IOW, you should never get walloped with multiple task assignments unless the requested numbers of seconds of work is greater than the estimated runtime of the proposed task(s) to be sent. Alinator |
dnolan Send message Joined: 30 Aug 01 Posts: 1228 Credit: 47,779,411 RAC: 32 |
I saw 20 Wu's that had a return date of 1 week. These were not AP WU's. We don't know if he's checked his account page to make sure seti doesnt send AP. As far as I can see from his work is that he constantly aborts work which puts seti in a work debt which makes BOINC want to do more Seti to catch up. Well, both of the below are cut from the list posted: stoll5 SETI@home 1/23/2009 5:22 Started download of ap_20dc08ad_B5_P0_00028_20090123_17965.wu stoll5 SETI@home 1/23/2009 5:22 Started download of ap_20dc08ad_B4_P1_00356_20090123_15293.wu But that aside, I think you're missing the point. His machine requested 1 second of work, with a request like that he should only have gotten 1 of any kind of WU. Aborting or not (and I don't think this increases debt, though I could be wrong), you're trying to fit the math of adding up some WUs into the situation you think the system somehow just knows, but again, I would point you to the first line - Requesting 1 seconds of work I just don't see how you can say getting all that work from the 1 second request is something that should be expected? If that were true, why wouldn't every request for any amount of work result in downloads that keep coming until the daily limit is reached? It seems like what you're saying is that the amount requested just doesn't matter in any way, and I don't think that's correct (or at least, when the system was working properly, it wasn't correct). -Dave |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
I saw 20 Wu's that had a return date of 1 week. These were not AP WU's. We don't know if he's checked his account page to make sure seti doesnt send AP. As far as I can see from his work is that he constantly aborts work which puts seti in a work debt which makes BOINC want to do more Seti to catch up. Agreed. If the host has become cache overloaded for it's CI/CO, aborting a task should not result in the project sending anymore work under any circumstances (at least until the cache overload is gone that it). Alinator |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.