Message boards :
Number crunching :
Suddenly BOINC Decides to Abandon 71 APs...WTH?
Message board moderation
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 · Next
Author | Message |
---|---|
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
I could be mistaken, but I believe he wants to invalidate all of them, because he believes you did something dodgy. And since when does the average user edit CS? Not to mention read the boards? A person who won't read has no advantage over one who can't read. (Mark Twain) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
I might be mistaken but iirc the client sends a list of all tasks (of that project) on board in the request. The server will currently mark all current tasks as abandoned, unconditionally. Nothing left over. So on the first successful RPC after that, the nack/abort reply could be sent for all tasks listed as present/running, before assessing the need for new tasks to replace them. But better not to abandon them in the first place, of course. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
I mean the original problem is 'tasks are being marked as abandoned that are still present on the host' [and being processed there quite in vain] Either tell the client to wipe them [bad for AP], or have the server do something more sensible than now. A person who won't read has no advantage over one who can't read. (Mark Twain) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
I could be mistaken, but I believe he wants to invalidate all of them, because he believes you did something dodgy. Some people at CPDN used to nurse their long ones (up to four months) very assiduously, even rewinding them and trying again if they crashed. It's a different culture in the different projects. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
I could be mistaken, but I believe he wants to invalidate all of them, because he believes you did something dodgy. Even at CPDN the users editing CS will be a minority. A person who won't read has no advantage over one who can't read. (Mark Twain) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
hmmm. yeah to me the lock option is still looking the most likely candidate, and not that hard if you can do it without recording active locks in the host table. I can;t see a reason the client should have 2 or more requests in progress The way I would do it is something along these lines: in some_config.h: #define DEFAULT_RPCHOSTLOCK_TIMEOUT 1800 // seconds, generous for crusty ol'servers #define DEFAULT_RPCHOSTLOCK_CHECK_INTERVAL 10 // garbage collect locks // volatiles because rpc threads could be running on different Threads\CPUs etc volatile int rpclockinterval = DEFAULT_RPCHOSTLOCK_CHECK_INTERVAL; volatile int rpclocktimeout = DEFAULT_RPCHOSTLOCK_TIMEOUT; at start of authentication, as soon as you have a valid userid and hostid: if ( acquire_some_mutex() && myrpclockhostlist.hasentry(hostid) && release_some_mutex() ) { reject_rpcWithMessage(...); } else { myrpclockhostlist.addentry(hostid, timestamp) ); // uses mutexes inside to access the shared lock list do_rpc_things(); myrpclockhostlist.removeentry(hostid) ); // uses mutexes inside to access the shared lock list } and in a timer thread, garbage collect the locks at rpclockinterval intervals : some_timer_thread() { acquire_some_mutex(); for ( i = myrpclockhostlist.begin(); i <= myrpclockhostlist.end(); i++) { if ( currenttimedate-myrpclockhostlist.gettimestamp(i) > rpclocktimeout ) myrpclockhostlist.removeentry(i); } release_some_mutex(); } "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
sched/handle_request.cpp#L387 has more consistent logic - it actually checks whether the client reported tasks before abandoning the lot. but THAT is the code that should get called if we are looking at a detach/reattach where no tasks should be present. That's when you really want to mark tasks as abandoned - you detached (silent, server never gets told, tasks idle out) you reattach and the DB is cleaned up. So he does a security check there but not when the rpc_seqno goes out of sync ?! So what if if ((g_request->allow_multiple_clients != 1) && (g_request->other_results.size() == 0) ) { mark_results_over(host); i.e. the check for tasks on the host is put into line 426 that currently has no such security check? A person who won't read has no advantage over one who can't read. (Mark Twain) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
We considered adding the cpid search into the other case where it uses the IPs etc, but doing so would destroy the copying client state prevention logic. (because cpid would match) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
looks complicated ;) yes it deals with the problem of server and client talk going out of synch. I'd prefer my more simple approach of simply making sure the server does proper checks ;) You can still smoothen out server client connections - that might help with some of the other hiccups we keep getting with flaky comms. A person who won't read has no advantage over one who can't read. (Mark Twain) |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
We considered adding the cpid search into the other case where it uses the IPs etc, but doing so would destroy the copying client state prevention logic. (because cpid would match) Also ich find es immer noch eine bodenlose Unverschaemtheit zu unterstellen, dass man schummelt, wenn das wahrscheinlichste ist, dass man ein sch**** Backup eingespielt hat. edit: yes yes, I'm translating that outburst edit2: I still think it's exceedingly impertinent to insinuate that you were doing something dodgy, when the most probable cause is having reverted to a backup for some reason. edit3: and I certainly have no trouble telling him THAT A person who won't read has no advantage over one who can't read. (Mark Twain) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
looks complicated ;) Not as complicated as it looks. It's one of those spinny latches labelled óccupied'' like on a public toilet door. I presume they have public toilets in Germany. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
edit2: I still think it's exceedingly impertinent to insinuate that you were doing something dodgy, when the most probable cause is having reverted to a backup for some reason. No the whole is built around the idea that hosts and users are unreliable. I can live with that. What I can;t live with is the insinuation that the server is right/complete in every common case it should handle reasonably [when the code clearly says it cannot]. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
edit2: I still think it's exceedingly impertinent to insinuate that you were doing something dodgy, when the most probable cause is having reverted to a backup for some reason. exactly - so just check the host really hasn't anything running before we ditch the lot. If you want to be more sophisticated, clean out what's really not there. A person who won't read has no advantage over one who can't read. (Mark Twain) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
edit2: I still think it's exceedingly impertinent to insinuate that you were doing something dodgy, when the most probable cause is having reverted to a backup for some reason. Well except for the legitimate no move/copy case, The entrance to the public toilet cloned you, you walked into a public toilet with no door, and your clone followed you in a couple of minutes later, Or you walked in on the clone. Which one should security shoot ? [Hint: always use public toilets with latching doors ] "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
Well except for the legitimate no move/copy case, The entrance to the public toilet cloned you, you walked into a public toilet with no door, and your clone followed you in a couple of minutes later, Or you walked in on the clone. Which one should security shoot ? Doesn't help if you are using the urinal, pardon me, row of buckets. I'd shoot the door and merge the clones... A person who won't read has no advantage over one who can't read. (Mark Twain) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I'd shoot the door and merge the clones... Well the Boinc technique appears to be no doors, make another clone and shoot the first two occupants, leaving the mess behind. [Edit:] Zombie hosts! lol "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
I could be mistaken, but I believe he wants to invalidate all of them, because he believes you did something dodgy. Thinking while I walked - it would be interesting to see how older server code (like Einsten - old - and CPDN - even older) handle the 'faked seqno' test. I can do CPDN (probably got most experience with that project, out of this little group) - anyone willing for Einstein, or should I do that myself, as well? But I still need to eat the fruits of my walk - lunch! Edit - talking of CPDN, they've launched a new set of tasks today: http://www.climateprediction.net/new-experiment-launched-weatherhome-2015-western-us-drought/ |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
I'm not running either right now and any experiments would have to wait for the next window of opportunity. A person who won't read has no advantage over one who can't read. (Mark Twain) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Thinking while I walked - it would be interesting to see how older server code (like Einsten - old - and CPDN - even older) handle the 'faked seqno' test. I can do CPDN (probably got most experience with that project, out of this little group) - anyone willing for Einstein, or should I do that myself, as well? You're it :). Could be worth bouncing my pseudocode off Oliver/Bernd if they are still prodding at the issue. Feel free. It's not complete code but they'd understand it, and the concept of resource locks, I believe, well and truly enough to adapt to their needs. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
since we know that one easy way to generate a new hostid (and thereby wipe silly APR entries) was to trigger the low rpc seqno code, we know that area of code is fairly new. I expect CPDN and Einstein to hand out fresh hostid - actually then marking the old ones on the old hostid as abandoned makes sense, since you are not acessing that DB entry any more. But it still leaves the problem that you have stale tasks on the host. new, better server code doesn't reach conservative projects. new better client code doesn't reach conservative users. As Richard suggested I think it's best to check out other projects and then try several independent improvements. Small, easy to understand, easy to do things have the best chance of getting done ;) [at least if you're not doin it yourself and going through the whole 'git-pull' diplomacy nightmare] A person who won't read has no advantage over one who can't read. (Mark Twain) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.