Posts by William


log in
1) Message boards : Number crunching : Panic Mode On (98) Server Problems? (Message 1703690)
Posted 14 days ago by Profile William
Credits? what credits?

You not only screw up credits, you also mess with a few server side held personal and global variables that are in turn used for runtime estimates.
Messed up credit is just a secondary effect.

These days a few people moving work across devices will not influence the bigger picture much - the impact is on your own APR and on credit (for you and your wingmate) of course.
The latter makes it an ethical decision ;)

I would discourage from rescheduling in general - you don't want to rock the boat more than you absolutely have to.

I live by the principle 'there are no rules, only guidelines'
That makes you responsible for your actions.
If you can stand before the angry mob and say 'yes I did it, I had my reasons' - well. But don't be amazed if you get lynched before you managed to explain your reasons.
2) Message boards : Number crunching : Suddenly BOINC Decides to Abandon 71 APs...WTH? (Message 1699813)
Posted 26 days ago by Profile William
So, after a nice email we've had the fix applied.

I had a quick test and it doesn't look like it has been deployed yet though.
3) Message boards : Number crunching : Bad News on BOINC funding (Message 1698936)
Posted 29 days ago by Profile William
I'll let Richard explain why he thinks branching is bad.
4) Message boards : Number crunching : Bad News on BOINC funding (Message 1698934)
Posted 29 days ago by Profile William
Excuse me for being ascerbic, but this is a disaster and some of us are trying to limit the damage.
5) Message boards : Number crunching : Bad News on BOINC funding (Message 1698933)
Posted 29 days ago by Profile William
What all this panic about?

AFAIK SETI@home functions w/o US government funding 3, 4, 5+ years already. So what? The single sign of past funding is the line on the bottom of SETI beta site "AstroPulse is funded in part by the NSF through grant AST-0307956" that should not be there quite long time ago. That funded part was changed even in algorithm heavely enough, not speaking of about all GPU versions that did not fund by NSF ever. Rather RFBR granting mention would be more appropriate :P ;D ;D ;D


Yes dear, this is not about SETI funding, it is about BOINC funding.

BOINC being the platform SETI is running on, as you might be aware, since you occassionally post to boinc mailing lists.

If you don't want to worry about the bigger picture, fair enough. Just don't complain if the gallery closes you wanted to hang your picture in.
6) Message boards : Number crunching : Bad News on BOINC funding (Message 1698931)
Posted 29 days ago by Profile William
ROFLMAO

Sorry Richard, I don't think I can explain what is making me laugh that hard at your comment :D

Yes, with the right group of people it _can_ work.
But is it the right group of people? I think that is where we are having our doubts.

I usually have no problem in letting people try to live up to their promises and expectations first and hit them over the head when they fall short or mess up.
In this I am a bit reluctant of letting them mess it up, I feel too much depends on a contiunued smooth running. (*)
On the other hand I don't quite see how I can prevent 'them' making silly mistakes.

(*) Not that it was running smoothly :D But at least it was bumping along.
7) Message boards : Number crunching : Bad News on BOINC funding (Message 1698927)
Posted 29 days ago by Profile William
Ok, first imagine some pretty coarse language.

I'm not going to write since a) I would have to self-mod me and b) I don't have a good command of coarse languae. So just imagine the worst you can and you are probably pretty close to what I'd be writing if I was the person to write such things.

Second

@Jord April eh? We should have known. we knew he was applying for funding, when he didn;t say he'd aquired it we should have known... We (i.e. Richard and I) figured _something_ was going on when that Project Governance draft turned up.
Why work to the last second without saying anything? The change could have been so much smoother if we'd known. Oh well. No crying over split milk eh?

Third

No crying over spilt milk. The cataclysm has happened, now we need to look at the broken pieces and decide how to go on. [in fact some 'we' already are]
As always in such cases, you can quit, you can start new and you can try to salvage.

Fourth

Strange that we are all quite agreed that a commitee doesn;t really work for this - you can convince one person (like talking to David until he sees reason) but to convince a group of people is somethign else entirely.
If we know that, why the heck did 'they' think it would work?
I never wanted push rights. Now I feel I should have made sure I had them. [not that anybody in their right mind would have given them to me :D]

Fifth

Somebody please fix my coffee machine.
8) Message boards : Cafe SETI : Happy Birthday Eric!!! (Message 1698086)
Posted 3 Jul 2015 by Profile William
Is that a cake?!

Happy birthday Eric!
9) Message boards : Number crunching : Suddenly BOINC Decides to Abandon 71 APs...WTH? (Message 1697264)
Posted 1 Jul 2015 by Profile William
What's up here?
PING 169.229.217.150 (169.229.217.150): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Request timeout for icmp_seq 6
Request timeout for icmp_seq 7
Request timeout for icmp_seq 8

--- 169.229.217.150 ping statistics ---
10 packets transmitted, 0 packets received, 100.0% packet loss

Bad Address? Is that an Aussie Addy?

Some servers don't react to pings, even if they work fine.
might be a bad apple or just an unresponsive apple.
10) Message boards : Number crunching : Suddenly BOINC Decides to Abandon 71 APs...WTH? (Message 1697258)
Posted 1 Jul 2015 by Profile William
What we need - and it might well be in your code - is a one-off "resend lost tasks" check triggered as part of the seqno/detach/authentication path we're exploring.

No, I didn;t do that. I proposed it as the more sophisticated solution:
Assume 'backup' and run the 'resend_lost_tasks' logic against the client request (that does send 'other tasks' anyway).
Actually I said just check the 'other tasks' list against the server held list and mark those abandoned that are not present.
How old are people's backups? weekly? half a year? 'when I remember to do one'?
Does the client abort 'not started by deadline' on its own?
What about tasks you crunched ages ago? no need to redo them.
Dang. that's getting complicated.
11) Message boards : Number crunching : Suddenly BOINC Decides to Abandon 71 APs...WTH? (Message 1697253)
Posted 1 Jul 2015 by Profile William
@Richard please run the scenario where task abandonmend was the right chioce past me again. I didn;t really become involved until yesterday.

If the client has been genuinely detached, or for some other reason (like Ivan's finger-fumble, or installing an anonymous platform app with the wrong version/plan_class combo) all tasks once present on the host are no longer available, the question arises 'what to do with the records of those tasks still present in the server's database'?

There are three choices:

Wait until they time out naturally at deadline
Send them back to the host as lost tasks (requires costly server resources to identify)
Bring forward deadline, or otherwise mark them as 'never going to report', so that wingmates can take over and complete the WU.

The safetycheck I am proposing checks whether the client reports 'other_tasks' - unless you managed to lose the files but keep the CS entries, you then still have files and I propose to not to abandon them. If the client doesn't have anything, it's truly lost and the server can kill it. files present in CS but not physically are covered by 'resend lost tasks'.

edit good point - I'll recheck the 'genuine detach' logic again and test it. When I'm more awake and some coffee has kicked in.
12) Message boards : Number crunching : Suddenly BOINC Decides to Abandon 71 APs...WTH? (Message 1697247)
Posted 1 Jul 2015 by Profile William
hmmm, that's where I looked for the word timeout, duh, must be time for coffee

I would set that to half an hour. If the responses never arrive in that time then something's broken.

ROFLMAO

just a side note, IIRC while boinc waits for a response, nothing much else happens, especially not contacts to other projects. You might want to keep that in mind if you run more than just Seti :)
13) Message boards : Number crunching : Suddenly BOINC Decides to Abandon 71 APs...WTH? (Message 1697239)
Posted 1 Jul 2015 by Profile William
@TBar kindly mention your UTC offset when posting log messages. I can work out the offset for Europe and the UK but when a country has several timezones...

@Jeff I don't see what kind of fraud might mess up rpc seqno. Being me, i.e. not paranoid, my base assumption is that whatever people do is down to ignorance, stupidity and bad luck and not to criminal energy. The science has to be protected from criminals [though we don;t really do a good job at that ourselves, whenever money (or fame) is involved, some people will cheat]. There are other parts of the code that deal with willful cheating.
Until somebody comes up with an example how fraud would mess up rpsseqno that's not completely far-fetched, I will consider the matter as resulting from the timeout problem discussed earlier or from using a backup and therefore not in need of cheatproofing.

@ivan rm -rf * eh? been there done that - not on purpose, slight typo, ended up with a ' ' inbetween the filename start and the * ... Thankfully just my user dir vanished and we were doing daily backups. But walk up to the sysadmin and tell him why you need your backup restored...

@Richard please run the scenario where task abandonmend was the right chioce past me again. I didn;t really become involved until yesterday.

@Jason I agree, improving client-server comms is desirable. Getting that BOINC improvement coded and accepted however is a different kettle of fish. I'll take the path of least resistance first.

@TBar Oh it's your thread. Nevermind.
14) Message boards : Number crunching : Suddenly BOINC Decides to Abandon 71 APs...WTH? (Message 1697077)
Posted 30 Jun 2015 by Profile William
well we actually have two conditions - one is the backup scenario the other is the 'ask twice - rpcs get out of order' that keeps killing the cache of some people here. the bits of code triggered are the same.
15) Message boards : Number crunching : Suddenly BOINC Decides to Abandon 71 APs...WTH? (Message 1697068)
Posted 30 Jun 2015 by Profile William
since we know that one easy way to generate a new hostid (and thereby wipe silly APR entries) was to trigger the low rpc seqno code, we know that area of code is fairly new.
I expect CPDN and Einstein to hand out fresh hostid - actually then marking the old ones on the old hostid as abandoned makes sense, since you are not acessing that DB entry any more. But it still leaves the problem that you have stale tasks on the host.

new, better server code doesn't reach conservative projects.
new better client code doesn't reach conservative users.

As Richard suggested I think it's best to check out other projects and then try several independent improvements.

Small, easy to understand, easy to do things have the best chance of getting done ;) [at least if you're not doin it yourself and going through the whole 'git-pull' diplomacy nightmare]
16) Message boards : Number crunching : Suddenly BOINC Decides to Abandon 71 APs...WTH? (Message 1697049)
Posted 30 Jun 2015 by Profile William

Thinking while I walked - it would be interesting to see how older server code (like Einsten - old - and CPDN - even older) handle the 'faked seqno' test. I can do CPDN (probably got most experience with that project, out of this little group) - anyone willing for Einstein, or should I do that myself, as well?


I'm not running either right now and any experiments would have to wait for the next window of opportunity.
17) Message boards : Number crunching : Suddenly BOINC Decides to Abandon 71 APs...WTH? (Message 1697046)
Posted 30 Jun 2015 by Profile William
Well except for the legitimate no move/copy case, The entrance to the public toilet cloned you, you walked into a public toilet with no door, and your clone followed you in a couple of minutes later, Or you walked in on the clone. Which one should security shoot ?

[Hint: always use public toilets with latching doors ]

Doesn't help if you are using the urinal, pardon me, row of buckets.

I'd shoot the door and merge the clones...
18) Message boards : Number crunching : Suddenly BOINC Decides to Abandon 71 APs...WTH? (Message 1697043)
Posted 30 Jun 2015 by Profile William
edit2: I still think it's exceedingly impertinent to insinuate that you were doing something dodgy, when the most probable cause is having reverted to a backup for some reason.


No the whole is built around the idea that hosts and users are unreliable. I can live with that. What I can;t live with is the insinuation that the server is right/complete in every common case it should handle reasonably.


exactly - so just check the host really hasn't anything running before we ditch the lot.
If you want to be more sophisticated, clean out what's really not there.
19) Message boards : Number crunching : Suddenly BOINC Decides to Abandon 71 APs...WTH? (Message 1697040)
Posted 30 Jun 2015 by Profile William
We considered adding the cpid search into the other case where it uses the IPs etc, but doing so would destroy the copying client state prevention logic. (because cpid would match)

Also ich find es immer noch eine bodenlose Unverschaemtheit zu unterstellen, dass man schummelt, wenn das wahrscheinlichste ist, dass man ein sch**** Backup eingespielt hat.

edit: yes yes, I'm translating that outburst

edit2: I still think it's exceedingly impertinent to insinuate that you were doing something dodgy, when the most probable cause is having reverted to a backup for some reason.

edit3: and I certainly have no trouble telling him THAT
20) Message boards : Number crunching : Suddenly BOINC Decides to Abandon 71 APs...WTH? (Message 1697039)
Posted 30 Jun 2015 by Profile William

The way I would do it is something along these lines:

looks complicated ;)

yes it deals with the problem of server and client talk going out of synch.

I'd prefer my more simple approach of simply making sure the server does proper checks ;)

You can still smoothen out server client connections - that might help with some of the other hiccups we keep getting with flaky comms.


Next 20

Copyright © 2015 University of California