Q. Where are these Units . . . ???

Message boards : Number crunching : Q. Where are these Units . . . ???
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 555402 - Posted: 28 Apr 2007, 13:58:34 UTC



Q. Where are these Units . . . ???

529745819 127865975 28 Apr 2007 12:52:57 UTC 25 May 2007 23:32:04 UTC In Progress Unknown New --- --- --- working

These (below) are in Question . . .????

529745415 127865827 28 Apr 2007 12:52:04 UTC 25 May 2007 23:27:52 UTC In Progress Unknown New --- --- ---
529744850 127865633 28 Apr 2007 12:51:14 UTC 2 May 2007 21:01:14 UTC In Progress Unknown New --- --- ---
529744847 127865648 28 Apr 2007 12:51:14 UTC 2 May 2007 21:01:14 UTC In Progress Unknown New --- --- ---
529744730 127865614 28 Apr 2007 12:50:56 UTC 2 May 2007 21:00:56 UTC In Progress Unknown New --- --- ---
529744674 127865590 28 Apr 2007 12:50:56 UTC 2 May 2007 21:00:56 UTC In Progress Unknown New --- --- ---
529744344 127865493 28 Apr 2007 12:49:53 UTC 23 May 2007 9:16:55 UTC In Progress Unknown New --- --- ---
529322416 127728590 27 Apr 2007 22:57:49 UTC 22 May 2007 18:22:25 UTC In Progress Unknown New --- --- ---


524111436 126045391 21 Apr 2007 12:59:23 UTC 28 Apr 2007 12:49:53 UTC Over Success Done 15,468.13 29.32 29.31 returned today, couldn't connect for a few days . . .


Question: where are the others Listed for me????


BOINC Wiki . . .

Science Status Page . . .
ID: 555402 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 555423 - Posted: 28 Apr 2007, 14:40:04 UTC

I'm not sure I'm following you here, nobody.

Are you saying the ones shown as 'In Progress' aren't onboard and aren't on the Transfer tab?

Alinator
ID: 555423 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 555425 - Posted: 28 Apr 2007, 14:44:20 UTC - in response to Message 555402.  

your first ghost units nobody? If so just letem run out, nothing you can do, if legit work is there others will take it up and they'll slowly disappear off the list.

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 555425 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 555427 - Posted: 28 Apr 2007, 14:46:31 UTC - in response to Message 555423.  

I'm not sure I'm following you here, nobody.

Are you saying the ones shown as 'In Progress' aren't onboard and aren't on the Transfer tab?

Alinator


Yes to Both Questions - this is another *Deja Vu* for mi . . . there is ONLY ONE Crunchin' right now . . . but @ Berkeley - THEY say i have just Uploaded the others . . . NO I HAVEN'T . . . though, and THIS is a *key* . . . THEY defintely WENT somewhere . . . ;)


BOINC Wiki . . .

Science Status Page . . .
ID: 555427 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 555428 - Posted: 28 Apr 2007, 14:47:39 UTC - in response to Message 555425.  

your first ghost units nobody? If so just letem run out, nothing you can do, if legit work is there others will take it up and they'll slowly disappear off the list.


*NOT* Ghost Units jason . . . 'something else' . . . (for sure)


BOINC Wiki . . .

Science Status Page . . .
ID: 555428 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 555431 - Posted: 28 Apr 2007, 14:52:50 UTC
Last modified: 28 Apr 2007, 15:02:16 UTC

Well, I'm confused now. Why do you think they've been uploaded? AFAICT they're showing as normal for results in progress. IOW, sent but no word back yet.

So as Jason said, if that's the case and your host infact didn't receive them they would be Ghosts.

Alinator
ID: 555431 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 555436 - Posted: 28 Apr 2007, 14:58:39 UTC - in response to Message 555428.  
Last modified: 28 Apr 2007, 14:59:42 UTC

Maybe you're saying you got them, processed them then uploaded and they're not showing? In which case they'll probably trickle through sooner or later, 'tis look unusual though.
[ Are they showing "ready to report" hit update dude :D]

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 555436 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 555438 - Posted: 28 Apr 2007, 15:00:24 UTC - in response to Message 555431.  

Well, I'm confused now. Why do you think they've been uploaded? ASFAICT they're showing as normal for results in progress. IOW, sent but no word back yet.

So as Jason said, if that's the case they would be Ghosts.

Alinator


> sorry (AND - this is NOT a slam or whatever) but i just can't go through All this again - YOU TWO have been righteous and forthright and helpful as i have seen since i started with this Project - February 29 2000) BUT *THIS* Problem i refer to has NOT been understood - and most likely WON'T be understood - so therefore - i will NOW depart from asking again - or referring to that which i (btw) UNDERSTAND what the problem is - but just thought i might get something - an inkling of understanding here - regarding the Problem - which is on the Berkeley side - and IT has to be corrected - at some time before it creates a larger problem for the *Science* . . .


BOINC Wiki . . .

Science Status Page . . .
ID: 555438 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 555440 - Posted: 28 Apr 2007, 15:07:54 UTC - in response to Message 555436.  
Last modified: 28 Apr 2007, 15:09:02 UTC

Maybe you're saying you got them, processed them then uploaded and they're not showing? In which case they'll probably trickle through sooner or later, 'tis look unusual though.
[ Are they showing "ready to report" hit update dude :D]


funny eh ;))))) not being *facetious* either - BUT - they just got UPLOADED - maybe mi NEW *CRAY* Supercomputer has the ability to crunch faster than one uploads , , , har har har . . . and so on and so forth . . . NO jason - they don't *EXIST* on mi system - THEY exist someplace else and they're NOT called Ghosts in this reference . . ..

Thanks for the assist - or the attempt - and this is to the two of you

> Callin' Berkeley . . . oh Berkeley - Please Contact ASAP . . . ;)
edit - sp




BOINC Wiki . . .

Science Status Page . . .
ID: 555440 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 555442 - Posted: 28 Apr 2007, 15:10:40 UTC - in response to Message 555440.  

Let us know how you go please nobody, it is an interesting situation. Obviously you have some history with it we don't know about.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 555442 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 555444 - Posted: 28 Apr 2007, 15:17:54 UTC - in response to Message 555442.  


Let us know how you go please nobody, it is an interesting situation. Obviously you have some history with it we don't know about.


> yes sir!!! thE bEginnin' of which (@ this tiME - i wouldn't know whErE to bEgin . . . ;))) and i shall *Publish* thE rEsults @ soME tiME in thE futurE . . . though, i sEEM to 'avE forgottEn WHeN that tiME shall bE ;)))

so latEr YOU TWO FINE INDIVIDUALS - for thE attEMpt . . . iT was *warrantEd* - so Thank You Both again . . . 'avE a wonderful day . . .

> @ Eric - call (we]hEn you arEn't busy Sir . . .


BOINC Wiki . . .

Science Status Page . . .
ID: 555444 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 555463 - Posted: 28 Apr 2007, 15:36:18 UTC
Last modified: 28 Apr 2007, 15:37:31 UTC

No offense taken, I'm just trying to work the problem from the 'through the keyhole' view I have of your host from half a world away. I'm going from the assumption there is a problem, but if we can't get on the same page about it there's no hope to determine what happened or how resolve it. ;-)

The part that's not making sense for me is the last result report time shown is the same as the first of the group of 7 new results which the project is saying were sent between 12:49:53 and 12:52:57 UTC.

Breaking down the supposedly sent ones, there was the intial single result 'sent' at 12:49:57 UTC, followed by 2 pairs sent at roughly 1 minute intervals later, and closed off with another single a minute after the second pair.

Based on my experience, this is telling me BOINC was requesting work and the project thought is was sending it. However the ~1 minute interval is the key here. Nothing about the result DL made it to your machine. As far as it could tell at that point the project had gone down, which is why it repeated its request 1 minute later. I'm assuming you do have RID #'s 529322416 and 529745819 onboard at this point and when it got the later its work request was fulfulled and so stopped requesting more.

I think if you go back and look at the Message tab you'll see there was only one DL from project shown, since if the host had gotten anything back from the project about the others it would have retried to DL those results rather than issue a new work request in this case.

Alinator
ID: 555463 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 555469 - Posted: 28 Apr 2007, 15:43:03 UTC - in response to Message 555463.  

No offense taken, I'm just trying to work the problem from the 'through the keyhole' view I have of your host from half a world away. I'm going from the assumption there is a problem, but if we can't get on the same page about it there's no hope to determine what happened or how resolve it. ;-)

The part that's not making sense for me is the last result report time shown is the same as the first of the group of 7 new results which the project is saying were sent between 12:49:53 and 12:52:57 UTC.

Breaking down the supposedly sent ones, there was the intial single result 'sent' at 12:49:57 UTC, followed by 2 pairs sent at roughly 1 minute intervals later, and closed off with another single a minute after the second pair.

Based on my experience, this is telling me BOINC was requesting work and the project thought is was sending it. However the ~1 minute interval is the key here. Nothing about the result DL made it to your machine. As far as it could tell at that point the project had gone down, which is why it repeated its request 1 minute later. I'm assuming you do have RID #'s 529322416 and 529745819 onboard at this point and when it got the later its work request was fulfulled and so stopped requesting more.

I think if you go back and look at the Message tab you'll see there was only one DL from project shown, since if the host had gotten anything back from the project about the others it would have retried to DL those results rather than issue a new work request in this case.

Alinator


right you bE Alinator - *almost* 100 % corrECt - thE ProblEM shall bE *PublishEd* @ a latEr datE . . . fyi (thaT whiCh i rEfEr to Sir) ;)

Thanks for thE input . . .


BOINC Wiki . . .

Science Status Page . . .
ID: 555469 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 555472 - Posted: 28 Apr 2007, 15:46:46 UTC - in response to Message 555463.  

Too many variables for me Alinator.

I'll wait , interested, and hear what comes of it because without more history anything is possible :D

some ideas, could be wide ranging. e.g. the no / heartbeat messages on that single successful workunit sounds like Nobody is dealing with some bigger issue with his setup ( like the remain in memory setting), or maybe interaction with berkeley or his ISP ( image file verfication). To me it looks like firewall aka Zonealarm or ISP issues, especially given the "no heartbeats" on that first workunit.


"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 555472 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 555483 - Posted: 28 Apr 2007, 16:00:04 UTC - in response to Message 555469.  



<snip>

right you bE Alinator - *almost* 100 % corrECt - thE ProblEM shall bE *PublishEd* @ a latEr datE . . . fyi (thaT whiCh i rEfEr to Sir) ;)

Thanks for thE input . . .



I think I get your drift now! :-)

You're saying that SAH should implement 'auto ghost recovery' like they do at EAH for example. ;-)

Agreed, that would be a good idea and has been discussed here before, but IIRC would require more backend changes than the team has been willing to undertake. Perhaps when they get MB, AP, RTPC, and some of the other stuff which has been backburnered for eons cleared out they will revisit this issue.

Alinator
ID: 555483 · Report as offensive
Profile Keck_Komputers
Volunteer tester
Avatar

Send message
Joined: 4 Jul 99
Posts: 1575
Credit: 4,152,111
RAC: 1
United States
Message 555611 - Posted: 28 Apr 2007, 21:56:13 UTC - in response to Message 555483.  



<snip>

right you bE Alinator - *almost* 100 % corrECt - thE ProblEM shall bE *PublishEd* @ a latEr datE . . . fyi (thaT whiCh i rEfEr to Sir) ;)

Thanks for thE input . . .



I think I get your drift now! :-)

You're saying that SAH should implement 'auto ghost recovery' like they do at EAH for example. ;-)

Agreed, that would be a good idea and has been discussed here before, but IIRC would require more backend changes than the team has been willing to undertake. Perhaps when they get MB, AP, RTPC, and some of the other stuff which has been backburnered for eons cleared out they will revisit this issue.

Alinator

Not too many changes, too much load on the database. The change is just adding a tag to the project's config.xml file. However it will cause a more involved search of the database each time a scheduler RPC occurs. With the chronic database problems here that extra load is not acceptable.
BOINC WIKI

BOINCing since 2002/12/8
ID: 555611 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 555626 - Posted: 28 Apr 2007, 22:21:22 UTC - in response to Message 555611.  

Not too many changes, too much load on the database. The change is just adding a tag to the project's config.xml file. However it will cause a more involved search of the database each time a scheduler RPC occurs. With the chronic database problems here that extra load is not acceptable.


Oh absolutely, I didn't mean to imply it was a difficult software backend change to get BOINC to do it. It's a matter of making sure there's enough of the hardware resource pie to go around for everyone.

Another thing we should keep in mind, which is now apparent since the advent of the Staff Blog forum, is there is a considerable amount of other pretty heavy duty science being perfromed on the SSL's gear above and beyond what's used to keep us busy.

Alinator
ID: 555626 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 556137 - Posted: 29 Apr 2007, 18:57:27 UTC
Last modified: 29 Apr 2007, 19:55:47 UTC

I just saw my first "ghost" workunit today. It only happened on 1 WU but it is my first.

I have looked in the "D:\\Program Files\\BOINC\\projects\\setiathome.berkeley.edu" folder and in BOINC Manager and BoincView, result 24fe05aa.11170.22530.29822.3.48_1 did not download to host 3001110. It's there on my results page, it's not stuck in the transfers, it's just not there. This PC currently has 4 SETI main WUs (there are also 4 SETI Beta, and 1 Rosetta WUs as well).

I also took a look at the Messages in "stderrdae.txt" in the BOINC folder:
I have some of the Core client configuration messages enabled including [work_fetch_debug].

2007-04-29 13:00:07 [---] Resuming network activity
2007-04-29 13:00:07 [---] [work_fetch_debug] Request work fetch: timer
2007-04-29 13:00:07 [---] [work_fetch_debug] compute_work_requests(): start
2007-04-29 13:00:07 [---] [work_fetch_debug] compute_work_requests(): cpu_shortfall 0.000000, overall urgency OK
2007-04-29 13:00:07 [SETI@home] [work_fetch_debug] best project so far
2007-04-29 13:00:07 [SETI@home Beta Test] [work_fetch_debug] project has no shortfall
2007-04-29 13:00:07 [rosetta@home] [work_fetch_debug] project has suspended result
2007-04-29 13:00:07 [SETI@home] [work_fetch_debug] compute_work_requests(): work req 11375.407843, shortfall 11375.407843, urgency Need
2007-04-29 13:00:07 [---] [sched_op_debug] SCHEDULER_OP::init_op_project(): starting op for http://setiathome.berkeley.edu/
2007-04-29 13:00:07 [---] [work_fetch_debug] time_until_work_done(): est 62619.372374 ssr 300.000000 apr 0.688603 prs 100.000000
2007-04-29 13:00:08 [SETI@home] Sending scheduler request: To fetch work
2007-04-29 13:00:08 [SETI@home] Requesting 11375 seconds of new work
2007-04-29 13:00:28 [SETI@home] Scheduler request failed: HTTP internal server error
2007-04-29 13:00:28 [SETI@home] Deferring communication for 1 min 0 sec
2007-04-29 13:00:28 [SETI@home] Reason: scheduler request failed
2007-04-29 13:00:28 [---] [work_fetch_debug] Request work fetch: RPC complete
2007-04-29 13:00:28 [---] [work_fetch_debug] compute_work_requests(): start
2007-04-29 13:00:28 [---] [work_fetch_debug] compute_work_requests(): cpu_shortfall 0.000000, overall urgency OK
2007-04-29 13:00:28 [SETI@home] [work_fetch_debug] work fetch: project not contactable
2007-04-29 13:00:28 [SETI@home Beta Test] [work_fetch_debug] project has no shortfall
2007-04-29 13:00:28 [rosetta@home] [work_fetch_debug] project has suspended result
2007-04-29 13:01:28 [---] [work_fetch_debug] Request work fetch: Project backoff ended
2007-04-29 13:01:28 [---] [work_fetch_debug] Request work fetch: timer
2007-04-29 13:01:28 [---] [work_fetch_debug] compute_work_requests(): start
2007-04-29 13:01:28 [---] [work_fetch_debug] compute_work_requests(): cpu_shortfall 0.000000, overall urgency OK
2007-04-29 13:01:28 [SETI@home] [work_fetch_debug] best project so far
2007-04-29 13:01:28 [SETI@home Beta Test] [work_fetch_debug] project has no shortfall
2007-04-29 13:01:28 [rosetta@home] [work_fetch_debug] project has suspended result
2007-04-29 13:01:28 [SETI@home] [work_fetch_debug] compute_work_requests(): work req 11395.553706, shortfall 11395.553706, urgency Need
2007-04-29 13:01:28 [---] [sched_op_debug] SCHEDULER_OP::init_op_project(): starting op for http://setiathome.berkeley.edu/
2007-04-29 13:01:28 [---] [work_fetch_debug] time_until_work_done(): est 62619.372374 ssr 300.000000 apr 0.688831 prs 100.000000
2007-04-29 13:01:28 [SETI@home] Sending scheduler request: To fetch work
2007-04-29 13:01:28 [SETI@home] Requesting 11396 seconds of new work
2007-04-29 13:01:33 [SETI@home] Scheduler RPC succeeded [server version 509]
2007-04-29 13:01:33 [SETI@home] Project requested delay of 11.000000 seconds
2007-04-29 13:01:33 [SETI@home] Deferring communication for 11 sec
2007-04-29 13:01:33 [SETI@home] Reason: requested by project
2007-04-29 13:01:33 [---] [work_fetch_debug] Request work fetch: RPC complete
2007-04-29 13:01:33 [---] [work_fetch_debug] compute_work_requests(): start
2007-04-29 13:01:33 [---] [work_fetch_debug] compute_work_requests(): cpu_shortfall 0.000000, overall urgency Don't need
2007-04-29 13:01:35 [SETI@home] [file_xfer] Started download of file 24fe05aa.11170.22530.29822.3.197
2007-04-29 13:01:44 [---] [work_fetch_debug] Request work fetch: Project backoff ended
2007-04-29 13:01:44 [---] [work_fetch_debug] compute_work_requests(): start
2007-04-29 13:01:44 [---] [work_fetch_debug] compute_work_requests(): cpu_shortfall 0.000000, overall urgency Don't need
2007-04-29 13:01:50 [SETI@home] [file_xfer] Finished download of file 24fe05aa.11170.22530.29822.3.197
2007-04-29 13:01:50 [SETI@home] [file_xfer] Throughput 18914 bytes/sec

2007-04-29 13:02:44 [---] [work_fetch_debug] Request work fetch: timer
2007-04-29 13:02:44 [---] [work_fetch_debug] compute_work_requests(): start
2007-04-29 13:02:44 [---] [work_fetch_debug] compute_work_requests(): cpu_shortfall 0.000000, overall urgency Don't need

I know this WU will get re-sent if the other 2 computers don't reach a quorum, but it's going to be on my Results page until 24 May 2007 8:27:41 UTC but I can't do anything about it. :-(
Sir Arthur C Clarke 1917-2008
ID: 556137 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 556201 - Posted: 29 Apr 2007, 20:27:28 UTC

LOL...

Well, messy looking? Yes.

Your hosts fault? Probably not, so no harm no foul! Like Goggles Pisano said, "Whatsa behind me isa none of my concern..." :-)

Alinator
ID: 556201 · Report as offensive

Message boards : Number crunching : Q. Where are these Units . . . ???


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.