Takin' Care of Business and Workin' Overtime (May 11 2007)

Message boards : Technical News : Takin' Care of Business and Workin' Overtime (May 11 2007)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Profile Pinemarten
Avatar

Send message
Joined: 14 Jun 00
Posts: 29
Credit: 19,400
RAC: 0
Canada
Message 566825 - Posted: 14 May 2007, 0:45:32 UTC

Would it be possible to have the BOINC software (in a future upgrade?) to set the clients to 'suspend' if this happens again?
Or possibly a 'pop-up' window saying something like:

"Server not contacted for X time. Please set project to SUSPEND and go to this site (insert hotlink here) for important messages and/or further instructions".
ID: 566825 · Report as offensive
Doug Brandt

Send message
Joined: 16 May 99
Posts: 1
Credit: 932,443
RAC: 0
United States
Message 566838 - Posted: 14 May 2007, 1:16:10 UTC - in response to Message 566834.  

Managed to download 7 WUs and crunched them all...now they won't upload.

I've got 8 WUs staged to download, but they won't. :-(

Back to Einstein crunchin 'til the logjam breaks.




I am in the same boat.----Can't Upload, ---Can't Download new work.----Must still be MAJOR Server problems,---Although Server Status Page indicates that ALL is Well.----Sure hope Matt can fix this tomorrow.

D.B.
ID: 566838 · Report as offensive
Ricky@SETI.USA
Avatar

Send message
Joined: 4 Sep 04
Posts: 453
Credit: 1,586,857
RAC: 0
United States
Message 566841 - Posted: 14 May 2007, 1:18:20 UTC

I have been trying to download 6 WU's almost all day long with the following error:

5/13/2007 21:07:32|SETI@home|[file_xfer] Started download of file 16fe05ab.10775.13792.59646.3.195
5/13/2007 21:07:35|SETI@home|[file_xfer] Started download of file 16fe05ab.10775.13792.59646.3.201
5/13/2007 21:07:54||Project communication failed: attempting access to reference site
5/13/2007 21:07:54|SETI@home|[file_xfer] Temporarily failed download of 16fe05ab.10775.13792.59646.3.195: system connect
5/13/2007 21:07:54|SETI@home|Backing off 3 hr 35 min 33 sec on download of file 16fe05ab.10775.13792.59646.3.195
5/13/2007 21:07:54|SETI@home|[file_xfer] Started download of file 16fe05ab.10775.13792.59646.3.194
5/13/2007 21:07:55||Access to reference site succeeded - project servers may be temporarily down.
5/13/2007 21:07:57||Project communication failed: attempting access to reference site
5/13/2007 21:07:57|SETI@home|[file_xfer] Temporarily failed download of 16fe05ab.10775.13792.59646.3.201: system connect
5/13/2007 21:07:57|SETI@home|Backing off 1 hr 31 min 12 sec on download of file 16fe05ab.10775.13792.59646.3.201
5/13/2007 21:07:57|SETI@home|[file_xfer] Started download of file 16fe05ab.10775.13792.59646.3.180
5/13/2007 21:07:58||Access to reference site succeeded - project servers may be temporarily down.
5/13/2007 21:08:16||Project communication failed: attempting access to reference site
5/13/2007 21:08:16|SETI@home|[file_xfer] Temporarily failed download of 16fe05ab.10775.13792.59646.3.194: system connect
5/13/2007 21:08:16|SETI@home|Backing off 16 min 0 sec on download of file 16fe05ab.10775.13792.59646.3.194
5/13/2007 21:08:16|SETI@home|[file_xfer] Started download of file 16fe05ab.10775.13792.59646.3.200
5/13/2007 21:08:17||Access to reference site succeeded - project servers may be temporarily down.
5/13/2007 21:08:19||Project communication failed: attempting access to reference site
5/13/2007 21:08:19|SETI@home|[file_xfer] Temporarily failed download of 16fe05ab.10775.13792.59646.3.180: system connect
5/13/2007 21:08:19|SETI@home|Backing off 3 hr 13 min 44 sec on download of file 16fe05ab.10775.13792.59646.3.180
5/13/2007 21:08:19|SETI@home|[file_xfer] Started download of file 16fe05ab.10775.13792.59646.3.186
5/13/2007 21:08:20||Access to reference site succeeded - project servers may be temporarily down.
5/13/2007 21:08:38||Project communication failed: attempting access to reference site
5/13/2007 21:08:38|SETI@home|[file_xfer] Temporarily failed download of 16fe05ab.10775.13792.59646.3.200: system connect
5/13/2007 21:08:38|SETI@home|Backing off 3 hr 42 min 11 sec on download of file 16fe05ab.10775.13792.59646.3.200
5/13/2007 21:08:39||Access to reference site succeeded - project servers may be temporarily down.
5/13/2007 21:08:41||Project communication failed: attempting access to reference site

It's been over 6 hours I think since they were 1st set to download. At this rate they will be due before I get them downloaded!

ID: 566841 · Report as offensive
Profile ofp1

Send message
Joined: 5 Jan 01
Posts: 5
Credit: 59,369,927
RAC: 0
Venezuela
Message 566850 - Posted: 14 May 2007, 1:37:39 UTC

One of my computers downloaded some Wu's with deadline as of May 17 2007 ???
Boinc will have to process them first because they are going to be reported out of date and then invalidated I don`t know what is going to happen?

Iam Shure this would be fixed on Monday.:)
ID: 566850 · Report as offensive
Profile twc122171

Send message
Joined: 30 Jul 02
Posts: 1
Credit: 418,624
RAC: 0
United States
Message 566858 - Posted: 14 May 2007, 2:18:44 UTC

looks like the server is overloaded is this the case
ID: 566858 · Report as offensive
Modesto
Volunteer tester

Send message
Joined: 4 Jul 04
Posts: 47
Credit: 321,752
RAC: 0
Canada
Message 566888 - Posted: 14 May 2007, 3:07:41 UTC - in response to Message 566858.  

looks like the server is overloaded is this the case


I have no idea myself, but it seems like a reasonable possibility, so much traffic (upload/download, update requests...) so just in case, I'm suspending BOINC's network activity for the night.... I figure that at this rate I won't succeed anyway, so I might as well lighten the load. If overload is the case, and a significant number of people back off, it will be better for all of us looking for results, and more importantly the project as a whole....

back tomorrow...
ID: 566888 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3224
Credit: 4,603,826
RAC: 0
United States
Message 566913 - Posted: 14 May 2007, 4:00:05 UTC - in response to Message 566453.  

You may be right, but "Last block split" has no numbers, and "Results ready to send" is 0. If results were being created but not sent then it should be >0.

Whereever the bottleneck is, I think we can agree that there are currently none being sent out.

I'm not too concerned personally as I have enough work on other projects for about 2 days, but when I enabled network access to upload other results SETI started to download 4 WUs that are all stuck at 0% in the Transfers queue. I have disabled "network access" again until I see a healthier Cricket graph.

The servers were still assigning results to people (stuck downloads), but they are not necessarily downloading (I have several waiting to download). That's why it was at 0. Now it's over 140K as of tonight.

Because it's a weekend, and the guys did work a lot of "free OT" already, I am sure they took a well needed rest, and sometime in the next 12-18 hours things will start getting straightened out. It is currently 21:00 Sunday Berkeley time.
ID: 566913 · Report as offensive
Profile JustNOC'ng

Send message
Joined: 31 Aug 00
Posts: 3
Credit: 319,632
RAC: 0
United States
Message 566965 - Posted: 14 May 2007, 6:24:02 UTC

Thanks to the fine folks for working overtime to get the project flowing again.

I did manage to get some work units, and as others have reported, I now have results to upload, but they are currently "stuck". I'm also stuck trying to download some work. I would say that the servers are very busy.

I'm currently doing work on another project and my spare cycles are being productively used.

Once more, thanks folks.

Gil
Where ever you go, there you are !

ID: 566965 · Report as offensive
emmdeb

Send message
Joined: 3 Jan 02
Posts: 10
Credit: 105,054
RAC: 0
France
Message 566982 - Posted: 14 May 2007, 7:52:34 UTC

Hello,

BRAVO Matt, Eric and all the fellow.

Emmanuel
ID: 566982 · Report as offensive
TarracoServer
Volunteer tester

Send message
Joined: 11 Apr 07
Posts: 38
Credit: 595,022
RAC: 0
Spain
Message 566983 - Posted: 14 May 2007, 7:54:19 UTC

Be patient, guys. Remember that Thumper was "out of order" during a week, so all the systems have been working at "low state" due this problem.

Now, thumper is on again, the splitters are on line, and before u can get any WU, it must be created, registered, "tested", and finally, sended (well, I suppose this is the way that koloth, penguin, klaatu, and kosh works ;) )

Surely, along today and/or tomorrow, everybody will be able to download/upload WU/results without no problem, but right now, it's logical to think that all the system is reloading after this "small" electroshock ;)

Thx to all the team for the news theese days and his work.
ID: 566983 · Report as offensive
Stoffel
Avatar

Send message
Joined: 22 Aug 99
Posts: 1
Credit: 647,843
RAC: 0
Germany
Message 566990 - Posted: 14 May 2007, 8:43:24 UTC - in response to Message 566983.  

Hi there,

are there a few pictures of the new server? I like big "machines" ;-)

Greets from Germany

Stoffel
ID: 566990 · Report as offensive
lee clissett

Send message
Joined: 12 Jun 00
Posts: 46
Credit: 2,647,496
RAC: 0
United Kingdom
Message 567059 - Posted: 14 May 2007, 12:11:27 UTC

well sat here with 22 completed units hope it get sorted when matt comes in.i thought it might be nutty for a few days
ID: 567059 · Report as offensive
lee clissett

Send message
Joined: 12 Jun 00
Posts: 46
Credit: 2,647,496
RAC: 0
United Kingdom
Message 567068 - Posted: 14 May 2007, 12:18:24 UTC

at least other projects have had help with there work in the down time so its not all bad.bet the guys will have some work to do today more headaches whatever you get paid its not enough for all the sorting out you have to do,should have been a brain surgeons might have been easier ..lol
ID: 567068 · Report as offensive
Profile Rev. Tim Olivera

Send message
Joined: 15 Jan 06
Posts: 20
Credit: 1,717,714
RAC: 0
United States
Message 567096 - Posted: 14 May 2007, 13:08:41 UTC - in response to Message 565486.  

It worked for about 10min on Saturday!!

5/14/2007 9:01:17 AM||Starting BOINC client version 5.8.16 for windows_intelx86
5/14/2007 9:01:17 AM||log flags: task, file_xfer, sched_ops
5/14/2007 9:01:17 AM||Libraries: libcurl/7.16.0 OpenSSL/0.9.8a zlib/1.2.3
5/14/2007 9:01:17 AM||Data directory: C:\\Program Files\\BOINC
5/14/2007 9:01:17 AM||Processor: 2 GenuineIntel Intel(R) Pentium(R) 4 CPU 3.40GHz [x86 Family 15 Model 3 Stepping 4] [fpu tsc sse sse2 mmx]
5/14/2007 9:01:17 AM||Memory: 1.50 GB physical, 3.35 GB virtual
5/14/2007 9:01:17 AM||Disk: 14.65 GB total, 5.85 GB free
5/14/2007 9:01:18 AM|SETI@home|URL: http://setiathome.berkeley.edu/; Computer ID: 2896286; location: home; project prefs: default
5/14/2007 9:01:18 AM||General prefs: from SETI@home (last modified 2006-01-30 10:18:19)
5/14/2007 9:01:18 AM||Host location: home
5/14/2007 9:01:18 AM||General prefs: no separate prefs for home; using your defaults
5/14/2007 9:01:23 AM|SETI@home|Sending scheduler request: Requested by user
5/14/2007 9:01:23 AM|SETI@home|(not requesting new work or reporting completed tasks)
5/14/2007 9:01:28 AM|SETI@home|Scheduler RPC succeeded [server version 509]
5/14/2007 9:01:28 AM|SETI@home|Deferring communication for 11 sec
5/14/2007 9:01:28 AM|SETI@home|Reason: requested by project
5/14/2007 9:06:29 AM||Suspending computation - user is active
5/14/2007 9:06:38 AM|SETI@home|Sending scheduler request: Requested by user
5/14/2007 9:06:38 AM|SETI@home|(not requesting new work or reporting completed tasks)
5/14/2007 9:06:43 AM|SETI@home|Scheduler RPC succeeded [server version 509]
5/14/2007 9:06:43 AM|SETI@home|Deferring communication for 11 sec
5/14/2007 9:06:43 AM|SETI@home|Reason: requested by project

ID: 567096 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 567124 - Posted: 14 May 2007, 14:00:26 UTC - in response to Message 567096.  
Last modified: 14 May 2007, 14:07:36 UTC

It worked for about 10min on Saturday!!...


It worked for about 14 hours on Saturday, then something broke about midnight Berkeley time.

See Cricket graphs for more.



See this thread in NC forum for some constructive suggestions.

Keith T.
ID: 567124 · Report as offensive
baracutio
Volunteer tester

Send message
Joined: 25 Sep 01
Posts: 6
Credit: 150,423
RAC: 0
Germany
Message 567126 - Posted: 14 May 2007, 14:03:57 UTC - in response to Message 567096.  

It worked for about 10min on Saturday!!

5/14/2007 9:06:29 AM||Suspending computation - user is active


@Rev. Tim Olivera

change in your general preferences 'do work while computer is in use' to 'yes'
ID: 567126 · Report as offensive
Profile Adri
Volunteer tester

Send message
Joined: 27 Apr 07
Posts: 56
Credit: 132,673
RAC: 0
Malaysia
Message 567130 - Posted: 14 May 2007, 14:15:48 UTC

Hmmm... it seems that i can obtain the wus.....but they refuse dload at alll....
ID: 567130 · Report as offensive
TarracoServer
Volunteer tester

Send message
Joined: 11 Apr 07
Posts: 38
Credit: 595,022
RAC: 0
Spain
Message 567152 - Posted: 14 May 2007, 14:38:48 UTC - in response to Message 567124.  
Last modified: 14 May 2007, 14:43:13 UTC

It worked for about 10min on Saturday!!...


It worked for about 14 hours on Saturday, then something broke about midnight Berkeley time.

See Cricket graphs for more.



See this thread in NC forum for some constructive suggestions.

Keith T.


Yeah, you're right, but if you check the servers status, you'll find that the servers are on, working (except kosh on fixing missed results) and there are results ready to send (so, WU are ready or nearby ready to send). Don't forget that there are many work on the splitters, and it's possible that no new WU is ready to send.

Probably, that work done on saturday, was prior the "burn out" of thumper. Remember that Matt, Eric & Jeff made a bit "trick" to lay the boinc client to continue workin', and now, that trick must be undonned, the servers must assimilate the work done while Thumper was out, insert those results into thumper's database, etc...

There're many work to do! ;)
That's why I said that the rest of people must be very patient (me included ;) ), let the team to adjust correctly Thumper to the system, and... wait, wait, wait ;)
Sure that all the system will work perfectly in a short time.

ID: 567152 · Report as offensive
Profile Francesco Forti
Avatar

Send message
Joined: 24 May 00
Posts: 334
Credit: 204,421,005
RAC: 15
Switzerland
Message 567156 - Posted: 14 May 2007, 14:42:50 UTC - in response to Message 567124.  

It worked for about 10min on Saturday!!...


It worked for about 14 hours on Saturday, then something broke about midnight Berkeley time.



I think that this is wanted.
Blocking upload and download now we have:
a) Results ready to send 342,172
b) Workunits waiting for validation 0
c) Workunits waiting for assimilation 0
d) Workunits waiting for deletion 0
e) Results in progress 1,341,312 (that is standard)

so the system is ready (within some hour) to restart
without having big work to do for assimilation
and having enought results redy to send.

isn't it?

Bye,
Franz




ID: 567156 · Report as offensive
TarracoServer
Volunteer tester

Send message
Joined: 11 Apr 07
Posts: 38
Credit: 595,022
RAC: 0
Spain
Message 567162 - Posted: 14 May 2007, 14:47:46 UTC - in response to Message 567156.  
Last modified: 14 May 2007, 14:52:13 UTC

It worked for about 10min on Saturday!!...


It worked for about 14 hours on Saturday, then something broke about midnight Berkeley time.



I think that this is wanted.
Blocking upload and download now we have:
a) Results ready to send 342,172
b) Workunits waiting for validation 0
c) Workunits waiting for assimilation 0
d) Workunits waiting for deletion 0
e) Results in progress 1,341,312 (that is standard)

so the system is ready (within some hour) to restart
without having big work to do for assimilation
and having enought results redy to send.

isn't it?

Bye,
Franz





In fact, yes, but remember that a thing are the work units, and another, a result.
Work units are RAW data, preprocesed from the tapes, and results are the result obtained from that data (as less, I think so. MATT, HELP ;) ).

If you see the bottom of the status of the server pages:
Database/file status states

Results ready to send: For each workunit, four "empty" results are generated that are then sent out to individual users to be filled with data. This is the number of empty results ready to be sent out.
Results in progress: Number of results that haven't been returned by their clients, or are waiting for "quorum" to be reached so validation could take place.
Workunits waiting for validation: The number of workunits that reached quorum and are awaiting validation.
Workunits waiting for assimilation: The number of workunits waiting to have data from their canonical result input into the master science database.
Workunits/Results waiting for deletion: The number of workunits or results which can be deleted from disk, as the workunit has been assimilated, and there is no more use for it or its constituent results.
Transitioner backlog: Amount of time that the transitioner is behind (i.e. the age of the oldest result in the database waiting to be transitioned to another state).


So, no: Results ready to send aren't work units.

ID: 567162 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : Technical News : Takin' Care of Business and Workin' Overtime (May 11 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.