Bits (Oct 09 2007)

Author	Message
Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 656897 - Posted: 9 Oct 2007, 21:59:10 UTC Today the usual tuesday outage, which went fine. Of course, we preceeded this by having the project off all night to clean out various backlogged queues. It's at the point that if one part of the backend fails for long enough, the result table gets bloated and wreaks havoc on the whole system. But we were fully drained by this morning, and the database backup/compression went smoothly. We're catching up now. Somebody asked what "db_purge.x86_64" is. In order to speed up the process of reducing the db_purge queue we wanted to run that process on the system where the actual archives are being stored to disk. This was thumper, a 64 bit machine, so that meant compiling a 64 bit version of the purger. The suffix "x86_64" denotes that. During the outage Jeff and I reconfigured the workunit volume on our Snap Appliance to be a grouped set of mirrors instead of a big raid 5. The idea is that this will vastly help disk I/O - we'll start putting workunits back on this system in due time and monitor progress. We shall see how well this helps. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 656897 ·

Pooh Bear 27 Volunteer tester Send message Joined: 14 Jul 03 Posts: 3224 Credit: 4,603,826 RAC: 0	Message 656900 - Posted: 9 Oct 2007, 22:04:19 UTC Matt, So, did you play any gigs in Kiwiland? Or just pure vacation? ID: 656900 ·

Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0	Message 656906 - Posted: 9 Oct 2007, 22:09:12 UTC Matt, Nice Work @ Berkeley - and to You & Jeff - Keep it up . . . ID: 656906 ·

Neil Blaikie Volunteer tester Send message Joined: 17 May 99 Posts: 143 Credit: 6,652,341 RAC: 0	Message 656910 - Posted: 9 Oct 2007, 22:13:04 UTC Good to see you are back from vacation nice and refreshed hopefully. It's a very nice part of the world and the prefect time to escape fall in Northern Hemisphere. Keep up the good work and as usual, thanks for keeping us informed. ID: 656910 ·

BenchZowner Send message Joined: 19 Jul 07 Posts: 2 Credit: 126,630 RAC: 0	Message 657030 - Posted: 10 Oct 2007, 1:08:45 UTC Thanks for the update Matt, and hope you had some nice time on your vacations. One quick question, are you running any software RAID arrays on the project machines ? ID: 657030 ·

KWSN THE Holy Hand Grenade! Volunteer tester Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0	Message 657034 - Posted: 10 Oct 2007, 1:15:50 UTC not a complaint, but... when I went to get more WU's, some of them came over as "0 bytes" on the "transfers" tab of the client... I've never noticed that before from active WU's! . Hello, from Albany, CA!... ID: 657034 ·

n7rfa Volunteer tester Send message Joined: 13 Apr 04 Posts: 370 Credit: 9,058,599 RAC: 0	Message 657045 - Posted: 10 Oct 2007, 1:31:02 UTC - in response to Message 657034. not a complaint, but... when I went to get more WU's, some of them came over as "0 bytes" on the "transfers" tab of the client... I've never noticed that before from active WU's! There's a whole slew of these bad WUs. I just aborted all them that were on two of my systems. (As soon as my wife gets off the other I'll do it also.) To abort them you have to find the zero length WU files and then use the file name to find them on the Tasks tab. Unfortunately, this results in a new file being sent out to someone else. ID: 657045 ·

Jesse Viviano Send message Joined: 27 Feb 00 Posts: 100 Credit: 3,949,583 RAC: 0	Message 657135 - Posted: 10 Oct 2007, 3:20:44 UTC - in response to Message 657045. not a complaint, but... when I went to get more WU's, some of them came over as "0 bytes" on the "transfers" tab of the client... I've never noticed that before from active WU's! There's a whole slew of these bad WUs. I just aborted all them that were on two of my systems. (As soon as my wife gets off the other I'll do it also.) To abort them you have to find the zero length WU files and then use the file name to find them on the Tasks tab. Unfortunately, this results in a new file being sent out to someone else. It would be better if you let your machine try to crunch them. It would help the admins to troubleshoot the splitters if they see a bunch of errors stating "Compute error" and "SETI@home error -6 Bad workunit header" instead of "Aborted by user". If enough of the bad workunit header results kill the work unit instead of aborts, it would be easier for the admins do a database search for which work units need to be resplit. ID: 657135 ·

n7rfa Volunteer tester Send message Joined: 13 Apr 04 Posts: 370 Credit: 9,058,599 RAC: 0	Message 657299 - Posted: 10 Oct 2007, 13:03:51 UTC - in response to Message 657135. not a complaint, but... when I went to get more WU's, some of them came over as "0 bytes" on the "transfers" tab of the client... I've never noticed that before from active WU's! There's a whole slew of these bad WUs. I just aborted all them that were on two of my systems. (As soon as my wife gets off the other I'll do it also.) To abort them you have to find the zero length WU files and then use the file name to find them on the Tasks tab. Unfortunately, this results in a new file being sent out to someone else. It would be better if you let your machine try to crunch them. It would help the admins to troubleshoot the splitters if they see a bunch of errors stating "Compute error" and "SETI@home error -6 Bad workunit header" instead of "Aborted by user". If enough of the bad workunit header results kill the work unit instead of aborts, it would be easier for the admins do a database search for which work units need to be resplit. I'll make it easier for them: 17mr07ab and 18mr07aa. ID: 657299 ·

Robert Ribbeck Send message Joined: 7 Jun 02 Posts: 644 Credit: 5,283,174 RAC: 0	Message 657470 - Posted: 10 Oct 2007, 20:16:04 UTC Noticed today WU are not being taken according to the due dates. Is that a seti or boinc problem ? ID: 657470 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 657487 - Posted: 10 Oct 2007, 20:58:12 UTC - in response to Message 657470. Noticed today WU are not being taken according to the due dates. Is that a seti or boinc problem ? BOINC takes runs the tasks from any particular project in First In First Out order unless there is a chance that this will miss a deadline. In that case it starts in Earliest Deadline First order. This is by design. BOINC WIKI ID: 657487 ·

Clyde C. Phillips, III Send message Joined: 2 Aug 00 Posts: 1851 Credit: 5,955,047 RAC: 0	Message 657915 - Posted: 11 Oct 2007, 18:38:27 UTC I did see a few zero-crunch-length "Client Errors" workunits on one of my machines. ID: 657915 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.