Why is there no work?

Author	Message
Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 908224 - Posted: 16 Jun 2009, 22:53:37 UTC Last modified: 16 Jun 2009, 23:02:08 UTC I don't get it. A lot of crunchers are complaining that they can't get any work. Yet (at this time) there are 67,570 Seti MB work units waiting to be sent out from the Berkeley servers. Server bandwidth is less than half of the 100 MB that is capable. And it's not because of the normal Tuesday outage or the recovery period that follows. This has been going on for several weeks now. So......... If the work is plentiful and the demand is high why aren't they being sent out? Boinc....Boinc....Boinc....Boinc.... ID: 908224 ·

JohnDK Volunteer tester Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127	Message 908228 - Posted: 16 Jun 2009, 23:07:57 UTC - in response to Message 908224. Last modified: 16 Jun 2009, 23:11:55 UTC I might very well have missed an official explanation, but my guess is that after the APs ran out, people now gets MBs instead. To cover those many more MBs needed, the servers simply can't keep up with the requests, it's way under dimensioned for this situation... ID: 908228 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 908229 - Posted: 16 Jun 2009, 23:11:51 UTC Something Matt mentioned last week in one of the tech news posts is that the numbers on the status page are not exact like they used to be.. they're a "good guess". This makes the load on the database much less strenuous. The accurate method locked the database while the whole thing was scanned to get a count of how many are ready to send. While the DB was locked, new units could not be created (split) and so on. Some logic was shuffled around and some code re-written, and now it does something different than that while allowing the database to continue working as it is read for the status page numbers. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 908229 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 908244 - Posted: 16 Jun 2009, 23:58:11 UTC - in response to Message 908229. Ok...........that explains it. I missed the posting where it went from an exact number to a SWAG. Boinc....Boinc....Boinc....Boinc.... ID: 908244 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 908250 - Posted: 17 Jun 2009, 0:03:58 UTC - in response to Message 908224. I don't get it. A lot of crunchers are complaining that they can't get any work. Yet (at this time) there are 67,570 Seti MB work units waiting to be sent out from the Berkeley servers. Server bandwidth is less than half of the 100 MB that is capable. And it's not because of the normal Tuesday outage or the recovery period that follows. This has been going on for several weeks now. So......... If the work is plentiful and the demand is high why aren't they being sent out? Because there can be work that is "ready" but it isn't available to the scheduler until the feeder puts it into the (100 work unit) feeder queue. I forget the exact speed of the feeder, but obviously BOINC clients are requesting work faster than the feeder is supplying it to the scheduler. ID: 908250 ·

TerryG Send message Joined: 11 Mar 01 Posts: 16 Credit: 15,351,703 RAC: 37	Message 908268 - Posted: 17 Jun 2009, 0:39:55 UTC Not sure if the fact the replica database is still offline has anything to do with this - I'm sure I've seen a lack of work units being sent out before when this happens. ID: 908268 ·

darengosse Send message Joined: 8 Mar 06 Posts: 9 Credit: 1,045,896 RAC: 0	Message 908284 - Posted: 17 Jun 2009, 1:33:51 UTC - in response to Message 908224. I don't get it. A lot of crunchers are complaining that they can't get any work. Yet (at this time) there are 67,570 Seti MB work units waiting to be sent out from the Berkeley servers. Server bandwidth is less than half of the 100 MB that is capable. And it's not because of the normal Tuesday outage or the recovery period that follows. This has been going on for several weeks now. So......... If the work is plentiful and the demand is high why aren't they being sent out? Hello. Me also to have many difficulties to obtain work When I consult the status waiter at June 16, 2009 with 10:30: 11 UTC: State of distribution of the data: Results ready to send: SETI@home = 22.149 at June 16, 2009 with 13:30: 17 UTC: State of distribution of the data: Results ready to send; SETI@home = 106.474 and all the waiters of remote loadings are on Running My question: Why I receive permanently, all the day in the boinc, this message: "Message from server: (Project has no jobs available)", and I do not receive any work, or then to the maximum 1 work at the same time, and that very seldom.... However I put in the preferences at 3,50 of reserve of work per days. I specify that is moreover, the project or I have less RAC. Thank you very much advances; to explain whyÃ¢â‚¬Â¦???? ID: 908284 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 908291 - Posted: 17 Jun 2009, 1:53:57 UTC - in response to Message 908250. ... If the work is plentiful and the demand is high why aren't they being sent out? Because there can be work that is "ready" but it isn't available to the scheduler until the feeder puts it into the (100 work unit) feeder queue. I forget the exact speed of the feeder, but obviously BOINC clients are requesting work faster than the feeder is supplying it to the scheduler. The Feeder tries to refill the list at 2 second intervals, but other database activity can slow that process a lot. Matt Lebofsky's post last December is still worth reading. It seems to me that something is overloading the database and effectively blocking the Feeder for long periods. Joe ID: 908291 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 908311 - Posted: 17 Jun 2009, 3:30:04 UTC - in response to Message 908291. Last modified: 17 Jun 2009, 3:36:55 UTC Thanks Joe........this was my feeling also but I lack the scientific data and background to back it up. I was baseing my opinion on the fact that all database tasks are running slow. Validators, Assimilators etc. And also the fact that Berkeley used to easily handle traffic that is now seemingly choking it. And what's new and what's been getting a lot of effort and requires massive database access?? NTPCKR Boinc....Boinc....Boinc....Boinc.... ID: 908311 ·

ST Send message Joined: 28 Nov 06 Posts: 1 Credit: 203,721 RAC: 0	Message 908338 - Posted: 17 Jun 2009, 8:06:04 UTC I stopped SETI@HOME last year because of the constant "NO WORK", and last week received a request to join back, which I did. But it is still the same old problem of "NO WORK", if it can't be resolved, then I will stop again. ID: 908338 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 908343 - Posted: 17 Jun 2009, 8:20:27 UTC - in response to Message 908311. Thanks Joe........this was my feeling also but I lack the scientific data and background to back it up. I was baseing my opinion on the fact that all database tasks are running slow. Validators, Assimilators etc. And also the fact that Berkeley used to easily handle traffic that is now seemingly choking it. And what's new and what's been getting a lot of effort and requires massive database access?? I'm getting work, but it looks like we could have a string of shorties -- which is something else that's changed. Are we also seeing some of the mount issues that Matt has talked about, or is someone else "scraping" for stats, or what else could possibly be an issue? Don't know. ID: 908343 ·

Virtual Boss* Volunteer tester Send message Joined: 4 May 08 Posts: 417 Credit: 6,440,287 RAC: 0	Message 908344 - Posted: 17 Jun 2009, 8:22:48 UTC - in response to Message 908338. I stopped SETI@HOME last year because of the constant "NO WORK", and last week received a request to join back, which I did. But it is still the same old problem of "NO WORK", if it can't be resolved, then I will stop again. Unfortunately it looks like your timing was not very good. I have been crunching for Seti for 13 months and this is the only time that any of my hosts has run out of work due to project problems, but only 1 host out of 8 so far. ID: 908344 ·

DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2	Message 908558 - Posted: 17 Jun 2009, 23:10:12 UTC - in response to Message 908250. I don't get it. A lot of crunchers are complaining that they can't get any work. Yet (at this time) there are 67,570 Seti MB work units waiting to be sent out from the Berkeley servers. Server bandwidth is less than half of the 100 MB that is capable. And it's not because of the normal Tuesday outage or the recovery period that follows. This has been going on for several weeks now. So......... If the work is plentiful and the demand is high why aren't they being sent out? Because there can be work that is "ready" but it isn't available to the scheduler until the feeder puts it into the (100 work unit) feeder queue. I forget the exact speed of the feeder, but obviously BOINC clients are requesting work faster than the feeder is supplying it to the scheduler. Ned, I agree. The 100 result limit on the feeder is holding back throughput for getting those work units out in the field. It really needs to be 1000 for SETI. ID: 908558 ·

Westsail and Pyxey Volunteer tester Send message Joined: 26 Jul 99 Posts: 338 Credit: 20,544,999 RAC: 0	Message 908597 - Posted: 18 Jun 2009, 1:02:01 UTC - in response to Message 908558. Ned, I agree. The 100 result limit on the feeder is holding back throughput for getting those work units out in the field. It really needs to be 1000 for SETI. Is there a specific reason for the 100 number? Is it in some way hardware limited etc. or...was just chose as a good number because significant less throughput was required in the past. Maybe it is server setting that could be changed easily? Or how about more than one instance running? No doubt the resent dramatic increase in throughput potential of an individual host has added alot more work for server in very short time frame. "The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov ID: 908597 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 908615 - Posted: 18 Jun 2009, 2:39:07 UTC - in response to Message 908597. Ned, I agree. The 100 result limit on the feeder is holding back throughput for getting those work units out in the field. It really needs to be 1000 for SETI. Is there a specific reason for the 100 number? Is it in some way hardware limited etc. or...was just chose as a good number because significant less throughput was required in the past. Maybe it is server setting that could be changed easily? Or how about more than one instance running? That 100 is the default setting in BOINC, and there is a warning about increasing it in sched_shmem.h: // Default number of work items in shared mem. // You can configure this in config.xml (<shmem_work_items>) // If you increase this above 100, // you may exceed the max shared-memory segment size // on some operating systems. // #define MAX_WU_RESULTS 100 As noted there, it can be changed fairly easily. Whether it would help much here I don't know. My impression is the Feeder has been effectively blocked for minutes at a time recently so feeding 1000 at a time would be only a minor help. For some period last year they were running two Feeders and Schedulers, one pair handling odd numbered tasks and the other even numbered. That had issues too, and if other activity on the database is the cause of the feeding delays, IMO the extra instance would just add to the problem. Joe ID: 908615 ·

bloodrain Volunteer tester Send message Joined: 8 Dec 08 Posts: 231 Credit: 28,112,547 RAC: 1	Message 908618 - Posted: 18 Jun 2009, 2:44:31 UTC - in response to Message 908615. true. i finale got some work but have not been able to upload it since early today ID: 908618 ·

DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2	Message 908628 - Posted: 18 Jun 2009, 3:45:29 UTC - in response to Message 908615. // Default number of work items in shared mem. // You can configure this in config.xml (<shmem_work_items>) // If you increase this above 100, // you may exceed the max shared-memory segment size // on some operating systems. // #define MAX_WU_RESULTS 100 As noted there, it can be changed fairly easily. Whether it would help much here I don't know. My impression is the Feeder has been effectively blocked for minutes at a time recently so feeding 1000 at a time would be only a minor help. For some period last year they were running two Feeders and Schedulers, one pair handling odd numbered tasks and the other even numbered. That had issues too, and if other activity on the database is the cause of the feeding delays, IMO the extra instance would just add to the problem. Hmm... if the feeder is blocked for such a long time, then it would make even more sense for it to have a larger buffer. And there is a way to increase the shared memory size on Linux. Even when the problem of DB is solved, I don't see any long term harm in having a big feeder buffer. I'd love to know: what is the average number of scheduler requests per minute? Ahem... # Set shared memory size (bytes) by including these lines in /etc/sysctl.conf # Default is 32M on most 2.6.2x kernels kernel.shmmax=268435456 kernel.shmall=268435456 From what I know of computer architecture, increasing this value will cause more cache misses on the CPU(s). ID: 908628 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 908632 - Posted: 18 Jun 2009, 3:55:51 UTC Please refer to the "Server Outage" stickied or the panic mode thread for server problems In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 908632 ·

darengosse Send message Joined: 8 Mar 06 Posts: 9 Credit: 1,045,896 RAC: 0	Message 908686 - Posted: 18 Jun 2009, 10:18:40 UTC - in response to Message 908632. Hello. My preceding message with was useful because on the whole my 2 Computers received 54 WU but impossible to send the results. 18 results in sendings in progress since June 17 with 20:41: 23 UTC Message permanently in the BOINC:(Temporarily failed upload - Internet access OK - project servers may be temporarily down). Please excuse me, but I think that SETI of, (according to a French expression), " eyes larger than the belly.!! " Indeed, why have almost 1 d' million; users and to accept the new ones, if they are unable to follow the rate/rhythm ....... http://www.boincstats.com/signature/user_754953_project-1.gif ID: 908686 ·

darengosse Send message Joined: 8 Mar 06 Posts: 9 Credit: 1,045,896 RAC: 0	Message 908712 - Posted: 18 Jun 2009, 12:48:52 UTC - in response to Message 908686. Hello. My preceding message with was useful because on the whole my 2 Computers received 54 WU but impossible to send the results. 18 results in sendings in progress since June 17 with 20:41: 23 UTC Message permanently in the BOINC:(Temporarily failed upload - Internet access OK - project servers may be temporarily down). Please excuse me, but I think that SETI of, (according to a French expression), " eyes larger than the belly.!! " Indeed, why have almost 1 d' million; users and to accept the new ones, if they are unable to follow the rate/rhythm ....... http://www.boincstats.com/signature/user_754953_project-1.gif Afflicted, but I think that there is an error in my message precede. It is necessary to read: why have almost 1 million users and to accept the new ones, if they are unable to follow the rythm.... Jean-Paul ID: 908712 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.