Why is there no work?

Message boards : Number crunching : Why is there no work?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 908224 - Posted: 16 Jun 2009, 22:53:37 UTC
Last modified: 16 Jun 2009, 23:02:08 UTC

I don't get it. A lot of crunchers are complaining that they can't get any work. Yet (at this time) there are 67,570 Seti MB work units waiting to be sent out from the Berkeley servers. Server bandwidth is less than half of the 100 MB that is capable.

And it's not because of the normal Tuesday outage or the recovery period that follows. This has been going on for several weeks now. So.........

If the work is plentiful and the demand is high why aren't they being sent out?
Boinc....Boinc....Boinc....Boinc....
ID: 908224 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 908228 - Posted: 16 Jun 2009, 23:07:57 UTC - in response to Message 908224.  
Last modified: 16 Jun 2009, 23:11:55 UTC

I might very well have missed an official explanation, but my guess is that after the APs ran out, people now gets MBs instead. To cover those many more MBs needed, the servers simply can't keep up with the requests, it's way under dimensioned for this situation...
ID: 908228 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 908229 - Posted: 16 Jun 2009, 23:11:51 UTC

Something Matt mentioned last week in one of the tech news posts is that the numbers on the status page are not exact like they used to be.. they're a "good guess". This makes the load on the database much less strenuous. The accurate method locked the database while the whole thing was scanned to get a count of how many are ready to send. While the DB was locked, new units could not be created (split) and so on. Some logic was shuffled around and some code re-written, and now it does something different than that while allowing the database to continue working as it is read for the status page numbers.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 908229 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 908244 - Posted: 16 Jun 2009, 23:58:11 UTC - in response to Message 908229.  

Ok...........that explains it. I missed the posting where it went from an exact number to a SWAG.

Boinc....Boinc....Boinc....Boinc....
ID: 908244 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 908250 - Posted: 17 Jun 2009, 0:03:58 UTC - in response to Message 908224.  

I don't get it. A lot of crunchers are complaining that they can't get any work. Yet (at this time) there are 67,570 Seti MB work units waiting to be sent out from the Berkeley servers. Server bandwidth is less than half of the 100 MB that is capable.

And it's not because of the normal Tuesday outage or the recovery period that follows. This has been going on for several weeks now. So.........

If the work is plentiful and the demand is high why aren't they being sent out?

Because there can be work that is "ready" but it isn't available to the scheduler until the feeder puts it into the (100 work unit) feeder queue.

I forget the exact speed of the feeder, but obviously BOINC clients are requesting work faster than the feeder is supplying it to the scheduler.
ID: 908250 · Report as offensive
Profile TerryG
Avatar

Send message
Joined: 11 Mar 01
Posts: 16
Credit: 15,351,703
RAC: 37
United Kingdom
Message 908268 - Posted: 17 Jun 2009, 0:39:55 UTC

Not sure if the fact the replica database is still offline has anything to do with this - I'm sure I've seen a lack of work units being sent out before when this happens.
ID: 908268 · Report as offensive
darengosse
Avatar

Send message
Joined: 8 Mar 06
Posts: 9
Credit: 1,045,896
RAC: 0
France
Message 908284 - Posted: 17 Jun 2009, 1:33:51 UTC - in response to Message 908224.  

I don't get it. A lot of crunchers are complaining that they can't get any work. Yet (at this time) there are 67,570 Seti MB work units waiting to be sent out from the Berkeley servers. Server bandwidth is less than half of the 100 MB that is capable.

And it's not because of the normal Tuesday outage or the recovery period that follows. This has been going on for several weeks now. So.........

If the work is plentiful and the demand is high why aren't they being sent out?

Hello. Me also to have many difficulties to obtain work
When I consult the status waiter
at June 16, 2009 with 10:30: 11 UTC: State of distribution of the data:
Results ready to send: SETI@home = 22.149
at June 16, 2009 with 13:30: 17 UTC: State of distribution of the data:
Results ready to send; SETI@home = 106.474
and all the waiters of remote loadings are on Running
My question: Why I receive permanently, all the day in the boinc, this message:
"Message from server: (Project has no jobs available)", and I do not receive any work, or then to the maximum 1 work at the same time, and that very seldom....
However I put in the preferences at 3,50 of reserve of work per days.
I specify that is moreover, the project or I have less RAC.
Thank you very much advances; to explain why…????

ID: 908284 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 908291 - Posted: 17 Jun 2009, 1:53:57 UTC - in response to Message 908250.  

...
If the work is plentiful and the demand is high why aren't they being sent out?

Because there can be work that is "ready" but it isn't available to the scheduler until the feeder puts it into the (100 work unit) feeder queue.

I forget the exact speed of the feeder, but obviously BOINC clients are requesting work faster than the feeder is supplying it to the scheduler.

The Feeder tries to refill the list at 2 second intervals, but other database activity can slow that process a lot. Matt Lebofsky's post last December is still worth reading. It seems to me that something is overloading the database and effectively blocking the Feeder for long periods.
                                                               Joe
ID: 908291 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 908311 - Posted: 17 Jun 2009, 3:30:04 UTC - in response to Message 908291.  
Last modified: 17 Jun 2009, 3:36:55 UTC

Thanks Joe........this was my feeling also but I lack the scientific data and background to back it up. I was baseing my opinion on the fact that all database tasks are running slow. Validators, Assimilators etc. And also the fact that Berkeley used to easily handle traffic that is now seemingly choking it.

And what's new and what's been getting a lot of effort and requires massive database access??

NTPCKR
Boinc....Boinc....Boinc....Boinc....
ID: 908311 · Report as offensive
ST

Send message
Joined: 28 Nov 06
Posts: 1
Credit: 203,721
RAC: 0
Malaysia
Message 908338 - Posted: 17 Jun 2009, 8:06:04 UTC

I stopped SETI@HOME last year because of the constant "NO WORK", and last week received a request to join back, which I did. But it is still the same old problem of "NO WORK", if it can't be resolved, then I will stop again.
ID: 908338 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 908343 - Posted: 17 Jun 2009, 8:20:27 UTC - in response to Message 908311.  

Thanks Joe........this was my feeling also but I lack the scientific data and background to back it up. I was baseing my opinion on the fact that all database tasks are running slow. Validators, Assimilators etc. And also the fact that Berkeley used to easily handle traffic that is now seemingly choking it.

And what's new and what's been getting a lot of effort and requires massive database access??

I'm getting work, but it looks like we could have a string of shorties -- which is something else that's changed.

Are we also seeing some of the mount issues that Matt has talked about, or is someone else "scraping" for stats, or what else could possibly be an issue?

Don't know.
ID: 908343 · Report as offensive
Profile Virtual Boss*
Volunteer tester
Avatar

Send message
Joined: 4 May 08
Posts: 417
Credit: 6,440,287
RAC: 0
Australia
Message 908344 - Posted: 17 Jun 2009, 8:22:48 UTC - in response to Message 908338.  

I stopped SETI@HOME last year because of the constant "NO WORK", and last week received a request to join back, which I did. But it is still the same old problem of "NO WORK", if it can't be resolved, then I will stop again.


Unfortunately it looks like your timing was not very good.

I have been crunching for Seti for 13 months and this is the only time that any of my hosts has run out of work due to project problems, but only 1 host out of 8 so far.
ID: 908344 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 908558 - Posted: 17 Jun 2009, 23:10:12 UTC - in response to Message 908250.  

I don't get it. A lot of crunchers are complaining that they can't get any work. Yet (at this time) there are 67,570 Seti MB work units waiting to be sent out from the Berkeley servers. Server bandwidth is less than half of the 100 MB that is capable.

And it's not because of the normal Tuesday outage or the recovery period that follows. This has been going on for several weeks now. So.........

If the work is plentiful and the demand is high why aren't they being sent out?

Because there can be work that is "ready" but it isn't available to the scheduler until the feeder puts it into the (100 work unit) feeder queue.

I forget the exact speed of the feeder, but obviously BOINC clients are requesting work faster than the feeder is supplying it to the scheduler.


Ned,
I agree. The 100 result limit on the feeder is holding back throughput for getting those work units out in the field. It really needs to be 1000 for SETI.
ID: 908558 · Report as offensive
Profile Westsail and *Pyxey*
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,544,999
RAC: 0
United States
Message 908597 - Posted: 18 Jun 2009, 1:02:01 UTC - in response to Message 908558.  


Ned,
I agree. The 100 result limit on the feeder is holding back throughput for getting those work units out in the field. It really needs to be 1000 for SETI.


Is there a specific reason for the 100 number? Is it in some way hardware limited etc. or...was just chose as a good number because significant less throughput was required in the past.
Maybe it is server setting that could be changed easily? Or how about more than one instance running?

No doubt the resent dramatic increase in throughput potential of an individual host has added alot more work for server in very short time frame.
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
ID: 908597 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 908615 - Posted: 18 Jun 2009, 2:39:07 UTC - in response to Message 908597.  


Ned,
I agree. The 100 result limit on the feeder is holding back throughput for getting those work units out in the field. It really needs to be 1000 for SETI.


Is there a specific reason for the 100 number? Is it in some way hardware limited etc. or...was just chose as a good number because significant less throughput was required in the past.
Maybe it is server setting that could be changed easily? Or how about more than one instance running?

That 100 is the default setting in BOINC, and there is a warning about increasing it in sched_shmem.h:

// Default number of work items in shared mem.
// You can configure this in config.xml (<shmem_work_items>)
// If you increase this above 100,
// you may exceed the max shared-memory segment size
// on some operating systems.
//
#define MAX_WU_RESULTS      100

As noted there, it can be changed fairly easily. Whether it would help much here I don't know. My impression is the Feeder has been effectively blocked for minutes at a time recently so feeding 1000 at a time would be only a minor help.

For some period last year they were running two Feeders and Schedulers, one pair handling odd numbered tasks and the other even numbered. That had issues too, and if other activity on the database is the cause of the feeding delays, IMO the extra instance would just add to the problem.
                                                                Joe
ID: 908615 · Report as offensive
Profile bloodrain
Volunteer tester
Avatar

Send message
Joined: 8 Dec 08
Posts: 231
Credit: 28,112,547
RAC: 1
Antarctica
Message 908618 - Posted: 18 Jun 2009, 2:44:31 UTC - in response to Message 908615.  

true. i finale got some work but have not been able to upload it since early today
ID: 908618 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 908628 - Posted: 18 Jun 2009, 3:45:29 UTC - in response to Message 908615.  


// Default number of work items in shared mem.
// You can configure this in config.xml (<shmem_work_items>)
// If you increase this above 100,
// you may exceed the max shared-memory segment size
// on some operating systems.
//
#define MAX_WU_RESULTS      100

As noted there, it can be changed fairly easily. Whether it would help much here I don't know. My impression is the Feeder has been effectively blocked for minutes at a time recently so feeding 1000 at a time would be only a minor help.

For some period last year they were running two Feeders and Schedulers, one pair handling odd numbered tasks and the other even numbered. That had issues too, and if other activity on the database is the cause of the feeding delays, IMO the extra instance would just add to the problem.


Hmm... if the feeder is blocked for such a long time, then it would make even more sense for it to have a larger buffer. And there is a way to increase the shared memory size on Linux. Even when the problem of DB is solved, I don't see any long term harm in having a big feeder buffer.

I'd love to know: what is the average number of scheduler requests per minute?

Ahem...
# Set shared memory size (bytes) by including these lines in /etc/sysctl.conf
# Default is 32M on most 2.6.2x kernels
kernel.shmmax=268435456
kernel.shmall=268435456

From what I know of computer architecture, increasing this value will cause more cache misses on the CPU(s).
ID: 908628 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 908632 - Posted: 18 Jun 2009, 3:55:51 UTC

Please refer to the "Server Outage" stickied or the panic mode thread for server problems


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 908632 · Report as offensive
darengosse
Avatar

Send message
Joined: 8 Mar 06
Posts: 9
Credit: 1,045,896
RAC: 0
France
Message 908686 - Posted: 18 Jun 2009, 10:18:40 UTC - in response to Message 908632.  

Hello. My preceding message with was useful because on the whole my 2 Computers received 54 WU but impossible to send the results.
18 results in sendings in progress since June 17 with 20:41: 23 UTC
Message permanently in the BOINC:(Temporarily failed upload - Internet access OK - project servers may be temporarily down).
Please excuse me, but I think that SETI of, (according to a French expression), " eyes larger than the belly.!! "
Indeed, why have almost 1 d' million; users and to accept the new ones, if they are unable to follow the rate/rhythm .......
http://www.boincstats.com/signature/user_754953_project-1.gif

ID: 908686 · Report as offensive
darengosse
Avatar

Send message
Joined: 8 Mar 06
Posts: 9
Credit: 1,045,896
RAC: 0
France
Message 908712 - Posted: 18 Jun 2009, 12:48:52 UTC - in response to Message 908686.  

Hello. My preceding message with was useful because on the whole my 2 Computers received 54 WU but impossible to send the results.
18 results in sendings in progress since June 17 with 20:41: 23 UTC
Message permanently in the BOINC:(Temporarily failed upload - Internet access OK - project servers may be temporarily down).
Please excuse me, but I think that SETI of, (according to a French expression), " eyes larger than the belly.!! "
Indeed, why have almost 1 d' million; users and to accept the new ones, if they are unable to follow the rate/rhythm .......
http://www.boincstats.com/signature/user_754953_project-1.gif

Afflicted, but I think that there is an error in my message precede.
It is necessary to read: why have almost 1 million users and to accept the new ones, if they are unable to follow the rythm....
Jean-Paul

ID: 908712 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Why is there no work?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.