anyone getting new work?

Message boards : Number crunching : anyone getting new work?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Jack Gulley

Send message
Joined: 4 Mar 03
Posts: 423
Credit: 526,566
RAC: 0
United States
Message 257941 - Posted: 6 Mar 2006, 0:18:43 UTC
Last modified: 6 Mar 2006, 0:31:40 UTC

Glad you were able to catch some.

During the outage I backed my preferences down to 0.2 Days knowing that it would be almost impossible to catch more than two or three WU at a time anyway. And to let others grab a few. I wanted to do some testing and checking anyway as I had moved, cleaned up and reconfigured my machines during the outage. Then every four hours or so I have been bumping it up 0.1 at a time. Went dry a number of times, but now back up to my normal 0.7 and still having trouble keeping it full like everyone else.

The problem is the creation rate has been dropping for some reason, down to almost 12 per second, which is not much more that what it takes to handle the burn rate. A problem Berkeley is well aware of. This will cause the recovery to stretch out even longer than I first expected. Was hoping someone would go in and look at them, as the current configuration can produce around 19 per second if the MDB is not too busy. And it is not right now, as the server has been able to reduce the large number of outstanding file deletes down to almost nothing, and all Data Base updates and requests seem to be happening faster than normal.

If you want to see the "best information WE have" on this, take a look at Scarecrows excellent graphs and click on the Week option at the top. This will give you a historical prospective of the outage. Look at the Result Creation Rate/Second 1 Week chart, and on the left you will see the "steady state" burn replacement rate of 12 to 13 WU's per second before the unexpected outage. On the chart above, you can see that at that rate the Ready to Send queue was still slowly dropping. Most likely due to people filling their queues prior to a planned outage later that day.

Going back through the charts to older time frames, you will see that this average holds up. The over all results processing rate average has been relatively flat for the past two months, not showing much increase, as the number of active users has actually dropped and people are exploring other projects. The charts do not show the actual results processing rate, as Berkeley does not think we need to know that information, it can only be inferred from the average Creation rate (times four) when the In Process number is holding steady. The charts are sort of course, because Berkeley does not want to use their processor time to update these status page numbers very often. If they had real programs and not "scripts" handling things, they could update those status in real time, not just every few hours.

Next, to see the hill that has to be climbed to fill all the queues back up scroll down to the In Progress chart. On the left you see the steady state "results" outstanding of around 2,400,000. Then the steady drop when the MDB went off line while completed results were being returned. When the MDB was brought back online this number had dropped to around 1,000,000. Now the splitters are trying to fill both the steady state burn rate and the backlog of 1,400,000 results. While it has recovered by 3/5 it is slowing down because most crunchers are getting some work now and returning it, and the spltters have slowed down some from their peak of 16 to 17 WU per second.

From tracking these charts over time, I know that when the scheduler is off line, the current splitter configuration can do about 19, maybe 20 per second. And that they can double that rate with hand feeding care from another old machine. Something that is not going to happen over a weekend or when they are real busy with other major problems. Maybe Monday when they are there or during the Data Base compress and backup I expect they will want/need to do.

But it now looks like we are going to have to live with this No Work from Project problem through most of Monday.

To better compare what is going on with the charts, use Scarecrow's other chart page and first select the 2-Graph Compare option. Now you can select any two charts and time frames and see them both on the same page.

I would really hate to be a Dial-up user and not have a second phone line with BOINC. And have more than one system to feed at times like this. A single line was manageable with Seti@home Classic because SetiQue would get in, get work and cache it for all local systems. The whole BOINC system is designed around the assumption that it will have slow servers and programs, on a slow Internet connection, and that those resources must be protected and assumed to be overloaded at all times, even when it is not necessary. If they gave special preference to Dail-up users for getting work, then most people would just set that option so they get priority also. As for not supporting Windows 98/ME very well, it just shows the mindset of the programmers and their low opinion of older systems (and people who use them) and them not willing to bother to test and fix their problems with the older operating systems. At least the next version will allow more of the older systems to be used. You can bet they are already looking at any problems with Vista.

So we either wait for the queues to trickle fill or we move on to something else. Many Setiathome cruncher have for now. Not true that there is a big steady increase in Active users. Sure the Total number of participants have been rising, that will always be the case. But the number of Active users has actually been dropping for the past month. You can see that in the charts on this tracking web page. And the same for the Active Host computers being used. The over all Recent Average Credit has still been going up (newer faster systems) except for a dip caused by this long outage. But I expect that is more due to increased usage of the optimized Crunch3r application and Trux's correcting BOINC Manager, than to any real increase in active users or hosts.

ID: 257941 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 257994 - Posted: 6 Mar 2006, 2:03:50 UTC

Thanks for pointing out the Team OcUK site! That was the one I couldn't think of when I read your original post. I have been monitoring Scarecrows data graphs, and the new features he brought online recently are really handy for this kind of work. ;-)

Have to admit though, the Team OcUK data kind of explodes some of the theories which have been put forth regarding performance issues and their causes. I agree with your assessment on overall throughput in light of this data. IOW, the "hardcore" segment would appear to be bringing more horsepower and/or increasing the efficiency of their current gear, rather than any new penetration of significant untapped potential. I know I have! :-)

Alinator

ID: 257994 · Report as offensive
Profile Kicksvette

Send message
Joined: 16 Nov 01
Posts: 2
Credit: 9,657,808
RAC: 29
United States
Message 258001 - Posted: 6 Mar 2006, 2:12:10 UTC

Unfortunately, I'm stuck with dial-up for a bit longer until the cable hookup becomes available on my block. But that means I'm a total crapshoot at getting any workunits right now. Actually I haven't gotten any since the outage. I normally keep a 3 day supply just to get through days when the system is down or days when I'm away. My new dog usually burns about 20 units a day and has now been dry for 3 days. Is it frustrating? DUH!!! Will I keep trying? Not so sure anymore. The outages are becoming more common than way back when I started with Classic on my old dog that could handle about 2 a day. If I stumble into a more reliable project, I'm probably done here.
ID: 258001 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 258010 - Posted: 6 Mar 2006, 2:30:42 UTC

Well you may want to take a look at Einstein, which is similar to SAH in that they're looking for cool signals from deep space as well.

EAH is as good as any project when it comes to uptime reliability. The only catch for a DU user is there is core data file which needs to be DL'ed periodically which is kind of large, so there a longish connect period required when that happens.

Alinator
ID: 258010 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 258011 - Posted: 6 Mar 2006, 2:49:49 UTC - in response to Message 258001.  
Last modified: 6 Mar 2006, 2:50:48 UTC

Unfortunately, I'm stuck with dial-up for a bit longer until the cable hookup becomes available on my block. But that means I'm a total crapshoot at getting any workunits right now. Actually I haven't gotten any since the outage. I normally keep a 3 day supply just to get through days when the system is down or days when I'm away. My new dog usually burns about 20 units a day and has now been dry for 3 days. Is it frustrating? DUH!!! Will I keep trying? Not so sure anymore. The outages are becoming more common than way back when I started with Classic on my old dog that could handle about 2 a day. If I stumble into a more reliable project, I'm probably done here.

Personally I don't see the problem. Boinc Seti is not less reliable than Classic was. Classic also had long periods of down time. It's just more visible now because of faster computers and the instant feedback provided by the Boinc Manager. All projects have down times, it is the nature of surviving on a shoestring budget.
As many have said before, Boinc was specifically design to allow to attach to multiple projects. It even allow for heavily weighted sharing so that backup projects rarely need to be crunched except at times like this. My system never runs out of work or stops crunching for a worthwhile project. And at the rate new project come on line, there will be multitude of choices available.

Boinc V7.2.42
Win7 i5 3.33G 4GB, GTX470
ID: 258011 · Report as offensive
Jack Gulley

Send message
Joined: 4 Mar 03
Posts: 423
Credit: 526,566
RAC: 0
United States
Message 258014 - Posted: 6 Mar 2006, 3:14:44 UTC

The thing you have to watch out for on the Team OcUK site is while the "dates" shown may be current, the actual data may be several days old. And on some of the charts the numbers on the sides are presented wrong. But if you know and adjust for that, the graphs are accurate, execpt for the most recent few days.

While off the charts now, they once showed that only about 80,000 or less Active new users were picked up from the closing Seti@home Classic side. And you have to adjust that with the knowledge that some unknown number of Seti@home classic crunchers had long before activated their BOINC/Seti account, and like me, had one old slow machine crunching a few for BOINC. Just my guess, but I would estimate that maybe as many as 20,000 of the 120,000 ACTIVE BOINC particapents before the Classic shutdown started were also still Active Classic crunchers. Some people are just slow to complete the switch when they have several scattered systems.

Offically there were still over 200,000 Active Seti@home crunchers before November (dispite numbers posted by Matt). But the information I was trying to track on other sites suggest that maybe as many as 320,000 people (or at least systems) were still running Seti@home Classic, but not all were doing enough work to show up as Active.

When I compared the volume of WU's being sent out and returned by Classic users (on the Cogent link during BOINC outages) only around 1/3 of that volume stayed on the link, meaning it moved to BOINC/Seti. That supports the claims that at least half of the last full time Classic crunchers, or at least half of their systems, were not moved to BOINC/Seti. Just from my own limited personal contacts, about half of them dropped Seti@home and did not give BOINC more than a two day try, not wanting to start over leaning something new. Of the others, while they now have BOINC running on a few machines, the number of machines is a lot smaller. The biggest objections are security concerns, no SetiQue type of local cache for their networked machines and that BOINC window template that comes up on the screen when booting the system.

So while there has been an increase in WU/results demands, it has been no where near what was predicted, nor has a lot of additional new hardware been necessary. Most of the changes have been to fix existing problems that have been showing up. With the exception of their current WU splitting limitations. I feel that could be solved by an off line splitter and a little code to handle this "pre split format" data coming into the system data base from another old systems and its disk drive instead of directly from the tapes or images of the tapes. Such a machine could pre split a number of old tapes and have them setting ready to go into the data base when needed.
ID: 258014 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : anyone getting new work?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.