SETI loves me.

Message boards : Number crunching : SETI loves me.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1012920 - Posted: 6 Jul 2010, 20:28:59 UTC - in response to Message 1012864.  

CPU tasks: 158 (158 VLAR, 0 VHAR)
GPU tasks: 141 (0 VLAR, 55 VHAR)

I for sure haven't enough GPUs for 3 days. I have a 4 days cache, tried raising it to 5,6,10 days, but Boinc simply wouldn't get more WUs. Can someone explain that??


It is obvious other blocks have been put in place.

fully loaded on 3 days: I had enough work for approx 2 days (over if you count other projects and CPU in the math, CPU does far less than low end GPU).

I raised it to 6 days. The amount of work I had on hand DID NOT CHANGE.. and it still refused to request more. Including after reboot.

so basically, we have badly written limits(combining CPU and GPU) .. suffered a work around ADDITIONAL badly written limit(20 units only) all weekend.. Compounded with a replica data base crash.. and now we are back to the original badly written limit.

The chance of finding Extra terrestrial intelligence seems dismal in light of our efforts to find terrestrial inteligence.
Janice
ID: 1012920 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1012951 - Posted: 6 Jul 2010, 21:30:18 UTC - in response to Message 1012920.  

Soft^spirit,

The separate DCFs were supposed to set the right timing on each different type of WU. By the time they found out it wasn't working they didn't have time to get it fixed so they went back to the old way with just one DCF for all. This also isn't working but.... Hopefully something can be worked out to get the separate DCFs to work right and get the times down to what they should be. Once that is done, we should all get the correct amount of work and things will go much smoother.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1012951 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1012964 - Posted: 6 Jul 2010, 22:31:35 UTC - in response to Message 1012951.  

well perry, I will believe it when I see it. But at least before Jeff's thread got hijacked he had at least read and seemed to understand.. I am hoping he understands it was a problem in the previous limits as well.

So I am not without hope. Still a bit dismayed at this raw test bed we seem to be used for...

And of course being a former technician.... I am a natural enemy of anyone from engineering. ;)
Janice
ID: 1012964 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6652
Credit: 121,090,076
RAC: 0
United States
Message 1012965 - Posted: 6 Jul 2010, 22:35:36 UTC - in response to Message 1012964.  

well perry, I will believe it when I see it. But at least before Jeff's thread got hijacked he had at least read and seemed to understand.. I am hoping he understands it was a problem in the previous limits as well.

So I am not without hope. Still a bit dismayed at this raw test bed we seem to be used for...

And of course being a former technician.... I am a natural enemy of anyone from engineering. ;)


Uh Oh! I'm an engineer! Back when I was a technician, I could really relate to that. I found things that were next to impossible to repair or even wire up. Now as a software engineer I use what I learn, and listen to those I am programming for. When I get requests I take them very seriously, and do my best to implement them as long as they do not compromise the end goal. :)

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1012965 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1012983 - Posted: 6 Jul 2010, 23:15:45 UTC - in response to Message 1012965.  

I have seen hundreds of thousands of dollars in engineered hardware.. that could have been done for several thousand... if they had talked to a tech first.
And it would have worked better.

And massive upgrades that would have paid off 10 times.. if anyone had listened to the "cheap seats"... But they saved $10 on something like a cable. So instead it took 6 months longer to get going... and never got fully utilized.

Could I have designed those systems? Nope. But I could walk up to them and in 5 minutes ask a few questions that could have saved thousands... if they had not already ordered everything.

Which always made the engineers very angry.

Janice
ID: 1012983 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6652
Credit: 121,090,076
RAC: 0
United States
Message 1012984 - Posted: 6 Jul 2010, 23:23:34 UTC - in response to Message 1012983.  
Last modified: 6 Jul 2010, 23:24:30 UTC

I totally understand what you are saying. I am trying to get approval to automate a temperature chamber. It will cost about $100,000. I have already developed the root structure in a massive database, and automated test stands, and we would be able to test 12 units at one time, rather than one at a time. I have been trying to do this for almost 8 years, and I'm finally getting closer. My system has already had a major impact on customer satisfaction and unifying test methods. I have only been an engineer for a year, and a technician for 30 years at least. My company makes force feedback servo inclinometers and accelerometers, as well as other products.

Edit: I have always worked closely with our engineers, as close communication is how things get done.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1012984 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 1012986 - Posted: 6 Jul 2010, 23:27:34 UTC - in response to Message 1012864.  

CPU tasks: 158 (158 VLAR, 0 VHAR)
GPU tasks: 141 (0 VLAR, 55 VHAR)

I for sure haven't enough GPUs for 3 days. I have a 4 days cache, tried raising it to 5,6,10 days, but Boinc simply wouldn't get more WUs. Can someone explain that??

Your DCF probably screwed you, I had to set mine to .01 this morning a few times but I got lots of work.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 1012986 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1012990 - Posted: 6 Jul 2010, 23:37:55 UTC - in response to Message 1012984.  

I totally understand what you are saying. I am trying to get approval to automate a temperature chamber. It will cost about $100,000. I have already developed the root structure in a massive database, and automated test stands, and we would be able to test 12 units at one time, rather than one at a time. I have been trying to do this for almost 8 years, and I'm finally getting closer. My system has already had a major impact on customer satisfaction and unifying test methods. I have only been an engineer for a year, and a technician for 30 years at least. My company makes force feedback servo inclinometers and accelerometers, as well as other products.

Edit: I have always worked closely with our engineers, as close communication is how things get done.

Steve


Unfortunately the corporation decided meetings went much more smoothly without technician input. And eventually dropped having us sign off jobs as completed as well. Much easier to have someone that had no clue do it.
Janice
ID: 1012990 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1012992 - Posted: 6 Jul 2010, 23:40:46 UTC - in response to Message 1012986.  

CPU tasks: 158 (158 VLAR, 0 VHAR)
GPU tasks: 141 (0 VLAR, 55 VHAR)

I for sure haven't enough GPUs for 3 days. I have a 4 days cache, tried raising it to 5,6,10 days, but Boinc simply wouldn't get more WUs. Can someone explain that??

Your DCF probably screwed you, I had to set mine to .01 this morning a few times but I got lots of work.

Mine is 0.018013 and the completion times are correct on GPU/CPU/AP. I did try lowering DCF but didn't help.
ID: 1012992 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6652
Credit: 121,090,076
RAC: 0
United States
Message 1012993 - Posted: 6 Jul 2010, 23:40:54 UTC - in response to Message 1012990.  


Unfortunately the corporation decided meetings went much more smoothly without technician input. And eventually dropped having us sign off jobs as completed as well. Much easier to have someone that had no clue do it.


I've seen that before too. There is the right way, the wrong way, and the bosses way. You kind of have to go along with the one who signs the paychecks. :)

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1012993 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1013002 - Posted: 6 Jul 2010, 23:54:18 UTC - in response to Message 1012993.  

I fixed that.

Maybe not a perfect fix. But no longer an issue.
Janice
ID: 1013002 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6652
Credit: 121,090,076
RAC: 0
United States
Message 1013006 - Posted: 6 Jul 2010, 23:59:17 UTC - in response to Message 1013002.  

I fixed that.

Maybe not a perfect fix. But no longer an issue.


I can't wait until I'm 396 and I can finally stop working. I'm counting the seconds. :)

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1013006 · Report as offensive
Ellis Hardin

Send message
Joined: 15 Mar 01
Posts: 33
Credit: 26,603,764
RAC: 0
United States
Message 1013007 - Posted: 7 Jul 2010, 0:16:57 UTC

I noticed a glitch a few weeks ago. Something messed up and all the estimated completion times were way off for some reason. Some work units that had been estimated to complete in an hour, were now estimated to complete in a minute or two, etc. One of the results of the glitch was that my client then downloaded about 30 days of work (I had it set to 10 days at the time).

They also need to get a some real software engineers to look at the BOINC code. Is a database driven system the best system to control a project like this? Or is it just all they know?

ID: 1013007 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 1013015 - Posted: 7 Jul 2010, 0:32:59 UTC - in response to Message 1012824.  

Bill Walker wrote:

From Matt's post over in Technical News:
Data wise, we were able to get back to merging our various spike tables together full bore - doing so while the project was up was causing all kinds of headaches. We'll have to turn the merge off over the weekend, of course. I also was able to do a whole bunch of data integrity testing - it's nice to be able to pull 1 Gbyte of signals out of the science database without the query getting blocked, or worrying about blocking other queries.

I guess everything is tied together, to the point where you can analyze the data, or create the data, but not both at the same time.

Splitting new S@H Enhanced work or assimilating results do involve the S@H science database, so are effectively ruled out during these outages. Because downloads and uploads don't involve either science or BOINC database access, those could remain enabled with the least impact, although at least 3 servers are involved; bane, vader, and bruno. Having forums alive of course needs the BOINC database servers and "data-driven web pages" at a minimum. The more that is running, the more likely something will go wrong and interrupt planned work during the outage, so although they could theoretically just disable all splitters, validators, and assimilators and let the remaining "Results ready to be sent" be issued, I don't expect that.
                                                              Joe

While it is true that sending work and reporting work does not involve the science database, sending new work relies on the splitters which do require the science database.


BOINC WIKI
ID: 1013015 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1013016 - Posted: 7 Jul 2010, 0:33:50 UTC - in response to Message 1013007.  

Anyone that wants can look at the BOINC code. Everything is open source.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1013016 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1013018 - Posted: 7 Jul 2010, 0:36:53 UTC - in response to Message 1013006.  

I fixed that.

Maybe not a perfect fix. But no longer an issue.


I can't wait until I'm 396 and I can finally stop working. I'm counting the seconds. :)

Steve


I had enough.. there was of course a last straw.. But I finally decided to just lower my expectations.
Janice
ID: 1013018 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6652
Credit: 121,090,076
RAC: 0
United States
Message 1013021 - Posted: 7 Jul 2010, 0:41:32 UTC - in response to Message 1013007.  

I noticed a glitch a few weeks ago. Something messed up and all the estimated completion times were way off for some reason. Some work units that had been estimated to complete in an hour, were now estimated to complete in a minute or two, etc. One of the results of the glitch was that my client then downloaded about 30 days of work (I had it set to 10 days at the time).

They also need to get a some real software engineers to look at the BOINC code. Is a database driven system the best system to control a project like this? Or is it just all they know?


SETI is a work in progress. Both on the SETI side, and the user side, there are some very sharp people working on this in order to solve many different issues. I am still learning several computer languages, but I can say for certain, based on their posts, that the developers on both sides are definitly the ones you want in those positions. Even on the user side, look how calm and precise the developers are. They know what they are talking about, and don't lose their cool. BOINC is a collection of systems. You have the BOINC infrastructure, with the individual project apps that process some sort of raw data depending on which project you crunch for. And you have us the crunchers working without some of the knowledge the developers have. Each of those different projects has their own specialists, that know the ins and outs of everything. I think that things will work out in the long run, and the issues we face today will be history.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1013021 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1013044 - Posted: 7 Jul 2010, 1:38:22 UTC - in response to Message 1013015.  

Bill Walker wrote:

From Matt's post over in Technical News:
Data wise, we were able to get back to merging our various spike tables together full bore - doing so while the project was up was causing all kinds of headaches. We'll have to turn the merge off over the weekend, of course. I also was able to do a whole bunch of data integrity testing - it's nice to be able to pull 1 Gbyte of signals out of the science database without the query getting blocked, or worrying about blocking other queries.

I guess everything is tied together, to the point where you can analyze the data, or create the data, but not both at the same time.

Splitting new S@H Enhanced work or assimilating results do involve the S@H science database, so are effectively ruled out during these outages. Because downloads and uploads don't involve either science or BOINC database access, those could remain enabled with the least impact, although at least 3 servers are involved; bane, vader, and bruno. Having forums alive of course needs the BOINC database servers and "data-driven web pages" at a minimum. The more that is running, the more likely something will go wrong and interrupt planned work during the outage, so although they could theoretically just disable all splitters, validators, and assimilators and let the remaining "Results ready to be sent" be issued, I don't expect that.
                                                              Joe

While it is true that sending work and reporting work does not involve the science database, sending new work relies on the splitters which do require the science database.

Very true, as I said. But it is creating new work which hits the science database, after that there's often a significant delay in the "Results ready to send" queue before sending. The splitters can be disabled or "not running" and still many tasks can be sent. I won't claim that's practical, there's ample evidence they don't want to stop a splitter in the middle of a channel so it would be a fuzzy shutdown as channels finish.
                                                               Joe
ID: 1013044 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1013060 - Posted: 7 Jul 2010, 2:05:03 UTC - in response to Message 1013007.  

I noticed a glitch a few weeks ago. Something messed up and all the estimated completion times were way off for some reason. Some work units that had been estimated to complete in an hour, were now estimated to complete in a minute or two, etc. One of the results of the glitch was that my client then downloaded about 30 days of work (I had it set to 10 days at the time).

That glitch was a transition to a new system of estimating run times (and granting credit). It has temporarily been turned off for those of us using anonymous platform, you can expect a similar glitch whenever it is tried again. But it ought to be just that, a temporary glitch which declines fairly quickly. There could be additional bugs, and because BOINC must be prepared to handle projects with many different characteristics it seldom achieves performance ideal for any one project.

They also need to get a some real software engineers to look at the BOINC code. Is a database driven system the best system to control a project like this? Or is it just all they know?

You might want to look at the wikipedia entry for Dr. Anderson and follow some of the external links. One paper he, Eric Korpela, and Rom Walton authored specifically discusses some of the design decisions.
                                                                 Joe
ID: 1013060 · Report as offensive
Profile Odan

Send message
Joined: 8 May 03
Posts: 91
Credit: 15,331,177
RAC: 0
United Kingdom
Message 1013148 - Posted: 7 Jul 2010, 8:52:19 UTC - in response to Message 1012657.  

Wasn't that the idea?

Not with a 10 days cache. Or haven't you read Jeff's post either? Seems like a lot of people haven't done so. Or all must've thought it wasn't meant for their eyes.

While in the same thread I pointed to, people are even saying they're increasing from whatever cache they had to a 10 day cache in anticipation of the limit to be lifted. Uhuh, good way to get a limit increase next week, instead of a limit lift.

Before you think I say this because I didn't get any, I did... Took me 5 hours to get 31 tasks in on the i3. Blistering high download speeds of 0.24KB and 0.03KB/sec on downloads. I liked those. Something new for my 25Mbit connection to deal with. :)

Those 31 probably won't get me through the outage, were it not that I also have 28 Einstein, 2 CPDN and at least 50 Primegrid. I'll weather it.

No, I was posting this because it's always nice to see everyone follow admin's request. But y'all seem to think they don't listen to your complaints, so why should you listen to them, right? :)


Hi Jord,

I feel your frustration. I did reduce my cache BTW but I also fiddled with some DCFs to get what I expect to just about tide me over until the servers come back on line. All in all, If I had just left my cache at 8 days i would have ended up with about the same 4-ish days of work.

I do look forward to the time when we have more accurate cache filling. I know this is being attempted & I wait patiently for the improvement & realise that I am effectively taking part in RC testing after Beta testing was unfortunately too small to be conclusive.

One thing that occurred to me while I was waiting for the 20 WU limit to be removed was that it was somewhat counter productive.

During the approx 3.5 days of restricted issue the number of MB in progress gradually fell. I believe that this is at least partly due to the draining of remaining larger caches for the fastest crunchers among us who can actually crunch more than 20 units at a time(!) as well as the slightly more usual among us who get through 20 WU quite quickly.

AP exhibited an interesting ratcheting up pattern that seems to be related to AP units being available for something like 1 hour in every 4 or so much more freely than at other times (don't shoot me here I'm really not quite sure what happened but it looks this way to me :) )

Anyway, my point is that the 20 WU limit was a very good idea for the initial restart; it allowed almost everyone to get some form of cache & to get crunching straight away without us being able to max out the downloads. Unfortunately I think it was too severe for too long if S@h still wants to be available for people to crunch continually (I'm not sure that is what the team wants or not, I haven't seen any statement either way. I know that nobody guarantees 24/7 availability I just wonder if Berkley want us all to continue crunching at about the same rate or if they want a reduction)

Right, back to the point :) If you look at the Cricket graphs for the week you can see that for those 3+ days we were only downloading WUs at something like 45% of practical capacity for most of that time. Remember that during that time the reservoir of work in progress was falling steadily; individuals were seeing their previously still large caches draining steadily and panicked, increasing their cache sizes further in the hope of snagging WUs when the flood gates opened. For these people we effectively had a 6 day cache drain so that when the limit was removed for 26 hours or so there was a lot of cache to fill up.

If some of the 55% of available bandwidth could have been freed up sooner this would have reduced the cache drain and given longer for the caches to fill up. It would also have helped a bit to calm the urge in some people to increase cache sizes "just in case".

Of course, if the splitters could not have kept up it wouldn't have mattered but we did start this latest outage with a reservoir of WUs on the servers that could not be sent; I also noticed not all the splitters were running all the time so it appeared that there was some capacity not used.

Anyway, I hope that we all have a calm outage :) and that everything comes up smoothly on Friday. I hope there is a bit freer availability of WUs.

Happy crunching all!
ID: 1013148 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : SETI loves me.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.