Panic Mode On (92) Server Problems?

Message boards : Number crunching : Panic Mode On (92) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 23 · Next

AuthorMessage
Profile S@NL Etienne Dokkum
Volunteer tester
Avatar

Send message
Joined: 11 Jun 99
Posts: 212
Credit: 43,822,095
RAC: 0
Netherlands
Message 1606643 - Posted: 28 Nov 2014, 20:16:35 UTC

Happy birthday Juan ! Don't toast too much now ;-)
ID: 1606643 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1606650 - Posted: 28 Nov 2014, 20:48:14 UTC - in response to Message 1606554.  

You are in this case, you're running a Boinc without OpenCL detection:

Another reason not to run Einstein anymore.
OpenCL app without own OpenCL detection.

No, OpenCL support done the way it should be done - collaborating with BOINC, letting each part of the system play its proper part in the overall picture.

Taking OpenCL detection down a level to the science application layer (instead of the infrastructure layer) was a useful stopgap when the science app was ready before the infrastructure caught up, but that's history now. Time to grow up.


Wrong way of thinking.
Einstein ignores thousands of users who wont use Boinc 7.
Their loss not mine.
Build in OpenCL detection wouldn`t hurt.

Actually Mike I have no problems myself with Einstein and BOINC 6, but then I'm nvidia equiped only here though I also have the same aversion to BOINC 7. ;-)

Cheers.
ID: 1606650 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1606657 - Posted: 28 Nov 2014, 21:09:21 UTC - in response to Message 1606546.  

You are in this case, you're running a Boinc without OpenCL detection:

Another reason not to run Einstein anymore.
OpenCL app without own OpenCL detection.

No, OpenCL support done the way it should be done - collaborating with BOINC, letting each part of the system play its proper part in the overall picture.

Taking OpenCL detection down a level to the science application layer (instead of the infrastructure layer) was a useful stopgap when the science app was ready before the infrastructure caught up, but that's history now. Time to grow up.

Having OpenCL detection in the science application also facilitates standalone application testing. An init_data.xml file can be used instead, but the BOINC WIKI AppDebug page doesn't even mention that. The Einstein developers of course have enough experience to do whatever is necessary to alpha test their applications, but a new project might not.

BOINC is a prerequisite for doing SETI@home work, and I accept that. It will never be ideal for any specific project simply because there are too many conflicting needs. For my dial-up situation it is getting worse and worse, of course, and sometime I'll need to start paying an additional 500 USD per year for the kind of anemic broadband now available where I live.
                                                                  Joe
ID: 1606657 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1606679 - Posted: 28 Nov 2014, 22:22:06 UTC - in response to Message 1606650.  

Hi Wiggo,

What perzakkerly are the problems with BOINC 7.x.xx with E@H?

I [touch wood] haven't had any with my rig running 7.2.42 or now 7.4.27.

I run both GPU & CPU tasks ok, no errors, no invalids, so far at any rate.
Or is it an ATI problem? I only run NV GPU's.

Regards,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1606679 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1606682 - Posted: 28 Nov 2014, 22:31:49 UTC
Last modified: 28 Nov 2014, 22:32:50 UTC

You shouldn't have any problems Cliff.

It's only when you have AMD/ATi GPU's combined with older v6 versions of BOINC that you'll have problems there.

So you have nothing to worry about. ;-)

I just don't like the BOINC versions above 6.10.60.

Cheers.
ID: 1606682 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1606695 - Posted: 28 Nov 2014, 22:56:59 UTC - in response to Message 1606657.  

Having OpenCL detection in the science application also facilitates standalone application testing. An init_data.xml file can be used instead, but the BOINC WIKI AppDebug page doesn't even mention that. The Einstein developers of course have enough experience to do whatever is necessary to alpha test their applications, but a new project might not.

I'm pretty certain that a new project would be desperate not to have to bother with all the detection gubbins in their first-ever GPU app. If BOINC provides all of that for free - let's run with it. They probably test by setting up a closed BOINC server on their lab network (no mean feat by itself, but useful practice), and run the apps there.
ID: 1606695 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1606709 - Posted: 28 Nov 2014, 23:11:06 UTC

Someone mentioned that their RAC doesn't seem to be falling/is frozen. I can see where you get that idea.. from your hosts page. Mine still shows my main cruncher as having 3637.38, whilst my overall is reported as 696.75. In fact, I just opened up the statistics XML file and took a look, and the useraverage has been steadily falling, but the host average has been static/frozen since the 7th.

The single-core machine that still gets some occasional MB re-sends has a constantly-changing RAC because it is still getting work.

I know the AP database has been down a while, but.. shouldn't RAC still function correctly even though all the reported work is still waiting for validation?
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1606709 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1606735 - Posted: 29 Nov 2014, 0:33:57 UTC - in response to Message 1606709.  
Last modified: 29 Nov 2014, 0:36:20 UTC

Someone mentioned that their RAC doesn't seem to be falling/is frozen. I can see where you get that idea.. from your hosts page. Mine still shows my main cruncher as having 3637.38, whilst my overall is reported as 696.75. In fact, I just opened up the statistics XML file and took a look, and the useraverage has been steadily falling, but the host average has been static/frozen since the 7th.

The single-core machine that still gets some occasional MB re-sends has a constantly-changing RAC because it is still getting work.

I know the AP database has been down a while, but.. shouldn't RAC still function correctly even though all the reported work is still waiting for validation?

Well on my best machine according to the statistics XML file on the 15th November the user RAC was 38032, today 16122, the host avg was 11503 and now 4694.

So it does seem to be falling!!
ID: 1606735 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1606747 - Posted: 29 Nov 2014, 1:29:55 UTC - in response to Message 1606709.  

Someone mentioned that their RAC doesn't seem to be falling/is frozen. I can see where you get that idea.. from your hosts page. Mine still shows my main cruncher as having 3637.38, whilst my overall is reported as 696.75. In fact, I just opened up the statistics XML file and took a look, and the useraverage has been steadily falling, but the host average has been static/frozen since the 7th.

The single-core machine that still gets some occasional MB re-sends has a constantly-changing RAC because it is still getting work.

I know the AP database has been down a while, but.. shouldn't RAC still function correctly even though all the reported work is still waiting for validation?

My AP only machines have had a flat RAC for weeks. The ones doing MB work have changed accordingly.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1606747 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1606780 - Posted: 29 Nov 2014, 3:05:52 UTC - in response to Message 1606747.  

My AP only machines have had a flat RAC for weeks. The ones doing MB work have changed accordingly.

That makes sense because APs have not been updated for a long time.
ID: 1606780 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1606787 - Posted: 29 Nov 2014, 3:12:50 UTC - in response to Message 1606682.  

Hi Wiggo,

Ahh, OK
Wont bother too much then:-)

BTW I got another whole 1 WU, resend.. lasted about 26mins:-)

Regards,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1606787 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1606807 - Posted: 29 Nov 2014, 4:53:25 UTC - in response to Message 1606780.  

My AP only machines have had a flat RAC for weeks. The ones doing MB work have changed accordingly.

That makes sense because APs have not been updated for a long time.

The calculation of RAC has a built in decay value every 7 days. While the processes to calculate RAC could be part of one of the app processes that wouldn't make complete sense. Once a host has all of it's tasks purged. Which app process would take control of handling the RAC calculation?
Perhaps it is handled by the app processes for the last task the host completed. Which might be make retiring app processes for older versions tricky. As it would leave numerous hosts at a static level of the processes was retired.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1606807 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1606852 - Posted: 29 Nov 2014, 8:56:28 UTC - in response to Message 1606807.  

My AP only machines have had a flat RAC for weeks. The ones doing MB work have changed accordingly.

That makes sense because APs have not been updated for a long time.

The calculation of RAC has a built in decay value every 7 days. While the processes to calculate RAC could be part of one of the app processes that wouldn't make complete sense. Once a host has all of it's tasks purged. Which app process would take control of handling the RAC calculation?
Perhaps it is handled by the app processes for the last task the host completed. Which might be make retiring app processes for older versions tricky. As it would leave numerous hosts at a static level of the processes was retired.

RAC is defined to have a 7-day decay half-life, but credit and RAC aren't actually calculated on a timed basis: they are simply updated by whatever fraction is appropriate every time a 'credit related event' occurs - i.e. when new credit is granted.

At the moment, people who have crunched MB in the past should still be getting new credit fairly regularly, as the long tail of pending results trickle in, and their RAC will be reasonably accurate. People who crunch AP only will see their RAC unchanged since the last time the AP validator ran and granted them credit. Ironically, the first thing that will happen when the AP validators start up again is that RAC will be recalculated with the first task validated, and all of that pent-up decay will drive RAC sharply downwards.

All credit calculations are done on the server, and what you see on the statistics tab in BOINC Manager (or in the project stats XML file) is simply the state of play as reported by the server the last time your client contacted to server to request work.

The question of what happens to a retired host after its last task is validated and purged is an interesting one. Most of mine have RAC below 0.1, but some have 0.00 - so some other decay mechanism must be in place. I don't know what that is, but I might have a look around.
ID: 1606852 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1606856 - Posted: 29 Nov 2014, 9:28:38 UTC

Down to one in-process job. Now have Einstein running on several of my "best" machines in the interim. I did have a problem with it on my work Win7 desktop -- a quad-core i7 (with hyper-threading) with 4 GB of RAM. It spawned eight tasks of 400 MB each, and sent the machine into swap territory! I eventually managed to regain enough control to be able to set local preferences to "use at most 50% memory" but it took a while! I'm away, imminently, for the next five weeks so I'm not overly concerned about interactive response for the nonce.
ID: 1606856 · Report as offensive
Profile Michel Makhlouta
Volunteer tester
Avatar

Send message
Joined: 21 Dec 03
Posts: 169
Credit: 41,799,743
RAC: 0
Lebanon
Message 1607030 - Posted: 29 Nov 2014, 21:30:01 UTC

Off topic but would induce panic. What is wrong with the threads? It's not sorted by time anymore.
ID: 1607030 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65733
Credit: 55,293,173
RAC: 49
United States
Message 1607032 - Posted: 29 Nov 2014, 21:36:45 UTC

Ok so I signed up for Beta, of course until the new motherboard is online in Jan 2015 I can't do any beta gpu work, since I lack a large enough case, an Azza GT1 is needed, the driver is too old and the motherboard power regulators are nearly shot.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1607032 · Report as offensive
Profile Julie
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 28 Oct 09
Posts: 34053
Credit: 18,883,157
RAC: 18
Belgium
Message 1607033 - Posted: 29 Nov 2014, 21:39:03 UTC

Not a lot of tasks left on my main rig:( The GPU isn't crunching anymore...
rOZZ
Music
Pictures
ID: 1607033 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1607035 - Posted: 29 Nov 2014, 21:42:26 UTC - in response to Message 1607030.  

Off topic but would induce panic. What is wrong with the threads? It's not sorted by time anymore.

That can happen if your browser history/cookies are cleared.
Grant
Darwin NT
ID: 1607035 · Report as offensive
Profile Julie
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 28 Oct 09
Posts: 34053
Credit: 18,883,157
RAC: 18
Belgium
Message 1607044 - Posted: 29 Nov 2014, 21:59:58 UTC

Praying for Bruno... (and some cash=>for the project)
rOZZ
Music
Pictures
ID: 1607044 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1607056 - Posted: 29 Nov 2014, 22:16:45 UTC
Last modified: 29 Nov 2014, 22:27:37 UTC

I wonder if the people that provide the Internet data/bandwidth for the work units to go out and come back on wondered what had happened and why the connection wasn't using terabytes/petabytes of data. It would be interesting to know how much data has been saved while the project has been only sending out resends. I bet when the project comes fully back online there will be a traffic jam for some time. On a very quick calculation if 1000 AP units were sent out an hour that is saving of just over 196 gig more like terabyte a day. I got this by going 1024×8 = 8192 × by 24 =196608 divided by 24=196.608
ID: 1607056 · Report as offensive
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 23 · Next

Message boards : Number crunching : Panic Mode On (92) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.