Panic Mode On (90) Server Problems?

Message boards : Number crunching : Panic Mode On (90) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 24 · Next

AuthorMessage
merle van osdol

Send message
Joined: 23 Oct 02
Posts: 809
Credit: 1,980,117
RAC: 0
United States
Message 1582729 - Posted: 6 Oct 2014, 21:18:58 UTC

1984
Soylent Green
ID: 1582729 · Report as offensive
aad

Send message
Joined: 3 Apr 99
Posts: 101
Credit: 204,131,099
RAC: 26
Netherlands
Message 1582758 - Posted: 6 Oct 2014, 22:37:05 UTC

There's a second ap validater 4 running on synergy.
So something changed!
Someone duplicated it to get it working again?
Doesn't work though.

I'm with HAL9000 ; it will happen tomorrow when there is the weekly maintenance
ID: 1582758 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1582760 - Posted: 6 Oct 2014, 22:45:02 UTC - in response to Message 1582758.  

There's a second ap validater 4 running on synergy.
So something changed!
Someone duplicated it to get it working again?
Doesn't work though.

I'm with HAL9000 ; it will happen tomorrow when there is the weekly maintenance

It also lists 5 instances of ap assimilator 4. I don't think any of them are doing anything. It has been that way for at least a few hours. Could it be the result of trying to fix things, but not quite working?
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1582760 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1582778 - Posted: 6 Oct 2014, 23:32:03 UTC

Stopping the validator in advance of a switchover doesn't make any sense. It could be the reason they kept the splitters fed longer than usual, though.

FWIW, my oldest report date on an AP task that has two reports pending is Oct. 2, and I only have one or two that were reported that early still pending. Most are from the 3rd or later.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1582778 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1582845 - Posted: 7 Oct 2014, 2:27:55 UTC

The Results waiting for db purging has stopped at 2,849. I wonder if that is the total stuck in limbo.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1582845 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1582854 - Posted: 7 Oct 2014, 2:46:37 UTC - in response to Message 1582845.  

The Results waiting for db purging has stopped at 2,849. I wonder if that is the total stuck in limbo.

I could be wrong but I am thinking the reason why the DB purging has stopped at 2849 is because no work has been validated for some time this meaning there is no results to delete. However this does not explain why the results sitting there have not been deleted. The only explanation I can come up with is that the command has not been triggered. This is just my thoughts
ID: 1582854 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1582876 - Posted: 7 Oct 2014, 4:06:10 UTC - in response to Message 1582854.  
Last modified: 7 Oct 2014, 4:07:45 UTC

The Results waiting for db purging has stopped at 2,849. I wonder if that is the total stuck in limbo.

I could be wrong but I am thinking the reason why the DB purging has stopped at 2849 is because no work has been validated for some time this meaning there is no results to delete. However this does not explain why the results sitting there have not been deleted. The only explanation I can come up with is that the command has not been triggered. This is just my thoughts

I was thinking it stopped for the same reason. As I mentioned before I wonder if the remaining 2849 could be those stuck in limbo like this one.
http://setiathome.berkeley.edu/workunit.php?wuid=1398656868
Which require them to manually run a clean up script to push the stuck ones through the system.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1582876 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1582877 - Posted: 7 Oct 2014, 4:09:04 UTC

Hmmm....I see that my "SETI@home preferences" page now includes an option for AstroPulse v7, which defaults to "no" (at least mine did). I would've thought they'd set it up to default to whatever a user's Astropulse v6 choice was set to. Oh well, I just changed my to "yes", so it'll be ready whenever v7 rolls out.

The WU details pages also now include an AstroPulse v7 count and link. Must be getting close! Now, if they could just get those pesky AstroPulse v6 WUs validated. ;^)
ID: 1582877 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1582880 - Posted: 7 Oct 2014, 4:15:45 UTC - in response to Message 1582854.  

I could be wrong but I am thinking the reason why the DB purging has stopped at 2849 is because no work has been validated for some time this meaning there is no results to delete

"purging" is not "delete"

"Results waiting for db purging" means - task is already: validated, assimilated, files deleted
Only the info we see on the web pages remains to be purged (to vanish after ~24 h)

So 2,843 have to be really tasks ('Results' in their terminology) that are "stuck in limbo" and look like:
http://setiathome.berkeley.edu/workunit.php?wuid=1481107605
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1582880 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1582891 - Posted: 7 Oct 2014, 4:30:47 UTC - in response to Message 1582880.  
Last modified: 7 Oct 2014, 4:32:47 UTC

AP v7.05 was released on beta sometime this evening, so obviously there are still issues that need to be worked out. When they are ready, they will release it.
ID: 1582891 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1582893 - Posted: 7 Oct 2014, 4:33:39 UTC - in response to Message 1582877.  

Hmmm....I see that my "SETI@home preferences" page now includes an option for AstroPulse v7, which defaults to "no" (at least mine did). I would've thought they'd set it up to default to whatever a user's Astropulse v6 choice was set to. Oh well, I just changed my to "yes", so it'll be ready whenever v7 rolls out.

The WU details pages also now include an AstroPulse v7 count and link. Must be getting close! Now, if they could just get those pesky AstroPulse v6 WUs validated. ;^)

WE still don't have any apps yet, but if someone is working this late they might be in the process. I'm not sure if they will show up tomorrow. I think there is an issue with the MAC GPU app still, but I'm not sure if that is an app issue or a hardware config issue.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1582893 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1582894 - Posted: 7 Oct 2014, 4:34:14 UTC - in response to Message 1582891.  

Why did you 'answer' me?
(I did not say anything about release of AstroPulse v7)
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1582894 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1582899 - Posted: 7 Oct 2014, 4:44:25 UTC - in response to Message 1582894.  

I clicked the nearest reply button..I'm lazy like that, sorry Bil. As far as the MAC GPU issue..It's seems to work (until apple decides to mess with the drivers again) with the newer ones, though older macs seem to have that error report. Even with the error report they validate so who knows. I'm guessing hardware configuration issues but I'm no expert.
ID: 1582899 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1582900 - Posted: 7 Oct 2014, 4:48:02 UTC - in response to Message 1582899.  

I clicked the nearest reply button..

[Post to thread] is even nearer ;)
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1582900 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1582904 - Posted: 7 Oct 2014, 5:11:37 UTC - in response to Message 1582899.  
Last modified: 7 Oct 2014, 5:29:19 UTC

I clicked the nearest reply button..I'm lazy like that, sorry Bil. As far as the MAC GPU issue..It's seems to work (until apple decides to mess with the drivers again) with the newer ones, though older macs seem to have that error report. Even with the error report they validate so who knows. I'm guessing hardware configuration issues but I'm no expert.

You should note the problem is with the Older Mac nVidia GPUs. I suspect it's related to the nVidia Problems on other Platforms, @Pre-FERMI nVidia GPU users: Important warning. There isn't any problem with the Mac ATI App. In fact, I'm set. Just waiting for one to pop up.
<app_info>
   <app>
      <name>astropulse_v7</name>
   </app>
    <file_info>
      <name>astropulse_7.04_x86_64-apple-darwin__opencl_ati_mac</name>
      <executable/>
    </file_info>
    <file_info>
       <name>ap_cmdline_7.04_x86_64-apple-darwin__opencl_ati_mac.txt</name>
    </file_info>
    <app_version>
      <app_name>astropulse_v7</app_name>
      <version_num>704</version_num>
      <plan_class>opencl_ati_mac</plan_class>
      <avg_ncpus>0.1</avg_ncpus>
      <max_ncpus>1</max_ncpus>
      <flops>900000000000</flops>
       <coproc>
        <type>ATI</type>
        <count>1</count>
       </coproc>
      <file_ref>
        <file_name>astropulse_7.04_x86_64-apple-darwin__opencl_ati_mac</file_name>
        <main_program/>
      </file_ref>
      <file_ref>
        <file_name>ap_cmdline_7.04_x86_64-apple-darwin__opencl_ati_mac.txt</file_name>
        <open_name>ap_cmdline.txt</open_name>
       </file_ref>
    </app_version>
   <app>
      <name>astropulse_v6</name>
   </app>
    <file_info>
      <name>ap_6.07r1874_sse3_clATI_OSX64</name>
      <executable/>
....

Tue Oct  7 01:15:40 2014 | SETI@home | No tasks sent
Tue Oct  7 01:15:40 2014 | SETI@home | No tasks are available for AstroPulse v6
Tue Oct  7 01:15:40 2014 | SETI@home | No tasks are available for AstroPulse v7
ID: 1582904 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1582985 - Posted: 7 Oct 2014, 9:59:19 UTC

I pernoally could not care about V7 JUST FIX THE DAM F$$$ing validatipn already Freaking hell tuesday night here so no dam F#$%ing excuse not to have someone look at it why do any work when your not getting the credit for god sack.

You watch well have the outage and when it comes back it will just F$#%up because the servers will try to catch up better to fix it now so they won't be overworked after the outage
ID: 1582985 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34859
Credit: 261,360,520
RAC: 489
Australia
Message 1582990 - Posted: 7 Oct 2014, 10:22:30 UTC
Last modified: 7 Oct 2014, 10:23:53 UTC

Take it easy Glenn, don't blow a fuse there mate, we'll all get paid in the end, eventually (so long as no funny scripts are thrown in again). ;-)

Well my 3570K CPU is doing the last 2 of its AP's (30mins will see them done) and is now starting back on a diet of MB's on all of its CPU cores.

My 2500K still has another 1.5 days worth yet.

Cheers.
ID: 1582990 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1582991 - Posted: 7 Oct 2014, 10:26:01 UTC - in response to Message 1582985.  

I pernoally could not care about V7 JUST FIX THE DAM F$$$ing validatipn already Freaking hell tuesday night here so no dam F#$%ing excuse not to have someone look at it why do any work when your not getting the credit for god sack.

You watch well have the outage and when it comes back it will just F$#%up because the servers will try to catch up better to fix it now so they won't be overworked after the outage

Language, please. Just calm down.

The validator backlog really does not matter in the grand scheme of things. It isn't stopping anybody working on the tasks they already have. It won't prevent new work being split when those of us who work on MB catch up on the raw data supply.

Validation, in the sort of numbers we're talking about (about 75 minutes MB production), is a very quick process. They'll be cleared in minutes once the staff task of exploring and fixing whatever is holding up the validation process bubbles to the top of their ToDo list. And that's a long list, and there are too few people to process it. More money would help.

In the meantime, what are the real problems for the project with this delay?

1) The data files for unvalidated tasks probably aren't being deleted from the storage arrays. At 8 MB a throw, that mounts up, but isn't critical.
2) A few tens of thousands of extra rows in the database can't be purged. In a database that can handle 10 million rows, that's negligable.
3) Some 5/6 year old recordings aren't being assimilated into the science database until a few days after processing.
4) And gollum points (credit) are also a few days late. Is that really all that this comes down to?
ID: 1582991 · Report as offensive
merle van osdol

Send message
Joined: 23 Oct 02
Posts: 809
Credit: 1,980,117
RAC: 0
United States
Message 1583014 - Posted: 7 Oct 2014, 11:33:36 UTC - in response to Message 1582991.  

This delay makes no difference to me at all especially now that you have brought it to light. Thanks
ID: 1583014 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1583032 - Posted: 7 Oct 2014, 12:14:28 UTC - in response to Message 1583014.  

This delay makes no difference to me at all especially now that you have brought it to light. Thanks

We are analyzing data recorded a few years ago from stars that are many light years away. There isn't really anything time sensitive about the project.
Having one part of it delayed a few days or weeks in the grand scheme of things doesn't matter.
Discussing and speculating on what may be wrong is just something to do to pass the time.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1583032 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 24 · Next

Message boards : Number crunching : Panic Mode On (90) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.