Panic Mode On (74) Server problems?

Message boards : Number crunching : Panic Mode On (74) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1233171 - Posted: 19 May 2012, 1:47:49 UTC - in response to Message 1233157.  

No main GPU work available though.

?
GPU work is all i'm getting at the moment. Not expecting any CPU work till the GPU cache has been topped up sufficiently.
So far there's only been a couple of shorties in sight, so hopefully it won't take too long.

Grant
Darwin NT
ID: 1233171 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1233174 - Posted: 19 May 2012, 1:49:08 UTC - in response to Message 1233171.  

Just got word from Eric. Good news is that there doesn't appear to have been any lost hardware. We were worried about lost hardware in light of Bane's death along with Dan's old workstation during the last outage.


Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1233174 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1233176 - Posted: 19 May 2012, 1:51:39 UTC - in response to Message 1233174.  

Just got word from Eric. Good news is that there doesn't appear to have been any lost hardware. We were worried about lost hardware in light of Bane's death along with Dan's old workstation during the last outage.

Excellent news....
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1233176 · Report as offensive
Profile Misfit
Volunteer tester
Avatar

Send message
Joined: 21 Jun 01
Posts: 21804
Credit: 2,815,091
RAC: 0
United States
Message 1233177 - Posted: 19 May 2012, 1:53:28 UTC - in response to Message 1233171.  

No main GPU work available though.

?
GPU work is all i'm getting at the moment. Not expecting any CPU work till the GPU cache has been topped up sufficiently.
So far there's only been a couple of shorties in sight, so hopefully it won't take too long.

I had to complain about it before I got any apparently.
me@rescam.org
ID: 1233177 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1233217 - Posted: 19 May 2012, 3:52:21 UTC - in response to Message 1233071.  

This WU looks like it'll be fun. _0 and _1 both failed on download. _2 managed to get it, supposedly. _3 and _4 (me) failed on download. _5 is unsent at the moment.

Check this one - Too many errors (may have bug).

I'm getting a huge amount of download errors now. What's going on?

Yeah, me too, lots of these:

1735 SETI@home 2012-05-18 21:17:31 [error] File 23fe11ab.28540.3508.8.10.14 has wrong size: expected 375361, got 0
1736 SETI@home 2012-05-18 21:17:31 [error] Checksum or signature error for 23fe11ab.28540.3508.8.10.14

Could that be due to the RAID resynchs Matt mentioned earlier?
Or maybe just saturation on the download pipe?

See Khangollo's message 1232644, posted before the power failure, showing that kind of error. Whatever the problem is, it's not an aftereffect of the power failure.
                                                                  Joe

I suspect some of the drive volumes were unreachable by the download servers shortly before everything shut down. Are all the servers on the same UPS ?

Since these download errors were already being discussed nearly four hours before the power went out, they can't very well be a result of it.

If the Cricket graph is a reliable indicator of the time (I don't see why it wouldn't be), the outage started at just about precisely (suspiciously precisely) 2000 PDT, or 0300 UTC. I have 39 such download errors, most if not all from several hours before 0300 UTC, so once again, they are clearly not a result of the power outage.

What I'm wondering about is that some of my cancelled WUs still show as in progress for other users.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1233217 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1233306 - Posted: 19 May 2012, 6:54:14 UTC - in response to Message 1233115.  
Last modified: 19 May 2012, 6:54:37 UTC

Another issue, maybe...

Although SETI Beta is up, the usual http://setiweb.ssl.berkeley.edu/beta/ link gives a 404 not found status for me. http://setiathome.ssl.berkeley.edu/beta/ or just http://setiathome.berkeley.edu/beta/ will get the front page, but the cookies (and the certificate for logging in) are all for setiweb. Anyone else seeing the same?
                                                                   Joe

I'm logged on the the first url already, normally have to go through the certificate error to do so, looking at my account on the 2nd and 3rd url gives me the certificate error,

Claggy
ID: 1233306 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1233337 - Posted: 19 May 2012, 9:51:47 UTC


Network traffic has dropped off slightly, which isn't a good sign.
My last 8 requests for work resulted in "Project has no tasks available" messages.
Till now, since the outage, maybe as few as 1 in 7 requests got that response.
Grant
Darwin NT
ID: 1233337 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1233456 - Posted: 19 May 2012, 15:06:06 UTC - in response to Message 1233217.  

I've just had a couple of download errors at Seti Beta, so i tried to download one of them with Getright, instead of being the normal 8Mb, it's only 4.1Mb

http://boinc2.ssl.berkeley.edu/beta/download/25f/ap_20au10ae_B2_P0_00061_20120515_03292.wu

Claggy
ID: 1233456 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1233499 - Posted: 19 May 2012, 16:30:24 UTC - in response to Message 1233461.  

I've just had a couple of download errors at Seti Beta, so i tried to download one of them with Getright, instead of being the normal 8Mb, it's only 4.1Mb

http://boinc2.ssl.berkeley.edu/beta/download/25f/ap_20au10ae_B2_P0_00061_20120515_03292.wu

Claggy



Yup, I downloaded it too. 4.1Mb....


Indeed.

ID: 1233499 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1233506 - Posted: 19 May 2012, 16:49:19 UTC - in response to Message 1233461.  

I've just had a couple of download errors at Seti Beta, so i tried to download one of them with Getright, instead of being the normal 8Mb, it's only 4.1Mb

http://boinc2.ssl.berkeley.edu/beta/download/25f/ap_20au10ae_B2_P0_00061_20120515_03292.wu

Claggy



Yup, I downloaded it too. 4.1Mb....


Since all my wingmen couldn't get all the file eithier, there's probably a failed drive or screwed filesystem out there:

http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=3987470

Claggy
ID: 1233506 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1233525 - Posted: 19 May 2012, 17:18:38 UTC - in response to Message 1233306.  

Another issue, maybe...

Although SETI Beta is up, the usual http://setiweb.ssl.berkeley.edu/beta/ link gives a 404 not found status for me. http://setiathome.ssl.berkeley.edu/beta/ or just http://setiathome.berkeley.edu/beta/ will get the front page, but the cookies (and the certificate for logging in) are all for setiweb. Anyone else seeing the same?
                                                                   Joe

I'm logged on the the first url already, normally have to go through the certificate error to do so, looking at my account on the 2nd and 3rd url gives me the certificate error,

Claggy

I had upgraded my Opera browser during the power outage, that seems to have been at least part of the problem as an older version of Opera on my laptop worked OK. But today the new browser is also working on the setiweb URL...
------------------

Other issue ---
Since all my wingmen couldn't get all the file eithier, there's probably a failed drive or screwed filesystem out there:

http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=3987470

Claggy

I had one of those, too. I think it's actually related to the problems with 0 byte WUs noted earlier in this thread, the creation time of 15 May 2012 | 19:59:56 UTC is about the same. I had emailed Eric about those and he said that georgem had to be rebooted and the splitters lost their NFS mounts to WU storage in that time period, though he was puzzled why parts of the system thought the files were properly written.
                                                                 Joe
ID: 1233525 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6651
Credit: 121,090,076
RAC: 0
United States
Message 1233593 - Posted: 19 May 2012, 20:40:10 UTC
Last modified: 19 May 2012, 20:40:47 UTC

I just noticed I have 2863 Cuda tasks, and 0 CPU tasks. My chiller just cut off, then on, or I never would have known it. I think this is a first. I usually run out of GPU tasks a couple of days before I run out of CPU tasks.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1233593 · Report as offensive
Profile shizaru
Volunteer tester
Avatar

Send message
Joined: 14 Jun 04
Posts: 1130
Credit: 1,967,904
RAC: 0
Greece
Message 1233605 - Posted: 19 May 2012, 21:14:51 UTC - in response to Message 1233593.  

I just noticed I have 2863 Cuda tasks, and 0 CPU tasks. My chiller just cut off, then on, or I never would have known it. I think this is a first. I usually run out of GPU tasks a couple of days before I run out of CPU tasks.

Steve


Sounds like you need to do the "uncheck-GPU-tasks-in-Seti-prefs-and-update-project-in-BM" hocus pocus. It has instantly worked for me, three times so far...
ID: 1233605 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6651
Credit: 121,090,076
RAC: 0
United States
Message 1233622 - Posted: 19 May 2012, 21:33:11 UTC

I just used the rescheduler to shift a few over to the CPU. My chiller is running constantly again. When more CPU units are downloaded, all will be good.....

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1233622 · Report as offensive
Profile elbeejay
Avatar

Send message
Joined: 17 Oct 01
Posts: 18
Credit: 173,951
RAC: 0
United States
Message 1234403 - Posted: 21 May 2012, 5:40:19 UTC

Why is there NO DATA to download from the SETI site???
NONE FOR A WHILE NOW......
ID: 1234403 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1234405 - Posted: 21 May 2012, 5:45:04 UTC - in response to Message 1234403.  

Why is there NO DATA to download from the SETI site???
NONE FOR A WHILE NOW......

Uhhh...
I have 8 rigs online, and am getting work quite readily.
What Boinc messages or errors are you getting?
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1234405 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1234409 - Posted: 21 May 2012, 5:50:21 UTC - in response to Message 1234403.  

Just got 47 gpu wus. Tried rebooting?
ID: 1234409 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1234413 - Posted: 21 May 2012, 6:03:38 UTC - in response to Message 1234403.  
Last modified: 21 May 2012, 6:04:46 UTC

Why is there NO DATA to download from the SETI site???
NONE FOR A WHILE NOW......

Go to your Computing preferences and under "Network usage" you'll see "Maintain enough tasks to keep busy for at least" and ".. and up to an additional" to which I set both for 4 days which gives me 4-8 days of work on hand which sees me by any server/connection problems.

Cheers.
ID: 1234413 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1234442 - Posted: 21 May 2012, 10:20:20 UTC - in response to Message 1234413.  
Last modified: 21 May 2012, 10:20:46 UTC

Why is there NO DATA to download from the SETI site???
NONE FOR A WHILE NOW......

Go to your Computing preferences and under "Network usage" you'll see "Maintain enough tasks to keep busy for at least" and ".. and up to an additional" to which I set both for 4 days which gives me 4-8 days of work on hand which sees me by any server/connection problems.

Cheers.

elbeejay is running Boinc 7.0.25, and since Boinc 7 has a new scheduler elbeejay should set his preferences to the following instead:

Maintain enough tasks to keep busy for at least 4 days
(max 10 days).

... and up to an additional 0.01 days

This will cache four days work, and will often ask for work to keep it topped up.

Claggy
ID: 1234442 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1234448 - Posted: 21 May 2012, 10:56:51 UTC - in response to Message 1234442.  

Why is there NO DATA to download from the SETI site???
NONE FOR A WHILE NOW......

Go to your Computing preferences and under "Network usage" you'll see "Maintain enough tasks to keep busy for at least" and ".. and up to an additional" to which I set both for 4 days which gives me 4-8 days of work on hand which sees me by any server/connection problems.

Cheers.

elbeejay is running Boinc 7.0.25, and since Boinc 7 has a new scheduler elbeejay should set his preferences to the following instead:

Maintain enough tasks to keep busy for at least 4 days
(max 10 days).

... and up to an additional 0.01 days

This will cache four days work, and will often ask for work to keep it topped up.

Claggy

I prefer not to be bugging the servers constantly so others can have their have their fair chance at them plus relieving the pressure on the connection as well.

Cheers.
ID: 1234448 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Panic Mode On (74) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.