The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 75 · 76 · 77 · 78 · 79 · 80 · 81 . . . 107 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2044934 - Posted: 15 Apr 2020, 11:39:38 UTC - in response to Message 2044931.  

My host 8670176 got the batch which started this conversation this morning. Four have been completed and validated, and are still visible after a longer wait that usual in recent times. One more is ready to report, and I've picked up another task.
ID: 2044934 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2044936 - Posted: 15 Apr 2020, 11:49:18 UTC - in response to Message 2044470.  
Last modified: 15 Apr 2020, 11:58:39 UTC

Eric says that they have been running with resends on about 1/3 of the time.
If things get tangled up, it gets turned back off and back on when things clear.
So, many ghost tasks should have already been resent.
Not sure how many are still out there.

Meow.
Unfortunately, in order to pick up a resend, the host in question has to contact the server and actively request work.

https://setiathome.berkeley.edu/show_host_detail.php?hostid=8873201 still hasn't phoned home since 8 April: unless that spoofer gives his machine a kick, we'll never know whether the tasks are bunkered or ghosts.


I am seeing that the system just contacted the server. :)
Now if he/she would take advantage of the "re-scheduler" software that allows you to "move" gpu tasks to the cpu... might get even more processing done :)

I just looked. I have 2,000+ pending. One of my wingman has been polling regularly but doesn't seem to be getting much processing done :(

Tom M
A proud member of the OFA (Old Farts Association).
ID: 2044936 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 2044937 - Posted: 15 Apr 2020, 11:51:26 UTC

A quick look at my outstanding Tasks shows that for most resends aren't due till late May or early June.
Grant
Darwin NT
ID: 2044937 · Report as offensive     Reply Quote
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 2044938 - Posted: 15 Apr 2020, 11:56:04 UTC - in response to Message 2044934.  

My host 8670176 got the batch which started this conversation this morning. Four have been completed and validated, and are still visible after a longer wait that usual in recent times. One more is ready to report, and I've picked up another task.


We're effectively back to the old replication and quorum levels that SETI had back in 2007 when I first started using BOINC

My very first post on NC https://setiathome.berkeley.edu/forum_thread.php?id=37367 is worth a read as it attracted responses from some distinguished forum members, including Joe Segur
I only recently learned that Joe died in 2015. RIP Joe
ID: 2044938 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2044940 - Posted: 15 Apr 2020, 12:03:04 UTC - in response to Message 2044939.  


And how the He** can he still get so many tasks? Yesterday for example a whole bunch.
https://setiathome.berkeley.edu/results.php?hostid=8873201



My understanding is that host is not "getting new tasks" but has a huge cache that the system is (still) munching through.

Tom M
A proud member of the OFA (Old Farts Association).
ID: 2044940 · Report as offensive     Reply Quote
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 2044950 - Posted: 15 Apr 2020, 12:45:57 UTC - in response to Message 2044931.  

https://setiathome.berkeley.edu/workunit.php?wuid=3947300995 , also one of mine shows where this policy should work. My Windows machine is relatively a tortoise compared to some of the "big boys", but it usually produces the decider on many Validation Inconclusive Wus
Yes, that's the way it should go - and it happens to be a short-deadline VHAR, as well.

But I don't hold out a lot of hope for your new extra wingmate on WU 3893947696.


995 should return in less than 2 hours, it's currently running on my single Intel GPU, wall clock time is counting down at about the correct rate :-)
ETA before 11:00 UTC


https://setiathome.berkeley.edu/workunit.php?wuid=3947300995 Completed and Validated
My _3 was the deciding tie-breaker

Results will probably be purged in under 2 hours


Now gone, but I did save a copy in offline Chrome
ID: 2044950 · Report as offensive     Reply Quote
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 2044972 - Posted: 15 Apr 2020, 14:29:40 UTC - in response to Message 2044938.  

My host 8670176 got the batch which started this conversation this morning. Four have been completed and validated, and are still visible after a longer wait that usual in recent times. One more is ready to report, and I've picked up another task.


We're effectively back to the old replication and quorum levels that SETI had back in 2007 when I first started using BOINC

My very first post on NC https://setiathome.berkeley.edu/forum_thread.php?id=37367 is worth a read as it attracted responses from some distinguished forum members, including Joe Segur

I only recently learned that Joe died in 2015. RIP Joe



I started https://setiathome.berkeley.edu/workunit.php?wuid=3898489118 before the 2nd result had been returned, but my result was " Credit value only" not needed for the science !

If you look at that old thread from 2007, SETI was handing out a lot more work than was needed when it was Replication 4, Quorum 3
ID: 2044972 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2044975 - Posted: 15 Apr 2020, 14:46:04 UTC - in response to Message 2044972.  
Last modified: 15 Apr 2020, 14:47:02 UTC

If you look at that old thread from 2007, SETI was handing out a lot more work than was needed when it was Replication 4, Quorum 3

Agree but after that the revolution of the GPU crunching started, Now a single GPU has the crunching power of an entire fleet at that time, and that changes everything. To crunch a WU on a CPU at that time takes about 6 hrs (IIRC) and now to crunch the same WU (with 2 x the precision BTW) on a GPU could take about a minute or less. And not forget the CPU work itself. When a host has 1 or 2 cores in 2007, now is common a CPU with 8 or even more cores.

On the other hand your post makes me remember of a lot fellow crunchers from that time, some not even with us anymore. I rely miss a lot of them.

BTW For a incredible coincidence, I Started to crunch SETI in 16 Mar 2007, at the time of your post i was not even complete a month of crunching.
Yes was a very long and nice ride. LOL
ID: 2044975 · Report as offensive     Reply Quote
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 2044978 - Posted: 15 Apr 2020, 14:53:07 UTC - in response to Message 2044934.  

My host 8670176 got the batch which started this conversation this morning. Four have been completed and validated, and are still visible after a longer wait that usual in recent times. One more is ready to report, and I've picked up another task.


Take a look at my TR, specifically the valid tasks, most if not all of them are awaiting another to return a result.

Those that have gone out to more than one machine and all have returned them expire within a very short time.

Unfortunately where they are sending to multiple machines they are not shortening the deadlines so that even when these tasks are validated they may continue to hang around for a long time.
Kevin


ID: 2044978 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2044980 - Posted: 15 Apr 2020, 15:01:59 UTC

Maybe a wise thing to do is abort and resend with a small dateline all the WU from hosts who last contact was before april 1.
If a host has not contact with the servers in the last 15 days is highly probable is a powered down host.
Just a suggestion.
ID: 2044980 · Report as offensive     Reply Quote
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 2044984 - Posted: 15 Apr 2020, 15:33:17 UTC - in response to Message 2044980.  

Maybe a wise thing to do is abort and resend with a small dateline all the WU from hosts who last contact was before april 1.
If a host has not contact with the servers in the last 15 days is highly probable is a powered down host.
Just a suggestion.


If the Project Staff were in the lab, and not working from home, then it might be worthwhile.

.CA is still on lockdown as far as I know https://covid19.ca.gov/ so they may want to wait until they are back.
ID: 2044984 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2044985 - Posted: 15 Apr 2020, 15:45:39 UTC - in response to Message 2044984.  
Last modified: 15 Apr 2020, 15:46:04 UTC

Maybe a wise thing to do is abort and resend with a small dateline all the WU from hosts who last contact was before april 1.
If a host has not contact with the servers in the last 15 days is highly probable is a powered down host.
Just a suggestion.


If the Project Staff were in the lab, and not working from home, then it might be worthwhile.

.CA is still on lockdown as far as I know https://covid19.ca.gov/ so they may want to wait until they are back.

Yes that i know, we are on lockdown too, but in this times you not need to be physically close to the servers to work with them.
And AFAIK the lab guys are already making and runs some scripts remotely.
ID: 2044985 · Report as offensive     Reply Quote
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 2044988 - Posted: 15 Apr 2020, 15:57:35 UTC - in response to Message 2044978.  
Last modified: 15 Apr 2020, 16:12:33 UTC

My host 8670176 got the batch which started this conversation this morning. Four have been completed and validated, and are still visible after a longer wait that usual in recent times. One more is ready to report, and I've picked up another task.


Take a look at my TR, specifically the valid tasks, most if not all of them are awaiting another to return a result.

Those that have gone out to more than one machine and all have returned them expire within a very short time.

Unfortunately where they are sending to multiple machines they are not shortening the deadlines so that even when these tasks are validated they may continue to hang around for a long time.


I think it is getting a Canonical Result into the Science Database which is the higher priority.
Emptying the BOINC Database may not finish for a few weeks (or months) after the last Valid Results have been returned.
There are still over 2 million results and tasks out "in the field"; maybe when that drops below a few hundred thousand, the plan might change.

At the current rate of return on the SSP, most of the results should be back in about 17 days.
ID: 2044988 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22222
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2044990 - Posted: 15 Apr 2020, 16:00:45 UTC

While one doesn't need to be physically close to the servers to do work on them one does need to have a reliable, solid connection, and probably be the correct side of the Uni's firewall and have the correct set of tools at hand to do the job. It may also depend on the terms of furlough they are on, for example I could well do 70% of my job from home, and indeed have done so since the middle of last year, but I am furloughed and the terms of my furlough state I shouldn't even read my work emails let alone any of my real job.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2044990 · Report as offensive     Reply Quote
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 2044993 - Posted: 15 Apr 2020, 16:11:06 UTC - in response to Message 2044985.  

Maybe a wise thing to do is abort and resend with a small dateline all the WU from hosts who last contact was before april 1.
If a host has not contact with the servers in the last 15 days is highly probable is a powered down host.
Just a suggestion.


If the Project Staff were in the lab, and not working from home, then it might be worthwhile.

.CA is still on lockdown as far as I know https://covid19.ca.gov/ so they may want to wait until they are back.

Yes that i know, we are on lockdown too, but in this times you not need to be physically close to the servers to work with them.
And AFAIK the lab guys are already making and runs some scripts remotely.


They might decide to cancel Tasks in the next week or so if they get a Canonical Result, which might upset a few people who have cached thousands of WUs :) but it might upset some very slow crunchers too.
Are people like the "1337" guy, and a few others; here to get the science completed as quickly as possible, or are they just here for the BOINC credits ?
ID: 2044993 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2044996 - Posted: 15 Apr 2020, 16:27:48 UTC - in response to Message 2044990.  
Last modified: 15 Apr 2020, 16:52:16 UTC

While one doesn't need to be physically close to the servers to do work on them one does need to have a reliable, solid connection, and probably be the correct side of the Uni's firewall and have the correct set of tools at hand to do the job. It may also depend on the terms of furlough they are on, for example I could well do 70% of my job from home, and indeed have done so since the middle of last year, but I am furloughed and the terms of my furlough state I shouldn't even read my work emails let alone any of my real job.

Hard to say about UCB, but California's "so-called" lockdown really isn't. People are still working, just in smaller numbers to attain the social distancing requests. Do you guys not know that California is the most liberal state in the US? Most of the shelter in place orders are "recommendations." Berkeley is down the road a ways from the hot spot of California...Santa Clara County. I would lay bets at least one of these people are going into the shop daily, my guess would be Eric. And working remote, really isn't a big issue. I ran 7 data centers around the world, and managed an average of 17Gbps to those data centers in India, Ireland, and the US from a remote spot in Spanish Fork, Utah. I live in what is described as the 394th best connected city in California (pop. 22K), and my connection is 1Gbps down, 50Mbps up. I don't think them, sitting in the burbs of the 8th largest city of California (Oakland pop. 429K) (Berkeley itself is the 51st largest city at 121K all in the Bay Area), less than 50 miles from Silicon Valley (San Jose 1M+), really have any difficulty working from home.
ID: 2044996 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2044997 - Posted: 15 Apr 2020, 16:32:04 UTC - in response to Message 2044993.  
Last modified: 15 Apr 2020, 16:39:05 UTC

I'd say the 1337 guy and others are still doing the work faster than the majority of casual users who download hundreds of tasks then turn their computer off for weeks at a time. or who have a system so slow, that they can only do a handful of tasks every day, but still download hundreds of tasks anyway.

projects who employ short deadlines don't have to deal with the bloat on their servers that comes from users like that. task distribution absolutely should scale to the relative speed of the system. fast systems should be alloted more work than others since they can get through them faster. my fastest system was capable of over 20,000 WUs per day. there's no reason I shouldn't be able to cache more than someone running a raspberry pi or other similarly slow system.

I'm really enjoying GPUGRID for these reasons. 5 day deadlines FTW
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2044997 · Report as offensive     Reply Quote
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2044998 - Posted: 15 Apr 2020, 16:32:42 UTC - in response to Message 2044993.  

Maybe a wise thing to do is abort and resend with a small dateline all the WU from hosts who last contact was before april 1.
If a host has not contact with the servers in the last 15 days is highly probable is a powered down host.
Just a suggestion.


If the Project Staff were in the lab, and not working from home, then it might be worthwhile.

.CA is still on lockdown as far as I know https://covid19.ca.gov/ so they may want to wait until they are back.

Yes that i know, we are on lockdown too, but in this times you not need to be physically close to the servers to work with them.
And AFAIK the lab guys are already making and runs some scripts remotely.


They might decide to cancel Tasks in the next week or so if they get a Canonical Result, which might upset a few people who have cached thousands of WUs :) but it might upset some very slow crunchers too.
Are people like the "1337" guy, and a few others; here to get the science completed as quickly as possible, or are they just here for the BOINC credits ?

Hi Keith,

The answer should be obvious, CREDITS! lol ;) Every time I see that SETI has tasks to go out, I hit Update and try to get a few of them, but alas, the scheduler backs me off for 30 minutes again and again. :(

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2044998 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2044999 - Posted: 15 Apr 2020, 16:39:42 UTC - in response to Message 2044998.  

The answer should be obvious, CREDITS! lol ;)


SETI gives low credits compared to other projects. if someone was only after BOINC credits they'd crunch something useless like Collatz...
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2044999 · Report as offensive     Reply Quote
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 2045003 - Posted: 15 Apr 2020, 16:47:49 UTC - in response to Message 2044999.  

The answer should be obvious, CREDITS! lol ;)


SETI gives low credits compared to other projects. if someone was only after BOINC credits they'd crunch something useless like Collatz...


I've never attached to Collatz, but I did visit their website, and the admin has recommended that people switch to other projects for the moment ! https://boinc.thesonntags.com/collatz/forum_thread.php?id=167
ID: 2045003 · Report as offensive     Reply Quote
Previous · 1 . . . 75 · 76 · 77 · 78 · 79 · 80 · 81 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.