Posts by Freewill

41) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2033374)
Posted 21 Feb 2020 by Profile Freewill Project Donor
Post:
Should they be denied to participate in something they find interesting, just because the 24/7 club don't like it when they can't get thousands of tasks every day?

No, everyone should be able to participate as much as they wish to. I just wish the servers and database could accommodate all the interest. Perhaps, setting amount of tasks based on average turn around time covers machine speed and on time. For example, if one runs 1 hr/day CPU only, they should need fewer tasks to reach avg turnaround of say 10 days than someone with 8 x 2080 Tis.

If I run out of tasks, at least my 24/7 club dues will go down for the month. :)
42) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2033333)
Posted 21 Feb 2020 by Profile Freewill Project Donor
Post:
While it MAY have more work than can be processed (a claim for which there is NO evidence) then, if there is a problem delivering that work to the users then it make no sense to attempt to grab all one can, and so turn the average user away because they can't get work due to the greed of a very vocal minority.

I seem to recall from another thread that SAH is only taking a few percent of the Breakthrough Listen data from Green Bank. That's my evidence. Plus, since I've been here we have never run out of tapes that I recall. Regardless, the servers cannot dish out the stack of tapes they have loaded since everyone's caches are dropping and I see plenty of tapes mounted and unprocessed.
43) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2033316)
Posted 21 Feb 2020 by Profile Freewill Project Donor
Post:
Hi Siran,
The project has far more work than can be processed. Even with current computing power, the database and servers cannot handle the load during this "steady state" period between outages. The spoofing just helps fast PCs keep processing during an outage. At times like today, I don't think it's a problem, other than results out in the field. That number is only about 1/3 of results returned and awaiting validation. I get no more priority to tasks downloads than you do on a Pi system. Every 5 min, we each get a shot.

You may recall when they increased the tasks per CPU and GPU from 100 to 300(?), the system jammed up. With moderate GPUs even that is not a lot. They need to find and address the root cause. Various solutions have been suggested and I'm sure they've considered all of them. I'm ready to contribute some $ if they'll just tell us what they need.

Roger
44) Message boards : Number crunching : SETI/BOINC Milestones [ v2.0 ] - XXIX (Message 2031313)
Posted 8 Feb 2020 by Profile Freewill Project Donor
Post:
Nice work, Wiggo!
45) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2031259)
Posted 7 Feb 2020 by Profile Freewill Project Donor
Post:
+1
May not be easy to implement, but makes sense! I agree with Ville Saari. Errors can happen for many reasons, including me making a bad edit in an xml file :) but Invalids need to be driven to zero.
46) Message boards : Number crunching : Don't know where it should go? Stick it here! (Message 2031195)
Posted 7 Feb 2020 by Profile Freewill Project Donor
Post:
Cats are good people.

Yes, they are. We are currently servants to a couple of 6 year olds.
47) Message boards : Number crunching : SETI/BOINC Milestones [ v2.0 ] - XXIX (Message 2030782)
Posted 4 Feb 2020 by Profile Freewill Project Donor
Post:
Congratulations, Mr. Kevvy! Huge milestone. At least we know ET wasn't there at that time. Onward!
48) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2030424)
Posted 2 Feb 2020 by Profile Freewill Project Donor
Post:
Running empty again. Shooting down the host to save some electric power.


. . Following Grumpy's example then ? :)

Stephen

It seems a bit extreme unless you wanted to build a new system anyway. ;)
49) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2030350)
Posted 1 Feb 2020 by Profile Freewill Project Donor
Post:
What if the aliens are gumming up the system because we're close to finding them? Hmmm.
50) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2030345)
Posted 1 Feb 2020 by Profile Freewill Project Donor
Post:
Just started getting "Scheduler request failed: Timeout was reached" notices.
51) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2030284)
Posted 1 Feb 2020 by Profile Freewill Project Donor
Post:
Thanks, Richard!
52) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2030278)
Posted 1 Feb 2020 by Profile Freewill Project Donor
Post:
Just got a bunch of WU's now, but all are resends _2 or higher.
But downloading them, now that is another thing :-)
I managed to score 50 resends on one of my systems. When they finally downloaded, all done in under 4 minutes. Only 3 of them weren't noise bombs.

How does one tell if the jobs are resends?
53) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2030099)
Posted 31 Jan 2020 by Profile Freewill Project Donor
Post:
Quorum of 1 Bad Example. Here's a failure of the Approach. The PC with the validated result has 461 invalids, but returned ahead of my PC by 7 seconds! It has no stderr output, so what did it return?

https://setiathome.berkeley.edu/workunit.php?wuid=3860444977

Curious Example. Saw this from my overnight. Both my PC and the Apple had same number of Autocorr and Pulses found, similar peaks and time (rounding error diffs?). The Apple got credit cause it was first returned. Not clear which or both are correct. Track record similar in numbers; mine is better in percentage valid tasks.

https://setiathome.berkeley.edu/workunit.php?wuid=3859724613

I don't understand how the first PC above was allowed to have a quorum of 1. In general, this seems to be introducing questionable results into the science database. More importantly, it may potentially reject an ET result.
54) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2029959)
Posted 30 Jan 2020 by Profile Freewill Project Donor
Post:
To add a few more data points, none of my hosts are currently showing invalid tasks.
55) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2029357)
Posted 26 Jan 2020 by Profile Freewill Project Donor
Post:
... And just maybe release their wish list for better hardware that can handle the loads Seti will be dealing with in the future ...
Just savoring the irony of those messages last fall that the workload was increasing and they needed more folks doing more work to keep up with supply.
"Build it and they will come ..." or is that "Be careful what you asked for. You just might get it."?

Clearly, the volunteer processing power is nowhere close to the bottleneck at present. Half my crunching capacity is unfed at the moment.
56) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2029220)
Posted 25 Jan 2020 by Profile Freewill Project Donor
Post:
For Me the problems with the Failing Uploads seems to be getting Worse. This morning I found all machines, except the fastest one, working fine. I found the top Mining machine was clogged with failed Uploads, dozens of them. The only machine without any Uploads waiting on retries was the slowest one. Trying to clear the Uploads on the one machine also Failed, countless times. I tried everything, then tried using my USB/Ethernet adapter which finally allowed the Uploads to clear. But, even with the USB adapter I now have an average of 6 retries waiting on that machine. It seems if you get very many they just Fail altogether and then rapidly start piling up until the Downloads stop. At that point it becomes difficult to get the Uploads to clear.
It's Not getting any better...

I have seen a few uploads go into retry for a few minutes on each machine. They clear when I hit retry or clear themselves if I'm not logged on. My hosts have slowly been refilling their caches. Here's the event log info for a recent one:
Sat 25 Jan 2020 03:19:00 PM EST |  | Project communication failed: attempting access to reference site
Sat 25 Jan 2020 03:19:00 PM EST | SETI@home | Temporarily failed upload of blc35_2bit_guppi_58691_86094_HIP80163_0111.7431.409.22.45.44.vlar_2_r1267369358_0: transient HTTP error
Sat 25 Jan 2020 03:19:00 PM EST | SETI@home | Backing off 00:03:34 on upload of blc35_2bit_guppi_58691_86094_HIP80163_0111.7431.409.22.45.44.vlar_2_r1267369358_0
Sat 25 Jan 2020 03:19:01 PM EST |  | Internet access OK - project servers may be temporarily down.

I hadn't really seen this until today.
57) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2029013)
Posted 24 Jan 2020 by Profile Freewill Project Donor
Post:
This will be a cold weekend if they don't kick the servers into action.
58) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2028908)
Posted 24 Jan 2020 by Profile Freewill Project Donor
Post:
I'm pretty sure my (idle) 2070 Supers are faster than the i7-5820K. ;)
59) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2028887)
Posted 24 Jan 2020 by Profile Freewill Project Donor
Post:
Getting "No tasks available" with 770k ready to send on the server. I gave up trying to understand the logic behind this...

100 other hosts beat you to the RTS buffer first.


. . :)

. . There are clearly still issues to be worked out ... :(

. . Unless those 100 hosts got 8,000 WUs each :)

Stephen

? ?

The actual buffer that you download from only holds about 250 tasks at any time. Constantly replenishes from the RTS buffer that shows on the SSP. If 5 hosts hit it at the same time and get 50 tasks, it empties very fast before refilling. If your request comes in after it was emptied, you get the "no tasks are available" message.

I don't understand why they don't make the 250 task buffer larger. Seems to be a bottleneck.
Can't get enough jobs to keep my top machine's GPUs busy. Why does it keep giving that one CPU tasks?
60) Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118) (Message 2028362)
Posted 18 Jan 2020 by Profile Freewill Project Donor
Post:
I'm now getting messages that the project is down for maintenance. Looks like someone came in on their weekend to kick the servers.

*edit* Just saw Mr. Kevvy's message right after I posted. Great to hear they're on the job. And gave us a brief update. :)


Previous 20 · Next 20


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.