Panic Mode On (99) Server Problems?

Message boards : Number crunching : Panic Mode On (99) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 26 · Next

AuthorMessage
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1710534 - Posted: 9 Aug 2015, 19:08:23 UTC - in response to Message 1710485.  

I'd expect they have some script running that warns them when magic number N about the amount of results/tasks waiting for 'X' is passed and that it then plays a trumpet at them. Perhaps that the sound's been turned off.


hrhr the warning script is us since we the first to bitch and moan if something is wrong
I came down with a bad case of i don't give a crap
ID: 1710534 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30640
Credit: 53,134,872
RAC: 32
United States
Message 1710554 - Posted: 9 Aug 2015, 20:33:45 UTC - in response to Message 1710525.  

Roll on the release of the OpenCL application for VLARs on GPUs.

Thinking aloud - would it be possible to make that application a special application only for use against VLARs? This might overcome the performance hit that was being suffered in Beta when using the application for normals and shorties.

There is a CPU App sitting at Beta right now that would basically Double the number of VLARs processed per hour in OSX. Right now the FLOPS reading is;
Mac OS X/Intel 7.00 29 May 2013, 21:14:00 UTC 30,342 GigaFLOPS
That number would Double as soon as the CPU App is released, that's a Lot of VLARs. Yet, the App just sits at Beta even though not a Single Machine has had a problem with it in the Months it's been there.

Strange isn't it...

And everyone is free to join Beta, have it downloaded, and now change main to anonymous platform and crunch with it.

Charlie, the Mac programmer, was being funded by BOINC. As BOINC is broke ......
ID: 1710554 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1710567 - Posted: 9 Aug 2015, 20:57:37 UTC - in response to Message 1710554.  

Roll on the release of the OpenCL application for VLARs on GPUs.

Thinking aloud - would it be possible to make that application a special application only for use against VLARs? This might overcome the performance hit that was being suffered in Beta when using the application for normals and shorties.

There is a CPU App sitting at Beta right now that would basically Double the number of VLARs processed per hour in OSX. Right now the FLOPS reading is;
Mac OS X/Intel 7.00 29 May 2013, 21:14:00 UTC 30,342 GigaFLOPS
That number would Double as soon as the CPU App is released, that's a Lot of VLARs. Yet, the App just sits at Beta even though not a Single Machine has had a problem with it in the Months it's been there.

Strange isn't it...

And everyone is free to join Beta, have it downloaded, and now change main to anonymous platform and crunch with it.

Charlie, the Mac programmer, was being funded by BOINC. As BOINC is broke ......

Yes, Everyone can install apps under anonymous platform, yet percentage wise very few do. I was responding to someone inquiring about an App to work VLARs faster, seems they can play havoc with the system when they are abundant. Charlie is not needed in this instance. The App has existed for Months, has Passed all tests, and is ready to be moved from Beta to Main. I believe it is a simple matter for someone, say Eric, to place the Apps on Main and create a couple new Plan Classes. No One from BOINC is needed.
If VLARs are causing problems, there is help available Now.
ID: 1710567 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1710575 - Posted: 9 Aug 2015, 21:13:08 UTC - in response to Message 1710567.  

There apps both for CPU and GPU on beta for VLARs. The pan class already exist for the GPU there.

I can't speak for the CPU like TBar does, as he is much more familiar with those.

The GPU was said to need more testers, not Nvidia GPUs as they had plenty of those.
ID: 1710575 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1710729 - Posted: 10 Aug 2015, 7:12:00 UTC - in response to Message 1710431.  
Last modified: 10 Aug 2015, 7:17:35 UTC

I'd reckon the file deletion program is returning an error for result files associated with a ResultID >= 2^32, accounting for the line

Result files waiting for deletion 7,847,326

on the server status page. And if the result file can't be deleted, the result status in the database can't transition to "ready to purge". So both the file storage area on the server disks is filling up (slowly, these are small files), and the table rowcount in the database is growing inexorably. Better put that on their ToDo list for Tuesday.

Maybe that will also account for the Database queries being 3 times higher than they usually are (3,000/s v 1,000/s)?


EDIT- and do we know if there is any particular reason for the lack of a 2nd download server? Been getting quite a few stalled downloads lately. Eventually they do download, but the caches tend to run down significantly by the time that occurs.
Grant
Darwin NT
ID: 1710729 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1710732 - Posted: 10 Aug 2015, 7:21:51 UTC

EDIT- and do we know if there is any particular reason for the lack of a 2nd download server? Been getting quite a few stalled downloads lately. Eventually they do download, but the caches tend to run down significantly by the time that occurs.


I might be wrong but I thought that it had been used as the new BOINC server, after that crashed last weekend.
ID: 1710732 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1710748 - Posted: 10 Aug 2015, 8:32:40 UTC - in response to Message 1710729.  

I'd reckon the file deletion program is returning an error for result files associated with a ResultID >= 2^32, accounting for the line

Result files waiting for deletion 7,847,326

on the server status page. And if the result file can't be deleted, the result status in the database can't transition to "ready to purge". So both the file storage area on the server disks is filling up (slowly, these are small files), and the table rowcount in the database is growing inexorably. Better put that on their ToDo list for Tuesday.

Maybe that will also account for the Database queries being 3 times higher than they usually are (3,000/s v 1,000/s)?

I suggested that to Eric last night as a possible line of enquiry, especially since the file deleter program has a 'retry after 1 hour' default error handler. He says "Could be. The deleter is certainly getting a lot of "no file to delete" errors."

But no progress apart from that. I guess we may have to live with this until maintenance day.
ID: 1710748 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1710752 - Posted: 10 Aug 2015, 8:48:36 UTC - in response to Message 1710492.  

I just sent in a report that the security of the web sites (BOINC and Seti) is a bit lacking, in that Poodle can attack, the certificate is going to be blocked next year and such fun things. I think that with stuff like that they turn me up. ;-)

Poodle can no longer attack.
As for the certificates, that's going to be looked into soon, Matt promised me.

At present, the certificates used by the BOINC and Seti web sites use an on-campus generated certificate which is SHA-1 only. SHA-1 is going to be phased out, by some already next year, by others after 2016. Anything then still running SHA-1 cannot be contacted with present browsers.
ID: 1710752 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1710892 - Posted: 10 Aug 2015, 13:59:14 UTC - in response to Message 1710732.  


I might be wrong but I thought that it had been used as the new BOINC server, after that crashed last weekend.



So, just wondering. Since Matt had to move 1 of the back up servers to BOINC to help them out. Is that in any way affecting service on Seti?

I know Matt also commented on not having a budget to get new components, so does that mean he is in need of a new server to replace the back up one that he moved?

Just thinking out loud......
ID: 1710892 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1710910 - Posted: 10 Aug 2015, 14:11:48 UTC - in response to Message 1710892.  


I might be wrong but I thought that it had been used as the new BOINC server, after that crashed last weekend.



So, just wondering. Since Matt had to move 1 of the back up servers to BOINC to help them out. Is that in any way affecting service on Seti?

I know Matt also commented on not having a budget to get new components, so does that mean he is in need of a new server to replace the back up one that he moved?

Just thinking out loud......


What kinda hardware specs would we be looking at i'm pretty sure if we need some sort of machine we could all throw together and get something used off ebay

just an idea though
I came down with a bad case of i don't give a crap
ID: 1710910 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1710925 - Posted: 10 Aug 2015, 14:20:06 UTC

Perhaps a fundraiser for a small VM machine they would move some light weight services to. Then when a box dies it doesn't require a lot of work to get things going. Just restart the VM.

A larger VM setup could also be helpful for those times when they get an OS kernel panic & a machine locks up. With a VM server monitoring things it would reboot a locked up machine.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1710925 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1710930 - Posted: 10 Aug 2015, 14:24:37 UTC - in response to Message 1710925.  
Last modified: 10 Aug 2015, 14:25:45 UTC

Perhaps a fundraiser for a small VM machine they would move some light weight services to. Then when a box dies it doesn't require a lot of work to get things going. Just restart the VM.

A larger VM setup could also be helpful for those times when they get an OS kernel panic & a machine locks up. With a VM server monitoring things it would reboot a locked up machine.


Some older machines can be bought on ebay for little coin i'm sure we could come up with 1k or 1500 $ to buy one of these

1000-1500$ would buy a pretty decent server ...it's used but it is something

The dell r910 i just bought was 200 something bucks 24 cores 32 gigs ram which i then upgraded to hex core cpu's and a lil more ram
I came down with a bad case of i don't give a crap
ID: 1710930 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1710962 - Posted: 10 Aug 2015, 15:03:27 UTC

Just a question with no dis-respect intended, when is Yuri's funding going to start or is that strictly for purchasing observing time?

Seems to me a few thousand dollars of that enormous endowment would be well spent upgrading equipment and software for more efficient analysis and distribution of the data.

Again, just a question.

"Sour Grapes make a bitter Whine." <(0)>
ID: 1710962 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6652
Credit: 121,090,076
RAC: 0
United States
Message 1710965 - Posted: 10 Aug 2015, 15:07:27 UTC

I may have missed this, as I only read up 20 or so posts, but I have 348 tasks in progress, and normally I would only have 300. 100 for the CPU, and 100 each for my GPU's. Did something change?

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1710965 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1710971 - Posted: 10 Aug 2015, 15:23:59 UTC - in response to Message 1710965.  

I may have missed this, as I only read up 20 or so posts, but I have 348 tasks in progress, and normally I would only have 300. 100 for the CPU, and 100 each for my GPU's. Did something change?

Yes, you added a computer... 7715048 has 300 tasks, 5483835 has 48.
ID: 1710971 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1710973 - Posted: 10 Aug 2015, 15:30:33 UTC - in response to Message 1710962.  

Just a question with no dis-respect intended, when is Yuri's funding going to start or is that strictly for purchasing observing time?

Seems to me a few thousand dollars of that enormous endowment would be well spent upgrading equipment and software for more efficient analysis and distribution of the data.

Again, just a question.

I would imagine that it would take several months to plan what money needs to go where & then it maybe slated for a 2016 or 2017 budget for SETI@home.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1710973 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6652
Credit: 121,090,076
RAC: 0
United States
Message 1710975 - Posted: 10 Aug 2015, 15:34:40 UTC - in response to Message 1710971.  
Last modified: 10 Aug 2015, 15:36:31 UTC

I may have missed this, as I only read up 20 or so posts, but I have 348 tasks in progress, and normally I would only have 300. 100 for the CPU, and 100 each for my GPU's. Did something change?

Yes, you added a computer... 7715048 has 300 tasks, 5483835 has 48.

Rats. I kept the same folders intact when I upgraded to Win 10, but did have to jump from BOINC 6.10.58 to the current version, and I guess I overwrote my machine ID. At least I know what happened.

Thank you!

Steve

Edit: I was able to merge them.
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1710975 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1710983 - Posted: 10 Aug 2015, 16:13:06 UTC - in response to Message 1710479.  

I reckon the dB can hold out until Tuesday - we've been higher than this, with the WU table exploding too, and lived to tell the tale ...

Do you remember what the previous magic number was? We're now rapidly approaching 10 million results ready to delete.
ID: 1710983 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1710985 - Posted: 10 Aug 2015, 16:16:33 UTC - in response to Message 1710973.  

Just a question with no dis-respect intended, when is Yuri's funding going to start or is that strictly for purchasing observing time?

Seems to me a few thousand dollars of that enormous endowment would be well spent upgrading equipment and software for more efficient analysis and distribution of the data.

Again, just a question.

I would imagine that it would take several months to plan what money needs to go where & then it maybe slated for a 2016 or 2017 budget for SETI@home.

I think a higher priority for the endowment funds might be to upgrade the staff - by hiring more of them, and maybe having someone whose fulltime first priority is keeping an eye on the smooth running of the project, and curing the little quirks shown up by internal monitoring or raised by us pesky minions.
ID: 1710985 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1710996 - Posted: 10 Aug 2015, 16:28:28 UTC - in response to Message 1710983.  

I reckon the dB can hold out until Tuesday - we've been higher than this, with the WU table exploding too, and lived to tell the tale ...

Do you remember what the previous magic number was? We're now rapidly approaching 10 million results ready to delete.

Quoting from 4 November 2012:

"Results out in the field" is currently 10,725,146 - an increase of 200,000 since this morning, and an an increase of more than a million in the last 24 hours.

We were in Panic Mode On (78) at the time.

And the reason was that the splitters were running amok, while many users were being allocated ghosts. It took a long while (about a week, IIRC) to get all those ghosts as 'resent lost results', but cleaning up this spillage should be a lot quicker - the files can safely be deleted, and the results purged, with no further processing.

The difficult bit will be deciding which results are ready for treatment, since they're normally accessed via the workunit, and they've gone already.
ID: 1710996 · Report as offensive
Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 26 · Next

Message boards : Number crunching : Panic Mode On (99) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.