Taking over SETI rig management for my dad, settings and configurations?

Message boards : Number crunching : Taking over SETI rig management for my dad, settings and configurations?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1919362 - Posted: 17 Feb 2018, 3:01:00 UTC

yes, I would like to transfer the E5440 system to be able to use the SSE4.1 app, and the 750ti system to use the special sauce. I just need to do some more research into how to make these changes on linux. I looked at the downloads and the linux versions dont seem to have an "installer" like windows does. just a compressed folder with some files inside, but i dont know what to do with them exactly of where to put them.


The E5540 Linux system won't be able to use the special app because the 760 doesn't have compute capability 5.0. You still can transition it over from the SSE3 cpu app to the SSE4.1 app.

You will have to write your own app_info.xml file to use the SSE4.1 application. You can just copy the cpu parts from the special app app_info and edit that into a new app_info for the 760 machine. You will have to research how app_info's are constructed to get the SoG application in shape though since I don't think there is a stock app_info for the default Linux apps. Maybe, I don't know.

Anonymous platform
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1919362 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1919363 - Posted: 17 Feb 2018, 3:17:02 UTC - in response to Message 1919360.  

The MOST important thing you need to do is to make sure the apps have the Permission and executable bits set in the properties of the files.

And empty your cache before making the change over- just set the machine to No new tasks.


how do you do this exactly? i see no setting in boinc called "no new tasks". the closest i can see is disabling the network connection.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1919363 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1919369 - Posted: 17 Feb 2018, 3:26:43 UTC - in response to Message 1919363.  

If you are using the Advanced View in the Boinc Manager. No New Tasks or what we abbreviate to NNT is under the Projects tab. Select the SETI project and click NNT. That will cause the Manager to not request any more work. Then you can process what you have onboard in your cache, report all of them and then shut down the Manager and Client to start your editing and transition over to the new applications.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1919369 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1919371 - Posted: 17 Feb 2018, 3:33:14 UTC

ah! i dont know how i missed that. i was looking all over
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1919371 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1919470 - Posted: 17 Feb 2018, 17:08:06 UTC

so should the -sbs be tied to GPU memory available?

all of my 760/750ti cards have 2GB VRAM, and the 1050ti cards have 4GB VRAM.

will going higher increase performance? i see grant recommending 1024 (1GB), but the readme suggests much smaller values in the range of 190-250MB or so.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1919470 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1919476 - Posted: 17 Feb 2018, 17:30:54 UTC - in response to Message 1919470.  

That document was written in the early days of the app and the developer was erring on the side of caution with regard how much any of the suggested tunings would impact the responsiveness of a target system. In other words, he was trying to avoid keyboard lag which would put off anyone attempting to use his new app. He later came up with a better method -use_sleep for low end cards.

I think that -sbs 1024 would be fine with all your cards as Grant suggests.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1919476 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1919500 - Posted: 17 Feb 2018, 20:24:21 UTC

is there a shortage of work today?

Summit2 (Win7, E5-2680v2) is only crunching 4 CPU WUs and the normal 2x GPU WUs with a whole list of GPU tasks ready to go.
Summit3 (Win7, E5-2697v2) isnt doing any WUs and has nothing in the task list.

clicking update on the project doesnt seem to help. these systems should normally be doing about 38-46 CPU WUs each (set to 100% CPU usage and only allocating 2 cores for each GPU WU)
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1919500 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1919502 - Posted: 17 Feb 2018, 20:36:34 UTC - in response to Message 1919500.  

The project ran out of work about 4AM this morning. There was a short reprieve when they loaded 2 more tapes later in the morning, but those are about finished. With everyone out of work, there are no tasks to be downloaded and the RTS buffer was never able to build.

Unless they load more tapes this weekend the project will remain out of work. You should attach to a backup project with resource share of zero to fall back on when SETI runs out of work.

Or just turn off the computers and give your power bill a break.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1919502 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1919504 - Posted: 17 Feb 2018, 20:41:55 UTC

Got it, I was just making sure that it wasnt just me and that there wasnt something wrong with the systems.

I'll see if my dad wants to add a backup project. I see he at least at one time was crunching Enigma, but personally I think Einstein is more interesting.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1919504 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1919547 - Posted: 17 Feb 2018, 22:22:48 UTC - in response to Message 1919504.  

Got it, I was just making sure that it wasnt just me and that there wasnt something wrong with the systems.

I'll see if my dad wants to add a backup project. I see he at least at one time was crunching Enigma, but personally I think Einstein is more interesting.

If you select a backup project, set the resource share for it to 0.
That way BOINC will only run it if there is no Seti work available. Once work becomes available again it will finish off whatever other project work it is running & not request any more till the next time Seti runs out of work.
Me, I just use it as a chance to save some money on the power bill.
Grant
Darwin NT
ID: 1919547 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1919549 - Posted: 17 Feb 2018, 22:38:53 UTC - in response to Message 1919547.  

Got it, I was just making sure that it wasnt just me and that there wasnt something wrong with the systems.

I'll see if my dad wants to add a backup project. I see he at least at one time was crunching Enigma, but personally I think Einstein is more interesting.

If you select a backup project, set the resource share for it to 0.
That way BOINC will only run it if there is no Seti work available. Once work becomes available again it will finish off whatever other project work it is running & not request any more till the next time Seti runs out of work.
Me, I just use it as a chance to save some money on the power bill.

I go a step further with this by using venue location preferences here for my rigs so that they can grab their max caches' worth from this project, but my backup projects just use the default preferences which is set to only grab a half day's work. ;-)

Cheers.
ID: 1919549 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1924526 - Posted: 14 Mar 2018, 17:10:21 UTC

updating progress, nearing 160k RAC, and almost 80mil total!

can't thank you guys enough for the help!
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1924526 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1924530 - Posted: 14 Mar 2018, 17:17:40 UTC - in response to Message 1924526.  

Congrats Steve
ID: 1924530 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1935586 - Posted: 14 May 2018, 2:06:47 UTC
Last modified: 14 May 2018, 2:09:28 UTC

another update:

Sierra-Prize2 system was retired and re-purposed as a FreeNAS box. It had been running for somewhere around 10-12 years or longer and had accumulated something around 30 million credits. The core components now living the easy life as a NAS. I pulled the motherboard/CPUs and tossed them into a 2U 8-bay server case and maxed the ram out to 32GB and loaded up freeNAS. Supermicro X7DA8+, 2x E5440 Xeon quad cores. I wanted to run a virtual machine on this to run SETI in a windows VM on like half of the CPU for some small number, but unfortunately these CPUs dont support the level of virtualization needed by bhyve, lacking extended page support.

I also finally loaded up the "Special Sauce" Cuda90 app for the Linux Machines. just default settings for now, i haven't tweaked anything beyond what it comes with.

SIERRA-SUMMIT:
Ubuntu x64
Supermicro X7DA8+
2x Xeon E5440 2.83GHz 4-core
16GB DDR2 FBDIMM ECC
2x EVGA GTX 1050ti SSC

SIERRA-SPARE:
Ubuntu x64
Supermicro X9DRi-LN4F+ v1.10
2x Xeon E5-2690 2.9 GHz 8-core w/HT
16GB DDR3-10600 reg ECC
2x EVGA GTX 750Ti FTW

We'll see how these turn out.

But one thing I noticed is that the WUs from SIERRA-SUMMIT are being processed slower than those from SIERRA-SPARE, even though the 1050tis should be much more capable than the 750tis. Does anyone know why?

My only guess is PCIe bandwidth. but i thought SETI didn't care much about bandwidth?

The X7DA8+ board has
1x PCIe x16 (gen1)
1x PCIe x4 (gen1, in x16 slot size)
link to specs: http://www.supermicro.com/products/motherboard/xeon1333/5000X/X7DA8_.cfm
WUs from each card run more or less the same speed. if PCIe bandwidth was my problem i'd expect the second card in the x4 slot to run much slower. but it's not.

The X9DRi-LN4F+ board has
4x PCIe x16 (gen 3)
link to specs: https://www.supermicro.com/products/motherboard/Xeon/C600/X9DRi-LN4F_.cfm

another thing that i noticed is that the SIERRA-SUMMIT system's RAC has been droping steadily over the past month or so. it used to pull about 13k (while running a single GTX 760), and now it's back down to about 8k or so. I can't see anything obviously wrong with the system, seti is still crunching along, no errors, the WUs arent erroring out.

is there any way to check why a system would start becoming less productive?
is there a better way to organize the BOINC credit data better than whats shown on the BOINCstats page?

i'd love to be able to isolate my credits earned per day via CPU tasks and GPU tasks. and over greater time scales than 30 days. anyway i can do that? it might help me troubleshoot.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1935586 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1935609 - Posted: 14 May 2018, 8:29:22 UTC - in response to Message 1935586.  
Last modified: 14 May 2018, 8:33:35 UTC

is there any way to check why a system would start becoming less productive?

Check the runtimes for the GPU & CPU work (although you can't really compare the run times for the stock applications and the optimized ones. You need to compare the same application run times for the same type of WU).
A quick look shows incredibly long run times for CPU work on the E5440. My Core 2 Duo when overloaded trying to feed 2 GPUs was able to do them in less than 6 hours. It's 14+ hours on that system.
I'd check it's actual clock speed and temperature.


EDIT- have you changed any memory modules recently? We've had multi socket systems in the past that have had the memory in the wrong slots (for the number of modules & number of CPUs) resulting in a massive system memory I/O bottleneck. Just putting the memory in the channels specified in the manual made for a huge improvement in crunching times.
Grant
Darwin NT
ID: 1935609 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1935632 - Posted: 14 May 2018, 12:22:50 UTC - in response to Message 1935609.  


Check the runtimes for the GPU & CPU work (although you can't really compare the run times for the stock applications and the optimized ones. You need to compare the same application run times for the same type of WU).
A quick look shows incredibly long run times for CPU work on the E5440. My Core 2 Duo when overloaded trying to feed 2 GPUs was able to do them in less than 6 hours. It's 14+ hours on that system.
I'd check it's actual clock speed and temperature.


how do i filter out only GPU work? usually when i check i have to sort through pages and pages of CPU work before i even see the GPU tasks that were validated


EDIT- have you changed any memory modules recently? We've had multi socket systems in the past that have had the memory in the wrong slots (for the number of modules & number of CPUs) resulting in a massive system memory I/O bottleneck. Just putting the memory in the channels specified in the manual made for a huge improvement in crunching times.


memory isn't impropery populated, as every memory slot is full lol. but i did add some more memory to this system, maybe around the time time it started going downhill. let me pull that memory and see if it makes any difference.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1935632 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1935658 - Posted: 14 May 2018, 15:10:28 UTC - in response to Message 1935632.  

For me it is just the opposite, tons of gpu pages before I ever find a cpu task. Either you can just whack in a huge number into the offset position in the first page of tasks URL to get much deeper into your validated tasks list and hope for a cpu task to show up, or you can use BoincTasks History file to locate a cpu task which are colored differently than gpu tasks and then copy and paste the task name into the search list on the site for for your validated tasks. Much faster in the end usually to find cpu tasks that way for me.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1935658 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1935661 - Posted: 14 May 2018, 15:31:30 UTC - in response to Message 1935658.  

does some of this history get "lost" or deleted over time? as in, is there some website where i can pull all the work units i've run in say the past year, and sort them by type? if not it's ok, I'm just curious if anything like this exists. I'm very much data driven with stuff like this. i like to see long timescale graphs.

but i think the problem with this machine is certainly the extremely long run times. i'm going to pull the memory that i added to this system back out. its ECC, but possible that it's going bad.

how much does disk access mater? could a very slow disk speed impede performance? this system has the OS on 2x very old SCSI drives (raid-1), but the raid is still working fine according to the controller. when i was transferring files from a USB stick to this system (for the special sauce), it was obscenely slow. copying files from a USB drive at just 192kBps. whether that speed was limited by the CPU's ability to move the files around, or the disk's limit of receiving the files, i'm not sure.

i guess i have some things that i can check lol. tonight i'll pull the suspected faulty ram, re-paste the CPU TIM, and go from there.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1935661 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1935668 - Posted: 14 May 2018, 16:09:19 UTC

Disk access speed is a low contributor to SETI calculation performance - the only time the disk is accessed is during the first couple of seconds when the data is read, every time SETI does a snapshot (only a few bytes of data once a minute by default - longer if yo are worried about disk access performance) and when the run is completed, about 30kBytes, so only a blink of an eye.

USB sticks tend to be very slow for bulk data moves - particularly if they are USB 1.x and it sounds like your is.

The work units and results are only kept on the "live" database for twenty four hours after validation - there are a few tools that can help you get data in that time, but I wouldn't recommend doing it as the amount of data you will accrue over the course of a year is simply vast. You could try one of the BOINC manager tools or SETI Sprint and see if that does what you are looking for.

Keith's approach is about as good as it gets for a manual data grab.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1935668 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1935674 - Posted: 14 May 2018, 18:00:31 UTC - in response to Message 1935668.  

hey thanks. you might be right, i didn't think the USB ports were so old to be 1.1, but i guess they are, which might explain the really slow speed.

but the memory seems to be a prime suspect right now.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1935674 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Taking over SETI rig management for my dad, settings and configurations?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.