High performance Linux clients at SETI

Message boards : Number crunching : High performance Linux clients at SETI
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 20 · Next

AuthorMessage
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 1990316 - Posted: 16 Apr 2019, 23:55:06 UTC

Okay, I'm excited to try this, but this is first time I'm updating an existing special app installation. I have the BOINC.7z package and understand about updating my nvidia drivers. I also got the part about updating the boinc files.

Can someone let me know which of the files from the setiathome.berkeley.edu folder need to be updated? I assume all of them should be updated, but would appreciate confirmation. Also, is -nobs still good to use on the command line in app_info?

Thanks!
ID: 1990316 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1990321 - Posted: 17 Apr 2019, 0:25:54 UTC - in response to Message 1990316.  

You can just unpack the new All-in-One package in the Download directory and just copy and paste the five main executables into your existing BOINC folder. You shouldn't copy the boinc file if you are running a spoofed client though or you will lose your spoofed gpus. Stay with the 7.15.0 boinc client file in that case. You can also copy and paste the new setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 executable into your setiathome project directory. You need to edit your existing app_info and do a Find and Replace with the Text Editor to change over from the existing special app to the new 0.98b1 CUDA10.1 special app. As long as the app_info special app filename is correct you just start up right where you left off before you stopped BOINC. And yes I would continue to use the -nobs parameter.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1990321 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 1990326 - Posted: 17 Apr 2019, 1:37:28 UTC
Last modified: 17 Apr 2019, 1:37:52 UTC

I have it working on my T-3500, but I just noticed the GTX-980 card is only drawing 46 watts and nvidia-smi shows 0% volatile GPU-Util on the Cuda 10.1. Memory use is only 99 MB! Seems to be processing the job fairly well, however.
ID: 1990326 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1990328 - Posted: 17 Apr 2019, 1:58:16 UTC - in response to Message 1990326.  

Probably something to do with card firmware and the 418.56 drivers that nvidia-smi is having a hard time interpreting. Need to compare other Maxwell cards to see if that is the commonality. Somebody said the new 10.1 app was working well on their 750 TI.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1990328 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1990335 - Posted: 17 Apr 2019, 3:54:26 UTC
Last modified: 17 Apr 2019, 4:03:40 UTC

Just upgraded (I think) my all gtx 1060 3GB box. Once I figure out the power short out I will be adding a gtx 750Ti to run 7 gpus.
https://setiathome.berkeley.edu/show_host_detail.php?hostid=8684146

Before the upgrade, the estimated processing time for the gpu tasks that were waiting to run was 2 minutes 55 seconds.

When I checked again just after the update it has gone down to 2 minutes 50 seconds.

This system is running with 0.33 cpu to 1 gpu for the gpu tasks. And without the -nobs parameter.

If I run with the -nobs parameter I have to run 1 cpu to 1 gpu. Otherwise the task manager pegs at 100%.

Last time I ran with -nobs and 1 to 1, it was crunching around 2:45-50 seconds. With 0.33 to 1, and without -nobs it was averaging earlier today about 3:08 / task. When I updated it was running 2:55 / task.

The oldest card(s) I have are gtx 750Ti's.

HTH,
Tom
A proud member of the OFA (Old Farts Association).
ID: 1990335 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1990339 - Posted: 17 Apr 2019, 4:43:30 UTC

Not quite sure how you get Task Manager or the proper name of System Monitor to show 100% usage. Unless you are running all 16 threads on cpu and gpu tasks. Can't you just use a <project_max_concurrent> statement and knock a few threads out of use to limit your usage to 75-80%?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1990339 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34854
Credit: 261,360,520
RAC: 489
Australia
Message 1990358 - Posted: 17 Apr 2019, 11:10:01 UTC

Please don't add the 750 Ti's to that rig Tom as I'll be taking a benchmark from it.

After I get my old wagon done for another 12mths on the road (by the end of this month) I'm getting a couple of SSD's to dual boot these 2 rigs of mine.to get a bit more heat out of them this quickly coming up winter here. ;-)

Cheers.
ID: 1990358 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 1990359 - Posted: 17 Apr 2019, 11:35:26 UTC

After the update to the new version, I found this very long jobs running on my 1060. Usual run time is about 2:50. This is over 36 min and still going. I don't know the different job types that well, but others with similar names have been running in the normal range. I killed it, but want to document to the group in case it was useful:
_________________________________________
Application
Local: setiathome_v8 8.01 (cuda90)
Name
blc32_2bit_guppi_58406_02921_HIP116398_0037.31677.818.22.45.231.vlar
State
Running
Received
Sun 14 Apr 2019 07:56:44 AM EDT
Report deadline
Thu 06 Jun 2019 12:56:26 PM EDT
Resources
1 CPU + 1 NVIDIA GPU (device 2)
Estimated computation size
16,326 GFLOPs
CPU time
00:00:00
CPU time since checkpoint
00:00:00
Elapsed time
00:27:15
Estimated time remaining
00:00:37
Fraction done
97.780%
Virtual memory size
40.76 GB
Working set size
28.79 MB
Directory
slots/9
Process ID
11763
Progress rate
3.588% per minute
Executable
setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101
ID: 1990359 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1990361 - Posted: 17 Apr 2019, 12:10:41 UTC - in response to Message 1990359.  

Looking at that list, I'd say that

CPU time
00:00:00
CPU time since checkpoint
00:00:00
represent a task which never got started and hasn't done any work - it's stuck.

Fraction done
97.780%
is pseudo progress - designed to reassure you when running a project application which can't report its own fraction done. But all SETI apps do report their own fraction done, so this is more confusing than reassuring here.

I'm assuming that anyone running a special build knows their way round the system: find the stderr.txt file in the slot directory where this particular task is running, and see if that contains any clues. Once you've done that investigation, try suspending that particular task and allowing it to run again once it's cleared itself out of memory - that sometimes kicks 'em into life.
ID: 1990361 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 1990364 - Posted: 17 Apr 2019, 12:41:36 UTC - in response to Message 1990361.  

Looking at that list, I'd say that

CPU time
00:00:00
CPU time since checkpoint
00:00:00
represent a task which never got started and hasn't done any work - it's stuck.

Fraction done
97.780%
is pseudo progress - designed to reassure you when running a project application which can't report its own fraction done. But all SETI apps do report their own fraction done, so this is more confusing than reassuring here.

I'm assuming that anyone running a special build knows their way round the system: find the stderr.txt file in the slot directory where this particular task is running, and see if that contains any clues. Once you've done that investigation, try suspending that particular task and allowing it to run again once it's cleared itself out of memory - that sometimes kicks 'em into life.


Thanks, Richard. I recognized it was probably hung since the job type looked normal, but I was rushing out the door to my day job. ;)

This was actually the 980 card; seems like every program wants to enumerate the cards differently. The display hung shortly thereafter (it was on one of the 1060s). I had to hard reboot. When it came back, the nvidia-smi is now showing power usage and GPU % usage at normal, loaded levels for the 980 versus my earlier post above. Strange. I'm hoping things settle down now with a couple of reboots under its belt.
ID: 1990364 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1990365 - Posted: 17 Apr 2019, 12:49:37 UTC - in response to Message 1990364.  
Last modified: 17 Apr 2019, 12:50:02 UTC

Looks like you aborted your copy of WU 3434109110, but Mr. Kevvy completed his OK.

No diagnostic data retained after an abort, but you can see what x41p_V0.97, Cuda 9.00 special should have looked like.
ID: 1990365 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 1990370 - Posted: 17 Apr 2019, 13:27:07 UTC
Last modified: 17 Apr 2019, 13:38:32 UTC

Greetings,

I extracted the zip file that I got from the link here. When I clicked that link, it started the download immediately. The modification dates/times on the 5 executable files are the same as the dates/times on the files I currently have. If the zip file I got is newer than what I currently have, shouldn't the dates/times be later? Or is this another Linux quirk that needs to be gotten used to? ;) I will not copy those files over until I get a definitive answer one way or the other. :)

Have a great day! :)

Siran

[edit]
I tested the link on this PC (Winders) and the download starts immediately too. So, I have ruled that out for the Linux box.
[/edit]

[edit2]
Duh! I forgot. When I went to copy the first file and I got the window telling me the file exists and do I want to overwrite it, not only were the dates/times the same, the files sizes were the same.
[/edit2]
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 1990370 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1990375 - Posted: 17 Apr 2019, 13:52:24 UTC - in response to Message 1990370.  
Last modified: 17 Apr 2019, 13:59:51 UTC

You just retrieved the old copy you had on your system or something. The new file at the same location is completely different. Twice the size of the old and has completely different creation dates for every file. Try again. The new file is 250MB in size since it contains the source code in /Docs. Clear your browser caches.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1990375 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 1990379 - Posted: 17 Apr 2019, 14:10:22 UTC - in response to Message 1990375.  

You just retrieved the old copy you had on your system or something. The new file at the same location is completely different. Twice the size of the old and has completely different creation dates for every file. Try again. The new file is 250MB in size since it contains the source code in /Docs. Clear your browser caches.

Hi Keith,

Ok, I'll try again. Thanks. :)

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 1990379 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1990382 - Posted: 17 Apr 2019, 14:16:59 UTC - in response to Message 1990375.  

Sounds like he was only comparing the 5 boinc files in the root of the BOINC package. Those were already previously released (the 7.14.2 update) and have not changed.

What changed was the contents further in, in the project/seti folder where the seti special app and config files reside. That’s what you need out of this new package.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1990382 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 1990383 - Posted: 17 Apr 2019, 14:49:43 UTC - in response to Message 1990382.  

Sounds like he was only comparing the 5 boinc files in the root of the BOINC package. Those were already previously released (the 7.14.2 update) and have not changed.

What changed was the contents further in, in the project/seti folder where the seti special app and config files reside. That’s what you need out of this new package.

Hi Ian,

Then the instructions are quite misleading:
If you already have a BOINC folder in Home, Stop BOINC, unpack the new BOINC folder to Downloads, copy the desired files from the downloaded BOINC and paste them into the BOINC in Home. The Five BOINC files are;
boinc
boinccmd
boincmgr
boincscr
switcher
BOINC files for the older systems are in the docs folder.
If you want to use the run_client & run_manager scripts, you need to add your user folder name to the location replacing 'user'.
The download is a little larger than last time, but it seems to be downloading very slowly at present. Don't know, it uploaded pretty quick, as usual.

Yes, those 5 files are the same as what I have. I do have the newest zip file which has files/folders with Apr 16, 2019 dates.

So, now I know what the "new" files are. I now have to decide whether or not it is worth the hassle to change those files, and do the "find and replace" in another file that Keith mentioned in an earlier post, just to shave a few seconds off of the computation times. I'll look into what I may need to change and decide then.

I like the idea of using Linux vs Windows 10, but there is such a steep learning curve and that is why the majority of users shy away from Linux. Why cannot the Linux community figure out how to install software with out the constant use of the terminal? It's grand that they have made Linux easier to use with the GUI and such, but still. Nobody wants to flash back to the 90s and do things as was done in MS-DOG.

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 1990383 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1990389 - Posted: 17 Apr 2019, 15:24:47 UTC - in response to Message 1990383.  
Last modified: 17 Apr 2019, 15:27:06 UTC

Perhaps the reason No One Else had your problem is they were able to identify an OLD Quoted Post from the Last release and the New Actual Post which clearly identifies What Is New;
A New version has been uploaded to the same location.
New in this version is an upgrade to 0.98b1 featuring numerous improvements to compatibility and speed. Most notable is much better handling of the Arecibo files producing fewer inconclusive results. All Users should update to this version to reduce the number of repeated Validation attempts.
Why are you quoting an Old Quoted Post from the Last Release?
Why cannot the Windows community figure out how to read a Post? Perhaps you should go back to Windows, you won't have much success in Linux if you can't read a post and conclude what's Quoted and What's New.
;-)
ID: 1990389 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1990393 - Posted: 17 Apr 2019, 15:45:48 UTC - in response to Message 1990383.  
Last modified: 17 Apr 2019, 15:46:24 UTC

Siran,

copy these files over:
they are in /BOINC/projects/setiathome.berkeley.edu/ copy them to the same directory in your existing BOINC folder

setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90 (this file is new)
setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 (this file is new)
app_info.xml (overwrite your existing file with this newer version)

the contents of the app_info.xml file is as follows by default:

1 <app_info>
2  <app>
3     <name>setiathome_v8</name>
4  </app>
5     <file_info>
6       <name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90</name>
7      <executable/>
8     </file_info>
9     <app_version>
10       <app_name>setiathome_v8</app_name>
11       <platform>x86_64-pc-linux-gnu</platform>
12      <version_num>801</version_num>
13       <plan_class>cuda90</plan_class>
14       <cmdline></cmdline>
15       <coproc>
16         <type>NVIDIA</type>
17         <count>1</count>
18       </coproc>
19      <avg_ncpus>0.1</avg_ncpus>
20       <max_ncpus>0.1</max_ncpus>
21       <file_ref>
22          <file_name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90</file_name>
          <main_program/>
      </file_ref>
    </app_version>
  <app>
     <name>astropulse_v7</name>
  </app>
     <file_info>
       <name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</name>
        <executable/>
     </file_info>
     <file_info>
       <name>AstroPulse_Kernels_r2751.cl</name>
     </file_info>
     <file_info>
       <name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</name>
     </file_info>
    <app_version>
      <app_name>astropulse_v7</app_name>
      <platform>x86_64-pc-linux-gnu</platform>
      <version_num>708</version_num>
      <plan_class>opencl_nvidia_100</plan_class>
      <coproc>
        <type>NVIDIA</type>
        <count>1</count>
      </coproc>
      <avg_ncpus>0.1</avg_ncpus>
      <max_ncpus>0.1</max_ncpus>
      <file_ref>
         <file_name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</file_name>
          <main_program/>
      </file_ref>
      <file_ref>
         <file_name>AstroPulse_Kernels_r2751.cl</file_name>
      </file_ref>
      <file_ref>
         <file_name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</file_name>
         <open_name>ap_cmdline.txt</open_name>
      </file_ref>
    </app_version>
   <app>
      <name>setiathome_v8</name>
   </app>
      <file_info>
         <name>MBv8_8.22r3711_sse41_intel_x86_64-pc-linux-gnu</name>
         <executable/>
      </file_info>
     <app_version>
     <app_name>setiathome_v8</app_name>
     <platform>x86_64-pc-linux-gnu</platform>
     <version_num>800</version_num>   
      <file_ref>
        <file_name>MBv8_8.22r3711_sse41_intel_x86_64-pc-linux-gnu</file_name>
        <main_program/>
      </file_ref>
    </app_version>
   <app>
      <name>astropulse_v7</name>
   </app>
     <file_info>
       <name>ap_7.05r2728_sse3_linux64</name>
        <executable/>
     </file_info>
    <app_version>
       <app_name>astropulse_v7</app_name>
       <version_num>704</version_num>
       <platform>x86_64-pc-linux-gnu</platform>
       <plan_class></plan_class>
       <file_ref>
         <file_name>ap_7.05r2728_sse3_linux64</file_name>
          <main_program/>
       </file_ref>
    </app_version>
</app_info>


so by default you can see that it is setup to use the cuda9.0 version of the app. if you would like to use the cuda10.1 version, all you have to do is replace the filename with the filename of the app you want to use. in this case since the filenames are identical with the exception of the cuda version at the end, you can just replace "90" with "101" and it will use that one. there are 2 places where you need to make this change (lines 6 and 22, i added line numbers to guide you)
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1990393 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 1990394 - Posted: 17 Apr 2019, 15:47:25 UTC - in response to Message 1990389.  

Perhaps the reason No One Else had your problem is they were able to identify an OLD Quoted Post from the Last release and the New Actual Post which clearly identifies What Is New;
A New version has been uploaded to the same location.
New in this version is an upgrade to 0.98b1 featuring numerous improvements to compatibility and speed. Most notable is much better handling of the Arecibo files producing fewer inconclusive results. All Users should update to this version to reduce the number of repeated Validation attempts.
Why are you quoting an Old Quoted Post from the Last Release?
Why cannot the Windows community figure out how to read a Post? Perhaps you should go back to Windows, you won't have much success in Linux if you can't read a post and conclude what's Quoted and What's New.
;-)

Hi TBar,

I quoted it from this post by you, that is why. You quoted the instructions from some other post that I think you also posted. Perhaps you should have removed the old instructions and made new ones for a noobie like me to understand. I was going by those instructions because that is what I assumed you were implying. You said nothing about copying any other files and doing a "find and replace" in a text file for the new app(s) to take affect. I am not a programming expert; I do not know the mechanics of how BOINC actually works, especially in Linux. I am not a Linux guru either.

Does that help explain why I am confused on what needed to be done with this "new" version?

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 1990394 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1990401 - Posted: 17 Apr 2019, 16:16:23 UTC - in response to Message 1990394.  

Actually I intended to quote the old post with instructions and the link in it. Most others were able to figure it out. I also mentioned the ReadMe files which also state how to change to the CUDA 10.1 App;
From the ReadMe in setiathome.berkeley.edu/docs
7) If you have an AMD CPU move the AMD CPU App in the folder 'For AMD CPUs' to the root level, change the App names in the CPU section of the app_info.xml (<name> & <file_name>), and see if that works better.
If you have a CUDA 10.1 driver you can use the CUDA101 App, change the app_info.xml to name the CUDA 10.1 App in the Two locations, <name> & <file_name>

It's the Same in Every version of BOINC on Every Platform, absolutely Nothing different in Windows Linux, or Mac.
Windows = http://mikesworld.eu/download.html
Linux = http://lunatics.kwsn.info/index.php?action=downloads;cat=48
Mac = https://arkayn.us/forum/index.php?PHPSESSID=1f59a52c29828c5235c1d51133e07d30&topic=191.0

What's interesting is someone that has been a member of SETI for so long and still doesn't know how to load an App....
ID: 1990401 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 20 · Next

Message boards : Number crunching : High performance Linux clients at SETI


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.