AP task (NV) hitting 2GB memory limit and resetting over and over again

Message boards : Number crunching : AP task (NV) hitting 2GB memory limit and resetting over and over again
Message board moderation

To post messages, you must log in.

AuthorMessage
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 3992
Credit: 85,281,665
RAC: 126
Finland
Message 1698082 - Posted: 3 Jul 2015, 9:47:19 UTC

Hi
here's the task http://setiathome.berkeley.edu/workunit.php?wuid=1835824773

It is running on my GTX970 and the CPU usage and memory usage are very high. The CPU usage is about 95% and memory usage goes up to 2 GB and then the application just restarts itself. The application is Lunatics AP7_win_x86_SSE2_OpenCL_NV_r2721.exe with parameters:
-unroll 18 -oclFFT_plan 256 16 256 -use_sleep -ffa_block 16384 -ffa_block_fetch 8192 -hp -tune 1 64 8 1 -tune 2 64 8 1


Is there something I could try before aborting it?
ID: 1698082 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1698084 - Posted: 3 Jul 2015, 10:11:00 UTC - in response to Message 1698082.  

Is there something I could try before aborting it?


That depends on how many units you are doing at one time on the gpu ?

How many units is the CPU doing at one time ?

Are you using the IGPU ?
ID: 1698084 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1698085 - Posted: 3 Jul 2015, 10:19:49 UTC
Last modified: 3 Jul 2015, 10:22:39 UTC

I must ask you why you think the unroll command is doing it ?

I would delete the unroll command restart Bionic and see if that stops it in stead of aborting the unit if that is what you meant
ID: 1698085 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1698087 - Posted: 3 Jul 2015, 10:35:55 UTC

I just also looked at your machine the Nvida driver is 344 and it has opencl 1.1 but your IGPU has opencl 1.2 possible problems there driver mismatch ?

Upgrade your Nvida driver or better rollback the IGPU HD 4000 driver to a early'r one sorry can tell you witch one i haven't got a IGPU
ID: 1698087 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1698088 - Posted: 3 Jul 2015, 10:36:24 UTC - in response to Message 1698082.  

Hi
here's the task http://setiathome.berkeley.edu/workunit.php?wuid=1835824773

It is running on my GTX970 and the CPU usage and memory usage are very high. The CPU usage is about 95% and memory usage goes up to 2 GB and then the application just restarts itself. The application is Lunatics AP7_win_x86_SSE2_OpenCL_NV_r2721.exe with parameters:
-unroll 18 -oclFFT_plan 256 16 256 -use_sleep -ffa_block 16384 -ffa_block_fetch 8192 -hp -tune 1 64 8 1 -tune 2 64 8 1


Is there something I could try before aborting it?

Upgrade your Lunatics to v0.43a, which contains the r2737 app which eliminates this problem
ID: 1698088 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1698091 - Posted: 3 Jul 2015, 10:47:27 UTC - in response to Message 1698088.  

Hi
here's the task http://setiathome.berkeley.edu/workunit.php?wuid=1835824773

It is running on my GTX970 and the CPU usage and memory usage are very high. The CPU usage is about 95% and memory usage goes up to 2 GB and then the application just restarts itself. The application is Lunatics AP7_win_x86_SSE2_OpenCL_NV_r2721.exe with parameters:
-unroll 18 -oclFFT_plan 256 16 256 -use_sleep -ffa_block 16384 -ffa_block_fetch 8192 -hp -tune 1 64 8 1 -tune 2 64 8 1


Is there something I could try before aborting it?

Upgrade your Lunatics to v0.43a, which contains the r2737 app which eliminates this problem


Just wondering Richard i thought there was only a problem with the older cards and that's why Eric made the 0.43a version ?
Just being curious
ID: 1698091 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1698092 - Posted: 3 Jul 2015, 10:49:58 UTC - in response to Message 1698091.  

Hi
here's the task http://setiathome.berkeley.edu/workunit.php?wuid=1835824773

It is running on my GTX970 and the CPU usage and memory usage are very high. The CPU usage is about 95% and memory usage goes up to 2 GB and then the application just restarts itself. The application is Lunatics AP7_win_x86_SSE2_OpenCL_NV_r2721.exe with parameters:
-unroll 18 -oclFFT_plan 256 16 256 -use_sleep -ffa_block 16384 -ffa_block_fetch 8192 -hp -tune 1 64 8 1 -tune 2 64 8 1


Is there something I could try before aborting it?

Upgrade your Lunatics to v0.43a, which contains the r2737 app which eliminates this problem


Just wondering Richard i thought there was only a problem with the older cards and that's why Eric made the 0.43a version ?
Just being curious

Eric didn't make the Lunatics 0.43a Installer, Richard did that.

Claggy
ID: 1698092 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1698093 - Posted: 3 Jul 2015, 10:50:48 UTC - in response to Message 1698091.  

Hi
here's the task http://setiathome.berkeley.edu/workunit.php?wuid=1835824773

It is running on my GTX970 and the CPU usage and memory usage are very high. The CPU usage is about 95% and memory usage goes up to 2 GB and then the application just restarts itself. The application is Lunatics AP7_win_x86_SSE2_OpenCL_NV_r2721.exe with parameters:
-unroll 18 -oclFFT_plan 256 16 256 -use_sleep -ffa_block 16384 -ffa_block_fetch 8192 -hp -tune 1 64 8 1 -tune 2 64 8 1


Is there something I could try before aborting it?

Upgrade your Lunatics to v0.43a, which contains the r2737 app which eliminates this problem

Just wondering Richard i thought there was only a problem with the older cards and that's why Eric made the 0.43a version ?
Just being curious

*I* made the v0.43a version, because after v0.43 went live, it turned out that some special cases - like this workunit - had been missed in testing, and were badly handled by the application which Raistmer had originally supplied to me for distribution.
ID: 1698093 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1698094 - Posted: 3 Jul 2015, 10:53:17 UTC - in response to Message 1698092.  

ops sorry Richard i thought it was one of you but
wasn't shore getting old the brain doesn't remember well
ID: 1698094 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1698095 - Posted: 3 Jul 2015, 10:54:18 UTC - in response to Message 1698093.  

ops sorry and thank's for the info
ID: 1698095 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 3992
Credit: 85,281,665
RAC: 126
Finland
Message 1698107 - Posted: 3 Jul 2015, 11:46:45 UTC

I just installed Lunatics 0.43a, let's see what happens.

A few comments for your questions:
- I am running 3 WUs at a time on the 970. Two were APs and one Einstein BRP6.
- Two CPU cores were free to feed the GPU
- Intel GPU was not used
- 6 WUs running on CPU, 2 MBs and 4 CPDN PNWs

Tahnk you for everybody helping.
ID: 1698107 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1698181 - Posted: 3 Jul 2015, 15:55:20 UTC - in response to Message 1698107.  

I have 970's and run Lunatics 0.43a for quite a while now. I too typically run two SETI tasks and one BRP6 task on each card at the same time with no issues. Or two SETI tasks and one MW task on each card. That is the only time I see an increase in completion times for SETI tasks on that card. MW tasks take more computing power and more PCIe bus time and slow down other tasks.

Keith
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1698181 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 3992
Credit: 85,281,665
RAC: 126
Finland
Message 1698223 - Posted: 3 Jul 2015, 17:51:02 UTC

The problematic WU finished without a major hitch with the new app, and so have the rest of AP WUs since that.

Only thing to point out is that the app seems from time to time use high amount of CPU which causes some lag on computer. For example the clock app stops maybe for 5 seconds before catching up again. And when running two APs at the same time the apps seem to sync that high CPU usage making it seem even worse.

Maybe removing the -hp parameter from cmd-line or allowing only one AP at a time in app_config could improve the user experience.
ID: 1698223 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1698639 - Posted: 5 Jul 2015, 12:28:57 UTC

CPU use depends on a number of things such as the amount of radar blanking that was applied to the data. (I think if there is a lot of blanking a lot of CPU use is required)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1698639 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1698652 - Posted: 5 Jul 2015, 12:58:30 UTC - in response to Message 1698639.  

CPU use depends on a number of things such as the amount of radar blanking that was applied to the data. (I think if there is a lot of blanking a lot of CPU use is required)

That was with Astropulse v6, we're moved onto Astropulse v7 now, that doesn't have to do those calculations, so no extra CPU use.

Claggy
ID: 1698652 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1698666 - Posted: 5 Jul 2015, 13:57:59 UTC

...That's strange because I just did a check on one of my crunchers.

It has 2 GTX980 GPU and an AMD FX8 CPU.

I'm running 2 AP tasks per GPU.

Windows task manager reports:
4 off AP7_win_64_AVX_CPU_r2692.exe, each at ~13% (1 CPU core each)
and
4 off AP7_win_x86_SSE2_OpenCL_NV-r2737.exe*32 each at ~13% (1 CPU core each)
All 8 CPU cores are "red lined" at 100%.


The BOINC gui manager says I'm running four GPU (openCL_Nvidia_100) and four CPU tasks (sse2).

So what is happening? Are the "CPU" tasks actually using two cores each, or are the GPU cores using "50%" of a GPU plus a CPU core each, or what?
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1698666 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1698675 - Posted: 5 Jul 2015, 14:28:49 UTC - in response to Message 1698666.  

So what is happening? Are the "CPU" tasks actually using two cores each, or are the GPU cores using "50%" of a GPU plus a CPU core each, or what?

It's the OpenCL_NV application which has the high level of CPU use, nothing to do with radar blanking in that case.
ID: 1698675 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1698676 - Posted: 5 Jul 2015, 14:31:50 UTC - in response to Message 1698666.  
Last modified: 5 Jul 2015, 14:32:36 UTC

So what is happening? Are the "CPU" tasks actually using two cores each, or are the GPU cores using "50%" of a GPU plus a CPU core each, or what?

That's because the Nvidia OpenCL 100% CPU usage Bug that has been in existence since the 27x.xx drivers, there is also a school of thought that the NV OpenCL apps could be programmed differently to work round that problem.

Claggy
ID: 1698676 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1698743 - Posted: 5 Jul 2015, 18:40:24 UTC

The AP app has the -use_sleep switch to reduce CPU usage significantly.
Read the read me file please.


With each crime and every kindness we birth our future.
ID: 1698743 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1698759 - Posted: 5 Jul 2015, 19:25:29 UTC

Thanks all
Edited the AP command line file on the machine in question




Argh, why does it decide to run a pile of MBs just before I started to to the edit, so now I'll have to wait until a pile of APs start....


(In fact bigger ARGHHH - no APs around for the GPUs, so an even longer wait...)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1698759 · Report as offensive

Message boards : Number crunching : AP task (NV) hitting 2GB memory limit and resetting over and over again


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.