Linux CUDA 'Special' App finally available, featuring Low CPU use

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 . . . 83 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1868661 - Posted: 21 May 2017, 19:53:53 UTC - in response to Message 1868653.  

Stock clock.
I haven't tested AVX yet due to lack of time to do the test properly.
AVX SHOULD be faster, but "time will tell", once the current panic at work is over and I can devote more than a few minutes on the fora.....

Yes, after looking at your benchmark ratings, I figured you were at stock cpc clock. I would try to find the time to try out the AVX Linux version. It seems to be really fast on my 1700X. Don't have any SSE4.x version to compare against in Windows though.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1868661 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1868663 - Posted: 21 May 2017, 19:56:07 UTC - in response to Message 1868600.  
Last modified: 21 May 2017, 20:25:49 UTC


That's right. My hope was that While the Linux IOMMU situation is still problematic for Ryzen, that M$ migth have somehow addressed it, so the reverse situation might work.... passing through a GPU from Windows host to Linux. From Keith's description though, that sounds as though probably not the case, though uncertain. Not sure how long it will take AMD+MotherboardVendors+LinuxCommunity to come up with a solution from the other direction, but it seems they are working on it.


. . So net result is when I upgrade to the Ryzen unit I should stick with Win10 for a while before implementing Linux?

Stephen

??

I noticed this part of a comment from Elmor over in the ROG overclocking thread regarding changes to the SMU/FW update to AGESA 1.0.0.6.


For Linux/VM users: Good news! The IOMMU got patched, and the groupings are sane now. You can do PCI passthrough without needing a patched kernel.

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1868663 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1868703 - Posted: 21 May 2017, 22:35:26 UTC - in response to Message 1868628.  
Last modified: 21 May 2017, 22:40:57 UTC

Curious where you have your 1700 clocked at. The second thing is did you finally determine that the SSE4.2 CPU app is faster than the AVX app? I was wondering why your CPU task completion times are significantly longer than mine.


. . Glad you mentioned that Keith, I have been wondering about SSE4.1/4.2 versus AVX myself ...

. . And we still have to wait for the results ... ???

Stephen

??
ID: 1868703 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1868878 - Posted: 22 May 2017, 23:36:34 UTC

As time has permitted over the last few weeks, I've run many different test scenarios in an attempt to identify what factor(s) might be causing the poor performance of the GTX 960 on my host 8253697 when running the Special App (both 6.5 and 8.0) under Linux.

The one obvious factor that differentiates that card from the other two in the box is the use of a riser cable from a PCIe x8(x4) slot. That, by itself, seemed unlikely to cause a problem (at least to me, if not to TBar), since that card/cable/slot configuration has worked just fine in Windows with both Cuda50 and SoG.

On the hardware side, I first tried swapping that 960 with another 960 that's working just fine with the Linux Special App in another box (no difference), then with a GTX 750Ti (slightly worse performance than the 960), and then tried a different riser cable (no difference). I also moved the monitor cable from the 960 to one of the other cards (possible infinitesimal difference). There's not much I could do with the PCIe slot itself, the bus or the motherboard.

To see if the Linux OS might be a problem, I ran the Linux SoG app (2 tasks at a time) for a few hours. As it happened, the 960 only caught guppi VLARs in that stretch, but the run times, although slightly longer (perhaps due to a different BLCnn batch), were still roughly comparable to those from the Windows SoG app.

Noticing that at least a few of the other Linux Special App users are adding a "pfb" value to the command line, I tried running with a value of '15', since that's what I used to use for "pfblockspersm" in Cuda50. This might have shown some barely perceptible improvement, though probably not statistically significant.

Then I decided to try Petri's latest Cuda 8.0 app, the "zi3t2b" which runs without Blocking Sync. BINGO! Run times dropped dramatically, down to approximately what I'm getting with the four 960s in my other Linux host 8257247. On that box, 2 of the 960s are on the board in x16(x16) slots, while 2 are on risers from x16(x8) slots (which appears to have no impact on performance). For good measure, I also tried running that app with the "-bs" option set, which caused the performance to go in the toilet again.

Here are some performance comparisons, with each Average Run Time representing six tasks in the specified AR:

BASELINE: Host 7057115 running Win8.1 "SoG" (2/GPU)
High AR ---- Avg RT = 10:33, Tasks/Hr = 11.37
Normal AR -- Avg RT = 19:36, Tasks/Hr = 6.12
VLAR ------- Avg RT = 32:01, Tasks/Hr = 3.74

Host 8253697 running Linux "Cuda6.5"
High AR ---- Avg RT = 7:10, Tasks/Hr = 8.37, Change = -26.4%
Normal AR -- Avg RT = 14:11, Tasks/Hr = 4.23, Change = -30.9%
VLAR ------- Avg RT = 13:55, Tasks/Hr = 4.31, Change = +15.2%

Host 8253697 running Linux "Cuda8.0" (w/ built-in Blocking Sync)
High AR ---- Avg RT = 6:35, Tasks/Hr = 9.11, Change = -19.9%
Normal AR -- Avg RT = 13:20, Tasks/Hr = 4.50, Change = -22.2%
VLAR ------- Avg RT = 13:08, Tasks/Hr = 4.57, Change = +22.2%

Host 8253697 running Linux "Cuda8.0" (w/o Blocking Sync)
High AR ---- Avg RT = 2:28, Tasks/Hr = 24.32, Change = +113.9%
Normal AR -- Avg RT = 5:08, Tasks/Hr = 11.69, Change = +91.0%
VLAR ------- Avg RT = 8:28, Tasks/Hr = 7.09, Change = +89.6%

So, at least for a GTX 960, it would appear that Blocking Sync might not be a good choice for a card tied into a PCIe slot that's less than x8 electircal. Why that would be the case is not something I have the expertise to explain.

Whether this also applies to other cards, or to other motherboard/bus setups which have an x4 or x1 slot, is beyond my ability to test. Perhaps others can do so. In any event, hopefully this info will be useful to other users in the future.
ID: 1868878 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1868905 - Posted: 23 May 2017, 1:56:57 UTC - in response to Message 1868878.  
Last modified: 23 May 2017, 2:01:48 UTC

Interesting, considering I have two machines running the same App with cards in electrical x4 slots that don't show this behavior with the Blocking Sync. In fact, in over a Year of testing I have never seen such a slowdown as your One machine shows, and there have been many, many different App versions tested. This machine is running a GTX 950 in an electrical x4 slot listed as device 3, http://setiathome.berkeley.edu/results.php?hostid=6796479&offset=340. On that machine device 2 is an identical GTX 950 in an electrical x16 slot. This machine is running a GTX 750Ti in an electrical x4 slot listed as device 3, https://setiathome.berkeley.edu/results.php?hostid=7769537&offset=300. Both machines are running an Old Intel Motherboard classed somewhere around 'workstation' with the first running dual processors. For comparison with the 750Ti, this other machine is running 750Ti cards in electrical x8 slots, https://setiathome.berkeley.edu/results.php?hostid=6906726&offset=220. As you can see the difference is nowhere near the difference your one machine shows, so, it must be something other than just the x4 slot with the Blocking Sync.
ID: 1868905 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1868907 - Posted: 23 May 2017, 2:49:59 UTC - in response to Message 1868905.  

Yes, there certainly could be other factors at work. I was trying to isolate one variable at a time and Blocking Sync was the only thing that really popped. BTW, I did the testing with Petri's 8.0 app last Thursday and most of the tasks have validated and gone, but here are links to a couple that are still pending at the moment: http://setiathome.berkeley.edu/result.php?resultid=5745745690 and http://setiathome.berkeley.edu/result.php?resultid=5745322027. I think I've captured most of the others in my archives, though, should their Stderr output ever prove useful.

This machine is running a GTX 950 in an electrical x4 slot listed as device 3, http://setiathome.berkeley.edu/results.php?hostid=6796479&offset=340.
One thing that does pique my curiosity in comparing a couple of your tasks from that device with similar ones from my testing is that yours appear to be using somewhat more CPU time than mine. For instance, on High AR tasks, mine used between 29 and 35 seconds, whereas the one of yours I looked at used 48 seconds. Similarly, on a normal AR, yours looked to be using about 100 seconds where mine only used between 61 and 71 seconds. Perhaps there's a difference in the polling frequency that has an impact.
ID: 1868907 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1868910 - Posted: 23 May 2017, 3:43:31 UTC - in response to Message 1868907.  

...in comparing a couple of your tasks from that device with similar ones from my testing is that yours appear to be using somewhat more CPU time than mine. For instance, on High AR tasks, mine used between 29 and 35 seconds, whereas the one of yours I looked at used 48 seconds. Similarly, on a normal AR, yours looked to be using about 100 seconds where mine only used between 61 and 71 seconds. Perhaps there's a difference in the polling frequency that has an impact.
Have you tried running the Apps at Normal Priority as discussed back here, https://setiathome.berkeley.edu/forum_thread.php?id=80636&postid=1855061#1855061

<cc_config>
  <log_flags>
     <sched_op_debug>1</sched_op_debug>
  </log_flags>
  <options>
     <no_priority_change>1</no_priority_change>
     <use_all_gpus>1</use_all_gpus>
     <max_file_xfers_per_project>8</max_file_xfers_per_project>
     <save_stats_days>365</save_stats_days>
     <skip_cpu_benchmarks>1</skip_cpu_benchmarks>
  </options>
</cc_config>

I think others found the no_priority_change option doesn't work when using the version of BOINC from the Repository, it works with the Berkeley version of BOINC.
ID: 1868910 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1868911 - Posted: 23 May 2017, 3:57:24 UTC - in response to Message 1868910.  

I hadn't tried it before, but have set it now. BOINC seems to accept it.

Mon 22 May 2017 08:49:39 PM PDT |  | Config: run apps at regular priority

It shouldn't take too long to see if it makes a difference, at least with the 6.5 app that I'm currently running.
ID: 1868911 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1868912 - Posted: 23 May 2017, 4:08:21 UTC - in response to Message 1868911.  

I believe you need to look at Top to see if it's actually working, https://setiathome.berkeley.edu/forum_thread.php?id=80636&postid=1855477#1855477
If it's working the GPU App will run at Priority 20 Nice 0, if not it will be Priority 30 Nice 10.
ID: 1868912 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1868913 - Posted: 23 May 2017, 4:24:26 UTC - in response to Message 1868912.  
Last modified: 23 May 2017, 4:40:52 UTC

Looks OK, i think.

 2748 jeff      20   0 28.382g 406416 294872 S  14.6  6.7   0:56.12 setiathome_x41p      
 2756 jeff      20   0 28.393g 435824 312876 S  10.3  7.1   0:34.73 setiathome_x41p      
 2735 jeff      20   0 28.393g 435452 312552 S   8.3  7.1   0:46.89 setiathome_x41p      

EDIT 1: But no discernible impact on the 960's first task, a guppi VLAR, after adding the option.
http://setiathome.berkeley.edu/result.php?resultid=5755772741
The 13:44 Run Time is within the previous range.

EDIT 2: And a shortie (AR=5.602958) still took 6:33, although that is slightly faster than the 7:10 average in my test sample.
http://setiathome.berkeley.edu/result.php?resultid=5755834797
ID: 1868913 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1868922 - Posted: 23 May 2017, 6:05:48 UTC - in response to Message 1868878.  
Last modified: 23 May 2017, 6:06:01 UTC

...
So, at least for a GTX 960, it would appear that Blocking Sync might not be a good choice for a card tied into a PCIe slot that's less than x8 electircal. Why that would be the case is not something I have the expertise to explain.
...


Present blocking sync implementation is relatively simple/naive, and there are likely too many of them. Once the pulsefinding (and any other serious) wrinkles are ironed out, we can look at scaling the synchronisation on a timed basis, with something akin to a frames per second target (perhaps launches per second, and scale the launches). Before other issues are addressed, that would be putting the cart before the horse though.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1868922 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1868934 - Posted: 23 May 2017, 7:44:33 UTC - in response to Message 1868913.  
Last modified: 23 May 2017, 8:00:13 UTC

OK, let's see how this works on the HP 'Workstation' board seen here, https://setiathome.berkeley.edu/forum_thread.php?id=80636&postid=1855727#1855727
From the image, it has One x1, and One electrical x4 slot that I usually don't use. They are spaced in a manner that you can run the Two 750Ti in either those slots or the Two x16 slots.
Since this is a $22 dollar board, I had no qualms about removing the end plate on the x1 slot so it can take a x16 card. I preformed the operation the day I received the board.
Other than having to reconfigure the xorg.conf file, the only problem so far is I can't convince it to use the fan control on the Top card using the x1 slot. So, it's running a little hotter than normal.
It doesn't look that much different than running both cards in the x16 slots, https://setiathome.berkeley.edu/results.php?hostid=6906726&offset=200 The monitor is connected to the lower card in the x4 slot.
I'll let it run for a while, but for now, it doesn't appear to be having any trouble. I'm Not using any risers, both cards are mounted in slots and do not use external power connections. Yes, that means the 750Ti is pulling the power from the x1 slot only.
The pciBusID = 52 is the x1 slot,
setiathome_CUDA: Found 2 CUDA device(s):
Device 1: GeForce GTX 750 Ti, 1998 MiB, regsPerBlock 65536
computeCap 5.0, multiProcs 5
pciBusID = 40, pciSlotID = 0
Device 2: GeForce GTX 750 Ti, 2000 MiB, regsPerBlock 65536
computeCap 5.0, multiProcs 5
pciBusID = 52, pciSlotID = 0
ID: 1868934 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1868938 - Posted: 23 May 2017, 9:36:47 UTC

Hmmmm, it appears the HP board isn't having any trouble using the x4 slot with the Blocking Sync either. In fact, the times are within a few seconds of the 750Ti in the Intel board running in a x4 slot. The times for the HP x1 slot is about what would be expected. The HP board will run the slots at x8 if both x16 slots are used. The times for the HP board using the Blocking Sync are around;
Shorties;
x1 around 203 seconds
x4 around 173 seconds
x16(8) around 162 seconds
BLC03;
x1 around 700 seconds
x4 around 660 seconds
x16(8) around 640 seconds

There is another machine I'm aware of running the Blocking Sync with a x4 slot. It's similar to one of my machines, Slots 1 & 2 are x16, Slots 3 & 4 are electrically x4, http://setiathome.berkeley.edu/results.php?hostid=7942417&offset=340
But wait, there is at least one more. Jeff's machine with the 4 GTX 960s ran the Blocking Sync for a while and didn't appear to be having problems with the Two x4 slots, https://setiathome.berkeley.edu/results.php?hostid=8257247&offset=2280
So, the score would be 5 different machines Not having problems with the Blocking Sync and a x4 slot, and One machine having problems. 5 to 1...
For now, the Only machine I know of having that problem is Jeff's One machine.
ID: 1868938 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 14039
Credit: 208,696,464
RAC: 304
Australia
Message 1868939 - Posted: 23 May 2017, 9:52:17 UTC
Last modified: 23 May 2017, 9:55:08 UTC

Could it be IRQ/DPC related?
Is the odd system out low on physical RAM?
Or lots of video cards, lots of video memory to manage resulting in system resource contention/overhead?
Grant
Darwin NT
ID: 1868939 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1868951 - Posted: 23 May 2017, 23:18:48 UTC - in response to Message 1868938.  

But wait, there is at least one more. Jeff's machine with the 4 GTX 960s ran the Blocking Sync for a while and didn't appear to be having problems with the Two x4 slots, https://setiathome.berkeley.edu/results.php?hostid=8257247&offset=2280
No x4 slots on that machine. It's an HP xw9400 with 2 x16(x16) slots and 2 x16(x8) slots. I've mentioned several times that the riser cables on that machine are on the x16(x8) slots. That's why I was drawing a distinction between x8 (which works fine with Blocking Sync) and x4 (which doesn't).
ID: 1868951 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1868957 - Posted: 23 May 2017, 23:30:19 UTC - in response to Message 1868939.  

Could it be IRQ/DPC related?
Is the odd system out low on physical RAM?
Or lots of video cards, lots of video memory to manage resulting in system resource contention/overhead?
I can certainly try to check the IRQ, though I'm not sure where that info shows up in Linux.

The system has 6GB of ECC memory, and the last time I checked, GKrellM was showing less than half of that in use. In fact, I actually removed the memory krell from the monitor display since there was always so much free memory available. The GTX 960 itself has 2GB RAM but slightly less than 1.6GB appears to be in use with the pfp set to 8 by autotune.
ID: 1868957 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1868997 - Posted: 24 May 2017, 3:07:32 UTC - in response to Message 1868951.  
Last modified: 24 May 2017, 3:26:33 UTC

But wait, there is at least one more. Jeff's machine with the 4 GTX 960s ran the Blocking Sync for a while and didn't appear to be having problems with the Two x4 slots, https://setiathome.berkeley.edu/results.php?hostid=8257247&offset=2280
No x4 slots on that machine. It's an HP xw9400 with 2 x16(x16) slots and 2 x16(x8) slots. I've mentioned several times that the riser cables on that machine are on the x16(x8) slots. That's why I was drawing a distinction between x8 (which works fine with Blocking Sync) and x4 (which doesn't).
OK, so we're back to 4 machines Not having trouble with x4 Slots verses your One machine with the riser that is. The one item that sticks in your face is the Riser, it's probably introducing just enough delay that the Blocking Sync doesn't work correctly with That Riser in That Slot. The x4 Slot in my HP xw4600s (I have 2) is actually an open ended x8 slot that's wired for x4. It will take a x16 card without any trouble. If your x4 slot can take a x16 card I'd suggest you try it with the card mounted in the slot rather than the Riser. I have 3 different machines that don't have any trouble with the card mounted in the x4 Slot. Two of the machines work 24/7 using a x4 Slot which is why I don't think there is a problem for Most people who don't use a Riser cable. There doesn't seem to be any trouble using even the x1 Slot with the card mounted in the Slot.

If these ever hit NewEgg, no one will need a Riser, and my Old HP will take Four 1050 Ti with just an additional 8 inch Desk fan and an open case.
http://techreport.com/news/31888/inno3d-squeezes-a-geforce-gtx-1050-ti-into-a-single-slot#metal

ID: 1868997 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1869004 - Posted: 24 May 2017, 3:33:58 UTC - in response to Message 1868997.  

...it's probably introducing just enough delay that the Blocking Sync doesn't work correctly with That Riser in That Slot.
Certainly could be a possibility. I think Jason's the latency expert around here. The 6" cable is the shortest I can use, just enough to get the card outside the box.

If your x4 slot can take a x16 card I'd suggest you try it with the card mounted in the slot rather than the Riser.
Wish I could, but that machine is a Dell T7400 and the slot arrangement just won't allow it.


The x8(x4) slot is #28 and is tucked in tight under the x16 slot (#27) where the 980 currently resides. And just below the slot is a cover which forms an airflow tunnel over the memory sticks and CPUs. I can barely get my fingers in to seat the riser cable without pulling something else out.
ID: 1869004 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 14039
Credit: 208,696,464
RAC: 304
Australia
Message 1869019 - Posted: 24 May 2017, 4:26:55 UTC - in response to Message 1868957.  
Last modified: 24 May 2017, 4:27:54 UTC

Could it be IRQ/DPC related?
Is the odd system out low on physical RAM?

I can certainly try to check the IRQ, though I'm not sure where that info shows up in Linux.

Not so much the IRQ used, but the OS/CPU overhead in servicing requests.

On my C2D with 2 GTX*750Tis running CUDA 50 with 2 WUs at a time, when crunching Arecibo WUs the IRQ/DPC CPU overhead can peak as high at 20%, with periods of 15% for several seconds. While the CPU is handling that load, it's not supplying CPU time to crunching WUs.
Thought system overhead my be playing a role in your reduced GPU performance depending on the blocking sync setting.
Grant
Darwin NT
ID: 1869019 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1869032 - Posted: 24 May 2017, 5:03:46 UTC - in response to Message 1869019.  

Oh, okay, another possible latency issue, then. The thing is, with Blocking Sync on, the CPU usage is fairly minimal, but with it off, a full CPU is basically dedicated to each GPU task. If there was a DPC latency issue, I would think it would be more obvious without Blocking Sync than with it, though my understanding of that interaction is certainly pretty fuzzy. ;^)
ID: 1869032 · Report as offensive
Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 . . . 83 · Next

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use


 
©2026 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.