專家披Nvidia PhysX「放水」內幕:CPU竟只能運行於單線程和x87指令集模式下

MaY_bE

一般般會員
已加入
3/27/10
訊息
82
互動分數
6
點數
8
別忘N社是真金假效能的
 

gx19921211

高級會員
已加入
9/16/06
訊息
533
互動分數
0
點數
0
別忘了NV不只綁遊戲 還綁繪圖軟體(挖鼻
 

kwjko

吉祥天賜
已加入
5/30/08
訊息
1,731
互動分數
0
點數
36
年齡
43
網站
www.facebook.com
很顯然PhysX是軟體導向
只要寫得好,用CPU運算其實也不差

反過來說
很多程式也能借用GPU運算來減輕CPU運算
不過開發者也懶得去做…
 

Jason929

進階會員
已加入
3/11/10
訊息
203
互動分數
0
點數
16
可是NV的CUDA核心在一些影像轉換軟件有加速功能的說[比如xilisoft的video converter]
那個CUDA跟PhysX有關係的嗎?
 

zaqwsxdsa

進階會員
已加入
10/24/08
訊息
328
互動分數
0
點數
16
不知道有啥好酸的...

東西是他家的...他愛怎樣就怎樣!!!

不是這樣嗎???NV又不是做慈善事業!!!

I社也不是做慈善事業...A牌也不是在做慈善事業!!!

誰不會希望事情對自己有利而對敵人不利...


除非PHYSX不是NV的東西...而是開放...讓大家都能改來改去...像Android一樣...



你怎麼不罵蘋果王八蛋...開發好的東西還要經過蘋果審查後才能上傳???

你怎麼不罵我要賣程式還要跟蘋果買SDK???
 
最後編輯:

raizzeyui

高級會員
已加入
10/7/09
訊息
645
互動分數
0
點數
0
不知道有啥好酸的...

東西是他家的...他愛怎樣就怎樣!!!

不是這樣嗎???NV又不是做慈善事業!!!

I社也不是做慈善事業...A牌也不是在做慈善事業!!!

誰不會希望事情對自己有利而對敵人不利...


除非PHYSX不是NV的東西...而是開放...讓大家都能改來改去...像Android一樣...



你怎麼不罵蘋果王八蛋...開發好的東西還要經過蘋果審查後才能上傳???

你怎麼不罵我要賣程式還要跟蘋果買SDK???

+1 不过就是有很多人都爱酸,顺便传传教:PPP:
 

eLove

榮譽會員
已加入
4/5/08
訊息
1,394
互動分數
8
點數
38
不知道有啥好酸的...

東西是他家的...他愛怎樣就怎樣!!!

不是這樣嗎???NV又不是做慈善事業!!!

I社也不是做慈善事業...A牌也不是在做慈善事業!!!

誰不會希望事情對自己有利而對敵人不利...


除非PHYSX不是NV的東西...而是開放...讓大家都能改來改去...像Android一樣...



你怎麼不罵蘋果王八蛋...開發好的東西還要經過蘋果審查後才能上傳???

你怎麼不罵我要賣程式還要跟蘋果買SDK???
重點在於絕大部分的遊戲掛著"PhysX",實際上是以CPU來執行物理運算
如果PhysX能對CPU多執行緒以及程式代碼做優化,對消費者來說不是更好嗎 ?
我貼這篇文的緣由在"可以優化而不優化,卻又硬綁一堆遊戲用CPU跑PhysX"
跑不動又要花錢升級更高檔的硬體,難道這是你願意且甘心付出的代價嗎 ?

不要NVIDIA被酸就看不清事情的真相
跳出來幫NVIDIA維持正義
事實上PhysX不優化吃虧的絕對是消費者
 

eLove

榮譽會員
已加入
4/5/08
訊息
1,394
互動分數
8
點數
38
NVIDIA 官方正式回應

Nvidia: We're not hobbling CPU PhysX
SSE will soon be enabled by default

Nvidia has hit out at claims it's deliberately hobbling CPU PhysX, describing the reports as "factually inaccurate."

Speaking to THINQ, Nvidia's senior PR manager Bryan Del Rizzo said "any assertion that we are somehow hamstringing the CPU is patently false."
Del Rizzo states that "Nvidia has and will continue to invest heavily on PhysX performance for all platforms - including CPU-only on the PC."

The response follows a recent report on the Web claiming CPU-PhysX was unnecessarily reliant on x87 instructions, rather than SSE.
The report also suggested PhysX wasn't properly multi-threaded, with the test benchmarks showing a dependence on just one CPU core.

Let's start with multi-threading, which Del Rizzo says is readily available in CPU-PhysX, and "it's up to the developer to allocate threads as they see fit based on their needs."
He points out that you only need to look at the scaling shown in the CPU tests in 3DMark Vantage and FluidMark to see that CPU-PhysX is perfectly capable of scaling performance as more cores are added.

photos%2Ffluidmark.jpg


However, he notes that the current PhysX 2.x code "dates back to a time when multi-core CPUs were somewhat of a rarity," explaining why CPU-PhysX isn't automatically multi-threaded by default.
Yet despite this, Del Rizzo says it's easy enough for a developer to implement multi-threading in PhysX 2.x.


"There are some flags that control the use of 'worker threads' which perform functions in the rigid body pipeline," he says as an example, "and each NxScene runs in a separate thread."

The point appears to be moot in the long-term anyway, as Nvidia is apparently planning to introduce new automatic multi-threading features in the forthcoming PhysX 3.0 SDK.

This "uses a task-based approach that was developed in conjunction with our Apex product to add in more automatic support for multi-threading," explains Dell Rizzo.

The new SDK will automatically take advantage of however many cores are available, or the number of cores set by the developer,
and will also provide the option of a "thread pool" from which "the physics simulation can draw resources that run across all cores."


In addition to the new multi-threading features, Del Rizzo also says "SSE will be turned on by default" in the new SDK.
However, he notes that "not all developers want SSE enabled by default, because they still want support for older CPUs for their software versions."
The original


Why do games developers still want to provide support for CPUs that are over ten years old?
Del Rizzo says it's up to the game devs and what they demand, but he reiterates that it's definitely not a deliberate attempt to hobble CPU PhysX.
The original report by Real World Technologies showed the Dark Basic PhysX Soft Body demo (below) made heavy use of x87 instructions, rather than SSE.

photos%2Fscreenshot_fluiddemo.jpg


"We have hundreds of developers who are using PhysX in their applications," he says, "and we have a responsibility to ensure we do not break compatibility with any platforms once they have shipped.
Historically, we couldn't become dependent on any hardware feature like SSE after the first revision has shipped."

He also points out that the PhysX 2.x SDK does feature at least some SSE code, and SSE isn't necessarily faster anyway.
"We have found sometimes non-SSE code can result in higher performance than SSE vector code in many situations," he says.
However, in the long-term SSE will apparently be the way forward for CPU-PhysX in the long term.
"We will continue to use SSE and we plan to enable it by default in future releases," says Del Rizzo.


In short, it looks as though there's a fair bit of legacy detritus in the current PhysX SDK, partly due to the demands from games devs.
Nevertheless, there are already ways in which developers can use multi-threading in CPU-PhysX, and full SSE support and improved multi-threading will be coming shortly.

This doesn't look like a company trying to deliberately cripple CPU-PhysX to make its GPUs look good.
 
最後編輯:

eLove

榮譽會員
已加入
4/5/08
訊息
1,394
互動分數
8
點數
38
另一篇相關消息

Did NVIDIA cripple its CPU gaming physics library to spite Intel?

A new investigation by David Kanter at Realworldtech adds to the pile of circumstantial evidence that NVIDIA has apparently crippled the performance of CPUs on its popular, cross-platform physics acceleration library, PhysX.
If it's true that PhysX has been hobbled on x86 CPUs, then this move is part of a larger campaign to make the CPU—and Intel in specific—look weak and outdated.
The PhysX story is important, because in contrast to the usual sniping over conference papers and marketing claims, the PhysX issue could affect real users.

We talked to NVIDIA today about Kanter's article, and gave the company a chance to air its side of the story.
So we'll first take a look at the RWT piece, and then we'll look at NVIDIA's response.

Oh my God, it's full of cruft


When NVIDIA acquired Ageia in 2008, the GPU maker had no intention of getting into the dedicated physics accelerator hardware business.
Rather, the game plan was to give the GPU a new, non-graphics, yet still gaming-oriented advantage over the CPU and over ATI's GPUs.
NVIDIA did this by ditching Ageia's accelerator add-in board and porting the platform's core physics libraries, called PhysX, to NVIDIA GPUs using CUDA.
PhysX is designed to make it easy for developers to add high-quality physics simulation to their games, so that cloth drapes the way it should, balls bounce realistically, and smoke and fragments (mostly from exploding barrels) fly apart in a lifelike manner.
In recognition of the fact that game developers, by and large, don't bother to release PC-only titles anymore, NVIDIA also wisely ported PhysX to the leading game consoles, where it runs quite well on console hardware.

If there's no NVIDIA GPU in a gamer's system, PhysX will default to running on the CPU, but it doesn't run very well there.
You might think that the CPU's performance deficit is due simply to the fact that GPUs are far superior at physics emulation, and that the CPU's poor showing on PhysX is just more evidence that the GPU is really the component best-equipped to give gamers realism.

Some early investigations into PhysX performance showed that the library uses only a single thread when it runs on a CPU.
This is a shocker for two reasons. First, the workload is highly parallelizable, so there's no technical reason for it not to use as many threads as possible; and second, it uses hundreds of threads when it runs on an NVIDIA GPU.
So the fact that it runs single-threaded on the CPU is evidence of neglect on NVIDIA's part at the very least, and possibly malign neglect at that.

But the big kicker detailed by Kanter's investigation is that PhysX on a CPU appears to exclusively use x87 floating-point instructions, instead of the newer SSE instructions.

x87 = old and busted

The x87 floating-point math extensions have long been one of the ugliest legacy warts on x86.
Stack-based and register-starved, x87 is hard to optimize and needs more instructions and memory accesses to accomplish the same task than comparable RISC hardware.
Intel finally fixed this issue with the Pentium 4 by introducing a set of SSE scalar, single- and double-precision floating-point instructions that could completely replace x87, giving programmers access to more and larger registers, a flat register file (as opposed x87's stack structure), and, of course, floating-point vector formats.

Intel formally deprecated x87 in 2005, and every x86 processor from both Intel and AMD has long supported SSE.
For the past few years, x87 support has been included in x86 processors solely for backwards compatibility, so that you can still run old, deprecated, unoptimized code on them.
Why then, in 2010, does NVIDIA's PhysX emit x87 instructions, and not scalar SSE or, even better, vector SSE?

NVIDIA: the bottlenecks are elsewhere

We spent some time talking through the issue with NVIDIA's Ashutosh Rege, Senior Director of Content and Technology, and Mike Skolones, Product Manager for PhysX.
The gist of the pair's argument is that PhysX games are typically written for a console first (usually the PS3), and then they're ported to the PC.
And when the games go from console to PC, the PC runs them faster and better than the console without much, if any, optimization effort.

"It's fair to say we've got more room to improve on the CPU. But it's not fair to say, in the words of that article, that we're intentionally hobbling the CPU," Skolones told Ars.
"The game content runs better on a PC than it does on a console, and that has been good enough."

NVIDIA told us that it has never really been asked by game developers to spend any effort making the math-intensive parts of PhysX faster—when it gets asked for optimization help, it's typically for data structures or in an area that's bandwidth- or memory-bound.

"Most of the developer feedback to us is all around console issues, and as you can image that's the number one consideration for a lot of developers," Skolones said.

Even after they made their case, we were fairly direct in asking them why they couldn't do a simple recompile and use SSE instead of x87.
It's not that hard, so why keep on with the old, old, horrible x87 cruft?

The answer to this was twofold: first and least important, is the fact that all the AAA developers have access to the PhysX source and can (and do) compile it for x87 on their own.

The second answer was more important, and surprised me a bit: the PhysX 2.X code base is so ancient (it goes back to well before 2005, when x87 was deprecated), and it has such major problems elsewhere, that they were insisting that it was just kind of pointless to change over to SSE.

When you're talking about changing a compiler flag, which could have been done at any point revision, the combination of "nobody ever asked for it, and it wouldn't help real games anyway because the bottlenecks are elsewhere" is not quite convincing.
It never occurred to anyone over the past five years to just make this tiny change to SSE? Really?

Of all the answers we got for why PhysX still uses x87, the most convincing ones were the ones rooted in game developer apathy towards the PC as a platform.
Rege ultimately summed it up by arguing that if they weren't giving developers what they wanted, then devs would quit using PhysX; so they do give them what they want, and what they want are console optimizations.
What nobody seems to care about are PC optimizations like non-crufty floating-point and (even better) vectorization.

"It's a creaky old codebase, there's no denying it," Skolones told Ars. "That's why we're eager to improve it with 3.0."

Wait for 3.0

It's rare that you talk to people at a company and they spend as much time slagging their codebase as the NVIDIA guys did on the PC version of PhysX.
It seemed pretty clear that PhysX 2.x has a ton of legacy issues, and that the big ground-up rewrite that's coming next year with 3.0 will make a big difference.
The 3.0 release will use SSE scalar at the very least, and they may do some vectorization if they can devote the engineering resources to it.

As for how big of a difference 3.0 would bring for PhysX on the PC, we and NVIDIA had divergent takes.
Rege expressed real skepticism that a combination of greater multithreading, SSE scalar, and vectorization would yield a 2X performance boost for the CPU on the specific kernels that Kanter tested.
We don't know those kernels very well, but our intuition tells us that a 2X boost shouldn't be unreasonable.
Intel and AMD have spent a lot of effort in the past few years, both on the hardware side and in their compilers, making SSE execute very quickly—and none at all on x87.

Even if there was a 2X speedup to be had on a set of test kernels, that wouldn't translate to a 2X speedup in game performance.
Individual frames vary greatly in how much physics processing is going on, depending on what's happening in the scene.
So it's very hard to say what kind of average speedup an all-out optimization effort would deliver, which makes it even harder to speculate about 3.0.

It's also the case that the PC is the least sexy gaming platform that PhysX supports.
When the list of PhysX platforms includes the iPhone and all the consoles, it's easy to imagine that both developers and NVIDIA itself spend the majority of their effort elsewhere.

But still, when you boil it all down, we keep coming back to the point that it's so easy to switch from x87 to SSE, and x87 has been deprecated for so long, and it's so much to NVIDIA's advantage to be able to tout the GPU's superiority over the CPU for physics, that it's very hard to shake the feeling that there's some kind of malicious neglect going on.
Think about it: if game developers really don't care that much about PC physics performance, and it's "good enough" with x87 code, why make a small change that might give the CPU an unneeded boost?
Why not just let it struggle along at "good enough"?
 
最後編輯:

aasa

一般般會員
已加入
1/10/10
訊息
58
互動分數
0
點數
6
RWT和David Kanter進新聞了

不知道會不會造成RWT的人員水準變低下……;rr;
 
▌延伸閱讀