Sandra Standard 測試 with 不同參數

j@cko · 10/4/04

隞乩��葫閰? 蝚砌��皜祈岫閮擃隞暻潭瘜��臭誑��...
蝚砌��葫閰行皜祈岫雿輻銝��閮擃祝�餃�?
蝚砌��葫閰行閰脖��刻圾隤芯�, it's pretty straight foward.
��洵鈭葫閰行�唾牧�停�臭��典隞�憯��隤芸 AMD 蝟餌絞銝? TRAS �?10 撌血�舀雿喟, �隞交隞予撠梯�港�靘隤芯�皜祆葫�? �支�皜祈岫 TRAS ��詨��雿輻鈭�� TRC. 隢��, 憒�葫閰?procedure 銝憭梯炊�敹質��... 雓?
System Setup:
CPU: 3500+ @ Default Vcore
Mobo: MSI K8N Neo2 rev 1.0/BIOS 1.35 BETA
RAM: G.Skill 431 TCCD 2 x 512MB/Slot 1 & 2
Video: R350
HDD: Hitachi 160 Gig SATA @ SATA 3
Power: Antec Neo Power 480

BIOS Setting:
Aggressive Timing: Disabled
AGP Aperture: 64MB
SATA 1 & 2: Disabled

��皜祈岫

------------------------------------------------------------------------------
Sandra Standard Memory Bandwidth Benchmarks
Sandra Standard �: 2004.10.9.133

System Setup:
CPU: 3500+ @ Default Vcore/9 x 265 = 2385mhz
Mobo: MSI K8N Neo2 rev 1.0/BIOS 1.35 BETA
RAM: G.Skill 431 TCCD 2 x 512MB/Slot 1 & 2
Video: R350
HDD: Hitachi 160 Gig SATA@ SATA 3
Power: Antec Neo Power 480

BIOS Setting:
VDIMM: 2.7
HT: 4x
AGP Freq: 66
Aggressive Timing: Disabled
AGP Aperture: 64MB
SATA 1 & 2: Disabled

Background Processes

Testing Results

----------------------------------------------------------------

狂少 · 10/4/04

Great testing...
我是發現tras設6不錯
這應該花不少時間!

**寶貴參考資料**
置頂3天以供參考!!

33totoro33 · 10/4/04

��靘7-3-3-2.5�雿??
��?8-3-3-2.5�府銋�撌?

j@cko · 10/4/04

撣�刻��憭��銝鈭?additional �葫閰?..

��鈭

敹�閫�牧�皜祈岫 procedure....
�?trial 1, 2 and 3 �隞?.. 瘥身摰��皜?3 皜?�嗅��銝��葫閰?..
�隞乩��舀銝�身摰�刻�鈭�銝敺�頝�銝?..

j@cko · 10/4/04

Originally posted by 33totoro33@Oct 4 2004, 01:07 AM
看你來是7-3-3-2.5最佳??
我是用 8-3-3-2.5應該也不差

我覺得 6 跟 7 都不錯... 因為那分數的差距小到說有可能是因為誤差所造成的... 英文說是 insignificant.... 所以 6 & 7 等於是一樣的了... 對我這系統而言啦...

我愛電腦 · 10/4/04

�?....,�颲鈭銝摰鈭�撠��?#33;
��......!!

j@cko · 10/4/04

餈賢....
隞乩��?chart 銝�隤芰 default 撠望� default setting...(銝剜銝��獐隤?
瘥��葫閰西��渡�隞亙� (independent variables)... �嗡��賊�臭誑銝鋆⊿a64tweaker��貊銝?

隞乩�閫�牧�臬� AMD �?forum ��?http://forums.amd.com/index.php?showtopic=12017

RAS - Row Address Strobe or Row Address Select.
CAS - Column Address Strobe or Column Address Select.
tRAS - Active to precharge delay. This is the delay between the precharge and activation of a row.
tRCD - RAS to CAS Delay. The time required between RAS and CAS access.
tCL - (or CL) CAS Latency.
tRP - RAS Precharge. The time required to switch from one row to the next row, i.e. switch internal memory banks.
tCLK - CLocK. The length of a clock cycle.
Command Rate - This is the delay between Chip Select (CS) or when a IC is selected and the time commands can be issued to the IC.
Latency - The time between when a request is made and the request is answered. I.E, if you are in a restaurant, the latency would be the time between when you ordered your meal to the time you received it. Therefore, in memory terms, it is the total time required before data can be written to or read from the memory.

Some of the above terms are more important to system stability and performance than are others. However, it is important to understand the role of each of these settings/signals in order to understand the whole. Therefore, the numbers 2-3-2-6-T1 refer to CL-tRCD-tRP-tRAS-Command Rate and are measured in clock cycles.

tRAS
Memory architecture is like a spreadsheet with row upon row and column upon column with each row being 1 bank. In order for the CPU to access memory, it must first determine which Row or Bank in the memory that is to be accessed and activate that row via the RAS signal. Once activated, the row can be accessed over and over until the data is exhausted. This is why tRAS has little effect on overall system performance but could impact system stability if set incorrectly.

tRCD
There is a delay from when a row is activated to when the cell (or column) is activated via the CAS signal and data can be written to or read from a memory cell. This delay is called tRCD. When memory is accessed sequentially, the row is already active and tRCD will not have much impact. However, if memory is not accessed in a linear fashion, the current active row must be deactivated and then a new row selected/activated. It is this example where low tRCD's can improve performance. However, like any other memory timing, putting this too low for the module can result in instability.

CAS Latency
Certainly, one of the most important timings is that of the CAS Latency and is also the one most people understand. Since data is often accessed sequentially (same row), the CPU only needs to select the next column in the row to get the next piece of data. In other words, CAS Latency is the delay between the CAS signal and the availability of valid data on the data pins (DQ). Therefore, the latency between column accesses (CAS), plays an important role in the perfomance of the memory. The lower the latency, the better the performance. However, the memory modules must be capable of supporting low latency settings.

tRP
tRP is the time required to terminate one one Row access and begin the next row access. Another way to look at this it that tRP is the delay required between deactivating the current row and selecting the next row. Therefore, in conjunction with tRCD, the time required (or clock cycles required) to switch banks (or rows) and select the next cell for either reading, writting or refreshing is a combination of tRP and tRCD.

tRAS
Next comes tRAS. This is the time required before (or delay needed) between the active and precharge commands. In other words, how long must the memory wait before the next before the next memory access can begin.

tCLK
This is simply the clock used for the memory. Note that Frequency is 1/t. Therfore, if memory was running at 100Mhz, the timing of the memory would be 1/100Mhz or 10nS.

Command Rate
The Command Rate is the time needed between the chip select signal and the when commands can be issued to the RAM module IC. Typically, these are either 1 clock or 2.

j@cko · 10/4/04

��敺?Anandtech ��?http://anandtech.com/memory/showdoc.aspx?i=2223&p=5

Memory Timings and Bandwidth Explained
With that brief overview of the memory subsystem, we are ready to talk about memory timings. There are usually four and sometimes five timings listed with memory. They are expressed as a set of numbers, e.g. 2-3-2-7, corresponding to CAS-tRCD-tRP-tRAS. On modules that list a fifth number, it is usually the CMD value, e.g. 1T. Some might also include a range for the tRAS value. These are really only a small subset of the total number of timing figures that memory companies use, but they tend to be the more important ones and encapsulate the other values. So, what does each setting mean? By referring back to the previous sections on how memory is accessed, we can explain where each value comes into play.

The most common discussion on timing is the CAS Latency, or CL value. CAS stands for Column Access Strobe. This is the number of memory cycles that elapse between the time a column is requested from an active page and the time that the data is ready to begin bursting across the bus. This is the most common occurrence, and so, CAS Latency generally has the largest impact on overall memory performance for applications that depend on memory latency. Applications that depend on memory bandwidth do not care as much about CAS latency, though. Of course, there are other factors that come into play, as our tests with OCZ 3500EB RAM have shown that a well designed CL2.5 RAM can keep up with and sometimes even outperform CL2 RAM. Note that purely random memory accesses will stress the other timings more than the CL, as there is little spatial locality in that case. Random memory access is not typical for general computing, which explains why theoretical memory benchmarks that use it as a performance metric frequently have little to no correlation with real world performance.

The next value is tRCD, which is referred to as the RAS to CAS Delay. This is the delay in memory cycles between the time a row is activated and when a column of data within the row can actually be requested. It comes into play when a request arrives for data that is not in an active row, so it occurs less frequently than CL and is generally not as important. As mentioned a moment ago, certain applications and benchmarks can have different memory access patterns, though, which can make tRCD more of a factor.

The term tRP stands for the time for RAS Precharge, which can be somewhat confusing. Time for a Row Precharge is another interpretation of the term and explains the situation better. tRP is the time in memory cycles that is required to flush an active row out of the sense amp ("cache") before a new row can be requested. As with tRCD, this only comes into play when a request is made to an inactive row.

Moving on, we have the tRAS - or more properly tRASmin - which is the minimum time that a row must remain active before a new row within that bank can be activated. In other words, after a row is activated, it cannot be closed and another row in the same bank be opened until a minimum amount of time (tRASmin) has elapsed. This is why having more memory banks can help to improve memory performance, provided it does not slow down other areas of the memory. There is less chance that a new page/row will need to be activated in a bank for which tRASmin has not elapsed. Taken together, tRP and tRAS are also referred to as the Row Cycle time (tRC), as they occur together.

CMD is the command rate of the memory. The command rate specifies how many consecutive clock cycles that commands need to be presented to the DRAMs before the DRAMs sample the address and command bus wires. The package of the memory controller, the wires of the address and command buses, and the package of the DRAM all have some electrical capacitance. As electrical 1's and 0's in the commands are sent from the memory controller to the DRAMs, the capacitance of these (and other) elements of the memory system slow the rate at which an electrical transition between a 1 and a 0 (and vice versa) can occur. At ever-increasing memory bus clock speeds, the clock period shrinks, meaning that there is less time available for the transition between a 1 and a 0 (and vice versa) to occur. Because of the way that addresses and commands are routed to DRAMs on memory modules, the total capacitance on these wires may be so high that transitions between 1 and 0 cannot occur reliably in only one clock cycle. For this reason, commands may need to be sent for 2 consecutive clock cycles so that they can be assured of settling to their appropriate values before the DRAMs take action. A 2T command rate means that commands are presented for 2 consecutive clocks to the DRAMs. In some implementations, command rate is always 1T, while in others, it may be either 1T or 2T. On DDR/DDR2, for instance, using high-quality memory modules (which cost a little more) and/or reducing the number of memory modules on each channel can allow 1T command rates. If you are wondering how the command rate can impact performance, that explanation will have hopefully made it clear that CMD can be just as important as CL. Every memory access will incur the CMD and CL delays, so removing one memory clock cycle from each benefits every memory access.

In addition to all of these timings, the question of memory bandwidth still remains. Bandwidth is the rate at which data can be sent from the DRAMs over the memory bus. Lower timings allow faster access to the data, while higher bandwidth allows access to more data. Applications that access large amounts of data - either sequentially or randomly - usually benefit from increased bandwidth. Bandwidth can be increased either by increasing the number of memory channels (i.e. dual-channel) or by increasing the clock speed of the memory. Doubling memory bandwidth will never lead to a doubling of actual performance except in theoretical benchmarks, but it could provide a significant boost in performance. Many games and multimedia benchmarks process large amounts of data that cannot reside within the cache of the CPU, and being able to retrieve the data faster can help out. All other things being equal, more bandwidth will never hurt performance.

It is important to make clear that this is only a very brief overview of common RAM timings. Memory is really very complex, and stating that lower CAS Latencies and higher bandwidths are better is a generalization. It compares to stating that "larger caches and higher clock speeds are better" in the CPU realm. This is often true, but there are many other factors that come into play. For CPUs, we also need to consider pipeline lengths, number of in-flight instructions, specific instruction latencies, number and type of execution units, etc. RAM has numerous other timings that can come into play, and the memory controller, FSB, and many other influences can also affect the resulting performance and efficiency of a system. Some people might think that designing memory is relatively simple compared to working on CPUs, but especially with rising clock speeds, this is not the case.

j@cko · 10/4/04

http://anandtech.com/memory/showdoc.aspx?i=2223&p=6

Memory Latencies Explained
One big question that remains is latency. All the bandwidth in the world will not help if you have to wait forever to get the needed data. It is important to note, however, that higher latencies can be compensated for. The Pentium 4, for example, has improved buffering, sophisticated prefetch logic, and the ability to have many outstanding memory requests. It loves bandwidth, and performance has been helped substantially by increasing the bus speeds, even with higher memory latencies. Graphics chips also tend to be more forgiving of higher latencies. Any design can be modified to work with higher or lower latencies, of course; it is but one facet of the overall goal which needs to be addressed. Still, the question remains, how does memory latency relate to timings and bandwidth?

The simple answer is that it is directly related to the memory timings, but you cannot compare timings directly. The reason for this is that the memory timings are relative to the base clock speed of the RAM - they are the number of memory clock cycles that each operation requires. For DDR memory, this means that the cycle time is calculated using one half of the data transfer speed. PC3200 DDR memory has a 64-bit bus that transfers up to 3200 MB/s. Converting that to a clock speed means converting bytes to bits (multiply by eight), then divide by that bus width, and we get the effective clock speed; the base clock speed is half the effective clock speed.

PC3200:
3200 MB/s * 8 bits = 25600 Mb/s
25600 Mb/s / 64-bits = 400 MHz
400 MHz / 2 = 200 MHz base clock speed

Other memory types may use quad or even octal data rates, but if we convert those into the base clock speed, we can compare latencies. Where timings are listed in clock cycles, latency is listed in nanoseconds (ns). A CL of 2.0 sounds better than a CL of 5.0, but depending on the memory clock, it may actually be closer than we would at first expect. By converting all of the timings into nanoseconds, we can compare performance. We will save detailed comparisons for the next installment, but as an example, suppose we have two memory types - one with a CL of 4.0 and a base clock speed of 333 MHz, and the second with a CL of 2.5 and a base clock speed of 200 MHz.

-----------------------------------------------------
CL Clock Speed Cycle Time Real Latency
2.5 200 MHz 5.0 ns 12.5 ns
4.0 333 MHz 3.0 ns 12.0 ns
-----------------------------------------------------

In this specific example, we see that even with a CL that's 60% higher, the effective latency can actually end up being slightly slower. This is something that we will examine further in the next article of this series.

An Anecdote
Getting the whole picture of how memory performance impacts system performance is still a very difficult task. If all this talk of timings and latencies has not helped, let us provide another comparison. Think of the CPU as a cook at a restaurant, busily working to keep up with customer demand. There is a process that occurs. Waiters or cashiers take the orders and send them to the cook, the cook prepares the food, and the final result is delivered to the customer. Sounds simple enough, right? Let's look at some of the details.

When an order for a dish comes in, certain common items (e.g. fries, rice, soup, salads, etc.) may already be prepared, so delivering them to the customer occurs rapidly. We can think of this as the processor finding something in the L1 cache. This is great when it occurs, but it only occurs for a very limited number of items. Most of the time, the cook will need to begin preparing the order, so he will get the items from the cupboard, freezer and refrigerator and begin cooking them. This time, the ingredients are in the L2/L3 cache. So far so good, but where does RAM come into play?

As items are pulled from the fridge, freezer, etc., the restaurant will need to restock them. The supplies have to be ordered from headquarters or whomever the restaurant uses. This is akin to system RAM (or maybe even the hard drive, but we'll leave that out of the analogy for now). If the restaurant can anticipate needs properly, it can order the supplies in advance. Sometimes, though, supplies run low - or maybe you didn't order the correct amount of supplies - and you need to send someone off to a local store for additional ingredients. This is a cache miss, and the store is the system RAM. In a time-critical situation such as this one, the cook wants the ingredients ASAP. A closer store would be better, or perhaps a store with faster checkout lanes, but provided that the trip does not take a really long time, any store is about as good as another. Basically, system RAM with its timings and latencies can have an impact, but a really fast memory controller (i.e. a store next door) with slower RAM (slow checkout lanes) can be more important than having the fastest RAM in the world.

This is all well and good for smaller restaurants and chains, but a large corporation (e.g. McDonald's) cannot simply walk next door to pick up some frozen burgers. In this case, the whole supply chain needs to be highly efficient. Instead of ordering supplies once a week, inventories might be checked every night, and orders placed as necessary. Headquarters has forecasts based on past requirements and may send orders to their suppliers months in advance. This supply chain correlates loosely with the idea of outstanding memory requests, prefetch logic, deeper buffers, etc. Bandwidth also comes into play here, as a large chain might have several large trailers of supplies en route at any point in time, while a smaller chain might be able to get by with only one or two moderately-sized delivery vans.

With faster processors, faster buses, faster RAM, etc., the analogy is moving towards all processors being large corporations with huge demands. Early 8088 and 8086 processors could just wander to the local store as necessary - like what most adults do for their own cooking needs. As the amount of data being processed increases, though, everything becomes exponentially more difficult. There is a big jump from running one small restaurant that serves a few dozen people daily to serving hundreds of people daily, to running several locations, to running a corporation that has locations scattered across the world. That is essentially what we have seen in the world of computer processors. We have gone from running a local "mom-and-pop" burger joint to running McDonald's, Burger King, and several other hamburger chains.

This analogy is probably flawed at numerous levels, but hopefully it helps. If you think about it, the complexity of any one subsystem of the modern PC is probably hundreds of times greater than that of the entire original IBM PC. The change did not occur instantly, but even the largest of technology corporations are going to have a lot of trouble staying at the top of every area of computers.

kasin · 10/4/04

��J@cko��澈皜祈岫 :)
颲�? ��?:MMM:

搜尋

Sandra Standard 測試 with 不同參數

j@cko

高級會員

狂少

Bulletproof Themer

33totoro33

榮譽會員

j@cko

高級會員

j@cko

高級會員

我愛電腦

榮譽會員

j@cko

高級會員

j@cko

高級會員

j@cko

高級會員

kasin

一般般會員

相關的主題