Lab 5 – Caches in computer processor architectures
Objectives
In this experiment will we explore the caching techniques in our MIPS architecture.
Description
Caching techniques in computer architectures are being developed to increase the average computing performance. The first part of the experiment we will take a look at the cache internals. In the second part a program (Matrix multiplication) is written to check the differences in performance between a cached and non-cached situation.
The architectures explored in this experiment can be downloaded from this location.
Experiment – Exploring a cache architecture
For answering the questions load the architecture “mips_bp_cache” in the SIM-PL executer.
Questions
1. What is the size in bytes of one cache line in this architecture?
The cache is two-way set associative.
2. How many bits are used for the address fields "offset", "index" and "tag"?
Located directly above the cache is an component with the text “& -128”,
3. Describe the functionality of this component .
In the new pipeline stage TC, there are three components with the text “==”.
4. Describe the functionality of this component. 5. Is it possible to replace the rightmost "==" by another type gate?
Note: The forwarder component receives the data from the ALU output. The effect is a memory read is never forwarded and instructions reading the result of an LW instruction must wait at least two cycles. There is no hazard unit implemented, so to avoid hazards either insert NOP instructions or other useful instructions.
Experiment- Multiplying Matrices
A matrix is a rectangular array of numbers arranged in rows and columns. Matrices are used a lot in Computer Science, both in theory and in practice. Multiplying matrices quickly is important, so we will look at a simple way to improve the performance of matrix multiplication.
Hint: For some background information take a look at the following wiki page:
Wikipedia Matrix multiplication
Questions
6. Implement in the MIPS architecture (using the MIPS assembly language) the naive matrix multiplication algorithm for 16x16 matrices according the c source code given below
As a starting point use the file "Framework.wasm".
Use the matrices "a" and "b" from "matrixes.wasm". These matrices are stored in "row-major" order, meaning that the first row is stored first, then the second row, and so on. The correct result looks like 0, 1, 2, .. 254, 255 instead of random numbers.
C source code:
for (int i = 0; i < 16; i++)
{
for (int j = 0; j < 16; j++)
{
int sum = 0;
for (int k = 0; k < 16; k++)
{
sum += a[i * 16 + k] * b[k * 16 + j];
res[i * 16 + j] = sum;
}
}
}
"bt" is the matrix "b" but transposed (the first column of "b" is the first row of "bt", and so on)
7. Change your program to use "bt" instead of "b", while still getting the same result, meaning you will have to index into "bt" the "other way around".
8. Compare the performance of both programs and explain the difference.
Hint:
When running the simulation in the SIM_PL Executer use the yellow button “Simulate a number of cycles” to step forward an “n” number of cycles through the simulation. The expected number of cycles is somewhere between 30k and 100k