Chapter 2

  1. The total number of cycles is \(\sum_{i=1}^n(\hbox{CPI}_i\times I_i) = 45000\times 1+32000\times 2+15000\times 2+8000\times 2 = 155000\)
    The type for 1 cycle is \(1/40 \hbox{ MHz} = 25 \times 10^{-9} \hbox{s}\)
    The total time is \(\hbox{total cycles}\times\hbox{time per cycle} = 155000\hbox{ cycles}\times 25\times 10^{-9}\hbox{ s} = 3.875\times 10^{-3} \hbox{ s}\)

    \(\hbox{CPI} = {\hbox{total cycles}/\hbox{total instr}} = 155000/100000 = 1.55 \hbox{cycles/instruction}\).
    \(\hbox{MIPS} = {({\hbox{total instr}/1000000})\over\hbox{total time}} = {{100000/1000000}\over{3.875\times 10^{-3}}} = 25.81\hbox{ MIPS}\)
    The total execution time (from above) = 3.875 ms.

    1. We divide all the values in the table by R's values to get normalized values.
      Computer
      R M Z
      Benchmark E \({417\over 417} = 1\) \({244\over 417} = 0.585\) \({134\over 417} = 0.321\)
      Benchmark F \({83\over 83} = 1\) \({70\over 83} = 0.843\) \({70\over 83} = 0.843\)
      Benchmark H \({66\over 66} = 1\) \({153\over 66} = 2.318\) \({135\over 66} = 2.045\)
      Benchmark I \({39449\over 39449} = 1\) \({35527\over 39449} = 0.901\) \({66000\over 39449} = 1.673\)
      Benchmark K \({772\over 772} = 1\) \({368\over 772} = 0.477\) \({369\over 772} = 0.477\)
      For computer R, the arithmetic mean is \({1+1+1+1+1\over 5} = 1 \)
      For computer M, the arithmetic mean is \({0.585+0.843+2.318+0.901+0.477\over 5} = 1.025 \)
      For computer Z, the arithmetic mean is \({0.321+0.843+2.045+1.673+0.477\over 5} = 1.072\)

    2. We divide all the values in the table by M's values to get normalized values.
      Computer
      R M Z
      Benchmark E \({417\over 244} = 1.709\) \({244\over 244} = 1\) \({134\over 244} = 0.549\)
      Benchmark F \({83\over 70} = 1.186\) \({70\over 70} = 1\) \({70\over 70} = 1\)
      Benchmark H \({66\over 153} = 0.431\) \({153\over 153} = 1 \({135\over 153} = 0.882\)
      Benchmark I \({39449\over 35527} = 1.110\) \({35527\over 35527} = 1\) \({66000\over 35527} = 1.858\)
      Benchmark K \({772\over 368} = 2.098\) \({368\over 368} = 1\) \({369\over 368} = 1.003\)
      For computer R, the arithmetic mean is \({1.709+1.186+0.431+1.110+2.098\over 5} = 3.107 \)
      For computer M, the arithmetic mean is \({1+1+1+1+1\over 5} = 1 \)
      For computer Z, the arithmetic mean is \({0.549+1+0.882+1.858+1.003\over 5} = 0.980\)

    3. In part a, the machine Z has the highest average, so it is the slowest; it takes 1.072 times as long as the stadard.
      In part b, the machine R has the highest average, so it is the slowest.
    4. For the data in part a:
      machine R: \(\root 5 \of {1*1*1*1*1} = 1\)
      machine M: \(\root 5 \of {0.585*0.843*2.318*0.901*0.477} = 0.867\)
      machine Z: \(\root 5 \of {0.321*0.842*2.045*1.673*0.477} = 0.849\)

      For the data in part a:
      machine R: \(\root 5 \of {1.709*1.186*0.431*1.110*2.098} = 1.153\)
      machine M: \(\root 5 \of {1*1*1*1*1} = 1\)
      machine Z: \(\root 5 \of {0.321*1*0.882*1.858*1.003} = 0.880\)

      In both cases machine R is the slowest [largest average scaled time]

    1. The speedup is the ratio of the memory access time [before enhancement] to the cache access time [after enhancement]: \(T_2\over T_1\)
    2. The average access time in the probability of being in cache access time plus the probability of not being in cache times the main memory access time.
      \(T_\mbox{ave} = H * T_1 + (1-H) * T_2\)
      Hence the average speedup is \({T_2\over H * T_1 + (1-H) * T_2}\)
    3. Instead of going straight to main memory (time = \(T_1\)), we have try cache first then main memory (time = \(T_1 + T_2\)).
      \(T_\mbox{ave} = H * T_1 + (1-H) * (T_1+T_2)\)
      Hence the average speedup is \({T_2\over H * T_1 + (1-H) * (T_1 + T_2)}\)

Chapter 3

    1. The instruction is 32 bits, the opcode is 8 bits, hence the remainder is 24 bits. This is the size of an address. Thus there are 224 = 16,777,216 bytes = 16 MB.
    2. In either case, since the data can be 32 bits big [32 bit microprocessor], there will be 2 memory cycles to get one data item. Since an address is only 24 bits big, the address can be specified in one bus cycle if the address bus is 32 bits big, but it will need 2 bus cycles if the address bus is only 16 bits big.
    3. The program counter holds an address, thus it must be 24 bits big. The instruction register holds an instruction, so it must be 32 bits big.

    1. The original instruction takes 16 cycles [4+3+3+3+3].
      With 2 waitstates for both the operand fetch and the operand store, the instruction takes 20 cycles [4+3+5+3+5].
      The percent increase is \({\hbox{new - old}\over\hbox{old}} = {20-16\over 16} = 25\hbox{%}\)
    2. Now the instruction without waitstates takes 26 cycles [4+3+3+13+3].
      The instruction with waitstates now takes 30 cycles [4+3+5+13+5].
      The percent increase is \({30 - 26\over 26} = 15.4\hbox{%}\)

    Chapter 11