Deeply Pipelined AES Implementation on FPGA

AES is composed of five main components: SubBytes, ShiftRows, MixColumn, AddRoundKey, and KeyExpansion. The authors focused only on SubBytes & KeyExpansion components with respect to encryption. SubBytes is responsible of replacing a byte by another one to introduce dissimilarity between the original text and the cipher text. AES algorithm for encryption and decryption can be shown in Figure 1.

AES Algorithm
Figure 1: AES Algorithm.

AES Components

Performance Evaluation

In order to evaluate the performance of the proposed solution, we need to count how many cycles each step takes. Table 1 shows how many cycles each component needs for computation and register respectively. The total number of cycles to get the first valid result is: (1 + (9 * (10+1)) + 9) = 109 A more important metric is the average clock per operation (CPO) which is equal to (108 + k))/k, where k is the number of operations. It is clear that as we increase the number of operations (k), CPO gets closer to 1.

Table 1: Component Timing
Component Time (cycles)
computation + register
PreRound 0 + 1
Rounds(1-9) 10 + 1
Round(10) 9 + 0

The FPGA board used was Xilinx Virtex-6 XC6VLX550T. The maximum frequency reported by Xilinx software is 462.515 MHz where the critical delay is 2.126 ns. Table 2 compares our design performance with the proposed one[2].

Table 2: Performance Results
Design Device Slices Critical Delay Max Freq
(MHz)
Throughput
(Gbps)
Efficiency
(Mbps/slice)
[2] Vertex-6 XC6VLX240T 6361 1.17 849.185 108.69 17.08
Our Design Virtex-6 XC6VLX550T 8258 1.385 721.891 92.42 11.19

Future Work

There are a couple of improvements that the authors did not consider. First, the multiplicative inverse (x-1) in GF (24) uses 3 of (x2) and 2 of (x); however, it is not pipelined. Pipelining this component might reduce the critical delay. Another improvement should be considered is to generate RoundKey(10) from an initial RoundKey(0) in the decryption operation.


References