Monday, December 29, 2008

Nios II, C++, Quartus II

Spent about a week on porting my x86 Frogger game onto an Altera DE2 development board. The port was guided by an existing implementation of Frogger on the DE2 board by a PhD student called Willie but the game was written to use a Real-Time Operating System (RTOS) called MicroC/OS II. The software was written in C. That implementation runs at about 1 frame per second though I think there are some self-imposed delay statements. My implementation of the game, as a single threaded progamme, runs at about 14.6 frames per second which is a significant speed up. I removed a lot of hardware blocks from the synthesised system that I knew I wouldn't use. This was done mainly to decrease the time needed to compile and synthesis the SOPC for the DE2 board. This should help when I have to recompile the system when I develop my reactive unit.

System on Programmable Chip (SOPC) components:

  • SDRAM (8 Megabytes; 4 MB for programme memory and 4MB for exception instructions)
  • SRAM (512 Kilobytes) for VGA frame buffer
  • DMA controller for SDRAM and SRAM components
  • On-chip memory (32 Bytes)
  • On-chip memory (10 Kilobytes)
  • DMA controller for both on-chip memories
  • Nios II CPU (Fast configuration with I/D cache)
  • EPCS Serial Flash Controller (FPGA configuration)
  • JTAG UART (Downloading programme data and debugging)
  • Timer (100 MHz) for system clock
  • Timer (100 MHz) for time stamping
  • Push buttons (4) for player inputs
  • Slide switches (18) for changing player number
  • Red LED (18) for displaying player number
  • LCD (16 x 2) for displaying game information
  • Seven-segment LED (8) for displaying the number of DMA interrupts
  • VGA (80 x 60 pixel mode)
  • VGA Controller (640 x 480 extended by Willie)
  • ISP 1362 for USB
It took around 2 days just to reduce the list of components to this. Most of the time when I removed a component, the DMA interrupts would stop occurring. It's probably the symptom of something else. Maybe the Nios II processor stopped working. Not sure.


The right most push button on the board, KEY0, seems dodgy. If you move the top of the button from side to side while it's depressed, the voltage level from the button changes from high to low. Trying to de-bounce that would be futile considering the button is already depressed when the voltage changes from high to low.


Improving VGA performance
Understanding the use of the DMA controller and memories to get a VGA output took a bit of time. However, porting the code from C to C++ was easy. The method of drawing to the VGA goes like this: Pixel information is written to a one-dimensional array equal in size to videoWidth x video Height and exists in SDRAM; this array is then copied to SRAM; then the VGA controller is told to read from SRAM via a FIFO channel.

I have tried removing the second step (copying to SRAM) but this cause major glitching for the video drawing as the video becomes hazy. I'm not sure what is causing it and don't have the time to investigate. I haven't found the register informatino for the VGA controller so there's no way for me to determine how fast the picture is being redrawn. I think DMA counting is the way for me to go right now.
  1. The method of transferring the frame buffer from SDRAM to SRAM was improved so that it occurred as one DMA transaction. Before it was happening as 300 transactions of 600 bytes (600 x 300 pixel video size). This was possible as the memory required is just 175.8 Kilobytes.

  2. The transfer mode used by the DMA controller was set to bytes (8-bits) but half words (2 bytes or 16-bits) could be used instead. Changing the mode to half words improved performance quite a bit. The DMA controller supports up to quad words but because the SRAM is only 16-bits wide, half words is the realistic maximum for transfer mode.

  3. The functions used to draw to the background and foreground were optimised as well. This involved pre-calculating the starting base addresses for each line of the frame buffer and block. The frame buffer is an array of size videoWidth x videoHeight that exists in SDRAM while the block is an array containing pixel information of a game asset. Hence, the pre-calculated base addresses are basically offsets into the frame buffer and block arrays. Also, simple changes to the initial value for the for-loop counters was made to further simplify the arithmetic used.

  4. The function for redrawing the background (which seldom happens when compared to drawing frogs, logs and tokens) was improved by the simple fact that the videoWidth was conveniently divisible by 4. This meant I could transfer 4 pixels to the frame buffer in SDRAM rather than 1 pixel at a time. This optimisation was tried out with the functions used to draw to the background and foreground but the benefits were offset by an increase in the overhead required to determine whether there was still enough pixels left in one line to be transferred as a group of 4 pixels or 2 pixels or just 1 pixel.

  5. Finally the C code was ported to C++ by packaging the functions and variables required into a Render class.

The resulting Render class header file is listed below:


#ifndef RENDER

#define RENDER


#include "constants.h"

#include "frog.h"

#include "logRow.h"

#include "token.h"

#include "Game/assets.h"


#include "altera_avalon_pio_regs.h"


#include <sys/alt_irq.h>

#include <sys/alt_dma.h>


// Link as C. Avoids undefined function references during linking stage.

extern "C" {

#include "vga_controller_ext.h"

}



class Render {

public:

// Public variables

volatile bool dmaCompleted;

vga_controller_dev* vga;

// Public functions

Render(void);

Render(void (*)(void*, alt_u32));

void (*handleDmaInterrupt)(void *, alt_u32);

void repaint(void);


void drawToForeground(const short int x,

const short int y,

const unsigned short int width,

const unsigned short int height,

const unsigned char *block);

void drawToBackground(const short int x,

const short int y,

const unsigned short int width,

const unsigned short int height,

const unsigned char *block);

void drawChar(const short int horiz_offset,

const short int vert_offset,

const int colour,

const char character,

const char *font);

void drawString(const short int horiz_offset,

const short int vert_offset,

const int colour,

const char *font, const char string[]);


void redrawBackground(void);

void game(Frog frog[], const LogRow logRow[], Token& tokens);

void restart(void);

void won(int, int);



private:

// Private variables

unsigned short int SCREEN_WIDTH;

unsigned short int SCREEN_HEIGHT;

unsigned int PIXEL_COUNT;

vga_frame_buffer_struct* vgaFrameBuffer;

unsigned int frameAddress;

unsigned char *bufferImage;

// Location of current pixel of the block to draw

const unsigned char *blockPixel;

// Start address of the current video line (frame buffer in sdram)

const unsigned char *currentFrameBaseDelta;

// Start address of the current block line

const unsigned char *currentBlockBaseDelta;

Assets assets;

// Private functions

void removeFrogElements(const Frog& frog);

void frogElements(Frog& frog);

void logRowElements(const LogRow logRow[]);

void tokenElements(Token& tokens);

};


#endif /*RENDER*/



Everything is working so now it's time start working on the reactive unit to the Nios II processor. Time to learn about closely-coupled memory and custom instructions!!

No comments: