Monday, December 29, 2008

Nios II, C++, Quartus II

Spent about a week on porting my x86 Frogger game onto an Altera DE2 development board. The port was guided by an existing implementation of Frogger on the DE2 board by a PhD student called Willie but the game was written to use a Real-Time Operating System (RTOS) called MicroC/OS II. The software was written in C. That implementation runs at about 1 frame per second though I think there are some self-imposed delay statements. My implementation of the game, as a single threaded progamme, runs at about 14.6 frames per second which is a significant speed up. I removed a lot of hardware blocks from the synthesised system that I knew I wouldn't use. This was done mainly to decrease the time needed to compile and synthesis the SOPC for the DE2 board. This should help when I have to recompile the system when I develop my reactive unit.

System on Programmable Chip (SOPC) components:

  • SDRAM (8 Megabytes; 4 MB for programme memory and 4MB for exception instructions)
  • SRAM (512 Kilobytes) for VGA frame buffer
  • DMA controller for SDRAM and SRAM components
  • On-chip memory (32 Bytes)
  • On-chip memory (10 Kilobytes)
  • DMA controller for both on-chip memories
  • Nios II CPU (Fast configuration with I/D cache)
  • EPCS Serial Flash Controller (FPGA configuration)
  • JTAG UART (Downloading programme data and debugging)
  • Timer (100 MHz) for system clock
  • Timer (100 MHz) for time stamping
  • Push buttons (4) for player inputs
  • Slide switches (18) for changing player number
  • Red LED (18) for displaying player number
  • LCD (16 x 2) for displaying game information
  • Seven-segment LED (8) for displaying the number of DMA interrupts
  • VGA (80 x 60 pixel mode)
  • VGA Controller (640 x 480 extended by Willie)
  • ISP 1362 for USB
It took around 2 days just to reduce the list of components to this. Most of the time when I removed a component, the DMA interrupts would stop occurring. It's probably the symptom of something else. Maybe the Nios II processor stopped working. Not sure.

The right most push button on the board, KEY0, seems dodgy. If you move the top of the button from side to side while it's depressed, the voltage level from the button changes from high to low. Trying to de-bounce that would be futile considering the button is already depressed when the voltage changes from high to low.

Improving VGA performance
Understanding the use of the DMA controller and memories to get a VGA output took a bit of time. However, porting the code from C to C++ was easy. The method of drawing to the VGA goes like this: Pixel information is written to a one-dimensional array equal in size to videoWidth x video Height and exists in SDRAM; this array is then copied to SRAM; then the VGA controller is told to read from SRAM via a FIFO channel.

I have tried removing the second step (copying to SRAM) but this cause major glitching for the video drawing as the video becomes hazy. I'm not sure what is causing it and don't have the time to investigate. I haven't found the register informatino for the VGA controller so there's no way for me to determine how fast the picture is being redrawn. I think DMA counting is the way for me to go right now.
  1. The method of transferring the frame buffer from SDRAM to SRAM was improved so that it occurred as one DMA transaction. Before it was happening as 300 transactions of 600 bytes (600 x 300 pixel video size). This was possible as the memory required is just 175.8 Kilobytes.

  2. The transfer mode used by the DMA controller was set to bytes (8-bits) but half words (2 bytes or 16-bits) could be used instead. Changing the mode to half words improved performance quite a bit. The DMA controller supports up to quad words but because the SRAM is only 16-bits wide, half words is the realistic maximum for transfer mode.

  3. The functions used to draw to the background and foreground were optimised as well. This involved pre-calculating the starting base addresses for each line of the frame buffer and block. The frame buffer is an array of size videoWidth x videoHeight that exists in SDRAM while the block is an array containing pixel information of a game asset. Hence, the pre-calculated base addresses are basically offsets into the frame buffer and block arrays. Also, simple changes to the initial value for the for-loop counters was made to further simplify the arithmetic used.

  4. The function for redrawing the background (which seldom happens when compared to drawing frogs, logs and tokens) was improved by the simple fact that the videoWidth was conveniently divisible by 4. This meant I could transfer 4 pixels to the frame buffer in SDRAM rather than 1 pixel at a time. This optimisation was tried out with the functions used to draw to the background and foreground but the benefits were offset by an increase in the overhead required to determine whether there was still enough pixels left in one line to be transferred as a group of 4 pixels or 2 pixels or just 1 pixel.

  5. Finally the C code was ported to C++ by packaging the functions and variables required into a Render class.

The resulting Render class header file is listed below:

#ifndef RENDER

#define RENDER

#include "constants.h"

#include "frog.h"

#include "logRow.h"

#include "token.h"

#include "Game/assets.h"

#include "altera_avalon_pio_regs.h"

#include <sys/alt_irq.h>

#include <sys/alt_dma.h>

// Link as C. Avoids undefined function references during linking stage.

extern "C" {

#include "vga_controller_ext.h"


class Render {


// Public variables

volatile bool dmaCompleted;

vga_controller_dev* vga;

// Public functions


Render(void (*)(void*, alt_u32));

void (*handleDmaInterrupt)(void *, alt_u32);

void repaint(void);

void drawToForeground(const short int x,

const short int y,

const unsigned short int width,

const unsigned short int height,

const unsigned char *block);

void drawToBackground(const short int x,

const short int y,

const unsigned short int width,

const unsigned short int height,

const unsigned char *block);

void drawChar(const short int horiz_offset,

const short int vert_offset,

const int colour,

const char character,

const char *font);

void drawString(const short int horiz_offset,

const short int vert_offset,

const int colour,

const char *font, const char string[]);

void redrawBackground(void);

void game(Frog frog[], const LogRow logRow[], Token& tokens);

void restart(void);

void won(int, int);


// Private variables

unsigned short int SCREEN_WIDTH;

unsigned short int SCREEN_HEIGHT;

unsigned int PIXEL_COUNT;

vga_frame_buffer_struct* vgaFrameBuffer;

unsigned int frameAddress;

unsigned char *bufferImage;

// Location of current pixel of the block to draw

const unsigned char *blockPixel;

// Start address of the current video line (frame buffer in sdram)

const unsigned char *currentFrameBaseDelta;

// Start address of the current block line

const unsigned char *currentBlockBaseDelta;

Assets assets;

// Private functions

void removeFrogElements(const Frog& frog);

void frogElements(Frog& frog);

void logRowElements(const LogRow logRow[]);

void tokenElements(Token& tokens);


#endif /*RENDER*/

Everything is working so now it's time start working on the reactive unit to the Nios II processor. Time to learn about closely-coupled memory and custom instructions!!

Thursday, December 18, 2008

Computer Systems Engineering with First Class Honours

Compaq Presario replacement

Finally ordered and assembled our replacement computer for the old and dead Compaq Presario. The motherboard has a good selection of legacy ports that may be useful later on for some random FPGA projects. Got a 500 GB hard disk so that should last until the next motherboard melt down. Integrated graphics is working just fine with Vista Home Premium. Also an Intel Celeron E1400 (dual core) with 2 Gigs of RAM should be enough for internet surfing. Managed to hack in remote desktop so I can now administer it from my room! Monitor is the only thing that might need upgrading now. Scanner software is a bit old as well but the scanner itself still works fine.

Wednesday, December 10, 2008

Python, CherryPy and Verilog

Decided to try and upgrade Python from 2.52 to 3.0 on my Xubuntu partition.  I decided to try and install all the packages in a bid to try and get rid of all the module warnings.  I did succeed in the end but I probably installed a lot useless sources.  Took ages trying to find the right sources to install as well.  Anyway, got Python 3.0 to compile and install so that was good.

Now I know that CherryPy 3.1.1 is incompatible with Python 3.0 so I was prepared to manually "update"/hack the source files of CherryPy so that it would run.  I was quite successful in updating the syntax and finding replacement modules to use in place of the depreciations.  A lot of the changes were just guesses though because some of the Python 2.x syntax looked quite "loose"; especially the the except and raise expressions.  However, the new module "socket" that replaces "Socket" had changed quite a lot so I was unable to find the updated method calls.  Namely, I couldn't find an equivalent to socket._fileobject(None)._rbuf in Python 3.0 so that was when I gave up on "updating" CherryPy.  I *think* I found something similar called makefile() but the resulting object doesn't have an _rbuf attribute.  Not entirely sure what _rbuf stands for.  Could be "receive buffer".  I think the point of the line is to get a file object expected by the socket in its receive buffer and try and find out what type of object it is.  Guess I'll just have to wait for the official implementation.  Could be a few years . . .

Tried some more Verilog.  Stuck on how you should statically define an array.  Right now I've only got the array *initialised* within the "always" block which to me is clearly the wrong place; but it works okay for now.  There's also the distinction of packed and unpacked arrays which concerns the layout of memory.  Also, I'm not sure how you specify the range for an integer.

Saturday, December 6, 2008


A list of languages I've touched on so far during the holidays:
  • Python
  • Ruby
  • C / C++
  • Java
  • JavaScript
  • ActionScript 3, MXML
  • LaTeX
  • Bash
  • Simulink
  • Verilog

I went to the Microsoft Imagine Cup seminar session yesterday.  It was a fairly good session.  They talked about getting an idea through imagining and then realising the idea.  Tiny bit of talk about commercialisation but not enough to give the audience an impression of the business side of things.  no talk of writing up a venture summary, business plan or having an elevator pitch for investors to look at.  They then went through how you should do the presentation to the judges including demonstrations.  Lastly they went through the submission dates for the different competitions.  There was also something about an eBOX.

(Tag-less entry)

Tuesday, December 2, 2008

Flex 3, ActionScript 3, LaTeX

Spent the last two days working on some Adobe Flex 3 and ActionScript 3 goodness. Been porting the BlueMesh GUI that I had made for my Part Four Project from pure ActionScript 2 to ActionScript 3 and MXML. The use of MXML has definitely cut down on development time for the font end stuff (GUI). It has taken only two days to get the GUI ported from work that had taken at least 5 months to sort out. If only were able to use Adobe Flash Player 9 then maybe we would have had a kick arse relational editor (BlueMesh). The port has quite a bit of ActionScript but it's mainly for testing out popups and alert boxes. No more need for JavaScript calls! Everything is nicely contained in the MXML file. Still haven't tried passing in external variable values into the Flash application though. It should follow similarly to ActionScript 2. You use FlashVars to pass values in the application.

Also installed TexLive on my Xubuntu installation so I could have a play around with LaTeX. Will have to use it to create future reports. I guess you wouldn't call LaTeX a "document programming language". Rather it's more like a "document description language" (DDL) because you describe the structure of your intended document but the actual implementation of the document (in DVI, PS or PDF) is at the mercy of the LaTeX compiler you choose to use.

Hmm shouldn't be long before I can try some VHDL or Verilog and play around with FPGAs. Hopefully Xilinx ones. Still want to try and play around with Simulink on MatLAB. Wonder how the function block idea will be like compared to other "function block" design languages say, FBDK.