The past few days, I've spent much time bashing my head against the problem of building a microcontroller into an FPGA, using a synthesizable core.
I'm using The Wonderful AVR8 Core of Ruslan Lepetenok, as seen on OpenCores and used in Papilio.
Thing is, it's not real well documented... as in, there doesn't seem to be a handy document explaining the required timing relationships among the various signals.
Also, it's written in VHDL, and I'm speaking Verilog these days, so there's a certain difficulty trying to read somebody else's code that's written in a foreign programming language.
And I can't just embed the Papilio code as-is; for one thing, it'd need to be adapted somewhat for Spartan 6 use, and for another I need a wildly different set of peripherals... one of which is an SPI flash controller that uses DMA to do initial program load*, which implies rather nonstandard memory (and CPU reset) requirements.
Anyway, after a few days of head-bashing involving silly mistakes on my part, like not having the buses hooked up right, and after much fiddly soldering to connect a centigrid header to the Spartan 6 eval board so's I could hook up a logic analyzer to a decent number of FPGA pins... I saw the CPU come out of reset, load what looked like reasonable-ish numbers out of the first couple of words of program memory, and... take a flying leap into hyperspace.
A bit of further investigation revealed that the AVR core expects to be dealing with straight-up static memory, whereas Xilinx's memory core generator does only synchronous memory... so there's always one clock cycle involved in an access, no matter how slowly you're running. So, the processor was getting its instruction word one clock late... ergo, seeing the word at address 0 twice... ergo, jumping to word 0x940C.
Hooking a 2X clock up to the memory cores had (at least, for a first approximation) the desired effect: according to the front-panel lights (or, rather, the antique logic analyzer standing in for even-more-antique front-panel lights), the processor is coming out of reset, jumping to word 0x0046, and then doing things that look more or less reasonable.
I'm thinking that an out-of-phase memory clock might be better than a 2X clock, but that can wait until tomorrow.
Anyway, tomorrow: first thing (after household chores) is to get this thing to send the letter 'Z' to a serial port.
Then to make it wait for characters, and echo them.
After that... well, the rest is easy. Throw together some rinky-dink serial-port monitor with reasonable extensibility, and I'll be set for hardware bring-up when the real boards come in, probably early in the coming week.
* If the number 60133 sprang to mind, You Might Just Be A Data General Geek. Yes, it's much like IPL on an old mini. No, I didn't seriously consider using a NOVA core, mainly on account of not having a toolchain handy.
Update: Got the 'Z' sent just after breakfast. Then off to chores; now (luchtime) back to work, and the echo progam is running (complete with mangling the character to prove it really went by way of the processor).
So: synthesized AVR core, program memory, and UART are working. I think the UART wants a more suitable clock frequency; dividing 20 MHz down to a 16X clock for 115200 Baud gives a substantial error. Since the TX/RX data registers are designed to cross clock domains, and the clock manager has plenty more tricks available, this should not be a problem. (I do need to revisit the TX IDLE status logic; TX RDY is present after reset, but not TX IDLE.)
If the data memory also works, I'll be able to call printf, and if I can read from the UART and use printf, then it'll be all (well, mostly) downhill from there.
Oh, and a happy note: the Spartan 6 seems to be much less finicky about the SPI configuration signals than the Spartan 3E. I can leave the Aardvark hooked up to the SPI header, and the FPGA still loads just fine, despite the long and untidy wires.
And maybe, IMCFT, I'll write up some helpful user documentation for that AVR core.
Update 2: For a while, it looked like RAM wasn't working. Now it appears that RAM is working... but either avr-gcc is sometimes generating wonky code, or perfectly good code is sometimes not doing what it ought to. The latest development: the LPM instruction doesn't appear to be working, and in fact seems to be returning the result of the last LD. This is consistent across code tweaks and recompiles, so it's presumably not a timing issue. Perhaps the core doesn't work properly on a Spartan 6 for some reason? I suppose I could try adapting the Papilio project to work with a Spartan 3E board I have handy (would require re-doing the UCF, but maybe only a few lines of it, and the rest of the pin assignments could just be deleted and left up to the compiler's discretion). If it works on the 3E but not the 6... oog. I really don't want to dive into that forest of Somebody Else's VHDL without a map.
Or maybe it's an optimization thing, and I just need to find the right tweaks in the process settings.
Update 3: Oho! The LPM instruction comes in three forms:
LPM # (r0 implied)
LPM rd,Z
LPM rd,Z+
It seems that this here core only decodes as LPM opcode 0x95C8, i.e., the first, ancestral form. The avr-gcc output, meanwhile, is using the other two forms.
So maybe I need to tell the compiler not to use those... time to dig into the AVR-specific options to gcc (and worry about the presence of the third form in what appears to be startup code). I compiled for ATmega64; perhaps this was the wrong choice.
...Ah. Examining the datasheets, I find that the ATmega103 has only the first form of LPM, while the ATmega64 has all three. So: try compiling for the 103.
/bin/go
Now my memory dump does... oh, foo. The part that's reading from program memory is now returning correct data. The part that's reading from the bottom of real RAM, starting from 0x100, is showing power-up default contents... because... the 103 memory map has RAM starting at 0x60. Grrrr.
Anyway, it looks like the CPU core is indeed working, as long as I use the right instruction set. Now I just need to figure out how to tell avr-gcc to use the m103 instruction set for compiling, the m103 startup code, and the m64 memory map... or, cram my peripherals into 64 bytes (0x20..0x5F, memory mapped), which shouldn't be too much of a problem, especially if I shrink the SPI command buffer from 32 bytes (enough to hold typical write data, too) to, say, 4 bytes.
Maybe tomorrow I'll get printf working (in among all the interruptions).
Update 4: To get a custom memory map, copy, e.g., /usr/lib/ldscripts/avr3.x to your source directory, renaming it, e.g., "ldscript.ld" and changing the memory definitions to suit your target, and update your Makefile thusly:
MCU_TARGET = atmega103
MCU_DEVCODE = m103
override CFLAGS = -g -Wall $(OPTIMIZE) -mmcu=$(MCU_TARGET) $(DEFS)
override LDFLAGS = -Wl,-Map,$(PRG).map,-Tldscript.ld
Update 5: During RAM write operations, the data shows up on the bus one cycle ahead of the address and the write enable. This is fine for single writes, as the data will still be there when the write happens. For stack pushes, though, it causes trouble. So: add a one-cycle delay to the write data (dbusout) path, and all's warm and snuggly.
Update 6: During I/O write operations (OUT, SBI, CBI), the data is placed on the same bus... but it's not a cycle early. So the added pipeline register needs to be bypassed if it's an I/O write cycle. Since the compiler will helpfully use these instructions for RAM addresses which are in range, it's important to have the I/O write operation work correctly.
Recent Comments