I wrote a little emulator for an Atmega32 microcontroller which emulates some instructions of the parallax propeller cpu. This cpu has 8 cores with 32 bits and can achieve 160Mips.
V0.2:
The emulation is a little bit slower ( about 3 order of magnitude

But .. it can run on an Atmega32 and blink a led

V0.3
Now running 4 Cogs. The memory of each cog had to be limited to 64 longs because the Atmega32 has only 2KByte of Ram.