WARNING! DO NOT POWER UP THE MODIFIED LAMP DIRECTLY FROM THE MAINS! YOU NEED AN ISOLATED DC VOLTAGE, BETWEEN 5 AND 30V!

INTRODUCTION – INSPIRATION FROM A DISTORTED REALITY

Last year, I was impressed by the news about someone being able to run Doom on a pregnancy test

Sadly, the story was a little bit exaggerated.

In fact, all the news websites failed to understand (or to report correctly) what Foone (the author) actually did, despite clearly documenting all the steps, which finally lead to the viral video.

In fact, Foone was playing Doom on a PC, and a custom code on the PC-side was resizing and dithering the DOOM window, streaming it to a small OLED display through the USB port of the Teensy 3.2 Board. A small Bluetooth keyboard was connected to the PC to allow Foone to actually play Doom. The OLED display was fitted inside the plastic case of an electronic pregnancy tester the author analyzed earlier to see how does it work.

Despite all of this was very far from porting Doom to a microcontroller, it was very inspiring, and prompted me to find something I could port Doom to.

STATE OF THE ART AND RAM REQUIREMENTS

Original Doom’s memory requirement was 4MB of RAM, i.e. much more than what you might typically find in a microcontroller of a cheap consumer device. However, the relatively large RAM requirement followed the need of copying all the data (in particular graphics) from the hard disk to RAM, to access it quickly. Instead, in an embedded system or in a videogame console, constant data can stay in flash/ROM (provided you can access it with a good speed), saving a lot of RAM.

There are already ports on low-end devices, (SNES Doom, Vicdoom - Doom on a Vic 20 - or Doom on the Ti83 calculator), but these were stripped down a lot to fit memory and computing power limitations. Instead, I wanted to keep the engine close to its original.

How far can you optimize Doom, without sacrificing its main features, while being able to play at least some maps of the shareware version? (spoiler alert: all shareware maps work with this port!)

Or put in the opposite way, what it the minimum RAM requirement to play at least the shareware Doom? (given, of course, you have a reasonable computing power)

So I began researching what was the current state of the art, in terms of RAM usage, and I found this forum thread, which pointed me to the excellent doomhack’s Doom port to the GameBoy Advance (GBA),  based on PrBoom. Doomhack also documents the RAM usage, which can go as high as about 197000 bytes (192 kB) for level E1M6 in the shareware version.

Excerpt of the memory requirement spreadsheet. E1M6 is the heaviest of all shareware maps, in terms of memory.

 Later, I found that such spreadsheet, however, does not take into account:

Therefore, if we include also the frame buffer and the other memory region used in the GBA port, the total RAM required to play all the shareware levels could be somewhat larger than 256kB.

In the meantime, for other reasons, I stumbled across the IKEA TRÅDFRI lamp series. These use a Silicon Lab RF microcontroller, which is already powerful enough for Doom (a 40 MHz Cortex M4), however it features only 32 kB of RAM: too few, to run a vanilla Doom, unless stripping it of too many features. So, I had to drop the idea of porting there Doom.

Later, I saw on a github repository that a newer IKEA TRÅDFRI lamp (the RGB GU10 version) mounts the MGM210L module from Silicon Labs (you can find the same module on several distributors like Mouser, Farnell, Digikey, Newark, RS-Components, etc.), which is based on a more powerful (and memory-rich) microcontroller: the Silicon Labs EFR32MG21AxxxF1024, featuring more RAM  and a very powerful 80MHz Cortex M33.

The IKEA TRÅDFRI GU10 RGB lamp. After disassembly, of course. Before accidentally breaking the glass, of course.

In particular, the microcontroller of the MGM210L has 96+12 kB of RAM and 1MB of internal flash. By comparison, the GBA has 384kB of RAM and direct memory mapped access to up to 32 MB ROM on the cartridge.

Although these specifications suggested that there was not enough memory to play all the shareware levels, there was still hope at least to create a proof of concept project. In fact, by looking at the memory requirements shown above, I noticed that E1M1 could still be played, as it required just 90kB of RAM (this, of course before I realized that the spreadsheet did not count all the memory contribution mentioned above…). That was enough to convince me to start porting Doom to that IKEA TRÅDFRI lamp. 

The goal was set to play at least E1M1, with as much as features of the original shareware Doom engine, without any requirements about audio.

THE PORT TO THE MGM210L

My port is based on doomhack's GBA Doom port, as it already features many memory optimizations. No need to reinvent the wheel.

No doubt, the MGM210L packs enough computing power to run Doom at very high speed. The issue is the small RAM and Flash amounts available to run DOOM. Indeed, the real challenge of this project is memory optimization, both in terms of amount and access speed.

Since the shareware Doom WAD is around 4MB, I needed to add an external memory. 

MMC-SD cards are ruled out here: Doom is quite memory-access intensive, and high memory bandwidth and low latency are required. The filesystem overhead and high latency of SDs make them unsuitable for this job. The ideal solution would be a parallel flash memory, but 1) the MG21 does not support it 2) even if it did, the MGM210L module only has 10 I/O pins. After all, the MG21 is designed as RF MCU for IoT application, not to run Doom.

Therefore, the only acceptable solution was an external SPI memory. The size used on the prototype is 8MB and was determined by two factors:

Despite the MG21 microcontroller series does not support dual or quad SPI memory, using a hardware trick (and software optimization, see here for details) I was able to use a QSPI flash and read it in dual-SPI mode, doubling the bandwidth, with respect to a single-line SPI flash. Furthermore, I overclocked the CPU peripheral bus frequency to 80MHz. According to the datasheet, the maximum bus speed is 50MHz. However, I found that such figure is very conservative, and the system run flawlessly at 80MHz too, at least considering the peripherals used for this project, and at room temperature. This enables to get a 40MHz SPI, which, thanks to the dual-SPI trick, allows for a peak bandwidth of 10MB, i.e. 80Mbit/s.

This is just one part of the job. GBA Doom code expects data in a memory mapped region, whereas in my case most of the data is stored in the external flash. Therefore, all the data accesses had to be wrapped, to fetch data directly from the SPI.

Floor and ceiling texture data are not accessed sequentially, so leaving them in the external flash would lead to a painfully slow process (probably in the single digit FPS). Therefore, these textures are copied in the internal Flash. In the shareware version, these textures, together with the code, amount to about 550kB, so there are still 450kB of internal flash, which can be used to store as many wall textures as possible. This improves the frame rate a lot, because the data can be fetched at very high speed (peaking at 320 MB/s).

Now that I have discussed how I solved the issue of reading data at a decent speed, a major issue remains: RAM. This is where the hardest part was.

Cutting down as many bytes as possible has been quite an effort. The list of optimization is rather long and can be found here. Almost every single data structure has been modified to save up to the last drop.

By far, the major improvement was to use 16-bit pointers, instead of 32 bit. In fact, in the GBA Doom port, the majority of the pointers point to structures, which are stored in RAM, and are 4-byte aligned. Since there are less than 128kB RAM, this limits the useful address width to 17 bits. However, since the structures are 4-byte aligned, the 2 LSBs are not needed, because they are always zero, therefore a 16 bit integer can contain all the information to find something stored in RAM.

This trick, on one hand saves a lot of RAM, on the other  adds some overhead to convert a long pointer to short, and vice versa. This is not  a big issue: we have an 80MHz Cortex M33.

After a lot of memory optimizations, I was able to run the full shareware episode, including E1M6! This only using less than 108kB of RAM!

Furthermore, using only 3kB of flash, and few bytes of RAM, I was able to bring back the Z-Depth lighting effect!

LET’S ADD AUDIO

After almost everything was ported and fixed (menu, intermission screens, savegame, etc), something was still missing… Audio!

Luckily the MG21 features DMA, and timers that can work in PWM mode. Doom audio samples are 8-bit, 11025 Hz. Since I expect a frame rate well above 10Hz, I used a 1024-sample buffer, which is filled/refreshed each frame. This means that the minimum frame rate we need to have is 10.8 fps. Going below that threshold will trigger audio glitches. Here I explain why interrupts cannot be used for this purpose.

Since there are 8 channels at 8 bits, each sample of the audio buffer (which is the mix of all the 8 channels) is larger than one byte, therefore the audio buffer is 2048 bytes. To recover these 2 kB lost, further RAM optimizations had to be done.

Noticeably only sound FX are implemented. Music is not.

THE HARDWARE

Block diagram of the prototype

It is extremely simple: the Ikea module is soldered to a  DC-DC converter, which already generates 3.3V and 5V. All what you need is just the display itself (I used a quite standard 160x128 TFT pixel), the 8 Mbyte SPI flash, a widespread shift register, 8 pushbuttons and few passives. That’s it!

Prototype schematics. Simple, isn't it?

The module accepts input from 5 to 30V, however you need to remove one resistor if you are planning to use it at low voltages (e.g. 12V). That resistor prevents the DC-DC converter IC from turning on, in case the input voltage is too low (see picture below).

The 74HC165 parallel input serial output shift register is used because the MGM210L module has very few I/O pins. This cheap IC allows to connect 8 pushbuttons. Using two ICs would allow for a bigger keyboard, but for sake of simplicity and size I stick to a 8-key layout.

Keyboard, with function layout. If you are unhappy with key assignments you can remap by changing a header file.

Noticeably, the external Flash is clocked at 40MHz, the display at 30MHz (note: overclock! According to its datasheet its maximum clock frequency would be 16MHz), and the shift register only al 5MHz. This is not a problem, the MG21 allows to switch clock (and peripheral I/O) very easily.

Finally, audio is connected using just a capacitor. Due to lack of space, I did not even mount a low pass filter, as the PWM frequency is very high, and I rely on the amplified speaker internal filter. Still, it is recommended, as it improves the quality. When I will make the PCB, there will be for sure enough space for a simple RC low pass filter.

BUILDING THE PROTOTYPE

You need to get the MGM210L from the IKEA lamp. The module is not write locked so it can be easily programmed using any SWD programmer. You might also find the MGM210L from conventional distributors. In the IKEA lamp it is soldered on a small board, which, as said, provides a 30V to 5V (and 3.3V) converter. Since the voltage conversion circuit is very useful, and separating the two boards is quite difficult, I suggest to keep them joined and use the DC-DC converter.

To do this, you need to do some wiring work, and also you need to remove the resistor marked R25, as seen in the picture, shown before.

Top connections. You need to wire few more connections as shown in the next picture.
You need to add some wires on the back for 5V output, GND and input voltage.

I used prototyping boards, but as you can see, you’ll end with a mess of wires. In the future I plan to release a KiCAD PCB layout  in the github repository, so the design will be much more compact.

This is the board we will attach the RF module breakout board shown before. It contains the 8MB QSPI flash and the shift register. On the back, a mess of wires and some SMT passives.
As promised the mess of wires. On the left the original power connector of the high-voltage board, soldered to a small piece of prototyping board.

The RF module board goes on the top of the previous one. The connector on the left is the debug header, which is also used to upload the WAD file.

WARNING: the whole system (except the display) will fit the original Ikea Lamp, but you need to power it at low voltage! Do not power it from mains, if you don’t want to start a fire.

PROGRAMMING AND FINALIZING THE LAMP

The device can programmed using any SWD programmer attached to the debug header.  The program can be compiled using Silicon Labs’s Simplicity Studio V5. NOTE: at the end of programming, you might get an error. Ignore it, it will work.

Then you need a WAD file converted to a particular format, compatible with this port. The already converted shareware WAD file is in the repository. If you want to try other WADs (note I have not tested any other WAD, it might not work!) you need to use the mg21wadutil (derived from doomhack's GBA wad util), present in the same repository.

The converted WAD must be sent to the internal flash via YMODEM protocol (XMODEM supported too). For this operation you need an USB to TTL UART converter, see debug header signals as shown above (note: TX and RX are MGM210L relative. You need to connect TX with RX and vice versa!).

To upload the WAD, power up the device, and keep pressed the buttons “use”, “change weapon” and “alt”, to initiate YMODEM reception. Use Teraterm and send the file via YMODEM protocol.

Be aware that the upload process is very slow, and it will take more than 10 minutes for the shareware WAD. In the meantime go and have some coffee or tea…

After the download has been complete reset the device, and you should see DOOM running!

After WAD upload, you might want to test if it works, before fitting evetything in the lamp.

Once you verified that it works, you might want to put everything inside the lamp...

Some tape is enough to hold everything in place. Please note that the lamp holder is connected to 4 AA batteries, NOT TO THE MAINS!

PERFORMANCE

From a Cortex M33 you might expect to run Doom even with a 320x200 resolution at 35 fps. After all, Doom required a 486 DX 33MHz to run at a good frame rate.

You are right, but the issue is the data access speed. Flats (ceilings and floors), as well as some cached textures are accessed at high speed, but sprites and uncached data are accessed at most at 10MB/s. And this value is the peak sequential read speed. Actual figures are much, much lower.

I verified, using a STM32 microcontroller featuring memory mapped QSPI support, that the QSPI speed has dramatic effect on the speed. When the QSPI speed is 40 MHz or above (160 Mbit/s) the frame rate rarely drops below 30fps, and frequently caps at 35 fps. For smaller bandwidths, the frame rate drops, especially on those levels featuring many different uncached textures on the same scene.

That said, we got a framerate which still sometime caps at 35 fps, but on complex scenes with many sprites (in terms of data to be fetched externally, not in terms of textures) it can drop down to 16 fps: still playable.

COMPARISON WITH GBA DOOM PORT

The GBA Doom port feature also stereo audio, while our prototype only has a mono channel (8 channels are software mixed to a single one). Furthermore, music is not implemented in our port.

As in the GBA port, demo support is broken, so I have disabled it. This is also because every time you change level, you perform a write operation in the FLASH. This means that after you have changed level 10000 times, your flash will be out of specification (this does not mean it won’t work anymore. It means that the flash will not probably keep the stored data for 10 years. However, this affects only the upper region of the flash – code part is never overwritten – so it will not be a huge issue).

However, unlike the GBA Port, my version features Z-Depth lighting effect. On GBA Port, this was removed to save some 11kB of RAM. I managed to bring back this features using less than 3kB of Flash an 8 byte (yes, bytes!) or additional RAM.

Furthermore, doomhack implemented a mip-mapping like algorithm on composite textures. This was probably to increase software cache hit/miss ratio, at the expense of lower details on some textures. We cannot afford software cache (doomhack used 16kB for it), so there is no point of implementing it: the speed is not affected. Therefore, I have restored full detail rendering on composite textures. Note that I have restored high detail rendering of composite textures AFTER taking the pictures and shooting the video, where you can clearly see the mip-map effect on some walls.

In the IKEA lamp port, Doom runs faster than in the GBA, but this expected: an 80 MHz Cortex M33 is about 8 times faster than a 16.7 MHz ARM7TDMI.

In terms of memory usage, the port on the Ikea lamp requires much less RAM. For instance, level E1M1 requires less than 60kB (including framebuffer and stack) vs about 87.4kB (excluding 32kB of IWRAM, stack, memory overhead and 38400 bytes of frame buffer, i.e. a total of about 157kB). Level E1M6 occupies less than 108kB on the Ikea Tradfri port, vs the already mentioned amount exceeding 256kB.

"LAMP" OR "LIGHT BULB"? 

The correct technical word for "light bulb" is "lamp"!  IKEA itself refers to this particular device as "self ballasted LED lamp"

SOME MORE SCREENSHOTS

E1M1. The ammo counter shows 315. This means that the framerate is 31.5 fps.

E1M2. This level is more complex, and has many more textures, so some of them are directly loaded from the external flash. This explains the lower framerate (26.7 fps).
The complete system in action!

VIDEO

Here is a short clip, showing the performance on level E1M6. Note in this video mip-mapping on composite textures was not disabled yet. Actual graphics is much better. Also the camera does not get the exact colors, which are instead much better..

I WANT MORE DETAILS!

I have written a more detailed article, especially about optimization, here.

CONCLUSIONS

I made it!

Several features are still missing:

Finally, two last remarks.

NOTES

  1. Do not power up the lamp directly from the mains. Maximum voltage is 30V (if you use the DC-DC converter board).
  2. In all the pictures, the AMMO count shows the current framerate multiplied by 10. E.g. 300 = 30 fps. In the current github repository, the frame rate counter is disabled!
  3. Yes, you might see that the lamp case is broken. Unluckily, the lamp fell on the ground and broke in 3 pieces (it is made of glass). A bit of superglue helped :)