melonDS RSS
open_in_new https://melonds.kuribo64.net/rss.php
The latest news on melonDS.
Feed Info
[2025-12-07T21:53:24.838Z] Updated feed with 10 items
[https://melonds.kuribo64.net/rss.php] groups = Blogs
Copy Link
Grid View
List View
Flow View
Hardware renderer progress -- by Arisotura
Hey hey, little status update! I've been having fun lately.
The hardware renderer has been progressing nicely lately. It's always exciting when you're able to assemble the various parts you've been working on into something coherent, and it suddenly starts looking a lot like a finished product. It's no longer just something I'm looking at in RenderDoc.
Those screenshots were taken with 4x upscaling (click them for full-size versions). The last one demonstrates hi-res rotscale in action. Not bad, I dare say.
It's not done yet, though, so I'll go over what remains to be done.
Mosaic
Shouldn't be very difficult to add.
Except for, you know, sprite mosaic. I don't really know yet how I'll handle that one. The way it works is intuitive if you're processing pixels left-to-right within a scanline, but this isn't how a modern GPU works.
Display capture
What was previously in place was a bit of a hack made to work with the old approach, so I will have to rework this. Atleast now I have the required system for tracking VRAM banks for this. But the whole part of capturing video output to OpenGL textures, and reusing them where needed, will need to be redone.
I also need to think of a way to sync hi-res captures back to emulated VRAM when needed.
Of course, support for this also needs to be added to the 3D renderers, for the sake of render-to-texture. Shouldn't be too difficult to add it to the compute renderer, but the old OpenGL renderer is another deal. I had designed that rather lazily, just streaming raw VRAM to the GPU, and never improved it. I could backport the texture cache the compute renderer uses, but it will take a while.
Mid-frame rendering
I need to add provisions in case certain things get changed mid-frame. I quickly did it for OAM, as shown by that iCarly screenshot above. The proof of concept works, but it can be improved upon, and extended to other things too.
The way this works is similar to the blargSNES hardware renderer. When certain state gets modified mid-frame, a section of the screen gets rendered with the old state - that section goes to the top of the screen, or wherever the previous section ended if there was any. Upon VBlank, we finish rendering in the same way.
There are exceptions for things that are very likely to be changed mid-frame (ie. window positions, BG scroll positions), and are relatively inexpensive to deal with. It's worth noting that on the DS, it's less frequent for video registers to be changed mid-frame, because the hardware is more flexible. By comparison, in something like a SNES game, you will see a lot more mid-frame state changes.
Either way, time will tell what is worth accounting for and what needs to be optimized.
Filtering
Hey, let's not forget why we came in here in the first place.
Hopefully, with the way the shaders are structured, it shouldn't be too hard to slot in filters. Bilinear, bicubic, HQX, xBRZ, you name it.
What I'm not too sure about is how well it'll work with sprites. It's not uncommon for games to splice together several small sprites to form bigger graphical elements, and I'm not sure how that'll work with filtering. We'll see.
Misc. things
The usual, a lot of code cleanup, optimization work, fixing up little details, and so on.
And work to make the code nicer to work with, which isn't particularly exciting.
This gives you a rough idea of where things are at, and what's left to be done. To conclude, I'll leave you with an example of what happens when Arisotura goofs up her OpenGL calls:
Have fun!
Hardware rendering, the fun -- by Arisotura
This whole thing I'm working on gives me flashbacks from blargSNES. The goal and constraints are different, though. We weren't doing upscaling on the 3DS, but also, we had no fragment shaders, so we were much more limited in what we could do.
Anyway, these days, I'm waist-deep into OpenGL. I'm determined to go further than my original approach to upscaling, and it's a lot of fun too.
I might as well talk more about that approach, and what its limitations are.
First, let's talk about how 2D layers are composited on the DS.
There are 6 basic layers: BG0, BG1, BG2, BG3, sprites (OBJ) and backdrop. Sprites are pre-rendered and treated as a flat layer (which means you can't blend a sprite with another sprite). Backdrop is a fixed color (entry 0 of the standard palette), which basically fills any space not occupied by another layer.
For each pixel, the PPU keeps track of the two topmost layers, based on priority orders.
Then, you have the BLDCNT register, which lets you choose a color effect to be applied (blending or fade effects), and the target layers it may apply to. For blending, the "1st target" is the topmost pixel, and the "2nd target" is the pixel underneath. If the layers both pixels belong to are adequately selected in BLDCNT, they will be blended together, using the coefficients in the BLDALPHA register. Fade effects work in a similar fashion, except since they only apply to the topmost pixel, there's no "2nd target".
Then you also have the window feature, which can exclude not only individual layers from a given region, but can also disable color effects. There are also a few special cases: semi-transparent sprites, bitmap sprites, and the 3D layer. Those all ignore the color effect and 1st target selections in BLDCNT, as well as the window settings.
In melonDS, the 2D renderer renders all layers according to their priority order, and keeps track of the last two values for each pixel: when writing a pixel, the previous value is pushed down to a secondary buffer. This way, at the end, the two buffers can be composited together to form the final video frame.
I've talked a bit about how 3D upscaling was done: basically, the 3D layer is replaced with a placeholder. The final compositing step is skipped, and instead, the incomplete buffer is sent to the GPU. There, a compositor shader can sample this buffer and the actual hi-res 3D layer, and finish the work. This requires keeping track of not just the last two values, but the last three values for any given pixel: if a given 3D layer pixel turns out to be fully transparent, we need to be able to composite the pixels underneath "as normal".
This approach was good in that it allowed for performant upscaling with minimal modifications to the 2D renderer. However, it was inherently limited in what was doable.
It became apparent as I started to work on hi-res display capture. My very crude implementation, built on top of that old approach, worked fine for the simpler cases like dual-screen 3D. However, it was evident that anything more complex wouldn't work.
For example, in this post, I showed a render-to-texture demo that uses display capture. I also made a similar demo that renders to a rotating BG layer rather than a 3D cube:
And this is what it looks like when upscaled with the old approach:
Basically, when detecting that a given layer is going to render a display capture, the renderer replaces it with a placeholder, like for the actual 3D layer. The placeholder values include the coordinates within the source bitmap, and the compositor shader uses them to sample the actual hi-res bitmap.
The fatal flaw here is that this calculation doesn't account for the BG layer's rotation. Hence why it looks like shit. Linear interpolation could solve this issue, but it's just one of many problems with this approach.
Another big issue was filtering.
The basic reason is that when you're applying an upscaling filter to an image, for each given position within the destination image, you're going to be looking at not only the nearest pixel from the source image, but also the surrounding pixels, in an attempt at inferring the missing detail. For example, a bilinear filter works on a 2x2 block of source pixels, while it's 4x4 for a bicubic filter, and as much as 5x5 for xBRZ.
In our case, the different graphical layers are smooshed together into a weird 3-layer cake. This makes it a major pain to perform filtering: say you're looking at a source pixel from BG2, you'd want to find neighboring BG2 pixels, but they may be at different levels within the layer cake, or they may just not be part of it at all. All in all, it's a massive pain in the ass to work with.
Back in 2020, I had attempted to implement a xBRZ filter as a bit of a demo, to see how it'd work. I had even recorded a video of it on Super Princess Peach, and it was looking pretty decent... but due to the aforementioned issues, there were always weird glitches and other oddball issues, and it was evident that this was stretching beyond the limits of the old renderer approach. The xBRZ filter shader did remain in the melonDS codebase, unused...
-
So, basically, I started working on a proper hardware-accelerated 2D renderer.
As of now, I'm able to decode individual BG layers and sprites to flat textures. The idea is that doing so will simplify filtering a whole lot: instead of having to worry about the original format of the layer, the tiles, the palettes, and so on, it would just be matter of fetching pixels from a flat texture.
Here's an example of sprite rendering. They are first pre-rendered to an atlas texture, then they're placed on a hi-res sprite layer.
This type of renderer allows for other nifty improvements too: for example, hi-res rotation/scaling.
Next up is going to be rendering BG layers to similar hi-res layers. Once it's all done, the layers can be sent to the compositor shader and the job can be finished. I also have to think of provisions to deal with possible mid-frame setup changes. Anyone remember that midframe-OAM-modifying foodie game?
There will also be some work on the 3D renderers, to add support for things like render-to-texture, but also possibly adding 3D enhancements such as texture filtering.
-
I can hear the people already, "why make this with OpenGL, that's old, you should use Vulkan".
Yeah, OpenGL is no longer getting updates, but it's a stable and mature API, and it isn't going to be deprecated any time soon. For now, I see no reason to stop using it immediately.
However, I'm also reworking the way renderers work in melonDS.
Back then, Generic made changes to the system, so he could add different 2D renderers for the Switch port: a version of the software renderer that uses NEON SIMD, and a hardware-accelerated renderer that uses Deko3D.
I'm building upon this, but I want to also integrate things better: for example, figuring out a way to couple the 2D and 3D renderers better, and generally a cleaner API.
The idea is to also make it easier to implement different renderers. For example, the current OpenGL renderer is made with fast upscaling in mind, but we could have different renderers for mobile platforms (ie. OpenGL ES), that are first and foremost aimed at just being fast. Of course, we could also have a Vulkan renderer, or Direct3D, Metal, whatever you like.
melonDS 1.1 is out! -- by Arisotura
As promised, here is the new release: melonDS 1.1.
So, what's new in this release?
EDIT - there was an issue with the release builds that had been posted, so if your JIT option is greyed out and you're not using a x64 Mac, please redownload the release.
DSP HLE
This is going to be a big change for DSi gamers out there.
If you've been playing DSi titles in melonDS, you may have noticed that sometimes they run very slow. Single-digit framerates. Wouldn't be a big deal if melonDS was always this slow, but obviously, it generally performs much better, so this sticks out like a sore thumb.
This is because those titles use the DSi's DSP. What is the DSP, you ask? A specific-purpose (read: weird) processor that doesn't actually do much besides being very annoying and resource-intensive to emulate. They use it for such tasks as downscaling pictures or playing a camera shutter sound when you take a picture.
With help from CasualPokePlayer, we were able to figure out the 3 main classes of DSP ucodes those games use, determine their functionality, and implement HLE equivalents in melonDS. Thus, those wonderful DSP features can be emulated without utterly wrecking performance.
DSP HLE is a setting, which you will find in the emulation settings dialog, DSi-mode tab. It is enabled by default.
Note that if it fails to recognize a game's DSP ucode, it will fall back to LLE. Similarly, homebrew ucodes will also fall back to LLE. There's the idea of adding a DSP JIT to help with this, but it's not a very high priority right now.
DSi microphone input
This was one of the last big missing features in DSi mode, and it is now implemented, thus further closing the gap between DS and DSi emulation in melonDS.
The way external microphone input works was also changed: instead of keeping your mic open at all times, melonDS will only open it when needed. This should help under certain circumstances, such as when using Bluetooth audio.
High-quality audio resampling
The implementation of DSP audio involved several changes to the way melonDS produces sound. Namely, melonDS used to output at 32 KHz, but with the new DSi audio hardware, this was changed to 47 KHz. I had added in some simple resampling, so melonDS would produce 47 KHz audio in all cases. But this caused audio quality issues for a number of people.
Nadia took the matter in her hands and replaced my crude resampler with a high-quality blip-buf resampler. Not only are all those problems eliminated, but it also means the melonDS core now outputs at a nice 48 KHz frequency, much easier for frontends to deal with than the previous weird numbers.
Cheat database support
If you've used cheats in melonDS, surely you've found it inconvenient to have to manually enter them into the editor. But this is no more: you can now grab the latest R4 cheat database (usrcheat.dat) for example, and import your cheat codes from that.
The cheat import dialog will show you which game entries match your current game, show the cheat codes they contain, and let you select which codes to import. You can also choose whether to clear any previously existing cheat codes or to keep them when importing new codes.
melonDS's cheat code system was also improved in order to fully preserve the structure found in usrcheat.dat. Categories and cheat codes can now have descriptions, categories have an option to allow only one code in them to be enabled, and codes can be created at the root, without having to be in a category.
The cheat file format (.mch) was also modified to add support for this. The parser is backwards-compatible, so it will recognize old .mch files just fine. However, new files won't be able to be recognized by older melonDS versions.
The cheat editor UI was also revamped to add support for the new functionality, and generally be more flexible and easier to work with. For example, it's now possible to reorder your cheat codes by moving them around in the list.
Compute shader renderer fix
Those of you who have tried the compute shader renderer may have noticed that it could start to glitch out at really high resolutions. This was due to running out of tile space.
We merged FireNX70's pull request, which implements tile size scaling in order to alleviate this problem. This means the renderer should now be able to go pretty high in resolution without issues.
Wayland OpenGL fix
If you use Wayland and have tried to use the OpenGL renderers, you may have noticed that it made the melonDS window glitchy, especially when using hiDPI scaling.
I noticed that glitch too, but had absolutely no idea where to start looking for a fix. So I kinda just... didn't use OpenGL, and put that on the backburner.
Until a while ago, when I felt like trying modern PCSX2. I was impressed by how smoothly it ran, compared to what it was like back in 2007... but more importantly, I realized that it was rendering 3D graphics in its main window alongside UI elements, that it uses Qt and OpenGL just like melonDS, and that it was flawless, no weird glitchiness.
So I went and asked the PCSX2 team about it. Turns out they originally took their OpenGL context code from DuckStation, but improved upon it. Funnily enough, melonDS's context code also comes from there. Small world.
In the end, the PCSX2 folks told me about what they did to fix Wayland issues. I tried one of the fixes that involved just two lines of code, and... it completely fixed the glitchiness in melonDS. So, thanks there!
BSD CI
We now have CI for FreeBSD, OpenBSD and NetBSD, courtesy Rayyan and Izder456. This means we're able to provide builds for those platforms, too.
Adjustments were also done to the JIT recompiler so it will work on those platforms.
Fixing a bunch of nasty bugs
For example: it has been reported that melonDS 1.0 could randomly crash after a while if multiple instances were opened. Kind of a problem, given that local multiplayer is one of melonDS's selling points. So, this bug has been fixed.
Another fun example, it sometimes occured that melonDS wouldn't output any sound, for some mysterious reason. As it was random and seemingly had a pretty low chance of occuring, I was really not looking forward to trying to reproduce and fix it... But Nadia saved the day by providing a build that exhibited this issue 100% of the time. With a reliable way to reproduce the bug, I was able to track it down and it was fixed.
Nadia also fixed another bug that caused possible crashes that appeared to be JIT-related, but turned out to be entirely unrelated.
All in all, melonDS 1.1 should be more stable and reliable.
There's also the usual slew of misc bugfixes and improvements.
However, we realized that there's a bug with the JIT that causes a crash on x86 Macs. We will do our best to fix this, but in the meantime, we had to disable that setting under that platform.
Future plans
The hi-res display capture stuff will be for release 1.2. Even if I could rush to finish it for 1.1, it wouldn't be wise. Something of this scope will need comprehensive testing.
I also have more ideas that will also be for further releases. I want to experiment with RTCom support, netplay, a different type of UI, ...
And then there's also changes I have in mind for this website. The current layout was nice in the early days, but there's a lot of posts now, and it's hard to find specific posts. I'd also want the homepage to present information in a more attractive manner, make it more evident what the latest melonDS version is, maybe have less outdated screenshots, ... so much to do.
Anyway, you can grab melonDS 1.1 on the downloads page, as usual.
You can also donate to the project if you want, that's always appreciated.
Hi-res display capture: we're getting there! -- by Arisotura
Sneak peek of the blackmagic3 branch:
(click them for full-res versions)
Those are both dual-screen 3D scenes, but notice how both screens are nice and smooth and hi-res.
Now, how far along are we actually with this?
As I said in the previous post, this is an improved version of the old renderer, which was based on a simple but limited approach. At the time, it was easy enough to hack that on top of the existing 2D engine. But now, we're reaching the limits of what is possible with this approach. So, consider this a first step. The second step will be to build a proper OpenGL-powered 2D engine, which will open up more crazy possibilities as far as graphical enhancements go.
I don't know if this first step will make it in melonDS 1.1, or if it will be for 1.2. Turns out, this is already a big undertaking.
I added code to keep track of which VRAM blocks are used for display captures. It's not quite finished, it's missing some details, like freeing capture buffers that are no longer in use, or syncing them with emulated VRAM if the CPU tries to access VRAM.
It also needs extensive testing and optimization. For this first iteration, for once, I tried to actually build something that works, rather than spend too much time trying to imagine the perfect design. So, basically, it works, but it's inefficient... Of course, the sheer complexity of VRAM mapping on the DS doesn't help at all. Do you remember? You can make the VRAM banks overlap!
So, yeah. Even if we end up making a new renderer, all this effort won't go to waste: we will have the required apparatus for hi-res display capture.
So far, this renderer does its thing. It detects when a display capture is used, and replaces it with an adequate hi-res version. For the typical use cases, like dual-screen 3D or motion blur, it does the job quite well.
However, I made a demo of "render-to-rotscale-BG": like my render-to-texture demo in the previous post, but instead of rendering the captured texture on the faces of a bigger cube, it is simply rendered on a rotating 128x128 BG layer. Nothing very fancy, but those demos serve to test the various possibilities display capture offers, and some games also do similar things.
Anyway, this render-to-rotscale demo looks like crap when upscaling is used. It's because the renderer's shader works with the assumption that display capture buffers will be drawn normally and not transformed. The shader goes from original-resolution coordinates and interpolates between them in order to sample higher-resolution images. In the case of a rotated/scaled BG layer, the interpolation would need to take the BG layer's transform matrix into account.
I decided to postpone this to the second step. Just think of the possibilites the improved renderer would offer: hi-res rotation/scale, antialiasing, filtering on layers and sprites, ...
And the render-to-texture demo won't even work for now. This one is tricky, it will require some communication between the 2D and 3D renderers. It might also require reworking the way texturing is done: for example my old OpenGL renderer just streams raw VRAM to the GPU and lets the shader do the decoding. It's lazy, but it was a simple way to get texturing working. But again, a proper texture cache here would open up more enhancement possibilities. Generic did use such a cache in his compute shader renderer, so it could probably serve for both renderers.
That's about it for what this renderer can do, for now. I also have a lot of cleanup and tying-loose-ends to do. I made a mess.
Stay tuned!
Display capture: oh, the fun! -- by Arisotura
This is going to be a juicy technical post related to what I'm working on at the moment.
Basically, if you've used 3D upscaling in melonDS, you know that there are games where it doesn't work. For example, games that display 3D graphics on both screens: they will flicker between the high-resolution picture and a low-resolution version. Or in other instances, it might just not work at all, and all you get is low-res graphics.
It's linked to the way the DS video hardware works. There are two 2D tile engines, but there is only one 3D engine. The output from that 3D engine is sent to the main 2D engine, where it is treated as BG0. You can also change which screen each 2D engine is connected to. But you can only render 3D graphics on one screen.
So how do those games render 3D graphics on both screens?
This is where display capture comes in. The principle is as follows: you render a 3D scene to the top screen, all while capturing that frame to, say, VRAM bank B. On the next frame, you switch screens and render a 3D scene to the bottom screen. Meanwhile, the top screen will display the previously captured frame from VRAM bank B, and the frame you're rendering will get captured to VRAM bank C. On the next frame, you render 3D graphics to the top screen again, and the bottom screen displays the capture from VRAM bank C. And so on.
This way, you're effectively rendering 3D graphics to both screens, albeit at 30 FPS. This is a typical use case for display capture, but not the only possiblity.
Display capture can receive input from two sources: source A, which is either the main 2D engine output or the raw 3D engine output, and source B, which is either a VRAM bank or the main memory display FIFO. Then you can either select source A, or source B, or blend the two together. The result from this will be written to the selected output VRAM bank. You can also choose to capture the entire screen or a region of it (128x128, 256x64 or 256x128).
All in all, quite an interesting feature. You can use it to do motion blur effects in hardware, or even to render graphics to a texture. Some games even do software processing on captured frames to apply custom effects. It is also used by the aging cart to verify the video hardware: it renders a scene and checksums the captured output.
For example, here's a demo of render-to-texture I just put together, based on the libnds examples:
(video file)
The way this is done isn't very different from how dual-screen 3D is done.
Anyway, this stuff is very related to what I'm working on, so I'm going to explain a bit how upscaling is done in melonDS.
When I implemented the OpenGL renderer, I first followed the same approach as other emulators: render 3D graphics with OpenGL, read out the framebuffer and send it to the 2D renderer. Simple. Then, in order to support upscaling, I just had to increase the resolution of the 3D framebuffer. To compensate for this, the 2D renderer would push out more pixels.
The issue was that it was suboptimal: if I pushed the scaling factor to 4x, it would get pretty slow. On one hand, in the 2D renderer, pushing out more pixels takes more CPU time. On the other hand, on a PC, reading back from GPU memory is slow. The overhead tends to grow quadratically when you increase the output resolution.
So instead, I went for a different approach. The 2D renderer renders at 256x192, but the 3D layer is replaced with placeholder values. This incomplete framebuffer is then sent to the GPU along with the high-resolution 3D framebuffer, and the two are spliced together. The final high-resolution output can be sent straight to the screen, never leaving GPU memory. This approach is a lot faster than the previous one.
This is what was originally implemented in melonDS 0.8. Since this rendering method bypassed the regular frame presentation logic, it was a bit of a hack - the final compositing step was done straight in the frontend, for example. The renderer in modern melonDS is a somewhat more refined version of this, but the same basic idea remains.
There is also an issue in this: display capture. The initial solution was to downscale the GPU framebuffer to 256x192 and read that back, so it could be stored in the emulated VRAM, "as normal". Due to going through the emulated VRAM, the captured frame has to be at the original resolution. This is why upscaling in melonDS has those issues.
To work around this, one would need to detect when a VRAM bank is being used as a destination for a display capture, and replace it with a high-resolution version in further frames, in the same way was the 3D layer itself. But obviously, it's more complicated than that. There are several issues. For one, the game could still decide to access a captured frame in VRAM (to read it back or to do custom processing), so that needs to be fulfilled. There is also the several different ways a captured frame can be reused: as a bitmap BG layer (BG2 or BG3), as a bunch of bitmap sprites, or even as a texture in 3D graphics. This is kinda why it has been postponed for so long.
There are even more annoying details, if we consider all the possibilities: while an API like OpenGL gives you an identifier for a texture, and you can only use it within its bounds, the DS isn't like that. When you specify a texture on the 3D engine, you're really just giving it a VRAM address. You could technically point it in the middle of a previously captured frame, or before... Tricky to work with. I made a few demos (like the aforementioned render-to-texture demo) to exercise display capture, but the amount of possibilities makes it tricky.
So I'm basically trying to add support for high-resolution display capture.
The first step is to make a separate 2D renderer for OpenGL, which will go with the OpenGL 3D renderers. To remove the GLCompositor wart and the other hacks and integrate the OpenGL rendering functionality more cleanly (and thus, make it easier to implement other renderers in the future, too).
I'm also reworking this compositor to work around the original limitations, and make it easier to splice in high-resolution captured frames. I have a pretty good roadmap as far as the 2D renderer is concerned. For 3D, I'll have to see what I can do...
However, there will be more steps to this. I'm acutely aware of the limitations of the current approach: for example, it doesn't lend itself to applying filters to 2D graphics. I tried in the past, but kept running into issues.
There are several more visual improvements we could add to melonDS - 2D layer/sprite filtering, 3D texture filtering, etc... Thus, the second step of this adventure will be to rework the 2D renderer to do more of the actual rendering work on the GPU. A bit like the hardware renderer I had made for blargSNES a decade ago. This approach would make it easier to apply enhancements to 2D assets or even replace them with better versions entirely, much like user texture packs.
This is not entirely unlike the Deko3D renderer Generic made for the Switch port.
But hey, one step at a time... First, I want to get high-resolution capture working.
There's one detail that Nadia wants to fix before we can release melonDS 1.1. Depending how long this takes, and how long I take, 1.1 might include my improvements too. If not, that will be for 1.2. We'll see.
Happy birthday melonDS! -- by Arisotura
Those of you who know your melonDS lore know that today is a special day. melonDS is 9 years old!
...hey, I don't control this. 9 is brown. I don't make the rules.
Anyway, yeah. 9 years of melonDS. That's quite the achievement. Sometimes I don't realize it has been so long...
-
As far as I'm concerned, there hasn't been a lot -- 2025 has had a real shitty start for me. A lot of stuff came forward that has been really rough.
On the flip side, it has also been the occasion to get a fresh start. I was told about IFS therapy in March, and I was able to get started. It has been intense, but so far it has been way more effective than previous attempts at therapy.
I'm hopeful for 2026 to be a better year. I'm also going to hopefully get started with a new job which is looking pretty cool, so that's nice too.
-
As far as melonDS is concerned, I have several ideas. First of all, we'll release melonDS 1.1 pretty soon, with a nice bundle of fixes and improvements.
I also have ideas for further releases. RTCom support, netplay, ... there's a lot of potential. I guess we can also look at Retroachivements, since that seems to be popular request. There's also the long-standing issues we should finally address, like the lack of upscaling in dual-screen 3D scenes.
Basically, no shortage of things to do. It's just matter of actually doing them. You know how that goes...
I also plan some upgrades to the site. I have some basic ideas for a homepage redesign, for updating the information that is presented, and presenting it in a nicer way... Some organization for the blog would be nice too, like splitting the posts into categories: release posts, technical shito, ...
I also have something else in mind: adding in a wiki, to host all information related to melonDS. The FAQ, help pages specific to certain topics, maybe compatibility info, ...
And maybe, just maybe, a screenshot section that isn't just a few outdated pics. Maybe that could go on the wiki too...
-
Regardless, happy birthday melonDS! And thank you all, you have helped make this possible!
The joys of programming -- by Arisotura
There have been reports of melonDS 1.0 crashing at random when running multiple instances. This is kind of a problem, especially when local multiplayer is one of melonDS's selling points.
So I went and tried to reproduce it. I could get the 1.0 and 1.0RC builds to crash just by having two instances open, even if they weren't running anything. But I couldn't get my dev build to crash. I thought, great, one of those cursed bugs that don't manifest when you try to debug them. I think our team knows all about this.
Not having too many choices, I took the 1.0 release build and started hacking away at it with a hex editor. Basic idea being, if it crashes with no ROM loaded, the emulator core isn't the culprit, and there aren't that many places to look. So I could nop out function calls until it stopped crashing.
In the end, I ended up rediscovering a bug that I had already fixed.
The SDL joystick API isn't thread-safe. When running multiple emulator instances in 1.0, this API will get called from several different threads, since each instance gets its own emulation thread. I had already addressed this 2.5 months ago, by adding adequate locking.
I guess at the time of 1.0, it slipped through the cracks due to its random nature, as with many threading-related bugs.
Regardless, if you know your melonDS lore, you know what is coming soon. There will be a new release which will include this fix, among other fun shit. In the meantime, you can use the nightly builds.
Sneak peek -- by Arisotura
Just showing off some of what I've been working on lately:
This is adjacent to something else I want to work on, but it's also been in the popular request list, since having to manually enter cheat codes into melonDS isn't very convenient: would be much easier to just import them from an existing cheat code database, like the R4 cheat database (also known as usrcheat.dat).
It's also something I've long thought of doing. I had looked at usrcheat.dat and figured the format wasn't very complicated. It was just, you know, actually making myself do it... typical ADHD stuff. But lately I felt like giving it a look. I first wrote a quick PHP parser to make sure I'd gotten the usrcheat.dat format right, then started implementing it into melonDS.
For now, it's in a separate branch, because it's still quite buggy and unfinished, but it's getting there.
The main goal is complete, which is, it parses usrcheat.dat, extracts the relevant entries, and imports them into your game's cheat file. By default, it only shows the game entry which matches your game by checksum, which is largely guaranteed to be the correct one. However, if none is found, it shows all the game entries that match by game code.
It also shows a bunch of information about the codes and lets you choose which ones you want to import. By default, all of them will be imported.
This part is mostly done, besides some minor UI/layout bugs.
The rest is adding new functionality to the existing cheat code editor. I modified the melonDS cheat file format (the .mch files) to add support for the extra information usrcheat.dat provides, all while keeping it backwards compatible, so older .mch files will still load just fine. But, of course, I also need to expose that in the UI somehow.
I also found a bug: if you delete a cheat code, the next one in the list gets its code list erased, due to the way the UI works. Not great.
I'm thinking of changing the way this works: selecting a cheat code would show the information in disabled/read-only fields, you'd click an "Edit" button to modify those fields, then you'd get "Save" or "Cancel" buttons... This would avoid much of the oddities of the current interface.
The rest is a bunch of code cleanup...
Stay tuned!
Not much to say lately... -- by Arisotura
Yeah...
I guess real life matters don't really help. Atleast, my mental health is genuinely improving, so there's that.
But as far as my projects are concerned, this seems to be one of those times where I just don't feel like coding. Well, not quite, I have been committing some stuff to melonDS... just not anything that would be worthy of a juicy blog post. I also had another project: porting my old SNES emulator to the WiiU gamepad. I got it to a point where it runs and displays some graphics, but for now I don't seem motivated to continue working on it...
But I still have some ideas for melonDS.
One idea I had a while ago: using the WFC connection presets to store settings for different altWFC servers. For example, using connection 1 for Wiimmfi, connection 2 for Kaeru, etc... and melonDS would provide some way of switching between them. I don't know how really useful it would be, or how feasible it would be wrt patching requirements, but it could be interesting.
Another idea would be RTCom support. Basically RTCom is a protocol that is used on the 3DS in DS mode to add support for analog sticks and such. It involves game patches and ARM11-side patches to forward input data to RTC registers. The annoying aspect of this is that each game seems to have its own ARM11-side patch, and I don't really look forward to implementing a whole ARM11 interpreter just for this.
But maybe input enhancements, be it RTCom stuff or niche GBA slot addons, would make a good use case for Lua scripting, or some kind of plugin system... I don't know.
There are other big ideas, of course. The planned UI redesign, the netplay stuff, website changes, ...
Oh well.
Fix for macOS builds -- by Arisotura
Not a lot to talk about these days, as far as melonDS is concerned. I have some ideas, some big, some less big, but I'm also being my usual ADHD self.
Anyway, there have been complaints about the macOS builds we distribute: they're distributed as nested zip files.
Apparently, those people aren't great fans of Russian dolls...
The issue is caused by the way our macOS CI works. I'd have to ask Nadia, but I think there isn't really an easy fix for this. So the builds available on Github will have to stay nested.
However, since I control this site's server, I can fix the issue here. So now, all the macOS builds on the melonDS site have been fixed, they're no longer nested. Similarly, new builds should also be fixed as they're uploaded. Let us know if anything goes wrong with this.
DSP HLE: reaching the finish line -- by Arisotura
In the last post, the last things that needed to be added to DSP HLE were the G711 ucode, and the basic audio functions common to all ucodes.
G711 was rather trivial to add. The rest was... more involved. Reason was that adding the new audio features required reworking some of melonDS's audio support.
melonDS was originally built around the DS hardware. As far as sound is concerned, the DS has a simple 16-channel mixer, and the output from said mixer is degraded to 10-bit and PWM'd to the speakers. Microphone input is even simpler: the mic is connected to the touchscreen controller's AUX input. Reading said input gives you the current mic level, and you need to set up a timer to manually sample it at the frequency you want.
So obviously, in the early days, melonDS was built around that design.
The DSi, however, provides a less archaic system for producing and acquiring sound.
Namely, it adds a TI audio amplifier, which is similar to the one found in the Wii U gamepad, for example. This audio amplifier also doubles as a touchscreen controller, for DS backwards compatibility.
Instead of the old PWM stuff, output from the DS mixer is sent to the amplifier over an I2S interface. There is some extra hardware to support that interface. It is possible to set the sampling frequency to 32 KHz (like the DS) or 47 KHz. There is also a ratio for mixing outputs from the DS mixer and the DSP.
Microphone input also goes through an I2S interface. The DSi provides hardware to automatically sample mic input at a preset frequency, and it's even possible to automate it entirely with a NDMA transfer. All in all, quite the upgrade compared to the DS. Oh, and mic input is also routed to the DSP, and the ucodes have basic functions for mic sampling too.
All fine and dandy...
So I based my work on CasualPokePlayer's PR, which implemented some of the new DSi audio functionality. I added support for mixing in DSP audio. The "play sound" command in itself was trivial to implement in HLE, once there was something in place to actually output DSP audio.
I added support for the different I2S sampling frequencies. The 47 KHz setting doesn't affect DS mixer output beyond improving quality somewhat, but it affects the rate at which DSP output plays.
For this, I changed melonDS to always produce audio at 47 KHz (instead of 32 KHz). In 32 KHz mode (or in DS mode), audio output will be resampled to 47 KHz. It was the easiest way to deal with this sort of feature.
Then, I added support for audio output to Teakra. I had to record the audio output to a file to check whether it was working correctly, because it's just too slow with DSP LLE, but a couple bugfixes later, it was working.
Then came the turn of microphone input. The way it's done on the DSP is a bit weird. Given the "play sound" command will DMA sound data from ARM9 memory, I thought the mic commands would work in a similar way, but no. Instead, they just continually write mic input to a circular buffer in DSP memory, and that's about it. Then you need to set up a timer to periodically read that circular buffer with PDATA.
But there was more work to be done around mic input.
I implemented a centralized hub for mic functionality, so buffers and logic wouldn't be duplicated in several places in melonDS. I also changed the way mic input data is fed into melonDS. With the hub in place, I could also add logic for starting and stopping mic recording. This way, when using an external microphone, it's possible to only request the mic when the game needs it, instead of hogging it all the time.
Besides that, it wasn't very hard to make this new mic input system work, including in DSi mode. I made some changes to CasualPokePlayer's code to make it more accurate to hardware. I also added the mic sampling commands to DSP HLE, and added BTDMP input support to Teakra so it can also receive mic input.
The tricky part was getting input from an external mic to play nicely and smoothly, but I got there.
There is only one problem left: libnds homebrew in DSi mode will have noisy mic input. This is a timing issue: libnds uses a really oddball way to sample the DSi mic (in order to keep it compatible with the old DS APIs), where it will repeatedly disable, flush and re-enable the mic interface, and wait to receive one sample. The wait is a polling loop with a timeout counter, but it's running too fast on melonDS, so sometimes it doesn't get sampled properly.
So, this is mostly it. What's left is a bunch of cleanup, misc testing, and adding the new state to savestates.
DSP HLE is done. It should allow most DSP titles to play at decent speeds, however if a game uses an unrecognized DSP ucode, it will fall back to LLE.
The dsp_hle branch also went beyond the original scope of DSP HLE, but it's all good.
DSi mode finally gets microphone input. I believe this was the last big missing feature, so this brings DSi mode on par with DS mode.
The last big thing that might need to be added would be a DSP JIT, if there's demand for this. I might look into it at some point, would be an occasion to learn new stuff.
I'm thinking those changes might warrant a lil' release.
Having fun with DSP HLE -- by Arisotura
If you pay attention to the melonDS repo, you might have noticed the new branch: dsp_hle.
And this branch is starting to see some results...
These screenshots might not look too impressive at first glance, but pay closer attention to the framerates. Those would be more like 4 FPS with DSP LLE.
Also, we still haven't fixed the timing issue which affects the DSi sound app -- I had to hack the cache timings to get it to run, and I haven't committed that. We'll fix this in a better way, I promise.
So, how does this all work?
CasualPokePlayer has done research on the different DSP ucodes in existence. So far, three main classes have been identified:
• AAC SDK: provides an AAC decoder. The DSi sound app uses an early version of this ucode. It is also found in a couple other titles.
• Graphics SDK: provides functions for scaling bitmaps and converting YUV pictures to 15-bit RGB.
• G711 SDK: provides a G711 (A-law and µ-law) codec.
All ucodes also share basic audio functionality, allowing them to play simple sounds (for example, a camera shutter sound) and record microphone input. They are fairly simple, but emulating them in melonDS will require some reworking of the audio system.
It's not like this DSP is used to its fullest here, but hey, it's something.
I first started working on the graphics ucode, which is used in Let's Golf. It's used to scale down the camera picture from 144x144 to 48x48 -- you can see it in the screenshot above.
There's always something satisfying to reverse-engineering things and figuring out how they work. I even wrote a little test homebrew that loads a graphics SDK ucode and tests all the aspects of the bitmap scaling command (and another one for the yuv2rgb command). I even went as far as to work out the scaling algorithms so I could replicate them to the pixel. I also added delays to roughly simulate how long the graphics commands would take on hardware.
The way the scaling command works is a bit peculiar. You give it the size of your source bitmap, then you give it a rectangle within said bitmap, and X/Y scaling factors. Then it takes the specified rectangle of the source bitmap and scales it using the factors.
You can also specify a filtering mode. Nearest neighbor, bilinear and bicubic are supported. Nothing special to say about them, other than the fact bicubic uses somewhat atypical equations.
However, there's also a fourth filtering mode: one-third. This mode does what it says on the tin: it ignores the provided X/Y scaling factors, and scales the provided bitmap down to one third of its original size. The way it works is fairly simple: for every 3x3 block of source pixels, it averages the 8 outer pixels' values and uses that as the destination value. This also means that it requires that the source dimensions be multiples of 3.
Interestingly, the example of Let's Golf would be a perfect candidate for one-third scaling, but they chose bicubic instead.
After this, I felt like looking at the DSi sound app.
The AAC ucode doesn't use the pipe to receive commands. That's why it was working in LLE, despite the bugs that broke the pipe. Instead, commands and parameters are just sent through the CMD1 register. Also, the decoded sound data isn't sent to the audio output directly, but transferred to ARM9 RAM. Hence why the DSi sound app was somehow functional (albeit very slow).
My first step was to figure out what the command parameters mean and what kind of data is sent to the AAC ucode, so I could understand how to use it. I didn't exactly feel like replicating an entire AAC decoder in my HLE implementation, so I went with faad2 instead.
This has been the occasion for me to learn about AAC, MP4 files and such. I even modified Gericom's DSiDsp homebrew to work with the DSi AAC ucode, turning it into the worst MP4 audio player ever.
DSiDsp is an example of AAC decoding, but it's made to work with the 3DS AAC ucode. Besides the differences in the communication protocol and how memory is accessed, they also don't quite take in the same kind of data. The 3DS ucode takes AAC frames with ADTS headers. ADTS is a possible transport layer for AAC audio, where each frame gets a small header specifying the sampling frequency, channel configuration, and other parameters. All fine and dandy.
The DSi AAC ucode, however, doesn't take ADTS or ADIF headers, just raw AAC frames. The sampling frequency and channel configuration are specified in the parameters for the decoder command.
So I had to figure out how to make this work nicely with faad2. I also ran into an issue where the DSi sound app will first send an all-zero AAC frame before sending the correct data, which seems to be a bug in the sound app itself. The AAC ucode doesn't seem to be affected by this (it just returns an error code), but third-party AAC decoders don't like it. faad2 gets put into a bogus state where no further frames can be decoded. fdk-aac barfs over the memory in a seemingly random way, eventually causing a crash. So I had to hack around the issue and ignore all-zero frames.
So melonDS decodes AAC audio now. It's still rough around the edges, but it's pretty neat.
Now I guess the final step would be to reverse-engineer and implement the G711 ucode.
As for LLE options for DSP emulation, some sort of recompiler (JIT or otherwise) would be the way to go. I was thinking about how such a thing might work, but I've never written that sort of thing before. It could be the occasion to learn about this. It would be worthwhile if someone out there is trying to develop for the DSP, but as far as commercial titles are concerned, HLE should cover all the needs, all while being way more efficient.
The AV codec stuff is also adjacent to something else that's on my mind. It's not related to the DSP, but it's something I want to experiment with eventually. If it pans out, I'll let you guys know!
melonDS 1.0 is out -- by Arisotura
Finally, the "proper" melonDS 1.0 release is here. Sorry that it took so long...
Anyway, this is pretty much the same as the 1.0 RC, but we fixed a bunch of bugs that were found in said RC.
Namely, you can now use multiple windows with OpenGL under Windows.
However, depending on how good your OpenGL driver is, doing so may reduce performance. It's due to having multiple OpenGL contexts sharing data, but for now we don't really know what we can do about it. If you have ideas, let us know!
Speaking of multiple windows, I also added a way to tell melonDS windows apart, because things could get pretty confusing. They now get a tag in their title, for example [p1:w2] means first multiplayer instance, second window.
We also merged asie's add-on support PR, so this release includes support for the Motion Pak and the Guitar Grip.
We merged some other PRs. Among which, lower audio latency.
This release also includes the DSi camera fixes that were discussed in the previous posts. DSi titles that ran into issues while trying to use the camera should now work with no problems.
However, since they tend to also use the DSP at the same time, the performance will be abysmal...
DSP HLE is something I want to experiment with, but that will be for a further release.
As far as future plans are concerned, I also want to redesign the site's homepage, but you'll find out when I get around to that.
Anyway, you can find the release on our downloads page, as usual. Enjoy!
Let's Golf: the horseman of apocalypse that wasn't one -- by Arisotura
I figure I need to resolve the dramatic tension from the previous post.
So the issue we had was a screen-sized DMA3 transfer interfering with NDMA1, which is used to transfer camera data...
Shortly after I made that post, I recalled that Jakly mentioned VRAM timings, and started to figure it out.
I said that when I reproduced the setup in a homebrew, it would trigger a data overrun error and get stuck. But there was one thing I was missing: my homebrew was using a camera resolution of 256x192, with DMA set to run every 4 scanlines, the standard stuff. However, Let's Golf uses cropping to achieve a resolution of 144x144, and runs the DMA every 7 scanlines. This means more time between each NDMA1 transfer.
Regarding VRAM, the DSi supports accessing it over the 32-bit bus, instead of the old 16-bit bus. One effect is that this makes it possible to write to VRAM in 8-bit units, which wasn't possible on the DS. Another effect is that it affects timings: for example, a 32-bit DMA transfer from main RAM to VRAM would take 2 cycles per word, instead of 3.
I did the math, and with such timings, the screen-sized DMA3 transfer would have enough time that it could run between two NDMA1 transfers without disrupting the camera operation. But with the 16-bit bus timings, DMA3 would definitely take too long.
I even modified my homebrew to use the same 144x144 resolution as Let's Golf, and added a key to toggle the 32-bit bus for VRAM (via SCFG_EXT9 bit 13). Suddenly, my homebrew was running just fine as long as the 32-bit bus was enabled, but when it was disabled, it would trigger the data overrun error and get stuck.
So, basically, this is nothing fancy, just a case of "this works out of pure luck".
I added support for the new VRAM timings in melonDS. But this wasn't enough, Let's Golf would still get stuck.
I looked at it, and it was still possible that DMA3 would start right before a NDMA1 transfer, which would be the worst time. The camera FIFO would definitely be almost full at that point, and further scanlines would overflow before NDMA1 had a chance to run.
I thought about it, and... there was no way this could work unless there were some double-buffering shenanigans involved.
I modified my homebrew to try and verify this hypothesis. My idea was to start a camera transfer, let it fill and overrun the FIFO, and use a dummy NDMA to track it. The NDMA channel would be configured like for a camera transfer, but it wouldn't actually read camera data, and the length would be set to 1 word. I would just use the NDMA IRQ as a way to sense the camera transfer condition. This way, I could know how many times it fires before hitting the overrun error.
This didn't reveal anything, as the NDMA was only fired once.
My next idea was to wait for the NDMA IRQ, and from that point, count how long it takes for the data overrun to be raised (by checking bit 4 in CAM_CNT). The timings I measured from this definitely suggested that a second block of camera data was being transferred after the NDMA IRQ, before it would raise the data overrun error.
Which, in turn, tended to confirm my hypothesis that there are actually two FIFO buffers for camera data. Thus, when one buffer is filled entirely (ie. as much as the DMA interval allows), if the other buffer is empty, the buffers are swapped and the DMA transfer is fired, otherwise the data overrun error is raised.
So I reworked melonDS's camera interface implementation to take this into account. Fixed a bunch of other issues. Had some "how the fuck did this even work?" moments.
There are still minor issues I need to iron out, but: finally, Let's Golf is fixed, and I haven't seen regressions...
More camera trouble... -- by Arisotura
So I had made a nice post about Let's Golf and how it was fixed...
But, obviously, as far as DSi camera support is concerned, it wasn't all.
I looked at another game that was running into issues with the camera: Assassin's Creed II. The Wanted feature in the menu uses the camera, if you're playing the game on a DSi, but on melonDS, it just showed nothing at all.
Quick investigation showed why.
Normally, when using the camera, games will set up the DMA with a block length of N scanlines, and a total length matching the length of the full camera picture. The DMA channel will also be set to trigger an IRQ when it's done.
However, this game does things differently. The DMA channel has no total length setting, and is just set to repeat infinitely. It transfers picture data to a small temporary buffer, from which the game reads when needed. No idea why they did it this way, but regardless, shows why it's good to emulate things accurately. Anyway, a NDMA channel that is set to "repeat infinitely" will trigger an IRQ after each block, but due to an oversight, melonDS never triggered any IRQ.
After fixing this, I did have the camera feed showing up in the game's UI thing, but it was rolling. Heh. Couldn't have been so simple.
This turned out to be because of the timings for the camera transfer. The timings melonDS used were a big fat guess, and were way too fast for that game.
I dug up my old camera test homebrew and modified it to track camera timings from the DSi. Took some time to figure out the logic behind the numbers I was getting -- there was more time between camera DMA transfers when running in 256x192 mode, than in 640x480 mode. In fact, it makes sense: internally, the camera always runs at 640x480, and the HSync/VSync are the same, but when told to output a 256x192 picture, the camera simply skips some of the scanlines.
Once I understood that fact, I was able to put together a timing model that more closely resembled the real deal. And this fixed Assassin's Creed II -- the camera preview thing was working flawlessly.
I then checked Let's Golf, and... it was rolling, again.
Welcome to emulation whack-a-mole.
But, that's the thing. I have researched this bug long and hard, and I can't figure it out.
The basic issue is as follows: the game uses NDMA1 to transfer camera picture data to main RAM, but it also periodically (every 2 frames) transfers video data from main RAM to VRAM, to be displayed on the top screen, and does so using DMA3 (the old DMA, not NDMA).
Camera output isn't synchronized to the LCD framerate. This means that the DMA3 transfer may occur while a camera transfer is in progress. In melonDS, it meant that the NDMA1 transfer couldn't run because DMA3 was already running.
This highlighed a bug with how melonDS handled camera DMA: it assumed that the "try to fire a DMA transfer" operation would result in a DMA transfer effectively starting, but when that wasn't the case, things went south. I remodeled the camera FIFO to fix this problem (and raise a data overrun error instead of skipping a chunk). Let's Golf was no longer rolling, but it was just getting stuck on the same camera frame.
So clearly there was more to it...
But I can't figure it out.
I ran hardware tests, thinking that maybe NDMA1 should have priority over DMA3. But nope. NDMA can't preempt old DMA, no matter the settings.
I modified my camera homebrew to reproduce what Let's Golf does, and it gets stuck after a couple camera frames or less, with a camera data overrun error. So I can't understand why the game works fine.
I even modified the game itself, to track things like when the camera IRQ occurs, or whether data overrun errors happen, and it revealed nothing.
So, yeah. I'm absolutely stumped.
Who would have thought that a camera thing in a golf game would become a new horseman of apocalypse...
It feels weird after the last post, but I might ship melonDS 1.0 with this broken. But hey, the fixes do fix a bunch of other DSi games.
Windows OpenGL issues fixed, finally! -- by Arisotura
I went on a quest and battled the worst enemy imaginable.
Worse than a thousand orcs.
I fought endless privacy settings screens. Warded off all sorts of bullshit offers. Resisted the temptation to throw my brain in a lake.
I installed Windows 10 on my old laptop Crepe, so I could finally fix the issues with multiple windows and OpenGL.
CasualPokePlayer greatly helped me understand the problem, too.
Basically, due to the way melonDS works, when using OpenGL, we create the GL context on the UI thread, then use it on the emu thread exclusively. This is so that the OpenGL renderers can access OpenGL without needing extra locking. Window redrawing is also done on the emu thread.
The issue was due to how I originally implemented the multi-window mode. When a second window is created, it shares its GL context with its parent window. This way, everything OpenGL related will work on all windows. Except it turned out that the parent context was created on the UI thread, then made current on the emu thread, before the child context was created. Windows doesn't like that, and thus, fails to create the child context.
So it took some reworking to get this working smoothly, but the issue is fixed now.
This means that the proper 1.0 release will be soon -- for real. This issue was the last show stopper, basically.
I also fixed a couple other issues. For one, I added a way to tell melonDS windows apart, since multi-window made things pretty confusing. So now they get a [p1:w1] type tag that says which instance and which window it is. I also fixed a bug with the way windows were parented for second multiplayer instances.
I might try to fix some other misc. stability issues, if I can reproduce them. I will also likely rework the DSi camera timing model a bit, to fix a couple games.
DSP HLE is tempting me, as a next project, but it won't be for 1.0.
DSi bugfixes -- by Arisotura
So, yeah... The current mood around melonDS is "it does most things pretty well already", as far as emulation is concerned. While it's certainly true for the DS side of things, there's still a lot to do with the DSi. And this randomly piqued my interest.
Notably, we still have a bunch of games that can't get ingame, generally hanging on something DSP-related.
So I took one of those games: Let's Golf.
When run for the first time, the game has you create a profile, by first entering your name, then taking a selfie to be used as a profile icon. All fine. However, here the game freezes before you can get to the selfie part.
So I dug into it to try and figure out why it was freezing. We knew it was trying to use the DSP, but not much beyond that... anyway, I extracted the game's DSP binary to get more insight. The DSP was just running its main loop, waiting for incoming commands. On the other hand, the ARM9 was stuck, waiting for... something.
The ARM9 was using the PDATA registers to read the DSP memory, however, that got stuck. The bug turned out to be silly: PDATA reads/writes can have a fixed length set, but the code handling them wasn't calculating that length correctly, so there wasn't enough data being returned. Hence the ARM9 getting stuck.
When fixing that, I could get to the selfie screen thingy.
However, two issues: 1) the camera preview was rolling (as shown above), and 2) when trying to take a picture, it would just freeze.
Third issue is that it runs at like 4 FPS, but that's common to anything DSP-related for now, and we'll address it later.
I decided to first try and fix the camera rolling issue, because I thought that maybe the second issue was linked.
So I first logged what was going on with the camera. What settings are used on the camera, on the interface, and on the associated DMA channel.
cam1: start, width=256 height=192 read=0024 format=0002
CAM TRANSFER: CNT=EE06 CROP=00180038/00A700C6
ARM9 NDMA1 START MODE 0B, 04004204->0221E0E0 LEN=10368 BLK=504 CNT=CB044000
So what does this all mean?
The camera is set to 256x192, nothing special there. However, the settings used on the camera interface enable cropping, which is interesting. Basically, it only retains a 144x144 section of the camera frame.
Which is confirmed by the DMA parameters. The total length is 10368 words, which is the length of a 144x144 frame. The block length is 504 words, which corresponds to 7 scanlines of said frame. This matches the DMA interval setting on the camera interface, too.
So what's the issue there? If you happen to remember, I've discussed the camera interface before. Long story short, it has a FIFO buffer that holds 512 words worth of picture data, and it triggers a DMA transfer every N scanlines, with N=1..16. When modelling this system in melonDS, I had assumed that this N would be a multiple of the frame height.
But the issue became apparent after I looked at my logs: 144 is not a multiple of 7.
Which meant that it was discarding the last 4 scanlines of each camera frame. Hence the rolling.
So I decided to force a final DMA transfer at the end of each camera frame if there's anything left in the FIFO, which seems to be what the hardware does. Either way, this fixed the rolling issue entirely.
But it didn't fix the second issue, which turned out to be unrelated.
Another apparent issue is that the camera input looks very squished in melonDS. It might be good to add a "preserve aspect ratio" scaling mode to avoid that.
So I had to dig deeper, once again. When trying to take a picture, the game would send a command to the DSP to tell it to scale the picture to a different size, then send it a bunch of parameters via PDATA, then... get stuck.
This one proved to be tricky to figure out. I had no real idea what was going wrong... did a lot of logging and tracing, but couldn't really figure it out.
Eventually, CasualPokePlayer enlightened me about what was being sent to the DSP.
The mechanism used to transfer command parameters is what they call the pipe. It's a simple FIFO buffer: the ARM9 writes data into the buffer, then updates the buffer write position, and does all that by using PDATA.
In this situation, the parameters that were being sent for the scaling command looked correct, but the pipe write position was zero, which, according to CasualPokePlayer, was suspicious. He was on to something.
Looking at the code that was determining that value, I found that it was broken because for some reason the pipe length was zero. So I traced writes to that variable. I found that it was part of a bunch of variables that were initialized from DSP memory, through a PDATA read.
Looking closer, there was another bug with PDATA reads, that caused them to be one off. When I fixed that, the freeze was fixed entirely.
Finally, we get to play golf with a goofy face in melonDS. What's not to love!
The game also runs quite well past the profile selfie part. Yep, it uses the DSP just to scale a picture.
Also, looking at the game's DSP binary, I had an idea.
This binary only supports two commands: scaling and yuv2rgb. There's also a separate command channel, and it has one command, for playing sound effects (presumably, the camera shutter sound).
It would be totally feasible to HLE this shit.
According to CasualPokePlayer, most of the games/apps that use the DSP seem to use one or another variant of this binary, with extra features, but same basic idea. The only exception is the DSi sound app, which uses the DSP for AAC decoding.
Obviously, it would still be worth it to pursue a DSP JIT, atleast for the sake of homebrew. But the HLE route seems to also be viable, and it's piquing my interest now. Maybe not for 1.0, but I want to give it a try.
Proper melonDS 1.0 release "soon" -- by Arisotura
Apologies for taking so long to do a proper release.
Regardless, I think most, if not all, of the bugs that were found in the 1.0 RC have been fixed, so expect the proper 1.0 release soon. Hopefully.
The main issue would be the lack of functional Windows CI, but we're working on it...
We also have fun plans in mind for further releases, but we'll see once we get there.
Technical issues #2 -- by Arisotura
You might have noticed the lack of Windows builds on the nightlies page.
Sorry about it. The Windows CI is broken. Our CI expert Nadia has been on it for a while, so hopefully it should be back up again... someday.
On the flip side, the little loser stopped trying to take the server down, so that's atleast one positive.
Hardware renderer progress -- by Arisotura
Hey hey, little status update! I've been having fun lately.
The hardware renderer has been progressing nicely lately. It's always exciting when you're able to assemble the various parts you've been working on into something coherent, and it suddenly starts looking a lot like a finished product. It's no longer just something I'm looking at in RenderDoc.
Those screenshots were taken with 4x upscaling (click them for full-size versions). The last one demonstrates hi-res rotscale in action. Not bad, I dare say.
It's not done yet, though, so I'll go over what remains to be done.
Mosaic
Shouldn't be very difficult to add.
Except for, you know, sprite mosaic. I don't really know yet how I'll handle that one. The way it works is intuitive if you're processing pixels left-to-right within a scanline, but this isn't how a modern GPU works.
Display capture
What was previously in place was a bit of a hack made to work with the old approach, so I will have to rework this. Atleast now I have the required system for tracking VRAM banks for this. But the whole part of capturing video output to OpenGL textures, and reusing them where needed, will need to be redone.
I also need to think of a way to sync hi-res captures back to emulated VRAM when needed.
Of course, support for this also needs to be added to the 3D renderers, for the sake of render-to-texture. Shouldn't be too difficult to add it to the compute renderer, but the old OpenGL renderer is another deal. I had designed that rather lazily, just streaming raw VRAM to the GPU, and never improved it. I could backport the texture cache the compute renderer uses, but it will take a while.
Mid-frame rendering
I need to add provisions in case certain things get changed mid-frame. I quickly did it for OAM, as shown by that iCarly screenshot above. The proof of concept works, but it can be improved upon, and extended to other things too.
The way this works is similar to the blargSNES hardware renderer. When certain state gets modified mid-frame, a section of the screen gets rendered with the old state - that section goes to the top of the screen, or wherever the previous section ended if there was any. Upon VBlank, we finish rendering in the same way.
There are exceptions for things that are very likely to be changed mid-frame (ie. window positions, BG scroll positions), and are relatively inexpensive to deal with. It's worth noting that on the DS, it's less frequent for video registers to be changed mid-frame, because the hardware is more flexible. By comparison, in something like a SNES game, you will see a lot more mid-frame state changes.
Either way, time will tell what is worth accounting for and what needs to be optimized.
Filtering
Hey, let's not forget why we came in here in the first place.
Hopefully, with the way the shaders are structured, it shouldn't be too hard to slot in filters. Bilinear, bicubic, HQX, xBRZ, you name it.
What I'm not too sure about is how well it'll work with sprites. It's not uncommon for games to splice together several small sprites to form bigger graphical elements, and I'm not sure how that'll work with filtering. We'll see.
Misc. things
The usual, a lot of code cleanup, optimization work, fixing up little details, and so on.
And work to make the code nicer to work with, which isn't particularly exciting.
This gives you a rough idea of where things are at, and what's left to be done. To conclude, I'll leave you with an example of what happens when Arisotura goofs up her OpenGL calls:
Have fun!
Hardware rendering, the fun -- by Arisotura
This whole thing I'm working on gives me flashbacks from blargSNES. The goal and constraints are different, though. We weren't doing upscaling on the 3DS, but also, we had no fragment shaders, so we were much more limited in what we could do.
Anyway, these days, I'm waist-deep into OpenGL. I'm determined to go further than my original approach to upscaling, and it's a lot of fun too.
I might as well talk more about that approach, and what its limitations are.
First, let's talk about how 2D layers are composited on the DS.
There are 6 basic layers: BG0, BG1, BG2, BG3, sprites (OBJ) and backdrop. Sprites are pre-rendered and treated as a flat layer (which means you can't blend a sprite with another sprite). Backdrop is a fixed color (entry 0 of the standard palette), which basically fills any space not occupied by another layer.
For each pixel, the PPU keeps track of the two topmost layers, based on priority orders.
Then, you have the BLDCNT register, which lets you choose a color effect to be applied (blending or fade effects), and the target layers it may apply to. For blending, the "1st target" is the topmost pixel, and the "2nd target" is the pixel underneath. If the layers both pixels belong to are adequately selected in BLDCNT, they will be blended together, using the coefficients in the BLDALPHA register. Fade effects work in a similar fashion, except since they only apply to the topmost pixel, there's no "2nd target".
Then you also have the window feature, which can exclude not only individual layers from a given region, but can also disable color effects. There are also a few special cases: semi-transparent sprites, bitmap sprites, and the 3D layer. Those all ignore the color effect and 1st target selections in BLDCNT, as well as the window settings.
In melonDS, the 2D renderer renders all layers according to their priority order, and keeps track of the last two values for each pixel: when writing a pixel, the previous value is pushed down to a secondary buffer. This way, at the end, the two buffers can be composited together to form the final video frame.
I've talked a bit about how 3D upscaling was done: basically, the 3D layer is replaced with a placeholder. The final compositing step is skipped, and instead, the incomplete buffer is sent to the GPU. There, a compositor shader can sample this buffer and the actual hi-res 3D layer, and finish the work. This requires keeping track of not just the last two values, but the last three values for any given pixel: if a given 3D layer pixel turns out to be fully transparent, we need to be able to composite the pixels underneath "as normal".
This approach was good in that it allowed for performant upscaling with minimal modifications to the 2D renderer. However, it was inherently limited in what was doable.
It became apparent as I started to work on hi-res display capture. My very crude implementation, built on top of that old approach, worked fine for the simpler cases like dual-screen 3D. However, it was evident that anything more complex wouldn't work.
For example, in this post, I showed a render-to-texture demo that uses display capture. I also made a similar demo that renders to a rotating BG layer rather than a 3D cube:
And this is what it looks like when upscaled with the old approach:
Basically, when detecting that a given layer is going to render a display capture, the renderer replaces it with a placeholder, like for the actual 3D layer. The placeholder values include the coordinates within the source bitmap, and the compositor shader uses them to sample the actual hi-res bitmap.
The fatal flaw here is that this calculation doesn't account for the BG layer's rotation. Hence why it looks like shit. Linear interpolation could solve this issue, but it's just one of many problems with this approach.
Another big issue was filtering.
The basic reason is that when you're applying an upscaling filter to an image, for each given position within the destination image, you're going to be looking at not only the nearest pixel from the source image, but also the surrounding pixels, in an attempt at inferring the missing detail. For example, a bilinear filter works on a 2x2 block of source pixels, while it's 4x4 for a bicubic filter, and as much as 5x5 for xBRZ.
In our case, the different graphical layers are smooshed together into a weird 3-layer cake. This makes it a major pain to perform filtering: say you're looking at a source pixel from BG2, you'd want to find neighboring BG2 pixels, but they may be at different levels within the layer cake, or they may just not be part of it at all. All in all, it's a massive pain in the ass to work with.
Back in 2020, I had attempted to implement a xBRZ filter as a bit of a demo, to see how it'd work. I had even recorded a video of it on Super Princess Peach, and it was looking pretty decent... but due to the aforementioned issues, there were always weird glitches and other oddball issues, and it was evident that this was stretching beyond the limits of the old renderer approach. The xBRZ filter shader did remain in the melonDS codebase, unused...
-
So, basically, I started working on a proper hardware-accelerated 2D renderer.
As of now, I'm able to decode individual BG layers and sprites to flat textures. The idea is that doing so will simplify filtering a whole lot: instead of having to worry about the original format of the layer, the tiles, the palettes, and so on, it would just be matter of fetching pixels from a flat texture.
Here's an example of sprite rendering. They are first pre-rendered to an atlas texture, then they're placed on a hi-res sprite layer.
This type of renderer allows for other nifty improvements too: for example, hi-res rotation/scaling.
Next up is going to be rendering BG layers to similar hi-res layers. Once it's all done, the layers can be sent to the compositor shader and the job can be finished. I also have to think of provisions to deal with possible mid-frame setup changes. Anyone remember that midframe-OAM-modifying foodie game?
There will also be some work on the 3D renderers, to add support for things like render-to-texture, but also possibly adding 3D enhancements such as texture filtering.
-
I can hear the people already, "why make this with OpenGL, that's old, you should use Vulkan".
Yeah, OpenGL is no longer getting updates, but it's a stable and mature API, and it isn't going to be deprecated any time soon. For now, I see no reason to stop using it immediately.
However, I'm also reworking the way renderers work in melonDS.
Back then, Generic made changes to the system, so he could add different 2D renderers for the Switch port: a version of the software renderer that uses NEON SIMD, and a hardware-accelerated renderer that uses Deko3D.
I'm building upon this, but I want to also integrate things better: for example, figuring out a way to couple the 2D and 3D renderers better, and generally a cleaner API.
The idea is to also make it easier to implement different renderers. For example, the current OpenGL renderer is made with fast upscaling in mind, but we could have different renderers for mobile platforms (ie. OpenGL ES), that are first and foremost aimed at just being fast. Of course, we could also have a Vulkan renderer, or Direct3D, Metal, whatever you like.
melonDS 1.1 is out! -- by Arisotura
As promised, here is the new release: melonDS 1.1.
So, what's new in this release?
EDIT - there was an issue with the release builds that had been posted, so if your JIT option is greyed out and you're not using a x64 Mac, please redownload the release.
DSP HLE
This is going to be a big change for DSi gamers out there.
If you've been playing DSi titles in melonDS, you may have noticed that sometimes they run very slow. Single-digit framerates. Wouldn't be a big deal if melonDS was always this slow, but obviously, it generally performs much better, so this sticks out like a sore thumb.
This is because those titles use the DSi's DSP. What is the DSP, you ask? A specific-purpose (read: weird) processor that doesn't actually do much besides being very annoying and resource-intensive to emulate. They use it for such tasks as downscaling pictures or playing a camera shutter sound when you take a picture.
With help from CasualPokePlayer, we were able to figure out the 3 main classes of DSP ucodes those games use, determine their functionality, and implement HLE equivalents in melonDS. Thus, those wonderful DSP features can be emulated without utterly wrecking performance.
DSP HLE is a setting, which you will find in the emulation settings dialog, DSi-mode tab. It is enabled by default.
Note that if it fails to recognize a game's DSP ucode, it will fall back to LLE. Similarly, homebrew ucodes will also fall back to LLE. There's the idea of adding a DSP JIT to help with this, but it's not a very high priority right now.
DSi microphone input
This was one of the last big missing features in DSi mode, and it is now implemented, thus further closing the gap between DS and DSi emulation in melonDS.
The way external microphone input works was also changed: instead of keeping your mic open at all times, melonDS will only open it when needed. This should help under certain circumstances, such as when using Bluetooth audio.
High-quality audio resampling
The implementation of DSP audio involved several changes to the way melonDS produces sound. Namely, melonDS used to output at 32 KHz, but with the new DSi audio hardware, this was changed to 47 KHz. I had added in some simple resampling, so melonDS would produce 47 KHz audio in all cases. But this caused audio quality issues for a number of people.
Nadia took the matter in her hands and replaced my crude resampler with a high-quality blip-buf resampler. Not only are all those problems eliminated, but it also means the melonDS core now outputs at a nice 48 KHz frequency, much easier for frontends to deal with than the previous weird numbers.
Cheat database support
If you've used cheats in melonDS, surely you've found it inconvenient to have to manually enter them into the editor. But this is no more: you can now grab the latest R4 cheat database (usrcheat.dat) for example, and import your cheat codes from that.
The cheat import dialog will show you which game entries match your current game, show the cheat codes they contain, and let you select which codes to import. You can also choose whether to clear any previously existing cheat codes or to keep them when importing new codes.
melonDS's cheat code system was also improved in order to fully preserve the structure found in usrcheat.dat. Categories and cheat codes can now have descriptions, categories have an option to allow only one code in them to be enabled, and codes can be created at the root, without having to be in a category.
The cheat file format (.mch) was also modified to add support for this. The parser is backwards-compatible, so it will recognize old .mch files just fine. However, new files won't be able to be recognized by older melonDS versions.
The cheat editor UI was also revamped to add support for the new functionality, and generally be more flexible and easier to work with. For example, it's now possible to reorder your cheat codes by moving them around in the list.
Compute shader renderer fix
Those of you who have tried the compute shader renderer may have noticed that it could start to glitch out at really high resolutions. This was due to running out of tile space.
We merged FireNX70's pull request, which implements tile size scaling in order to alleviate this problem. This means the renderer should now be able to go pretty high in resolution without issues.
Wayland OpenGL fix
If you use Wayland and have tried to use the OpenGL renderers, you may have noticed that it made the melonDS window glitchy, especially when using hiDPI scaling.
I noticed that glitch too, but had absolutely no idea where to start looking for a fix. So I kinda just... didn't use OpenGL, and put that on the backburner.
Until a while ago, when I felt like trying modern PCSX2. I was impressed by how smoothly it ran, compared to what it was like back in 2007... but more importantly, I realized that it was rendering 3D graphics in its main window alongside UI elements, that it uses Qt and OpenGL just like melonDS, and that it was flawless, no weird glitchiness.
So I went and asked the PCSX2 team about it. Turns out they originally took their OpenGL context code from DuckStation, but improved upon it. Funnily enough, melonDS's context code also comes from there. Small world.
In the end, the PCSX2 folks told me about what they did to fix Wayland issues. I tried one of the fixes that involved just two lines of code, and... it completely fixed the glitchiness in melonDS. So, thanks there!
BSD CI
We now have CI for FreeBSD, OpenBSD and NetBSD, courtesy Rayyan and Izder456. This means we're able to provide builds for those platforms, too.
Adjustments were also done to the JIT recompiler so it will work on those platforms.
Fixing a bunch of nasty bugs
For example: it has been reported that melonDS 1.0 could randomly crash after a while if multiple instances were opened. Kind of a problem, given that local multiplayer is one of melonDS's selling points. So, this bug has been fixed.
Another fun example, it sometimes occured that melonDS wouldn't output any sound, for some mysterious reason. As it was random and seemingly had a pretty low chance of occuring, I was really not looking forward to trying to reproduce and fix it... But Nadia saved the day by providing a build that exhibited this issue 100% of the time. With a reliable way to reproduce the bug, I was able to track it down and it was fixed.
Nadia also fixed another bug that caused possible crashes that appeared to be JIT-related, but turned out to be entirely unrelated.
All in all, melonDS 1.1 should be more stable and reliable.
There's also the usual slew of misc bugfixes and improvements.
However, we realized that there's a bug with the JIT that causes a crash on x86 Macs. We will do our best to fix this, but in the meantime, we had to disable that setting under that platform.
Future plans
The hi-res display capture stuff will be for release 1.2. Even if I could rush to finish it for 1.1, it wouldn't be wise. Something of this scope will need comprehensive testing.
I also have more ideas that will also be for further releases. I want to experiment with RTCom support, netplay, a different type of UI, ...
And then there's also changes I have in mind for this website. The current layout was nice in the early days, but there's a lot of posts now, and it's hard to find specific posts. I'd also want the homepage to present information in a more attractive manner, make it more evident what the latest melonDS version is, maybe have less outdated screenshots, ... so much to do.
Anyway, you can grab melonDS 1.1 on the downloads page, as usual.
You can also donate to the project if you want, that's always appreciated.
Hi-res display capture: we're getting there! -- by Arisotura
Sneak peek of the blackmagic3 branch:
(click them for full-res versions)
Those are both dual-screen 3D scenes, but notice how both screens are nice and smooth and hi-res.
Now, how far along are we actually with this?
As I said in the previous post, this is an improved version of the old renderer, which was based on a simple but limited approach. At the time, it was easy enough to hack that on top of the existing 2D engine. But now, we're reaching the limits of what is possible with this approach. So, consider this a first step. The second step will be to build a proper OpenGL-powered 2D engine, which will open up more crazy possibilities as far as graphical enhancements go.
I don't know if this first step will make it in melonDS 1.1, or if it will be for 1.2. Turns out, this is already a big undertaking.
I added code to keep track of which VRAM blocks are used for display captures. It's not quite finished, it's missing some details, like freeing capture buffers that are no longer in use, or syncing them with emulated VRAM if the CPU tries to access VRAM.
It also needs extensive testing and optimization. For this first iteration, for once, I tried to actually build something that works, rather than spend too much time trying to imagine the perfect design. So, basically, it works, but it's inefficient... Of course, the sheer complexity of VRAM mapping on the DS doesn't help at all. Do you remember? You can make the VRAM banks overlap!
So, yeah. Even if we end up making a new renderer, all this effort won't go to waste: we will have the required apparatus for hi-res display capture.
So far, this renderer does its thing. It detects when a display capture is used, and replaces it with an adequate hi-res version. For the typical use cases, like dual-screen 3D or motion blur, it does the job quite well.
However, I made a demo of "render-to-rotscale-BG": like my render-to-texture demo in the previous post, but instead of rendering the captured texture on the faces of a bigger cube, it is simply rendered on a rotating 128x128 BG layer. Nothing very fancy, but those demos serve to test the various possibilities display capture offers, and some games also do similar things.
Anyway, this render-to-rotscale demo looks like crap when upscaling is used. It's because the renderer's shader works with the assumption that display capture buffers will be drawn normally and not transformed. The shader goes from original-resolution coordinates and interpolates between them in order to sample higher-resolution images. In the case of a rotated/scaled BG layer, the interpolation would need to take the BG layer's transform matrix into account.
I decided to postpone this to the second step. Just think of the possibilites the improved renderer would offer: hi-res rotation/scale, antialiasing, filtering on layers and sprites, ...
And the render-to-texture demo won't even work for now. This one is tricky, it will require some communication between the 2D and 3D renderers. It might also require reworking the way texturing is done: for example my old OpenGL renderer just streams raw VRAM to the GPU and lets the shader do the decoding. It's lazy, but it was a simple way to get texturing working. But again, a proper texture cache here would open up more enhancement possibilities. Generic did use such a cache in his compute shader renderer, so it could probably serve for both renderers.
That's about it for what this renderer can do, for now. I also have a lot of cleanup and tying-loose-ends to do. I made a mess.
Stay tuned!
Display capture: oh, the fun! -- by Arisotura
This is going to be a juicy technical post related to what I'm working on at the moment.
Basically, if you've used 3D upscaling in melonDS, you know that there are games where it doesn't work. For example, games that display 3D graphics on both screens: they will flicker between the high-resolution picture and a low-resolution version. Or in other instances, it might just not work at all, and all you get is low-res graphics.
It's linked to the way the DS video hardware works. There are two 2D tile engines, but there is only one 3D engine. The output from that 3D engine is sent to the main 2D engine, where it is treated as BG0. You can also change which screen each 2D engine is connected to. But you can only render 3D graphics on one screen.
So how do those games render 3D graphics on both screens?
This is where display capture comes in. The principle is as follows: you render a 3D scene to the top screen, all while capturing that frame to, say, VRAM bank B. On the next frame, you switch screens and render a 3D scene to the bottom screen. Meanwhile, the top screen will display the previously captured frame from VRAM bank B, and the frame you're rendering will get captured to VRAM bank C. On the next frame, you render 3D graphics to the top screen again, and the bottom screen displays the capture from VRAM bank C. And so on.
This way, you're effectively rendering 3D graphics to both screens, albeit at 30 FPS. This is a typical use case for display capture, but not the only possiblity.
Display capture can receive input from two sources: source A, which is either the main 2D engine output or the raw 3D engine output, and source B, which is either a VRAM bank or the main memory display FIFO. Then you can either select source A, or source B, or blend the two together. The result from this will be written to the selected output VRAM bank. You can also choose to capture the entire screen or a region of it (128x128, 256x64 or 256x128).
All in all, quite an interesting feature. You can use it to do motion blur effects in hardware, or even to render graphics to a texture. Some games even do software processing on captured frames to apply custom effects. It is also used by the aging cart to verify the video hardware: it renders a scene and checksums the captured output.
For example, here's a demo of render-to-texture I just put together, based on the libnds examples:
(video file)
The way this is done isn't very different from how dual-screen 3D is done.
Anyway, this stuff is very related to what I'm working on, so I'm going to explain a bit how upscaling is done in melonDS.
When I implemented the OpenGL renderer, I first followed the same approach as other emulators: render 3D graphics with OpenGL, read out the framebuffer and send it to the 2D renderer. Simple. Then, in order to support upscaling, I just had to increase the resolution of the 3D framebuffer. To compensate for this, the 2D renderer would push out more pixels.
The issue was that it was suboptimal: if I pushed the scaling factor to 4x, it would get pretty slow. On one hand, in the 2D renderer, pushing out more pixels takes more CPU time. On the other hand, on a PC, reading back from GPU memory is slow. The overhead tends to grow quadratically when you increase the output resolution.
So instead, I went for a different approach. The 2D renderer renders at 256x192, but the 3D layer is replaced with placeholder values. This incomplete framebuffer is then sent to the GPU along with the high-resolution 3D framebuffer, and the two are spliced together. The final high-resolution output can be sent straight to the screen, never leaving GPU memory. This approach is a lot faster than the previous one.
This is what was originally implemented in melonDS 0.8. Since this rendering method bypassed the regular frame presentation logic, it was a bit of a hack - the final compositing step was done straight in the frontend, for example. The renderer in modern melonDS is a somewhat more refined version of this, but the same basic idea remains.
There is also an issue in this: display capture. The initial solution was to downscale the GPU framebuffer to 256x192 and read that back, so it could be stored in the emulated VRAM, "as normal". Due to going through the emulated VRAM, the captured frame has to be at the original resolution. This is why upscaling in melonDS has those issues.
To work around this, one would need to detect when a VRAM bank is being used as a destination for a display capture, and replace it with a high-resolution version in further frames, in the same way was the 3D layer itself. But obviously, it's more complicated than that. There are several issues. For one, the game could still decide to access a captured frame in VRAM (to read it back or to do custom processing), so that needs to be fulfilled. There is also the several different ways a captured frame can be reused: as a bitmap BG layer (BG2 or BG3), as a bunch of bitmap sprites, or even as a texture in 3D graphics. This is kinda why it has been postponed for so long.
There are even more annoying details, if we consider all the possibilities: while an API like OpenGL gives you an identifier for a texture, and you can only use it within its bounds, the DS isn't like that. When you specify a texture on the 3D engine, you're really just giving it a VRAM address. You could technically point it in the middle of a previously captured frame, or before... Tricky to work with. I made a few demos (like the aforementioned render-to-texture demo) to exercise display capture, but the amount of possibilities makes it tricky.
So I'm basically trying to add support for high-resolution display capture.
The first step is to make a separate 2D renderer for OpenGL, which will go with the OpenGL 3D renderers. To remove the GLCompositor wart and the other hacks and integrate the OpenGL rendering functionality more cleanly (and thus, make it easier to implement other renderers in the future, too).
I'm also reworking this compositor to work around the original limitations, and make it easier to splice in high-resolution captured frames. I have a pretty good roadmap as far as the 2D renderer is concerned. For 3D, I'll have to see what I can do...
However, there will be more steps to this. I'm acutely aware of the limitations of the current approach: for example, it doesn't lend itself to applying filters to 2D graphics. I tried in the past, but kept running into issues.
There are several more visual improvements we could add to melonDS - 2D layer/sprite filtering, 3D texture filtering, etc... Thus, the second step of this adventure will be to rework the 2D renderer to do more of the actual rendering work on the GPU. A bit like the hardware renderer I had made for blargSNES a decade ago. This approach would make it easier to apply enhancements to 2D assets or even replace them with better versions entirely, much like user texture packs.
This is not entirely unlike the Deko3D renderer Generic made for the Switch port.
But hey, one step at a time... First, I want to get high-resolution capture working.
There's one detail that Nadia wants to fix before we can release melonDS 1.1. Depending how long this takes, and how long I take, 1.1 might include my improvements too. If not, that will be for 1.2. We'll see.
Happy birthday melonDS! -- by Arisotura
Those of you who know your melonDS lore know that today is a special day. melonDS is 9 years old!
...hey, I don't control this. 9 is brown. I don't make the rules.
Anyway, yeah. 9 years of melonDS. That's quite the achievement. Sometimes I don't realize it has been so long...
-
As far as I'm concerned, there hasn't been a lot -- 2025 has had a real shitty start for me. A lot of stuff came forward that has been really rough.
On the flip side, it has also been the occasion to get a fresh start. I was told about IFS therapy in March, and I was able to get started. It has been intense, but so far it has been way more effective than previous attempts at therapy.
I'm hopeful for 2026 to be a better year. I'm also going to hopefully get started with a new job which is looking pretty cool, so that's nice too.
-
As far as melonDS is concerned, I have several ideas. First of all, we'll release melonDS 1.1 pretty soon, with a nice bundle of fixes and improvements.
I also have ideas for further releases. RTCom support, netplay, ... there's a lot of potential. I guess we can also look at Retroachivements, since that seems to be popular request. There's also the long-standing issues we should finally address, like the lack of upscaling in dual-screen 3D scenes.
Basically, no shortage of things to do. It's just matter of actually doing them. You know how that goes...
I also plan some upgrades to the site. I have some basic ideas for a homepage redesign, for updating the information that is presented, and presenting it in a nicer way... Some organization for the blog would be nice too, like splitting the posts into categories: release posts, technical shito, ...
I also have something else in mind: adding in a wiki, to host all information related to melonDS. The FAQ, help pages specific to certain topics, maybe compatibility info, ...
And maybe, just maybe, a screenshot section that isn't just a few outdated pics. Maybe that could go on the wiki too...
-
Regardless, happy birthday melonDS! And thank you all, you have helped make this possible!
The joys of programming -- by Arisotura
There have been reports of melonDS 1.0 crashing at random when running multiple instances. This is kind of a problem, especially when local multiplayer is one of melonDS's selling points.
So I went and tried to reproduce it. I could get the 1.0 and 1.0RC builds to crash just by having two instances open, even if they weren't running anything. But I couldn't get my dev build to crash. I thought, great, one of those cursed bugs that don't manifest when you try to debug them. I think our team knows all about this.
Not having too many choices, I took the 1.0 release build and started hacking away at it with a hex editor. Basic idea being, if it crashes with no ROM loaded, the emulator core isn't the culprit, and there aren't that many places to look. So I could nop out function calls until it stopped crashing.
In the end, I ended up rediscovering a bug that I had already fixed.
The SDL joystick API isn't thread-safe. When running multiple emulator instances in 1.0, this API will get called from several different threads, since each instance gets its own emulation thread. I had already addressed this 2.5 months ago, by adding adequate locking.
I guess at the time of 1.0, it slipped through the cracks due to its random nature, as with many threading-related bugs.
Regardless, if you know your melonDS lore, you know what is coming soon. There will be a new release which will include this fix, among other fun shit. In the meantime, you can use the nightly builds.
Sneak peek -- by Arisotura
Just showing off some of what I've been working on lately:
This is adjacent to something else I want to work on, but it's also been in the popular request list, since having to manually enter cheat codes into melonDS isn't very convenient: would be much easier to just import them from an existing cheat code database, like the R4 cheat database (also known as usrcheat.dat).
It's also something I've long thought of doing. I had looked at usrcheat.dat and figured the format wasn't very complicated. It was just, you know, actually making myself do it... typical ADHD stuff. But lately I felt like giving it a look. I first wrote a quick PHP parser to make sure I'd gotten the usrcheat.dat format right, then started implementing it into melonDS.
For now, it's in a separate branch, because it's still quite buggy and unfinished, but it's getting there.
The main goal is complete, which is, it parses usrcheat.dat, extracts the relevant entries, and imports them into your game's cheat file. By default, it only shows the game entry which matches your game by checksum, which is largely guaranteed to be the correct one. However, if none is found, it shows all the game entries that match by game code.
It also shows a bunch of information about the codes and lets you choose which ones you want to import. By default, all of them will be imported.
This part is mostly done, besides some minor UI/layout bugs.
The rest is adding new functionality to the existing cheat code editor. I modified the melonDS cheat file format (the .mch files) to add support for the extra information usrcheat.dat provides, all while keeping it backwards compatible, so older .mch files will still load just fine. But, of course, I also need to expose that in the UI somehow.
I also found a bug: if you delete a cheat code, the next one in the list gets its code list erased, due to the way the UI works. Not great.
I'm thinking of changing the way this works: selecting a cheat code would show the information in disabled/read-only fields, you'd click an "Edit" button to modify those fields, then you'd get "Save" or "Cancel" buttons... This would avoid much of the oddities of the current interface.
The rest is a bunch of code cleanup...
Stay tuned!
Not much to say lately... -- by Arisotura
Yeah...
I guess real life matters don't really help. Atleast, my mental health is genuinely improving, so there's that.
But as far as my projects are concerned, this seems to be one of those times where I just don't feel like coding. Well, not quite, I have been committing some stuff to melonDS... just not anything that would be worthy of a juicy blog post. I also had another project: porting my old SNES emulator to the WiiU gamepad. I got it to a point where it runs and displays some graphics, but for now I don't seem motivated to continue working on it...
But I still have some ideas for melonDS.
One idea I had a while ago: using the WFC connection presets to store settings for different altWFC servers. For example, using connection 1 for Wiimmfi, connection 2 for Kaeru, etc... and melonDS would provide some way of switching between them. I don't know how really useful it would be, or how feasible it would be wrt patching requirements, but it could be interesting.
Another idea would be RTCom support. Basically RTCom is a protocol that is used on the 3DS in DS mode to add support for analog sticks and such. It involves game patches and ARM11-side patches to forward input data to RTC registers. The annoying aspect of this is that each game seems to have its own ARM11-side patch, and I don't really look forward to implementing a whole ARM11 interpreter just for this.
But maybe input enhancements, be it RTCom stuff or niche GBA slot addons, would make a good use case for Lua scripting, or some kind of plugin system... I don't know.
There are other big ideas, of course. The planned UI redesign, the netplay stuff, website changes, ...
Oh well.
Fix for macOS builds -- by Arisotura
Not a lot to talk about these days, as far as melonDS is concerned. I have some ideas, some big, some less big, but I'm also being my usual ADHD self.
Anyway, there have been complaints about the macOS builds we distribute: they're distributed as nested zip files.
Apparently, those people aren't great fans of Russian dolls...
The issue is caused by the way our macOS CI works. I'd have to ask Nadia, but I think there isn't really an easy fix for this. So the builds available on Github will have to stay nested.
However, since I control this site's server, I can fix the issue here. So now, all the macOS builds on the melonDS site have been fixed, they're no longer nested. Similarly, new builds should also be fixed as they're uploaded. Let us know if anything goes wrong with this.
DSP HLE: reaching the finish line -- by Arisotura
In the last post, the last things that needed to be added to DSP HLE were the G711 ucode, and the basic audio functions common to all ucodes.
G711 was rather trivial to add. The rest was... more involved. Reason was that adding the new audio features required reworking some of melonDS's audio support.
melonDS was originally built around the DS hardware. As far as sound is concerned, the DS has a simple 16-channel mixer, and the output from said mixer is degraded to 10-bit and PWM'd to the speakers. Microphone input is even simpler: the mic is connected to the touchscreen controller's AUX input. Reading said input gives you the current mic level, and you need to set up a timer to manually sample it at the frequency you want.
So obviously, in the early days, melonDS was built around that design.
The DSi, however, provides a less archaic system for producing and acquiring sound.
Namely, it adds a TI audio amplifier, which is similar to the one found in the Wii U gamepad, for example. This audio amplifier also doubles as a touchscreen controller, for DS backwards compatibility.
Instead of the old PWM stuff, output from the DS mixer is sent to the amplifier over an I2S interface. There is some extra hardware to support that interface. It is possible to set the sampling frequency to 32 KHz (like the DS) or 47 KHz. There is also a ratio for mixing outputs from the DS mixer and the DSP.
Microphone input also goes through an I2S interface. The DSi provides hardware to automatically sample mic input at a preset frequency, and it's even possible to automate it entirely with a NDMA transfer. All in all, quite the upgrade compared to the DS. Oh, and mic input is also routed to the DSP, and the ucodes have basic functions for mic sampling too.
All fine and dandy...
So I based my work on CasualPokePlayer's PR, which implemented some of the new DSi audio functionality. I added support for mixing in DSP audio. The "play sound" command in itself was trivial to implement in HLE, once there was something in place to actually output DSP audio.
I added support for the different I2S sampling frequencies. The 47 KHz setting doesn't affect DS mixer output beyond improving quality somewhat, but it affects the rate at which DSP output plays.
For this, I changed melonDS to always produce audio at 47 KHz (instead of 32 KHz). In 32 KHz mode (or in DS mode), audio output will be resampled to 47 KHz. It was the easiest way to deal with this sort of feature.
Then, I added support for audio output to Teakra. I had to record the audio output to a file to check whether it was working correctly, because it's just too slow with DSP LLE, but a couple bugfixes later, it was working.
Then came the turn of microphone input. The way it's done on the DSP is a bit weird. Given the "play sound" command will DMA sound data from ARM9 memory, I thought the mic commands would work in a similar way, but no. Instead, they just continually write mic input to a circular buffer in DSP memory, and that's about it. Then you need to set up a timer to periodically read that circular buffer with PDATA.
But there was more work to be done around mic input.
I implemented a centralized hub for mic functionality, so buffers and logic wouldn't be duplicated in several places in melonDS. I also changed the way mic input data is fed into melonDS. With the hub in place, I could also add logic for starting and stopping mic recording. This way, when using an external microphone, it's possible to only request the mic when the game needs it, instead of hogging it all the time.
Besides that, it wasn't very hard to make this new mic input system work, including in DSi mode. I made some changes to CasualPokePlayer's code to make it more accurate to hardware. I also added the mic sampling commands to DSP HLE, and added BTDMP input support to Teakra so it can also receive mic input.
The tricky part was getting input from an external mic to play nicely and smoothly, but I got there.
There is only one problem left: libnds homebrew in DSi mode will have noisy mic input. This is a timing issue: libnds uses a really oddball way to sample the DSi mic (in order to keep it compatible with the old DS APIs), where it will repeatedly disable, flush and re-enable the mic interface, and wait to receive one sample. The wait is a polling loop with a timeout counter, but it's running too fast on melonDS, so sometimes it doesn't get sampled properly.
So, this is mostly it. What's left is a bunch of cleanup, misc testing, and adding the new state to savestates.
DSP HLE is done. It should allow most DSP titles to play at decent speeds, however if a game uses an unrecognized DSP ucode, it will fall back to LLE.
The dsp_hle branch also went beyond the original scope of DSP HLE, but it's all good.
DSi mode finally gets microphone input. I believe this was the last big missing feature, so this brings DSi mode on par with DS mode.
The last big thing that might need to be added would be a DSP JIT, if there's demand for this. I might look into it at some point, would be an occasion to learn new stuff.
I'm thinking those changes might warrant a lil' release.
Having fun with DSP HLE -- by Arisotura
If you pay attention to the melonDS repo, you might have noticed the new branch: dsp_hle.
And this branch is starting to see some results...
These screenshots might not look too impressive at first glance, but pay closer attention to the framerates. Those would be more like 4 FPS with DSP LLE.
Also, we still haven't fixed the timing issue which affects the DSi sound app -- I had to hack the cache timings to get it to run, and I haven't committed that. We'll fix this in a better way, I promise.
So, how does this all work?
CasualPokePlayer has done research on the different DSP ucodes in existence. So far, three main classes have been identified:
• AAC SDK: provides an AAC decoder. The DSi sound app uses an early version of this ucode. It is also found in a couple other titles.
• Graphics SDK: provides functions for scaling bitmaps and converting YUV pictures to 15-bit RGB.
• G711 SDK: provides a G711 (A-law and µ-law) codec.
All ucodes also share basic audio functionality, allowing them to play simple sounds (for example, a camera shutter sound) and record microphone input. They are fairly simple, but emulating them in melonDS will require some reworking of the audio system.
It's not like this DSP is used to its fullest here, but hey, it's something.
I first started working on the graphics ucode, which is used in Let's Golf. It's used to scale down the camera picture from 144x144 to 48x48 -- you can see it in the screenshot above.
There's always something satisfying to reverse-engineering things and figuring out how they work. I even wrote a little test homebrew that loads a graphics SDK ucode and tests all the aspects of the bitmap scaling command (and another one for the yuv2rgb command). I even went as far as to work out the scaling algorithms so I could replicate them to the pixel. I also added delays to roughly simulate how long the graphics commands would take on hardware.
The way the scaling command works is a bit peculiar. You give it the size of your source bitmap, then you give it a rectangle within said bitmap, and X/Y scaling factors. Then it takes the specified rectangle of the source bitmap and scales it using the factors.
You can also specify a filtering mode. Nearest neighbor, bilinear and bicubic are supported. Nothing special to say about them, other than the fact bicubic uses somewhat atypical equations.
However, there's also a fourth filtering mode: one-third. This mode does what it says on the tin: it ignores the provided X/Y scaling factors, and scales the provided bitmap down to one third of its original size. The way it works is fairly simple: for every 3x3 block of source pixels, it averages the 8 outer pixels' values and uses that as the destination value. This also means that it requires that the source dimensions be multiples of 3.
Interestingly, the example of Let's Golf would be a perfect candidate for one-third scaling, but they chose bicubic instead.
After this, I felt like looking at the DSi sound app.
The AAC ucode doesn't use the pipe to receive commands. That's why it was working in LLE, despite the bugs that broke the pipe. Instead, commands and parameters are just sent through the CMD1 register. Also, the decoded sound data isn't sent to the audio output directly, but transferred to ARM9 RAM. Hence why the DSi sound app was somehow functional (albeit very slow).
My first step was to figure out what the command parameters mean and what kind of data is sent to the AAC ucode, so I could understand how to use it. I didn't exactly feel like replicating an entire AAC decoder in my HLE implementation, so I went with faad2 instead.
This has been the occasion for me to learn about AAC, MP4 files and such. I even modified Gericom's DSiDsp homebrew to work with the DSi AAC ucode, turning it into the worst MP4 audio player ever.
DSiDsp is an example of AAC decoding, but it's made to work with the 3DS AAC ucode. Besides the differences in the communication protocol and how memory is accessed, they also don't quite take in the same kind of data. The 3DS ucode takes AAC frames with ADTS headers. ADTS is a possible transport layer for AAC audio, where each frame gets a small header specifying the sampling frequency, channel configuration, and other parameters. All fine and dandy.
The DSi AAC ucode, however, doesn't take ADTS or ADIF headers, just raw AAC frames. The sampling frequency and channel configuration are specified in the parameters for the decoder command.
So I had to figure out how to make this work nicely with faad2. I also ran into an issue where the DSi sound app will first send an all-zero AAC frame before sending the correct data, which seems to be a bug in the sound app itself. The AAC ucode doesn't seem to be affected by this (it just returns an error code), but third-party AAC decoders don't like it. faad2 gets put into a bogus state where no further frames can be decoded. fdk-aac barfs over the memory in a seemingly random way, eventually causing a crash. So I had to hack around the issue and ignore all-zero frames.
So melonDS decodes AAC audio now. It's still rough around the edges, but it's pretty neat.
Now I guess the final step would be to reverse-engineer and implement the G711 ucode.
As for LLE options for DSP emulation, some sort of recompiler (JIT or otherwise) would be the way to go. I was thinking about how such a thing might work, but I've never written that sort of thing before. It could be the occasion to learn about this. It would be worthwhile if someone out there is trying to develop for the DSP, but as far as commercial titles are concerned, HLE should cover all the needs, all while being way more efficient.
The AV codec stuff is also adjacent to something else that's on my mind. It's not related to the DSP, but it's something I want to experiment with eventually. If it pans out, I'll let you guys know!
melonDS 1.0 is out -- by Arisotura
Finally, the "proper" melonDS 1.0 release is here. Sorry that it took so long...
Anyway, this is pretty much the same as the 1.0 RC, but we fixed a bunch of bugs that were found in said RC.
Namely, you can now use multiple windows with OpenGL under Windows.
However, depending on how good your OpenGL driver is, doing so may reduce performance. It's due to having multiple OpenGL contexts sharing data, but for now we don't really know what we can do about it. If you have ideas, let us know!
Speaking of multiple windows, I also added a way to tell melonDS windows apart, because things could get pretty confusing. They now get a tag in their title, for example [p1:w2] means first multiplayer instance, second window.
We also merged asie's add-on support PR, so this release includes support for the Motion Pak and the Guitar Grip.
We merged some other PRs. Among which, lower audio latency.
This release also includes the DSi camera fixes that were discussed in the previous posts. DSi titles that ran into issues while trying to use the camera should now work with no problems.
However, since they tend to also use the DSP at the same time, the performance will be abysmal...
DSP HLE is something I want to experiment with, but that will be for a further release.
As far as future plans are concerned, I also want to redesign the site's homepage, but you'll find out when I get around to that.
Anyway, you can find the release on our downloads page, as usual. Enjoy!
Let's Golf: the horseman of apocalypse that wasn't one -- by Arisotura
I figure I need to resolve the dramatic tension from the previous post.
So the issue we had was a screen-sized DMA3 transfer interfering with NDMA1, which is used to transfer camera data...
Shortly after I made that post, I recalled that Jakly mentioned VRAM timings, and started to figure it out.
I said that when I reproduced the setup in a homebrew, it would trigger a data overrun error and get stuck. But there was one thing I was missing: my homebrew was using a camera resolution of 256x192, with DMA set to run every 4 scanlines, the standard stuff. However, Let's Golf uses cropping to achieve a resolution of 144x144, and runs the DMA every 7 scanlines. This means more time between each NDMA1 transfer.
Regarding VRAM, the DSi supports accessing it over the 32-bit bus, instead of the old 16-bit bus. One effect is that this makes it possible to write to VRAM in 8-bit units, which wasn't possible on the DS. Another effect is that it affects timings: for example, a 32-bit DMA transfer from main RAM to VRAM would take 2 cycles per word, instead of 3.
I did the math, and with such timings, the screen-sized DMA3 transfer would have enough time that it could run between two NDMA1 transfers without disrupting the camera operation. But with the 16-bit bus timings, DMA3 would definitely take too long.
I even modified my homebrew to use the same 144x144 resolution as Let's Golf, and added a key to toggle the 32-bit bus for VRAM (via SCFG_EXT9 bit 13). Suddenly, my homebrew was running just fine as long as the 32-bit bus was enabled, but when it was disabled, it would trigger the data overrun error and get stuck.
So, basically, this is nothing fancy, just a case of "this works out of pure luck".
I added support for the new VRAM timings in melonDS. But this wasn't enough, Let's Golf would still get stuck.
I looked at it, and it was still possible that DMA3 would start right before a NDMA1 transfer, which would be the worst time. The camera FIFO would definitely be almost full at that point, and further scanlines would overflow before NDMA1 had a chance to run.
I thought about it, and... there was no way this could work unless there were some double-buffering shenanigans involved.
I modified my homebrew to try and verify this hypothesis. My idea was to start a camera transfer, let it fill and overrun the FIFO, and use a dummy NDMA to track it. The NDMA channel would be configured like for a camera transfer, but it wouldn't actually read camera data, and the length would be set to 1 word. I would just use the NDMA IRQ as a way to sense the camera transfer condition. This way, I could know how many times it fires before hitting the overrun error.
This didn't reveal anything, as the NDMA was only fired once.
My next idea was to wait for the NDMA IRQ, and from that point, count how long it takes for the data overrun to be raised (by checking bit 4 in CAM_CNT). The timings I measured from this definitely suggested that a second block of camera data was being transferred after the NDMA IRQ, before it would raise the data overrun error.
Which, in turn, tended to confirm my hypothesis that there are actually two FIFO buffers for camera data. Thus, when one buffer is filled entirely (ie. as much as the DMA interval allows), if the other buffer is empty, the buffers are swapped and the DMA transfer is fired, otherwise the data overrun error is raised.
So I reworked melonDS's camera interface implementation to take this into account. Fixed a bunch of other issues. Had some "how the fuck did this even work?" moments.
There are still minor issues I need to iron out, but: finally, Let's Golf is fixed, and I haven't seen regressions...
More camera trouble... -- by Arisotura
So I had made a nice post about Let's Golf and how it was fixed...
But, obviously, as far as DSi camera support is concerned, it wasn't all.
I looked at another game that was running into issues with the camera: Assassin's Creed II. The Wanted feature in the menu uses the camera, if you're playing the game on a DSi, but on melonDS, it just showed nothing at all.
Quick investigation showed why.
Normally, when using the camera, games will set up the DMA with a block length of N scanlines, and a total length matching the length of the full camera picture. The DMA channel will also be set to trigger an IRQ when it's done.
However, this game does things differently. The DMA channel has no total length setting, and is just set to repeat infinitely. It transfers picture data to a small temporary buffer, from which the game reads when needed. No idea why they did it this way, but regardless, shows why it's good to emulate things accurately. Anyway, a NDMA channel that is set to "repeat infinitely" will trigger an IRQ after each block, but due to an oversight, melonDS never triggered any IRQ.
After fixing this, I did have the camera feed showing up in the game's UI thing, but it was rolling. Heh. Couldn't have been so simple.
This turned out to be because of the timings for the camera transfer. The timings melonDS used were a big fat guess, and were way too fast for that game.
I dug up my old camera test homebrew and modified it to track camera timings from the DSi. Took some time to figure out the logic behind the numbers I was getting -- there was more time between camera DMA transfers when running in 256x192 mode, than in 640x480 mode. In fact, it makes sense: internally, the camera always runs at 640x480, and the HSync/VSync are the same, but when told to output a 256x192 picture, the camera simply skips some of the scanlines.
Once I understood that fact, I was able to put together a timing model that more closely resembled the real deal. And this fixed Assassin's Creed II -- the camera preview thing was working flawlessly.
I then checked Let's Golf, and... it was rolling, again.
Welcome to emulation whack-a-mole.
But, that's the thing. I have researched this bug long and hard, and I can't figure it out.
The basic issue is as follows: the game uses NDMA1 to transfer camera picture data to main RAM, but it also periodically (every 2 frames) transfers video data from main RAM to VRAM, to be displayed on the top screen, and does so using DMA3 (the old DMA, not NDMA).
Camera output isn't synchronized to the LCD framerate. This means that the DMA3 transfer may occur while a camera transfer is in progress. In melonDS, it meant that the NDMA1 transfer couldn't run because DMA3 was already running.
This highlighed a bug with how melonDS handled camera DMA: it assumed that the "try to fire a DMA transfer" operation would result in a DMA transfer effectively starting, but when that wasn't the case, things went south. I remodeled the camera FIFO to fix this problem (and raise a data overrun error instead of skipping a chunk). Let's Golf was no longer rolling, but it was just getting stuck on the same camera frame.
So clearly there was more to it...
But I can't figure it out.
I ran hardware tests, thinking that maybe NDMA1 should have priority over DMA3. But nope. NDMA can't preempt old DMA, no matter the settings.
I modified my camera homebrew to reproduce what Let's Golf does, and it gets stuck after a couple camera frames or less, with a camera data overrun error. So I can't understand why the game works fine.
I even modified the game itself, to track things like when the camera IRQ occurs, or whether data overrun errors happen, and it revealed nothing.
So, yeah. I'm absolutely stumped.
Who would have thought that a camera thing in a golf game would become a new horseman of apocalypse...
It feels weird after the last post, but I might ship melonDS 1.0 with this broken. But hey, the fixes do fix a bunch of other DSi games.
Windows OpenGL issues fixed, finally! -- by Arisotura
I went on a quest and battled the worst enemy imaginable.
Worse than a thousand orcs.
I fought endless privacy settings screens. Warded off all sorts of bullshit offers. Resisted the temptation to throw my brain in a lake.
I installed Windows 10 on my old laptop Crepe, so I could finally fix the issues with multiple windows and OpenGL.
CasualPokePlayer greatly helped me understand the problem, too.
Basically, due to the way melonDS works, when using OpenGL, we create the GL context on the UI thread, then use it on the emu thread exclusively. This is so that the OpenGL renderers can access OpenGL without needing extra locking. Window redrawing is also done on the emu thread.
The issue was due to how I originally implemented the multi-window mode. When a second window is created, it shares its GL context with its parent window. This way, everything OpenGL related will work on all windows. Except it turned out that the parent context was created on the UI thread, then made current on the emu thread, before the child context was created. Windows doesn't like that, and thus, fails to create the child context.
So it took some reworking to get this working smoothly, but the issue is fixed now.
This means that the proper 1.0 release will be soon -- for real. This issue was the last show stopper, basically.
I also fixed a couple other issues. For one, I added a way to tell melonDS windows apart, since multi-window made things pretty confusing. So now they get a [p1:w1] type tag that says which instance and which window it is. I also fixed a bug with the way windows were parented for second multiplayer instances.
I might try to fix some other misc. stability issues, if I can reproduce them. I will also likely rework the DSi camera timing model a bit, to fix a couple games.
DSP HLE is tempting me, as a next project, but it won't be for 1.0.
DSi bugfixes -- by Arisotura
So, yeah... The current mood around melonDS is "it does most things pretty well already", as far as emulation is concerned. While it's certainly true for the DS side of things, there's still a lot to do with the DSi. And this randomly piqued my interest.
Notably, we still have a bunch of games that can't get ingame, generally hanging on something DSP-related.
So I took one of those games: Let's Golf.
When run for the first time, the game has you create a profile, by first entering your name, then taking a selfie to be used as a profile icon. All fine. However, here the game freezes before you can get to the selfie part.
So I dug into it to try and figure out why it was freezing. We knew it was trying to use the DSP, but not much beyond that... anyway, I extracted the game's DSP binary to get more insight. The DSP was just running its main loop, waiting for incoming commands. On the other hand, the ARM9 was stuck, waiting for... something.
The ARM9 was using the PDATA registers to read the DSP memory, however, that got stuck. The bug turned out to be silly: PDATA reads/writes can have a fixed length set, but the code handling them wasn't calculating that length correctly, so there wasn't enough data being returned. Hence the ARM9 getting stuck.
When fixing that, I could get to the selfie screen thingy.
However, two issues: 1) the camera preview was rolling (as shown above), and 2) when trying to take a picture, it would just freeze.
Third issue is that it runs at like 4 FPS, but that's common to anything DSP-related for now, and we'll address it later.
I decided to first try and fix the camera rolling issue, because I thought that maybe the second issue was linked.
So I first logged what was going on with the camera. What settings are used on the camera, on the interface, and on the associated DMA channel.
cam1: start, width=256 height=192 read=0024 format=0002
CAM TRANSFER: CNT=EE06 CROP=00180038/00A700C6
ARM9 NDMA1 START MODE 0B, 04004204->0221E0E0 LEN=10368 BLK=504 CNT=CB044000
So what does this all mean?
The camera is set to 256x192, nothing special there. However, the settings used on the camera interface enable cropping, which is interesting. Basically, it only retains a 144x144 section of the camera frame.
Which is confirmed by the DMA parameters. The total length is 10368 words, which is the length of a 144x144 frame. The block length is 504 words, which corresponds to 7 scanlines of said frame. This matches the DMA interval setting on the camera interface, too.
So what's the issue there? If you happen to remember, I've discussed the camera interface before. Long story short, it has a FIFO buffer that holds 512 words worth of picture data, and it triggers a DMA transfer every N scanlines, with N=1..16. When modelling this system in melonDS, I had assumed that this N would be a multiple of the frame height.
But the issue became apparent after I looked at my logs: 144 is not a multiple of 7.
Which meant that it was discarding the last 4 scanlines of each camera frame. Hence the rolling.
So I decided to force a final DMA transfer at the end of each camera frame if there's anything left in the FIFO, which seems to be what the hardware does. Either way, this fixed the rolling issue entirely.
But it didn't fix the second issue, which turned out to be unrelated.
Another apparent issue is that the camera input looks very squished in melonDS. It might be good to add a "preserve aspect ratio" scaling mode to avoid that.
So I had to dig deeper, once again. When trying to take a picture, the game would send a command to the DSP to tell it to scale the picture to a different size, then send it a bunch of parameters via PDATA, then... get stuck.
This one proved to be tricky to figure out. I had no real idea what was going wrong... did a lot of logging and tracing, but couldn't really figure it out.
Eventually, CasualPokePlayer enlightened me about what was being sent to the DSP.
The mechanism used to transfer command parameters is what they call the pipe. It's a simple FIFO buffer: the ARM9 writes data into the buffer, then updates the buffer write position, and does all that by using PDATA.
In this situation, the parameters that were being sent for the scaling command looked correct, but the pipe write position was zero, which, according to CasualPokePlayer, was suspicious. He was on to something.
Looking at the code that was determining that value, I found that it was broken because for some reason the pipe length was zero. So I traced writes to that variable. I found that it was part of a bunch of variables that were initialized from DSP memory, through a PDATA read.
Looking closer, there was another bug with PDATA reads, that caused them to be one off. When I fixed that, the freeze was fixed entirely.
Finally, we get to play golf with a goofy face in melonDS. What's not to love!
The game also runs quite well past the profile selfie part. Yep, it uses the DSP just to scale a picture.
Also, looking at the game's DSP binary, I had an idea.
This binary only supports two commands: scaling and yuv2rgb. There's also a separate command channel, and it has one command, for playing sound effects (presumably, the camera shutter sound).
It would be totally feasible to HLE this shit.
According to CasualPokePlayer, most of the games/apps that use the DSP seem to use one or another variant of this binary, with extra features, but same basic idea. The only exception is the DSi sound app, which uses the DSP for AAC decoding.
Obviously, it would still be worth it to pursue a DSP JIT, atleast for the sake of homebrew. But the HLE route seems to also be viable, and it's piquing my interest now. Maybe not for 1.0, but I want to give it a try.
Proper melonDS 1.0 release "soon" -- by Arisotura
Apologies for taking so long to do a proper release.
Regardless, I think most, if not all, of the bugs that were found in the 1.0 RC have been fixed, so expect the proper 1.0 release soon. Hopefully.
The main issue would be the lack of functional Windows CI, but we're working on it...
We also have fun plans in mind for further releases, but we'll see once we get there.
Technical issues #2 -- by Arisotura
You might have noticed the lack of Windows builds on the nightlies page.
Sorry about it. The Windows CI is broken. Our CI expert Nadia has been on it for a while, so hopefully it should be back up again... someday.
On the flip side, the little loser stopped trying to take the server down, so that's atleast one positive.
Hardware renderer progress -- by Arisotura
Hey hey, little status update! I've been having fun lately.
The hardware renderer has been progressing nicely lately. It's always exciting when you're able to assemble the various parts you've been working on into something coherent, and it suddenly starts looking a lot like a finished product. It's no longer just something I'm looking at in RenderDoc.
Those screenshots were taken with 4x upscaling (click them for full-size versions). The last one demonstrates hi-res rotscale in action. Not bad, I dare say.
It's not done yet, though, so I'll go over what remains to be done.
Mosaic
Shouldn't be very difficult to add.
Except for, you know, sprite mosaic. I don't really know yet how I'll handle that one. The way it works is intuitive if you're processing pixels left-to-right within a scanline, but this isn't how a modern GPU works.
Display capture
What was previously in place was a bit of a hack made to work with the old approach, so I will have to rework this. Atleast now I have the required system for tracking VRAM banks for this. But the whole part of capturing video output to OpenGL textures, and reusing them where needed, will need to be redone.
I also need to think of a way to sync hi-res captures back to emulated VRAM when needed.
Of course, support for this also needs to be added to the 3D renderers, for the sake of render-to-texture. Shouldn't be too difficult to add it to the compute renderer, but the old OpenGL renderer is another deal. I had designed that rather lazily, just streaming raw VRAM to the GPU, and never improved it. I could backport the texture cache the compute renderer uses, but it will take a while.
Mid-frame rendering
I need to add provisions in case certain things get changed mid-frame. I quickly did it for OAM, as shown by that iCarly screenshot above. The proof of concept works, but it can be improved upon, and extended to other things too.
The way this works is similar to the blargSNES hardware renderer. When certain state gets modified mid-frame, a section of the screen gets rendered with the old state - that section goes to the top of the screen, or wherever the previous section ended if there was any. Upon VBlank, we finish rendering in the same way.
There are exceptions for things that are very likely to be changed mid-frame (ie. window positions, BG scroll positions), and are relatively inexpensive to deal with. It's worth noting that on the DS, it's less frequent for video registers to be changed mid-frame, because the hardware is more flexible. By comparison, in something like a SNES game, you will see a lot more mid-frame state changes.
Either way, time will tell what is worth accounting for and what needs to be optimized.
Filtering
Hey, let's not forget why we came in here in the first place.
Hopefully, with the way the shaders are structured, it shouldn't be too hard to slot in filters. Bilinear, bicubic, HQX, xBRZ, you name it.
What I'm not too sure about is how well it'll work with sprites. It's not uncommon for games to splice together several small sprites to form bigger graphical elements, and I'm not sure how that'll work with filtering. We'll see.
Misc. things
The usual, a lot of code cleanup, optimization work, fixing up little details, and so on.
And work to make the code nicer to work with, which isn't particularly exciting.
This gives you a rough idea of where things are at, and what's left to be done. To conclude, I'll leave you with an example of what happens when Arisotura goofs up her OpenGL calls:
Have fun!
Hardware rendering, the fun -- by Arisotura
This whole thing I'm working on gives me flashbacks from blargSNES. The goal and constraints are different, though. We weren't doing upscaling on the 3DS, but also, we had no fragment shaders, so we were much more limited in what we could do.
Anyway, these days, I'm waist-deep into OpenGL. I'm determined to go further than my original approach to upscaling, and it's a lot of fun too.
I might as well talk more about that approach, and what its limitations are.
First, let's talk about how 2D layers are composited on the DS.
There are 6 basic layers: BG0, BG1, BG2, BG3, sprites (OBJ) and backdrop. Sprites are pre-rendered and treated as a flat layer (which means you can't blend a sprite with another sprite). Backdrop is a fixed color (entry 0 of the standard palette), which basically fills any space not occupied by another layer.
For each pixel, the PPU keeps track of the two topmost layers, based on priority orders.
Then, you have the BLDCNT register, which lets you choose a color effect to be applied (blending or fade effects), and the target layers it may apply to. For blending, the "1st target" is the topmost pixel, and the "2nd target" is the pixel underneath. If the layers both pixels belong to are adequately selected in BLDCNT, they will be blended together, using the coefficients in the BLDALPHA register. Fade effects work in a similar fashion, except since they only apply to the topmost pixel, there's no "2nd target".
Then you also have the window feature, which can exclude not only individual layers from a given region, but can also disable color effects. There are also a few special cases: semi-transparent sprites, bitmap sprites, and the 3D layer. Those all ignore the color effect and 1st target selections in BLDCNT, as well as the window settings.
In melonDS, the 2D renderer renders all layers according to their priority order, and keeps track of the last two values for each pixel: when writing a pixel, the previous value is pushed down to a secondary buffer. This way, at the end, the two buffers can be composited together to form the final video frame.
I've talked a bit about how 3D upscaling was done: basically, the 3D layer is replaced with a placeholder. The final compositing step is skipped, and instead, the incomplete buffer is sent to the GPU. There, a compositor shader can sample this buffer and the actual hi-res 3D layer, and finish the work. This requires keeping track of not just the last two values, but the last three values for any given pixel: if a given 3D layer pixel turns out to be fully transparent, we need to be able to composite the pixels underneath "as normal".
This approach was good in that it allowed for performant upscaling with minimal modifications to the 2D renderer. However, it was inherently limited in what was doable.
It became apparent as I started to work on hi-res display capture. My very crude implementation, built on top of that old approach, worked fine for the simpler cases like dual-screen 3D. However, it was evident that anything more complex wouldn't work.
For example, in this post, I showed a render-to-texture demo that uses display capture. I also made a similar demo that renders to a rotating BG layer rather than a 3D cube:
And this is what it looks like when upscaled with the old approach:
Basically, when detecting that a given layer is going to render a display capture, the renderer replaces it with a placeholder, like for the actual 3D layer. The placeholder values include the coordinates within the source bitmap, and the compositor shader uses them to sample the actual hi-res bitmap.
The fatal flaw here is that this calculation doesn't account for the BG layer's rotation. Hence why it looks like shit. Linear interpolation could solve this issue, but it's just one of many problems with this approach.
Another big issue was filtering.
The basic reason is that when you're applying an upscaling filter to an image, for each given position within the destination image, you're going to be looking at not only the nearest pixel from the source image, but also the surrounding pixels, in an attempt at inferring the missing detail. For example, a bilinear filter works on a 2x2 block of source pixels, while it's 4x4 for a bicubic filter, and as much as 5x5 for xBRZ.
In our case, the different graphical layers are smooshed together into a weird 3-layer cake. This makes it a major pain to perform filtering: say you're looking at a source pixel from BG2, you'd want to find neighboring BG2 pixels, but they may be at different levels within the layer cake, or they may just not be part of it at all. All in all, it's a massive pain in the ass to work with.
Back in 2020, I had attempted to implement a xBRZ filter as a bit of a demo, to see how it'd work. I had even recorded a video of it on Super Princess Peach, and it was looking pretty decent... but due to the aforementioned issues, there were always weird glitches and other oddball issues, and it was evident that this was stretching beyond the limits of the old renderer approach. The xBRZ filter shader did remain in the melonDS codebase, unused...
-
So, basically, I started working on a proper hardware-accelerated 2D renderer.
As of now, I'm able to decode individual BG layers and sprites to flat textures. The idea is that doing so will simplify filtering a whole lot: instead of having to worry about the original format of the layer, the tiles, the palettes, and so on, it would just be matter of fetching pixels from a flat texture.
Here's an example of sprite rendering. They are first pre-rendered to an atlas texture, then they're placed on a hi-res sprite layer.
This type of renderer allows for other nifty improvements too: for example, hi-res rotation/scaling.
Next up is going to be rendering BG layers to similar hi-res layers. Once it's all done, the layers can be sent to the compositor shader and the job can be finished. I also have to think of provisions to deal with possible mid-frame setup changes. Anyone remember that midframe-OAM-modifying foodie game?
There will also be some work on the 3D renderers, to add support for things like render-to-texture, but also possibly adding 3D enhancements such as texture filtering.
-
I can hear the people already, "why make this with OpenGL, that's old, you should use Vulkan".
Yeah, OpenGL is no longer getting updates, but it's a stable and mature API, and it isn't going to be deprecated any time soon. For now, I see no reason to stop using it immediately.
However, I'm also reworking the way renderers work in melonDS.
Back then, Generic made changes to the system, so he could add different 2D renderers for the Switch port: a version of the software renderer that uses NEON SIMD, and a hardware-accelerated renderer that uses Deko3D.
I'm building upon this, but I want to also integrate things better: for example, figuring out a way to couple the 2D and 3D renderers better, and generally a cleaner API.
The idea is to also make it easier to implement different renderers. For example, the current OpenGL renderer is made with fast upscaling in mind, but we could have different renderers for mobile platforms (ie. OpenGL ES), that are first and foremost aimed at just being fast. Of course, we could also have a Vulkan renderer, or Direct3D, Metal, whatever you like.
melonDS 1.1 is out! -- by Arisotura
As promised, here is the new release: melonDS 1.1.
So, what's new in this release?
EDIT - there was an issue with the release builds that had been posted, so if your JIT option is greyed out and you're not using a x64 Mac, please redownload the release.
DSP HLE
This is going to be a big change for DSi gamers out there.
If you've been playing DSi titles in melonDS, you may have noticed that sometimes they run very slow. Single-digit framerates. Wouldn't be a big deal if melonDS was always this slow, but obviously, it generally performs much better, so this sticks out like a sore thumb.
This is because those titles use the DSi's DSP. What is the DSP, you ask? A specific-purpose (read: weird) processor that doesn't actually do much besides being very annoying and resource-intensive to emulate. They use it for such tasks as downscaling pictures or playing a camera shutter sound when you take a picture.
With help from CasualPokePlayer, we were able to figure out the 3 main classes of DSP ucodes those games use, determine their functionality, and implement HLE equivalents in melonDS. Thus, those wonderful DSP features can be emulated without utterly wrecking performance.
DSP HLE is a setting, which you will find in the emulation settings dialog, DSi-mode tab. It is enabled by default.
Note that if it fails to recognize a game's DSP ucode, it will fall back to LLE. Similarly, homebrew ucodes will also fall back to LLE. There's the idea of adding a DSP JIT to help with this, but it's not a very high priority right now.
DSi microphone input
This was one of the last big missing features in DSi mode, and it is now implemented, thus further closing the gap between DS and DSi emulation in melonDS.
The way external microphone input works was also changed: instead of keeping your mic open at all times, melonDS will only open it when needed. This should help under certain circumstances, such as when using Bluetooth audio.
High-quality audio resampling
The implementation of DSP audio involved several changes to the way melonDS produces sound. Namely, melonDS used to output at 32 KHz, but with the new DSi audio hardware, this was changed to 47 KHz. I had added in some simple resampling, so melonDS would produce 47 KHz audio in all cases. But this caused audio quality issues for a number of people.
Nadia took the matter in her hands and replaced my crude resampler with a high-quality blip-buf resampler. Not only are all those problems eliminated, but it also means the melonDS core now outputs at a nice 48 KHz frequency, much easier for frontends to deal with than the previous weird numbers.
Cheat database support
If you've used cheats in melonDS, surely you've found it inconvenient to have to manually enter them into the editor. But this is no more: you can now grab the latest R4 cheat database (usrcheat.dat) for example, and import your cheat codes from that.
The cheat import dialog will show you which game entries match your current game, show the cheat codes they contain, and let you select which codes to import. You can also choose whether to clear any previously existing cheat codes or to keep them when importing new codes.
melonDS's cheat code system was also improved in order to fully preserve the structure found in usrcheat.dat. Categories and cheat codes can now have descriptions, categories have an option to allow only one code in them to be enabled, and codes can be created at the root, without having to be in a category.
The cheat file format (.mch) was also modified to add support for this. The parser is backwards-compatible, so it will recognize old .mch files just fine. However, new files won't be able to be recognized by older melonDS versions.
The cheat editor UI was also revamped to add support for the new functionality, and generally be more flexible and easier to work with. For example, it's now possible to reorder your cheat codes by moving them around in the list.
Compute shader renderer fix
Those of you who have tried the compute shader renderer may have noticed that it could start to glitch out at really high resolutions. This was due to running out of tile space.
We merged FireNX70's pull request, which implements tile size scaling in order to alleviate this problem. This means the renderer should now be able to go pretty high in resolution without issues.
Wayland OpenGL fix
If you use Wayland and have tried to use the OpenGL renderers, you may have noticed that it made the melonDS window glitchy, especially when using hiDPI scaling.
I noticed that glitch too, but had absolutely no idea where to start looking for a fix. So I kinda just... didn't use OpenGL, and put that on the backburner.
Until a while ago, when I felt like trying modern PCSX2. I was impressed by how smoothly it ran, compared to what it was like back in 2007... but more importantly, I realized that it was rendering 3D graphics in its main window alongside UI elements, that it uses Qt and OpenGL just like melonDS, and that it was flawless, no weird glitchiness.
So I went and asked the PCSX2 team about it. Turns out they originally took their OpenGL context code from DuckStation, but improved upon it. Funnily enough, melonDS's context code also comes from there. Small world.
In the end, the PCSX2 folks told me about what they did to fix Wayland issues. I tried one of the fixes that involved just two lines of code, and... it completely fixed the glitchiness in melonDS. So, thanks there!
BSD CI
We now have CI for FreeBSD, OpenBSD and NetBSD, courtesy Rayyan and Izder456. This means we're able to provide builds for those platforms, too.
Adjustments were also done to the JIT recompiler so it will work on those platforms.
Fixing a bunch of nasty bugs
For example: it has been reported that melonDS 1.0 could randomly crash after a while if multiple instances were opened. Kind of a problem, given that local multiplayer is one of melonDS's selling points. So, this bug has been fixed.
Another fun example, it sometimes occured that melonDS wouldn't output any sound, for some mysterious reason. As it was random and seemingly had a pretty low chance of occuring, I was really not looking forward to trying to reproduce and fix it... But Nadia saved the day by providing a build that exhibited this issue 100% of the time. With a reliable way to reproduce the bug, I was able to track it down and it was fixed.
Nadia also fixed another bug that caused possible crashes that appeared to be JIT-related, but turned out to be entirely unrelated.
All in all, melonDS 1.1 should be more stable and reliable.
There's also the usual slew of misc bugfixes and improvements.
However, we realized that there's a bug with the JIT that causes a crash on x86 Macs. We will do our best to fix this, but in the meantime, we had to disable that setting under that platform.
Future plans
The hi-res display capture stuff will be for release 1.2. Even if I could rush to finish it for 1.1, it wouldn't be wise. Something of this scope will need comprehensive testing.
I also have more ideas that will also be for further releases. I want to experiment with RTCom support, netplay, a different type of UI, ...
And then there's also changes I have in mind for this website. The current layout was nice in the early days, but there's a lot of posts now, and it's hard to find specific posts. I'd also want the homepage to present information in a more attractive manner, make it more evident what the latest melonDS version is, maybe have less outdated screenshots, ... so much to do.
Anyway, you can grab melonDS 1.1 on the downloads page, as usual.
You can also donate to the project if you want, that's always appreciated.
Hi-res display capture: we're getting there! -- by Arisotura
Sneak peek of the blackmagic3 branch:
(click them for full-res versions)
Those are both dual-screen 3D scenes, but notice how both screens are nice and smooth and hi-res.
Now, how far along are we actually with this?
As I said in the previous post, this is an improved version of the old renderer, which was based on a simple but limited approach. At the time, it was easy enough to hack that on top of the existing 2D engine. But now, we're reaching the limits of what is possible with this approach. So, consider this a first step. The second step will be to build a proper OpenGL-powered 2D engine, which will open up more crazy possibilities as far as graphical enhancements go.
I don't know if this first step will make it in melonDS 1.1, or if it will be for 1.2. Turns out, this is already a big undertaking.
I added code to keep track of which VRAM blocks are used for display captures. It's not quite finished, it's missing some details, like freeing capture buffers that are no longer in use, or syncing them with emulated VRAM if the CPU tries to access VRAM.
It also needs extensive testing and optimization. For this first iteration, for once, I tried to actually build something that works, rather than spend too much time trying to imagine the perfect design. So, basically, it works, but it's inefficient... Of course, the sheer complexity of VRAM mapping on the DS doesn't help at all. Do you remember? You can make the VRAM banks overlap!
So, yeah. Even if we end up making a new renderer, all this effort won't go to waste: we will have the required apparatus for hi-res display capture.
So far, this renderer does its thing. It detects when a display capture is used, and replaces it with an adequate hi-res version. For the typical use cases, like dual-screen 3D or motion blur, it does the job quite well.
However, I made a demo of "render-to-rotscale-BG": like my render-to-texture demo in the previous post, but instead of rendering the captured texture on the faces of a bigger cube, it is simply rendered on a rotating 128x128 BG layer. Nothing very fancy, but those demos serve to test the various possibilities display capture offers, and some games also do similar things.
Anyway, this render-to-rotscale demo looks like crap when upscaling is used. It's because the renderer's shader works with the assumption that display capture buffers will be drawn normally and not transformed. The shader goes from original-resolution coordinates and interpolates between them in order to sample higher-resolution images. In the case of a rotated/scaled BG layer, the interpolation would need to take the BG layer's transform matrix into account.
I decided to postpone this to the second step. Just think of the possibilites the improved renderer would offer: hi-res rotation/scale, antialiasing, filtering on layers and sprites, ...
And the render-to-texture demo won't even work for now. This one is tricky, it will require some communication between the 2D and 3D renderers. It might also require reworking the way texturing is done: for example my old OpenGL renderer just streams raw VRAM to the GPU and lets the shader do the decoding. It's lazy, but it was a simple way to get texturing working. But again, a proper texture cache here would open up more enhancement possibilities. Generic did use such a cache in his compute shader renderer, so it could probably serve for both renderers.
That's about it for what this renderer can do, for now. I also have a lot of cleanup and tying-loose-ends to do. I made a mess.
Stay tuned!
Display capture: oh, the fun! -- by Arisotura
This is going to be a juicy technical post related to what I'm working on at the moment.
Basically, if you've used 3D upscaling in melonDS, you know that there are games where it doesn't work. For example, games that display 3D graphics on both screens: they will flicker between the high-resolution picture and a low-resolution version. Or in other instances, it might just not work at all, and all you get is low-res graphics.
It's linked to the way the DS video hardware works. There are two 2D tile engines, but there is only one 3D engine. The output from that 3D engine is sent to the main 2D engine, where it is treated as BG0. You can also change which screen each 2D engine is connected to. But you can only render 3D graphics on one screen.
So how do those games render 3D graphics on both screens?
This is where display capture comes in. The principle is as follows: you render a 3D scene to the top screen, all while capturing that frame to, say, VRAM bank B. On the next frame, you switch screens and render a 3D scene to the bottom screen. Meanwhile, the top screen will display the previously captured frame from VRAM bank B, and the frame you're rendering will get captured to VRAM bank C. On the next frame, you render 3D graphics to the top screen again, and the bottom screen displays the capture from VRAM bank C. And so on.
This way, you're effectively rendering 3D graphics to both screens, albeit at 30 FPS. This is a typical use case for display capture, but not the only possiblity.
Display capture can receive input from two sources: source A, which is either the main 2D engine output or the raw 3D engine output, and source B, which is either a VRAM bank or the main memory display FIFO. Then you can either select source A, or source B, or blend the two together. The result from this will be written to the selected output VRAM bank. You can also choose to capture the entire screen or a region of it (128x128, 256x64 or 256x128).
All in all, quite an interesting feature. You can use it to do motion blur effects in hardware, or even to render graphics to a texture. Some games even do software processing on captured frames to apply custom effects. It is also used by the aging cart to verify the video hardware: it renders a scene and checksums the captured output.
For example, here's a demo of render-to-texture I just put together, based on the libnds examples:
(video file)
The way this is done isn't very different from how dual-screen 3D is done.
Anyway, this stuff is very related to what I'm working on, so I'm going to explain a bit how upscaling is done in melonDS.
When I implemented the OpenGL renderer, I first followed the same approach as other emulators: render 3D graphics with OpenGL, read out the framebuffer and send it to the 2D renderer. Simple. Then, in order to support upscaling, I just had to increase the resolution of the 3D framebuffer. To compensate for this, the 2D renderer would push out more pixels.
The issue was that it was suboptimal: if I pushed the scaling factor to 4x, it would get pretty slow. On one hand, in the 2D renderer, pushing out more pixels takes more CPU time. On the other hand, on a PC, reading back from GPU memory is slow. The overhead tends to grow quadratically when you increase the output resolution.
So instead, I went for a different approach. The 2D renderer renders at 256x192, but the 3D layer is replaced with placeholder values. This incomplete framebuffer is then sent to the GPU along with the high-resolution 3D framebuffer, and the two are spliced together. The final high-resolution output can be sent straight to the screen, never leaving GPU memory. This approach is a lot faster than the previous one.
This is what was originally implemented in melonDS 0.8. Since this rendering method bypassed the regular frame presentation logic, it was a bit of a hack - the final compositing step was done straight in the frontend, for example. The renderer in modern melonDS is a somewhat more refined version of this, but the same basic idea remains.
There is also an issue in this: display capture. The initial solution was to downscale the GPU framebuffer to 256x192 and read that back, so it could be stored in the emulated VRAM, "as normal". Due to going through the emulated VRAM, the captured frame has to be at the original resolution. This is why upscaling in melonDS has those issues.
To work around this, one would need to detect when a VRAM bank is being used as a destination for a display capture, and replace it with a high-resolution version in further frames, in the same way was the 3D layer itself. But obviously, it's more complicated than that. There are several issues. For one, the game could still decide to access a captured frame in VRAM (to read it back or to do custom processing), so that needs to be fulfilled. There is also the several different ways a captured frame can be reused: as a bitmap BG layer (BG2 or BG3), as a bunch of bitmap sprites, or even as a texture in 3D graphics. This is kinda why it has been postponed for so long.
There are even more annoying details, if we consider all the possibilities: while an API like OpenGL gives you an identifier for a texture, and you can only use it within its bounds, the DS isn't like that. When you specify a texture on the 3D engine, you're really just giving it a VRAM address. You could technically point it in the middle of a previously captured frame, or before... Tricky to work with. I made a few demos (like the aforementioned render-to-texture demo) to exercise display capture, but the amount of possibilities makes it tricky.
So I'm basically trying to add support for high-resolution display capture.
The first step is to make a separate 2D renderer for OpenGL, which will go with the OpenGL 3D renderers. To remove the GLCompositor wart and the other hacks and integrate the OpenGL rendering functionality more cleanly (and thus, make it easier to implement other renderers in the future, too).
I'm also reworking this compositor to work around the original limitations, and make it easier to splice in high-resolution captured frames. I have a pretty good roadmap as far as the 2D renderer is concerned. For 3D, I'll have to see what I can do...
However, there will be more steps to this. I'm acutely aware of the limitations of the current approach: for example, it doesn't lend itself to applying filters to 2D graphics. I tried in the past, but kept running into issues.
There are several more visual improvements we could add to melonDS - 2D layer/sprite filtering, 3D texture filtering, etc... Thus, the second step of this adventure will be to rework the 2D renderer to do more of the actual rendering work on the GPU. A bit like the hardware renderer I had made for blargSNES a decade ago. This approach would make it easier to apply enhancements to 2D assets or even replace them with better versions entirely, much like user texture packs.
This is not entirely unlike the Deko3D renderer Generic made for the Switch port.
But hey, one step at a time... First, I want to get high-resolution capture working.
There's one detail that Nadia wants to fix before we can release melonDS 1.1. Depending how long this takes, and how long I take, 1.1 might include my improvements too. If not, that will be for 1.2. We'll see.
Happy birthday melonDS! -- by Arisotura
Those of you who know your melonDS lore know that today is a special day. melonDS is 9 years old!
...hey, I don't control this. 9 is brown. I don't make the rules.
Anyway, yeah. 9 years of melonDS. That's quite the achievement. Sometimes I don't realize it has been so long...
-
As far as I'm concerned, there hasn't been a lot -- 2025 has had a real shitty start for me. A lot of stuff came forward that has been really rough.
On the flip side, it has also been the occasion to get a fresh start. I was told about IFS therapy in March, and I was able to get started. It has been intense, but so far it has been way more effective than previous attempts at therapy.
I'm hopeful for 2026 to be a better year. I'm also going to hopefully get started with a new job which is looking pretty cool, so that's nice too.
-
As far as melonDS is concerned, I have several ideas. First of all, we'll release melonDS 1.1 pretty soon, with a nice bundle of fixes and improvements.
I also have ideas for further releases. RTCom support, netplay, ... there's a lot of potential. I guess we can also look at Retroachivements, since that seems to be popular request. There's also the long-standing issues we should finally address, like the lack of upscaling in dual-screen 3D scenes.
Basically, no shortage of things to do. It's just matter of actually doing them. You know how that goes...
I also plan some upgrades to the site. I have some basic ideas for a homepage redesign, for updating the information that is presented, and presenting it in a nicer way... Some organization for the blog would be nice too, like splitting the posts into categories: release posts, technical shito, ...
I also have something else in mind: adding in a wiki, to host all information related to melonDS. The FAQ, help pages specific to certain topics, maybe compatibility info, ...
And maybe, just maybe, a screenshot section that isn't just a few outdated pics. Maybe that could go on the wiki too...
-
Regardless, happy birthday melonDS! And thank you all, you have helped make this possible!
The joys of programming -- by Arisotura
There have been reports of melonDS 1.0 crashing at random when running multiple instances. This is kind of a problem, especially when local multiplayer is one of melonDS's selling points.
So I went and tried to reproduce it. I could get the 1.0 and 1.0RC builds to crash just by having two instances open, even if they weren't running anything. But I couldn't get my dev build to crash. I thought, great, one of those cursed bugs that don't manifest when you try to debug them. I think our team knows all about this.
Not having too many choices, I took the 1.0 release build and started hacking away at it with a hex editor. Basic idea being, if it crashes with no ROM loaded, the emulator core isn't the culprit, and there aren't that many places to look. So I could nop out function calls until it stopped crashing.
In the end, I ended up rediscovering a bug that I had already fixed.
The SDL joystick API isn't thread-safe. When running multiple emulator instances in 1.0, this API will get called from several different threads, since each instance gets its own emulation thread. I had already addressed this 2.5 months ago, by adding adequate locking.
I guess at the time of 1.0, it slipped through the cracks due to its random nature, as with many threading-related bugs.
Regardless, if you know your melonDS lore, you know what is coming soon. There will be a new release which will include this fix, among other fun shit. In the meantime, you can use the nightly builds.
Sneak peek -- by Arisotura
Just showing off some of what I've been working on lately:
This is adjacent to something else I want to work on, but it's also been in the popular request list, since having to manually enter cheat codes into melonDS isn't very convenient: would be much easier to just import them from an existing cheat code database, like the R4 cheat database (also known as usrcheat.dat).
It's also something I've long thought of doing. I had looked at usrcheat.dat and figured the format wasn't very complicated. It was just, you know, actually making myself do it... typical ADHD stuff. But lately I felt like giving it a look. I first wrote a quick PHP parser to make sure I'd gotten the usrcheat.dat format right, then started implementing it into melonDS.
For now, it's in a separate branch, because it's still quite buggy and unfinished, but it's getting there.
The main goal is complete, which is, it parses usrcheat.dat, extracts the relevant entries, and imports them into your game's cheat file. By default, it only shows the game entry which matches your game by checksum, which is largely guaranteed to be the correct one. However, if none is found, it shows all the game entries that match by game code.
It also shows a bunch of information about the codes and lets you choose which ones you want to import. By default, all of them will be imported.
This part is mostly done, besides some minor UI/layout bugs.
The rest is adding new functionality to the existing cheat code editor. I modified the melonDS cheat file format (the .mch files) to add support for the extra information usrcheat.dat provides, all while keeping it backwards compatible, so older .mch files will still load just fine. But, of course, I also need to expose that in the UI somehow.
I also found a bug: if you delete a cheat code, the next one in the list gets its code list erased, due to the way the UI works. Not great.
I'm thinking of changing the way this works: selecting a cheat code would show the information in disabled/read-only fields, you'd click an "Edit" button to modify those fields, then you'd get "Save" or "Cancel" buttons... This would avoid much of the oddities of the current interface.
The rest is a bunch of code cleanup...
Stay tuned!
Not much to say lately... -- by Arisotura
Yeah...
I guess real life matters don't really help. Atleast, my mental health is genuinely improving, so there's that.
But as far as my projects are concerned, this seems to be one of those times where I just don't feel like coding. Well, not quite, I have been committing some stuff to melonDS... just not anything that would be worthy of a juicy blog post. I also had another project: porting my old SNES emulator to the WiiU gamepad. I got it to a point where it runs and displays some graphics, but for now I don't seem motivated to continue working on it...
But I still have some ideas for melonDS.
One idea I had a while ago: using the WFC connection presets to store settings for different altWFC servers. For example, using connection 1 for Wiimmfi, connection 2 for Kaeru, etc... and melonDS would provide some way of switching between them. I don't know how really useful it would be, or how feasible it would be wrt patching requirements, but it could be interesting.
Another idea would be RTCom support. Basically RTCom is a protocol that is used on the 3DS in DS mode to add support for analog sticks and such. It involves game patches and ARM11-side patches to forward input data to RTC registers. The annoying aspect of this is that each game seems to have its own ARM11-side patch, and I don't really look forward to implementing a whole ARM11 interpreter just for this.
But maybe input enhancements, be it RTCom stuff or niche GBA slot addons, would make a good use case for Lua scripting, or some kind of plugin system... I don't know.
There are other big ideas, of course. The planned UI redesign, the netplay stuff, website changes, ...
Oh well.
Fix for macOS builds -- by Arisotura
Not a lot to talk about these days, as far as melonDS is concerned. I have some ideas, some big, some less big, but I'm also being my usual ADHD self.
Anyway, there have been complaints about the macOS builds we distribute: they're distributed as nested zip files.
Apparently, those people aren't great fans of Russian dolls...
The issue is caused by the way our macOS CI works. I'd have to ask Nadia, but I think there isn't really an easy fix for this. So the builds available on Github will have to stay nested.
However, since I control this site's server, I can fix the issue here. So now, all the macOS builds on the melonDS site have been fixed, they're no longer nested. Similarly, new builds should also be fixed as they're uploaded. Let us know if anything goes wrong with this.
DSP HLE: reaching the finish line -- by Arisotura
In the last post, the last things that needed to be added to DSP HLE were the G711 ucode, and the basic audio functions common to all ucodes.
G711 was rather trivial to add. The rest was... more involved. Reason was that adding the new audio features required reworking some of melonDS's audio support.
melonDS was originally built around the DS hardware. As far as sound is concerned, the DS has a simple 16-channel mixer, and the output from said mixer is degraded to 10-bit and PWM'd to the speakers. Microphone input is even simpler: the mic is connected to the touchscreen controller's AUX input. Reading said input gives you the current mic level, and you need to set up a timer to manually sample it at the frequency you want.
So obviously, in the early days, melonDS was built around that design.
The DSi, however, provides a less archaic system for producing and acquiring sound.
Namely, it adds a TI audio amplifier, which is similar to the one found in the Wii U gamepad, for example. This audio amplifier also doubles as a touchscreen controller, for DS backwards compatibility.
Instead of the old PWM stuff, output from the DS mixer is sent to the amplifier over an I2S interface. There is some extra hardware to support that interface. It is possible to set the sampling frequency to 32 KHz (like the DS) or 47 KHz. There is also a ratio for mixing outputs from the DS mixer and the DSP.
Microphone input also goes through an I2S interface. The DSi provides hardware to automatically sample mic input at a preset frequency, and it's even possible to automate it entirely with a NDMA transfer. All in all, quite the upgrade compared to the DS. Oh, and mic input is also routed to the DSP, and the ucodes have basic functions for mic sampling too.
All fine and dandy...
So I based my work on CasualPokePlayer's PR, which implemented some of the new DSi audio functionality. I added support for mixing in DSP audio. The "play sound" command in itself was trivial to implement in HLE, once there was something in place to actually output DSP audio.
I added support for the different I2S sampling frequencies. The 47 KHz setting doesn't affect DS mixer output beyond improving quality somewhat, but it affects the rate at which DSP output plays.
For this, I changed melonDS to always produce audio at 47 KHz (instead of 32 KHz). In 32 KHz mode (or in DS mode), audio output will be resampled to 47 KHz. It was the easiest way to deal with this sort of feature.
Then, I added support for audio output to Teakra. I had to record the audio output to a file to check whether it was working correctly, because it's just too slow with DSP LLE, but a couple bugfixes later, it was working.
Then came the turn of microphone input. The way it's done on the DSP is a bit weird. Given the "play sound" command will DMA sound data from ARM9 memory, I thought the mic commands would work in a similar way, but no. Instead, they just continually write mic input to a circular buffer in DSP memory, and that's about it. Then you need to set up a timer to periodically read that circular buffer with PDATA.
But there was more work to be done around mic input.
I implemented a centralized hub for mic functionality, so buffers and logic wouldn't be duplicated in several places in melonDS. I also changed the way mic input data is fed into melonDS. With the hub in place, I could also add logic for starting and stopping mic recording. This way, when using an external microphone, it's possible to only request the mic when the game needs it, instead of hogging it all the time.
Besides that, it wasn't very hard to make this new mic input system work, including in DSi mode. I made some changes to CasualPokePlayer's code to make it more accurate to hardware. I also added the mic sampling commands to DSP HLE, and added BTDMP input support to Teakra so it can also receive mic input.
The tricky part was getting input from an external mic to play nicely and smoothly, but I got there.
There is only one problem left: libnds homebrew in DSi mode will have noisy mic input. This is a timing issue: libnds uses a really oddball way to sample the DSi mic (in order to keep it compatible with the old DS APIs), where it will repeatedly disable, flush and re-enable the mic interface, and wait to receive one sample. The wait is a polling loop with a timeout counter, but it's running too fast on melonDS, so sometimes it doesn't get sampled properly.
So, this is mostly it. What's left is a bunch of cleanup, misc testing, and adding the new state to savestates.
DSP HLE is done. It should allow most DSP titles to play at decent speeds, however if a game uses an unrecognized DSP ucode, it will fall back to LLE.
The dsp_hle branch also went beyond the original scope of DSP HLE, but it's all good.
DSi mode finally gets microphone input. I believe this was the last big missing feature, so this brings DSi mode on par with DS mode.
The last big thing that might need to be added would be a DSP JIT, if there's demand for this. I might look into it at some point, would be an occasion to learn new stuff.
I'm thinking those changes might warrant a lil' release.
Having fun with DSP HLE -- by Arisotura
If you pay attention to the melonDS repo, you might have noticed the new branch: dsp_hle.
And this branch is starting to see some results...
These screenshots might not look too impressive at first glance, but pay closer attention to the framerates. Those would be more like 4 FPS with DSP LLE.
Also, we still haven't fixed the timing issue which affects the DSi sound app -- I had to hack the cache timings to get it to run, and I haven't committed that. We'll fix this in a better way, I promise.
So, how does this all work?
CasualPokePlayer has done research on the different DSP ucodes in existence. So far, three main classes have been identified:
• AAC SDK: provides an AAC decoder. The DSi sound app uses an early version of this ucode. It is also found in a couple other titles.
• Graphics SDK: provides functions for scaling bitmaps and converting YUV pictures to 15-bit RGB.
• G711 SDK: provides a G711 (A-law and µ-law) codec.
All ucodes also share basic audio functionality, allowing them to play simple sounds (for example, a camera shutter sound) and record microphone input. They are fairly simple, but emulating them in melonDS will require some reworking of the audio system.
It's not like this DSP is used to its fullest here, but hey, it's something.
I first started working on the graphics ucode, which is used in Let's Golf. It's used to scale down the camera picture from 144x144 to 48x48 -- you can see it in the screenshot above.
There's always something satisfying to reverse-engineering things and figuring out how they work. I even wrote a little test homebrew that loads a graphics SDK ucode and tests all the aspects of the bitmap scaling command (and another one for the yuv2rgb command). I even went as far as to work out the scaling algorithms so I could replicate them to the pixel. I also added delays to roughly simulate how long the graphics commands would take on hardware.
The way the scaling command works is a bit peculiar. You give it the size of your source bitmap, then you give it a rectangle within said bitmap, and X/Y scaling factors. Then it takes the specified rectangle of the source bitmap and scales it using the factors.
You can also specify a filtering mode. Nearest neighbor, bilinear and bicubic are supported. Nothing special to say about them, other than the fact bicubic uses somewhat atypical equations.
However, there's also a fourth filtering mode: one-third. This mode does what it says on the tin: it ignores the provided X/Y scaling factors, and scales the provided bitmap down to one third of its original size. The way it works is fairly simple: for every 3x3 block of source pixels, it averages the 8 outer pixels' values and uses that as the destination value. This also means that it requires that the source dimensions be multiples of 3.
Interestingly, the example of Let's Golf would be a perfect candidate for one-third scaling, but they chose bicubic instead.
After this, I felt like looking at the DSi sound app.
The AAC ucode doesn't use the pipe to receive commands. That's why it was working in LLE, despite the bugs that broke the pipe. Instead, commands and parameters are just sent through the CMD1 register. Also, the decoded sound data isn't sent to the audio output directly, but transferred to ARM9 RAM. Hence why the DSi sound app was somehow functional (albeit very slow).
My first step was to figure out what the command parameters mean and what kind of data is sent to the AAC ucode, so I could understand how to use it. I didn't exactly feel like replicating an entire AAC decoder in my HLE implementation, so I went with faad2 instead.
This has been the occasion for me to learn about AAC, MP4 files and such. I even modified Gericom's DSiDsp homebrew to work with the DSi AAC ucode, turning it into the worst MP4 audio player ever.
DSiDsp is an example of AAC decoding, but it's made to work with the 3DS AAC ucode. Besides the differences in the communication protocol and how memory is accessed, they also don't quite take in the same kind of data. The 3DS ucode takes AAC frames with ADTS headers. ADTS is a possible transport layer for AAC audio, where each frame gets a small header specifying the sampling frequency, channel configuration, and other parameters. All fine and dandy.
The DSi AAC ucode, however, doesn't take ADTS or ADIF headers, just raw AAC frames. The sampling frequency and channel configuration are specified in the parameters for the decoder command.
So I had to figure out how to make this work nicely with faad2. I also ran into an issue where the DSi sound app will first send an all-zero AAC frame before sending the correct data, which seems to be a bug in the sound app itself. The AAC ucode doesn't seem to be affected by this (it just returns an error code), but third-party AAC decoders don't like it. faad2 gets put into a bogus state where no further frames can be decoded. fdk-aac barfs over the memory in a seemingly random way, eventually causing a crash. So I had to hack around the issue and ignore all-zero frames.
So melonDS decodes AAC audio now. It's still rough around the edges, but it's pretty neat.
Now I guess the final step would be to reverse-engineer and implement the G711 ucode.
As for LLE options for DSP emulation, some sort of recompiler (JIT or otherwise) would be the way to go. I was thinking about how such a thing might work, but I've never written that sort of thing before. It could be the occasion to learn about this. It would be worthwhile if someone out there is trying to develop for the DSP, but as far as commercial titles are concerned, HLE should cover all the needs, all while being way more efficient.
The AV codec stuff is also adjacent to something else that's on my mind. It's not related to the DSP, but it's something I want to experiment with eventually. If it pans out, I'll let you guys know!
melonDS 1.0 is out -- by Arisotura
Finally, the "proper" melonDS 1.0 release is here. Sorry that it took so long...
Anyway, this is pretty much the same as the 1.0 RC, but we fixed a bunch of bugs that were found in said RC.
Namely, you can now use multiple windows with OpenGL under Windows.
However, depending on how good your OpenGL driver is, doing so may reduce performance. It's due to having multiple OpenGL contexts sharing data, but for now we don't really know what we can do about it. If you have ideas, let us know!
Speaking of multiple windows, I also added a way to tell melonDS windows apart, because things could get pretty confusing. They now get a tag in their title, for example [p1:w2] means first multiplayer instance, second window.
We also merged asie's add-on support PR, so this release includes support for the Motion Pak and the Guitar Grip.
We merged some other PRs. Among which, lower audio latency.
This release also includes the DSi camera fixes that were discussed in the previous posts. DSi titles that ran into issues while trying to use the camera should now work with no problems.
However, since they tend to also use the DSP at the same time, the performance will be abysmal...
DSP HLE is something I want to experiment with, but that will be for a further release.
As far as future plans are concerned, I also want to redesign the site's homepage, but you'll find out when I get around to that.
Anyway, you can find the release on our downloads page, as usual. Enjoy!
Let's Golf: the horseman of apocalypse that wasn't one -- by Arisotura
I figure I need to resolve the dramatic tension from the previous post.
So the issue we had was a screen-sized DMA3 transfer interfering with NDMA1, which is used to transfer camera data...
Shortly after I made that post, I recalled that Jakly mentioned VRAM timings, and started to figure it out.
I said that when I reproduced the setup in a homebrew, it would trigger a data overrun error and get stuck. But there was one thing I was missing: my homebrew was using a camera resolution of 256x192, with DMA set to run every 4 scanlines, the standard stuff. However, Let's Golf uses cropping to achieve a resolution of 144x144, and runs the DMA every 7 scanlines. This means more time between each NDMA1 transfer.
Regarding VRAM, the DSi supports accessing it over the 32-bit bus, instead of the old 16-bit bus. One effect is that this makes it possible to write to VRAM in 8-bit units, which wasn't possible on the DS. Another effect is that it affects timings: for example, a 32-bit DMA transfer from main RAM to VRAM would take 2 cycles per word, instead of 3.
I did the math, and with such timings, the screen-sized DMA3 transfer would have enough time that it could run between two NDMA1 transfers without disrupting the camera operation. But with the 16-bit bus timings, DMA3 would definitely take too long.
I even modified my homebrew to use the same 144x144 resolution as Let's Golf, and added a key to toggle the 32-bit bus for VRAM (via SCFG_EXT9 bit 13). Suddenly, my homebrew was running just fine as long as the 32-bit bus was enabled, but when it was disabled, it would trigger the data overrun error and get stuck.
So, basically, this is nothing fancy, just a case of "this works out of pure luck".
I added support for the new VRAM timings in melonDS. But this wasn't enough, Let's Golf would still get stuck.
I looked at it, and it was still possible that DMA3 would start right before a NDMA1 transfer, which would be the worst time. The camera FIFO would definitely be almost full at that point, and further scanlines would overflow before NDMA1 had a chance to run.
I thought about it, and... there was no way this could work unless there were some double-buffering shenanigans involved.
I modified my homebrew to try and verify this hypothesis. My idea was to start a camera transfer, let it fill and overrun the FIFO, and use a dummy NDMA to track it. The NDMA channel would be configured like for a camera transfer, but it wouldn't actually read camera data, and the length would be set to 1 word. I would just use the NDMA IRQ as a way to sense the camera transfer condition. This way, I could know how many times it fires before hitting the overrun error.
This didn't reveal anything, as the NDMA was only fired once.
My next idea was to wait for the NDMA IRQ, and from that point, count how long it takes for the data overrun to be raised (by checking bit 4 in CAM_CNT). The timings I measured from this definitely suggested that a second block of camera data was being transferred after the NDMA IRQ, before it would raise the data overrun error.
Which, in turn, tended to confirm my hypothesis that there are actually two FIFO buffers for camera data. Thus, when one buffer is filled entirely (ie. as much as the DMA interval allows), if the other buffer is empty, the buffers are swapped and the DMA transfer is fired, otherwise the data overrun error is raised.
So I reworked melonDS's camera interface implementation to take this into account. Fixed a bunch of other issues. Had some "how the fuck did this even work?" moments.
There are still minor issues I need to iron out, but: finally, Let's Golf is fixed, and I haven't seen regressions...
More camera trouble... -- by Arisotura
So I had made a nice post about Let's Golf and how it was fixed...
But, obviously, as far as DSi camera support is concerned, it wasn't all.
I looked at another game that was running into issues with the camera: Assassin's Creed II. The Wanted feature in the menu uses the camera, if you're playing the game on a DSi, but on melonDS, it just showed nothing at all.
Quick investigation showed why.
Normally, when using the camera, games will set up the DMA with a block length of N scanlines, and a total length matching the length of the full camera picture. The DMA channel will also be set to trigger an IRQ when it's done.
However, this game does things differently. The DMA channel has no total length setting, and is just set to repeat infinitely. It transfers picture data to a small temporary buffer, from which the game reads when needed. No idea why they did it this way, but regardless, shows why it's good to emulate things accurately. Anyway, a NDMA channel that is set to "repeat infinitely" will trigger an IRQ after each block, but due to an oversight, melonDS never triggered any IRQ.
After fixing this, I did have the camera feed showing up in the game's UI thing, but it was rolling. Heh. Couldn't have been so simple.
This turned out to be because of the timings for the camera transfer. The timings melonDS used were a big fat guess, and were way too fast for that game.
I dug up my old camera test homebrew and modified it to track camera timings from the DSi. Took some time to figure out the logic behind the numbers I was getting -- there was more time between camera DMA transfers when running in 256x192 mode, than in 640x480 mode. In fact, it makes sense: internally, the camera always runs at 640x480, and the HSync/VSync are the same, but when told to output a 256x192 picture, the camera simply skips some of the scanlines.
Once I understood that fact, I was able to put together a timing model that more closely resembled the real deal. And this fixed Assassin's Creed II -- the camera preview thing was working flawlessly.
I then checked Let's Golf, and... it was rolling, again.
Welcome to emulation whack-a-mole.
But, that's the thing. I have researched this bug long and hard, and I can't figure it out.
The basic issue is as follows: the game uses NDMA1 to transfer camera picture data to main RAM, but it also periodically (every 2 frames) transfers video data from main RAM to VRAM, to be displayed on the top screen, and does so using DMA3 (the old DMA, not NDMA).
Camera output isn't synchronized to the LCD framerate. This means that the DMA3 transfer may occur while a camera transfer is in progress. In melonDS, it meant that the NDMA1 transfer couldn't run because DMA3 was already running.
This highlighed a bug with how melonDS handled camera DMA: it assumed that the "try to fire a DMA transfer" operation would result in a DMA transfer effectively starting, but when that wasn't the case, things went south. I remodeled the camera FIFO to fix this problem (and raise a data overrun error instead of skipping a chunk). Let's Golf was no longer rolling, but it was just getting stuck on the same camera frame.
So clearly there was more to it...
But I can't figure it out.
I ran hardware tests, thinking that maybe NDMA1 should have priority over DMA3. But nope. NDMA can't preempt old DMA, no matter the settings.
I modified my camera homebrew to reproduce what Let's Golf does, and it gets stuck after a couple camera frames or less, with a camera data overrun error. So I can't understand why the game works fine.
I even modified the game itself, to track things like when the camera IRQ occurs, or whether data overrun errors happen, and it revealed nothing.
So, yeah. I'm absolutely stumped.
Who would have thought that a camera thing in a golf game would become a new horseman of apocalypse...
It feels weird after the last post, but I might ship melonDS 1.0 with this broken. But hey, the fixes do fix a bunch of other DSi games.
Windows OpenGL issues fixed, finally! -- by Arisotura
I went on a quest and battled the worst enemy imaginable.
Worse than a thousand orcs.
I fought endless privacy settings screens. Warded off all sorts of bullshit offers. Resisted the temptation to throw my brain in a lake.
I installed Windows 10 on my old laptop Crepe, so I could finally fix the issues with multiple windows and OpenGL.
CasualPokePlayer greatly helped me understand the problem, too.
Basically, due to the way melonDS works, when using OpenGL, we create the GL context on the UI thread, then use it on the emu thread exclusively. This is so that the OpenGL renderers can access OpenGL without needing extra locking. Window redrawing is also done on the emu thread.
The issue was due to how I originally implemented the multi-window mode. When a second window is created, it shares its GL context with its parent window. This way, everything OpenGL related will work on all windows. Except it turned out that the parent context was created on the UI thread, then made current on the emu thread, before the child context was created. Windows doesn't like that, and thus, fails to create the child context.
So it took some reworking to get this working smoothly, but the issue is fixed now.
This means that the proper 1.0 release will be soon -- for real. This issue was the last show stopper, basically.
I also fixed a couple other issues. For one, I added a way to tell melonDS windows apart, since multi-window made things pretty confusing. So now they get a [p1:w1] type tag that says which instance and which window it is. I also fixed a bug with the way windows were parented for second multiplayer instances.
I might try to fix some other misc. stability issues, if I can reproduce them. I will also likely rework the DSi camera timing model a bit, to fix a couple games.
DSP HLE is tempting me, as a next project, but it won't be for 1.0.
DSi bugfixes -- by Arisotura
So, yeah... The current mood around melonDS is "it does most things pretty well already", as far as emulation is concerned. While it's certainly true for the DS side of things, there's still a lot to do with the DSi. And this randomly piqued my interest.
Notably, we still have a bunch of games that can't get ingame, generally hanging on something DSP-related.
So I took one of those games: Let's Golf.
When run for the first time, the game has you create a profile, by first entering your name, then taking a selfie to be used as a profile icon. All fine. However, here the game freezes before you can get to the selfie part.
So I dug into it to try and figure out why it was freezing. We knew it was trying to use the DSP, but not much beyond that... anyway, I extracted the game's DSP binary to get more insight. The DSP was just running its main loop, waiting for incoming commands. On the other hand, the ARM9 was stuck, waiting for... something.
The ARM9 was using the PDATA registers to read the DSP memory, however, that got stuck. The bug turned out to be silly: PDATA reads/writes can have a fixed length set, but the code handling them wasn't calculating that length correctly, so there wasn't enough data being returned. Hence the ARM9 getting stuck.
When fixing that, I could get to the selfie screen thingy.
However, two issues: 1) the camera preview was rolling (as shown above), and 2) when trying to take a picture, it would just freeze.
Third issue is that it runs at like 4 FPS, but that's common to anything DSP-related for now, and we'll address it later.
I decided to first try and fix the camera rolling issue, because I thought that maybe the second issue was linked.
So I first logged what was going on with the camera. What settings are used on the camera, on the interface, and on the associated DMA channel.
cam1: start, width=256 height=192 read=0024 format=0002
CAM TRANSFER: CNT=EE06 CROP=00180038/00A700C6
ARM9 NDMA1 START MODE 0B, 04004204->0221E0E0 LEN=10368 BLK=504 CNT=CB044000
So what does this all mean?
The camera is set to 256x192, nothing special there. However, the settings used on the camera interface enable cropping, which is interesting. Basically, it only retains a 144x144 section of the camera frame.
Which is confirmed by the DMA parameters. The total length is 10368 words, which is the length of a 144x144 frame. The block length is 504 words, which corresponds to 7 scanlines of said frame. This matches the DMA interval setting on the camera interface, too.
So what's the issue there? If you happen to remember, I've discussed the camera interface before. Long story short, it has a FIFO buffer that holds 512 words worth of picture data, and it triggers a DMA transfer every N scanlines, with N=1..16. When modelling this system in melonDS, I had assumed that this N would be a multiple of the frame height.
But the issue became apparent after I looked at my logs: 144 is not a multiple of 7.
Which meant that it was discarding the last 4 scanlines of each camera frame. Hence the rolling.
So I decided to force a final DMA transfer at the end of each camera frame if there's anything left in the FIFO, which seems to be what the hardware does. Either way, this fixed the rolling issue entirely.
But it didn't fix the second issue, which turned out to be unrelated.
Another apparent issue is that the camera input looks very squished in melonDS. It might be good to add a "preserve aspect ratio" scaling mode to avoid that.
So I had to dig deeper, once again. When trying to take a picture, the game would send a command to the DSP to tell it to scale the picture to a different size, then send it a bunch of parameters via PDATA, then... get stuck.
This one proved to be tricky to figure out. I had no real idea what was going wrong... did a lot of logging and tracing, but couldn't really figure it out.
Eventually, CasualPokePlayer enlightened me about what was being sent to the DSP.
The mechanism used to transfer command parameters is what they call the pipe. It's a simple FIFO buffer: the ARM9 writes data into the buffer, then updates the buffer write position, and does all that by using PDATA.
In this situation, the parameters that were being sent for the scaling command looked correct, but the pipe write position was zero, which, according to CasualPokePlayer, was suspicious. He was on to something.
Looking at the code that was determining that value, I found that it was broken because for some reason the pipe length was zero. So I traced writes to that variable. I found that it was part of a bunch of variables that were initialized from DSP memory, through a PDATA read.
Looking closer, there was another bug with PDATA reads, that caused them to be one off. When I fixed that, the freeze was fixed entirely.
Finally, we get to play golf with a goofy face in melonDS. What's not to love!
The game also runs quite well past the profile selfie part. Yep, it uses the DSP just to scale a picture.
Also, looking at the game's DSP binary, I had an idea.
This binary only supports two commands: scaling and yuv2rgb. There's also a separate command channel, and it has one command, for playing sound effects (presumably, the camera shutter sound).
It would be totally feasible to HLE this shit.
According to CasualPokePlayer, most of the games/apps that use the DSP seem to use one or another variant of this binary, with extra features, but same basic idea. The only exception is the DSi sound app, which uses the DSP for AAC decoding.
Obviously, it would still be worth it to pursue a DSP JIT, atleast for the sake of homebrew. But the HLE route seems to also be viable, and it's piquing my interest now. Maybe not for 1.0, but I want to give it a try.
Proper melonDS 1.0 release "soon" -- by Arisotura
Apologies for taking so long to do a proper release.
Regardless, I think most, if not all, of the bugs that were found in the 1.0 RC have been fixed, so expect the proper 1.0 release soon. Hopefully.
The main issue would be the lack of functional Windows CI, but we're working on it...
We also have fun plans in mind for further releases, but we'll see once we get there.
Technical issues #2 -- by Arisotura
You might have noticed the lack of Windows builds on the nightlies page.
Sorry about it. The Windows CI is broken. Our CI expert Nadia has been on it for a while, so hopefully it should be back up again... someday.
On the flip side, the little loser stopped trying to take the server down, so that's atleast one positive.