Homebrew Who wants Duke Nukem 3D (JFDuke?) on the DSi?

Would you like to play Duke Nukem 3D on DSi, in a version with good quality graphics and sound?

  • Yes, certainly, I have always dreamed of it and I can't wait!

    Votes: 35 89.7%
  • No. I don't care. Duke is too vulgar and violent. And it's too difficult for me. I'm not up to it.

    Votes: 4 10.3%

  • Total voters
    39

Nikokaro

Lost philosopher... searching for a way out...
OP
Member
Joined
Feb 3, 2020
Messages
2,189
Trophies
1
Location
Nautilus (under) Lake Como, Italy 🇮🇹
XP
6,771
Country
Italy
You said that the first boss made the game struggle with it. That probably means the software sprite upscaler is having some trouble when the destination image is really huge.
I honestly don't remember saying that....
However, it's okay anyway, since it's not far from the truth: Caribbean's last big boss, flanked by two other henchmen, caused some slowdowns and some jerking in Duke's movements, but then it was okay and game never froze.
Since miracles cannot be performed, all in all, it is an excellent port as it is.
 

Nikokaro

Lost philosopher... searching for a way out...
OP
Member
Joined
Feb 3, 2020
Messages
2,189
Trophies
1
Location
Nautilus (under) Lake Como, Italy 🇮🇹
XP
6,771
Country
Italy
Hi @elhobbs, I noticed that you haven't uploaded any more versions of Jfduke3d to your github. Should we consider the latest one you released as the final one?
I thought if it would be possible to disable the lower screen if needed with a stylus tap or button combination, and then re-enable it if needed. Wouldn't that gain a few extra frames (and greater fluidity of movement) in the main game, as happens with cheretic if you disable the lower screen (map)?
Thank you for everything.
 

elhobbs

Well-Known Member
Member
Joined
Jul 28, 2008
Messages
1,044
Trophies
1
XP
3,034
Country
United States
Hi @elhobbs, I noticed that you haven't uploaded any more versions of Jfduke3d to your github. Should we consider the latest one you released as the final one?
I thought if it would be possible to disable the lower screen if needed with a stylus tap or button combination, and then re-enable it if needed. Wouldn't that gain a few extra frames (and greater fluidity of movement) in the main game, as happens with cheretic if you disable the lower screen (map)?
Thank you for everything.
The bottom screen is not taking much time. It is drawn once and then only the key that is changing state touched/released is drawn each frame.
I did a little bit of profiling and identified some of the hot paths. Which turned out to be exactly what should have been expected - all of the functions that have asm versions for x86. I tried a couple simple things like moving the functions to the itcm section which is supposed to provide some benefit for hot code. It did not help though. I don’t have a lot of arm asm experience, but I plan to give it a try. Whether it will provide any benefit is another story.
I have been pretty busy lately with work/life stuff but I do plan to spend some more time on this.
 

mrparrot2

Well-Known Member
Member
Joined
Nov 29, 2021
Messages
106
Trophies
0
Age
29
Location
SP, Brazil
XP
568
Country
Brazil
The bottom screen is not taking much time. It is drawn once and then only the key that is changing state touched/released is drawn each frame.
I did a little bit of profiling and identified some of the hot paths. Which turned out to be exactly what should have been expected - all of the functions that have asm versions for x86. I tried a couple simple things like moving the functions to the itcm section which is supposed to provide some benefit for hot code. It did not help though. I don’t have a lot of arm asm experience, but I plan to give it a try. Whether it will provide any benefit is another story.
I have been pretty busy lately with work/life stuff but I do plan to spend some more time on this.
A quick way to squeeze some extra performance without too much work is to label those functions you found to be problematic to be compiled with more aggressive optimizations. That way you can even retarget function to be compiled in arm mode instead of thumb for performance.

For example, on my C&C port I have

```
#ifdef _NDS
# define OPTIMIZE_AGGRESSIVELY __attribute__((optimize("Ofast"))) __attribute__((hot)) __attribute__((target("arm")))
#else
# define OPTIMIZE_AGGRESSIVELY
#endif
```

And declare functions as

```
void OPTIMIZE_AGGRESSIVELY function(void) {
...
}
```

On the software engine drawing functions I even set `unroll-loops`. But be careful with this one as it seems GCC have bugs in that optimization and can grow the code size quite well.

Another way if you want to set an entire file with a certain optimization is using pragmas

```
#pragma GCC optimize("Ofast,unroll-loops")
#pragma GCC target("arm")
... game functions
```
 
  • Like
Reactions: Nikokaro

elhobbs

Well-Known Member
Member
Joined
Jul 28, 2008
Messages
1,044
Trophies
1
XP
3,034
Country
United States
A quick way to squeeze some extra performance without too much work is to label those functions you found to be problematic to be compiled with more aggressive optimizations. That way you can even retarget function to be compiled in arm mode instead of thumb for performance.

For example, on my C&C port I have

```
#ifdef _NDS
# define OPTIMIZE_AGGRESSIVELY __attribute__((optimize("Ofast"))) __attribute__((hot)) __attribute__((target("arm")))
#else
# define OPTIMIZE_AGGRESSIVELY
#endif
```

And declare functions as

```
void OPTIMIZE_AGGRESSIVELY function(void) {
...
}
```

On the software engine drawing functions I even set `unroll-loops`. But be careful with this one as it seems GCC have bugs in that optimization and can grow the code size quite well.

Another way if you want to set an entire file with a certain optimization is using pragmas

```
#pragma GCC optimize("Ofast,unroll-loops")
#pragma GCC target("arm")
... game functions
```
I will give the attributes a shot. I did try manually unrolling the loops, but that did not improve anything.
 

elhobbs

Well-Known Member
Member
Joined
Jul 28, 2008
Messages
1,044
Trophies
1
XP
3,034
Country
United States
i have a suggestion.... can you use the built in 3D engine on the DSi to speed it up by it bit? just askin
the ds 3d hardware (which is the same as the dsi 3d hardware) has its own issues. there are 2 main issues:
* very little vram for textures (512k)
* 2000 triangle limit
it also basically requires creating an entirely new 3d engine for the game. it is an enormous amount of work.
so, yes it is possible. it is not a simple solution to implement in a couple hours (not by me anyway).
 
  • Like
Reactions: Tarmfot

sombrerosonic

Idiot machine
Member
Joined
Jan 12, 2022
Messages
1,455
Trophies
2
Location
The Tower of pizza
XP
2,899
Country
United States
the ds 3d hardware (which is the same as the dsi 3d hardware) has its own issues. there are 2 main issues:
* very little vram for textures (512k)
* 2000 triangle limit
it also basically requires creating an entirely new 3d engine for the game. it is an enormous amount of work.
so, yes it is possible. it is not a simple solution to implement in a couple hours (not by me anyway).
hmmm, maybe for intensive, prob would recommend you compressing the the sound to put it into the cash to speed it up as well as a FPS boost
 

mrparrot2

Well-Known Member
Member
Joined
Nov 29, 2021
Messages
106
Trophies
0
Age
29
Location
SP, Brazil
XP
568
Country
Brazil
I added a new build with the optimizations added. I did not notice much difference.
Well... If It did improve some frames that is already a Win.

hmmm, maybe for intensive, prob would recommend you compressing the the sound to put it into the cash to speed it up as well as a FPS boost
He is suggesting you to cache sound decompression to speedup the game a bit.

I suggest not to do that. Instead move the sound decompression to the ARM7, as I did in my C&C port
the ds 3d hardware (which is the same as the dsi 3d hardware) has its own issues. there are 2 main issues:
* very little vram for textures (512k)
* 2000 triangle limit
it also basically requires creating an entirely new 3d engine for the game. it is an enormous amount of work.
so, yes it is possible. it is not a simple solution to implement in a couple hours (not by me anyway).

If someone knows enough opengl witchcraftely to port the Polymost renderer to the DS, that is a maybe maybe.
:D
 
  • Like
Reactions: Tarmfot

elhobbs

Well-Known Member
Member
Joined
Jul 28, 2008
Messages
1,044
Trophies
1
XP
3,034
Country
United States
Well... If It did improve some frames that is already a Win.


He is suggesting you to cache sound decompression to speedup the game a bit.

I suggest not to do that. Instead move the sound decompression to the ARM7, as I did in my C&C port


If someone knows enough opengl witchcraftely to port the Polymost renderer to the DS, that is a maybe maybe.
:D
I have ds 3d hardware support implemented for cquake and cheretic. I know what is involved - which is why I said it can’t be implemented in an hour or two. Which is not to say it can’t be done. I am just not sure it would provide a benefit.
The sounds that are playing are all uncompressed pcm sounds. There is software mixing taking place as well as unsigned/signed conversion. Duke does use a custom multi-part file format with it own event playback system. Anyway sound does not seem to be the main performance issue. The issue seems to be the low level render code. Which is still using the reference unoptimized c implementation. This code has optimized assembly versions for the pc version. My own profiling showed this code as the slow spot - and the fact that assembly versions exist for the pc versions would seem to indicate this is warranted.
 
  • Like
Reactions: Tarmfot

elhobbs

Well-Known Member
Member
Joined
Jul 28, 2008
Messages
1,044
Trophies
1
XP
3,034
Country
United States
I have ds 3d hardware support implemented for cquake and cheretic. I know what is involved - which is why I said it can’t be implemented in an hour or two. Which is not to say it can’t be done. I am just not sure it would provide a benefit.
The sounds that are playing are all uncompressed pcm sounds. There is software mixing taking place as well as unsigned/signed conversion. Duke does use a custom multi-part file format with it own event playback system. Anyway sound does not seem to be the main performance issue. The issue seems to be the low level render code. Which is still using the reference unoptimized c implementation. This code has optimized assembly versions for the pc version. My own profiling showed this code as the slow spot - and the fact that assembly versions exist for the pc versions would seem to indicate this is warranted.
I added a couple more optimizations. It shifted the highest cost functions around a bit but it does not seem to have made much of a difference. In any case there is a new build if anyone cares to take a look.
 

Nikokaro

Lost philosopher... searching for a way out...
OP
Member
Joined
Feb 3, 2020
Messages
2,189
Trophies
1
Location
Nautilus (under) Lake Como, Italy 🇮🇹
XP
6,771
Country
Italy
In any case there is a new build if anyone cares to take a look.
Don't doubt, it matters a lot to yours truly as well as to many other users, who have shown interest and participated in the debate since the opening of this thread. 😉
For my part, this surprising and unexpected port has made me completely abandon interest in the 3DS version of the game.
Regarding this build, that there has been an improvement is hard to notice, lacking obvious objective evidence (the on-screen fps count).
 

CrashMidnick

Well-Known Member
Member
Joined
Jul 22, 2015
Messages
733
Trophies
0
Age
41
XP
2,849
Country
France
I added a couple more optimizations. It shifted the highest cost functions around a bit but it does not seem to have made much of a difference. In any case there is a new build if anyone cares to take a look.

Thanks for this update :)
Do you think it would be possible to have a FPS counter ? It is quite difficult to track down improvements or not without it.

I tested the very 1st level of standard duke 3D, it seems to be a little faster but I am not sure about that. Thx.
 

elhobbs

Well-Known Member
Member
Joined
Jul 28, 2008
Messages
1,044
Trophies
1
XP
3,034
Country
United States
Thanks for this update :)
Do you think it would be possible to have a FPS counter ? It is quite difficult to track down improvements or not without it.

I tested the very 1st level of standard duke 3D, it seems to be a little faster but I am not sure about that. Thx.
Use the keyboard and type “dnrate”
 

mrparrot2

Well-Known Member
Member
Joined
Nov 29, 2021
Messages
106
Trophies
0
Age
29
Location
SP, Brazil
XP
568
Country
Brazil
Another thing that can be done to improve performance is replace the ARM unoptimized libc functions from newlib with the ones provided by agbabi. Those are ARM7 optimized, but surely runs faster than those provided by newlib even on ARM9.

Here is what I done. I had to patch the asm functions to declare the asm symbols as STT_FUNC, else the game crashed when I called them from C++

https://github.com/giulianobelinassi/Vanilla-Conquer/blob/ndsi_new/common/memcpy.s

(Original repo: https://github.com/felixjones/agbabi )
 

elhobbs

Well-Known Member
Member
Joined
Jul 28, 2008
Messages
1,044
Trophies
1
XP
3,034
Country
United States
Another thing that can be done to improve performance is replace the ARM unoptimized libc functions from newlib with the ones provided by agbabi. Those are ARM7 optimized, but surely runs faster than those provided by newlib even on ARM9.

Here is what I done. I had to patch the asm functions to declare the asm symbols as STT_FUNC, else the game crashed when I called them from C++

https://github.com/giulianobelinassi/Vanilla-Conquer/blob/ndsi_new/common/memcpy.s

(Original repo: https://github.com/felixjones/agbabi )
hardly conclusive, but without this version of memcpy the first level sits bounces between 20 and 21. with this version it stays steady at 20. either way it does not seem to impact performance much.
I did create a new build where I changed the makefile to generate arm code instead of thumb code. it does show an ever so slight performance increase. though it does push the nds file size over 1MB. this is not really an issue for dsi mode, but nds mode is not going to be happy with this - though I cant say nds mode has been a priority for me at the moment.
edit: I created a new build with some minimal changes. I had left some profiling code enabled which is now disabled and I enabled lto. each piece not really impressive, but all of the optimizations together are almost noticeable!
 
Last edited by elhobbs,

mrparrot2

Well-Known Member
Member
Joined
Nov 29, 2021
Messages
106
Trophies
0
Age
29
Location
SP, Brazil
XP
568
Country
Brazil
hardly conclusive, but without this version of memcpy the first level sits bounces between 20 and 21. with this version it stays steady at 20. either way it does not seem to impact performance much.
I did create a new build where I changed the makefile to generate arm code instead of thumb code. it does show an ever so slight performance increase. though it does push the nds file size over 1MB. this is not really an issue for dsi mode, but nds mode is not going to be happy with this - though I cant say nds mode has been a priority for me at the moment.
edit: I created a new build with some minimal changes. I had left some profiling code enabled which is now disabled and I enabled lto. each piece not really impressive, but all of the optimizations together are almost noticeable!

You can also enable GCC's Profile Guided Optimizations. That may improve performance some more. Although I am not sure if you can use it while cross-compiling.
https://developer.ibm.com/articles/gcc-profile-guided-optimization-to-accelerate-aix-applications/

There is -Ofast too.

But having the entire build in ARM instead of THUMB does not seems to be a good idea. You are bumping code size of large code that is rarely used (like menus). If you could find the pieces that benefit from having ARM code it is better to use #pragma's to compile them in this way.
 
  • Like
Reactions: Tarmfot

Site & Scene News

Popular threads in this forum

General chit-chat
Help Users
  • SylverReZ @ SylverReZ:
    @Sonic Angel Knight, Is that SAK I see. :ninja:
  • BigOnYa @ BigOnYa:
    What a weird game
  • K3Nv2 @ K3Nv2:
    Yeah I wanted to see shards of the titanic
  • BigOnYa @ BigOnYa:
    I kept thinking jaws was gonna come up and attack
  • K3Nv2 @ K3Nv2:
    Jaws is on a diet
  • K3Nv2 @ K3Nv2:
    Damn power went out
  • BigOnYa @ BigOnYa:
    Ok xdqwerty, your little bro prob tripped On the cord and unplugged you
  • K3Nv2 @ K3Nv2:
    Ya I'm afraid of the dark hug me
  • BigOnYa @ BigOnYa:
    Grab and hold close your AncientBoi doll.
  • K3Nv2 @ K3Nv2:
    Damn didn't charge my external battery either
  • BigOnYa @ BigOnYa:
    Take the batteries out of your SuperStabber3000... Or is it gas powered?
  • K3Nv2 @ K3Nv2:
    I stole batteries from your black mamba
    +1
  • K3Nv2 @ K3Nv2:
    My frozen food better hold up for an hour I know that
  • BigOnYa @ BigOnYa:
    Or else gonna be a big lunch and dinner tomorrow.
  • BigOnYa @ BigOnYa:
    Did you pay your power bill? Or give all yo money to my wife, again.
  • K3Nv2 @ K3Nv2:
    Oh good the estimated time is the same exact time they just said
    +1
  • BigOnYa @ BigOnYa:
    Load up your pc and monitor, and head to a McDonalds dining room, they have free WiFi
  • K3Nv2 @ K3Nv2:
    Sir please watch your porn in the bathroom
    +1
  • BigOnYa @ BigOnYa:
    No sir we can not sell you anymore apple pies, after what you did with the last one.
  • K3Nv2 @ K3Nv2:
    We ran out
  • HiradeGirl @ HiradeGirl:
    for your life
    +1
  • K3Nv2 @ K3Nv2:
    My life has no value my fat ass is staying right here
    K3Nv2 @ K3Nv2: My life has no value my fat ass is staying right here