Homebrew TWLbf - a tool to brute force DSi Console ID or EMMC CID

enderghast13

Member
Newcomer
Joined
Jun 8, 2017
Messages
5
Trophies
0
XP
170
Country
United States
Thanks to all of you for sharing!


Thank you for the detailed report and our first EMMC CID report! I assume they're all US region based on your location?


I don't know much about that, aren't 3DS like totally hacked already?
And this tool actually doesn't support 3DS TWL FIRM, they're encrypted differently according to GBATEK and TWLTool.
I could add support to this if such needs arise though.
Yes, these are all US region, sorry for forgetting to specify.
About my 3DS question, I was just wondering if they would be useful for archival or research purposes as there isn't much documentation on 3DS Console IDs, though I do understand that bruteforcing 3DS Console IDs is beyond the scope of this tool and practically useless to do anyways.
 
  • Like
Reactions: JimmyZ

JimmyZ

Sarcastic Troll
OP
Member
Joined
Apr 2, 2009
Messages
681
Trophies
0
XP
762
Country
Zimbabwe
I've had a look at polarssl and openssl some years ago when trying to "understand how AES works"... openssl looked very confusing, and polarssl looked a bit straighter (but still very confusing and overcomplicated)... anyways, as far as I remember polarssl did support AES hardware acceleration, too. So both might be same as long as you have a PC with AES-NI support (which seems to have been invented in 2010).
Yes PolarSSL supported AES-NI, but the version included in TWLTool doesn't, otherwise TWLbf with mbed TLS(PolarSSL's new name) won't be 3 times faster than previous TWLTool brute force mod.

For multi-core CPUs, I wonder if each core is having its own AES hardware? If not, then multi-threading won't actually speedup the calculations.
AES-NI is a new set of instructions, I believe each core has it's own AES-NI like MMX/SSE/AVX, it's not like a separate AES engine device on DSi.

Is that using an "optimized" SHA1 function? One older optimization mentioned here https://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1 uses gerneral-purpose SSSE3 instructions (this should be also implemented in openssl).
I really don't know, I just call SHA1 from OpenSSL and/or mbed TLS, I suppose I couldn't do much better than them, at least in a short amount of time.

OpenSSL forces us to use a high level interface called EVP to utilize hardware specific optimizations like AES-NI, the result is, on large blocks(like 8K bytes) EVP AES can be 5 times faster with AES-NI, and 1.8 times faster than mbed TLS with AES-NI(supposedly for lacking pipeline optimize), but, if on 16 bytes blocks, mbed TLS is 2.5 time faster than OpenSSL EVP. I talked about this in the github readme.

And, for SHA1, EVP interface and low level interface is about the same speed(you can test this with "openssl speed -evp sha1" and "openssl speed sha1"), the later is a tiny bit faster than EVP on 16 bytes blocks(which is our use case). so it's either not hardware specific optimized or on the same level of optimization, I suspect the 1st guess, I suppose in crypto libs targeting TLS, SHA1 performance is not that important. apparently SHA1 in OpenSSL uses lots of exotic SIMD optimizations including SSE/AVX(2)/SHAEXT, but when working on 16 bytes blocks, it's only about 1?% faster than mbed TLS's C code.

And newer intel processors should have extra opcodes SHA1RNDS4, SHA1NEXTE, SHA1MSG1/2 (not sure if/when/where that's supported, intel announced that stuff in 2013, but some other webpage mentioned it not being implemented until 2016, or so).

My ignorance! I've never heard of those SHA1 instructions, a quick search show they're used in OpenSSL but not mbed TLS
https://github.com/openssl/openssl/search?utf8=✓&q=sha1rnds4&type=
https://github.com/ARMmbed/mbedtls/search?utf8=✓&q=sha1rnds&type=

And, another (small) optimization would be appending the sha1-end-byte and sha1-padding-bytes to the CID, and then passing that directly to the 64-byte-sha1-core function (ie. avoiding the same padding to be repeated on each calculation).
Yeah I thought about that too, but my assumption is it will not impact the performance too much?
It might benefit more from preparing a series of CIDs in 512 bit blocks and feed it to a pipeline optimized sha1-core function, but this will only benefit the EMMC CID brute which by my opinion is less useful, and requires me to write/modify crypto lib code, as of now I just call them.

I thought the MBR and DSi partitions are using the same encryption on 3DS? That should be somewhat required to be so for DSi backwards compatibility. The MBR may contain different/extra data on 3DS (so brute forcing may fail when searching for certain "fixed" values in the MBR).
For the ConsoleID, I think the 3DS does have it's own "3DS ID" (for whatever 3DS things), and separate/crippled "DSi ID" (for DSi-style eMMC encryption). The latter one being reported to be 6B27D20002000000h on one n3DS console.
I'd admit I don't know most of this, but TWLTool had a 3DS TWL FIRM brute routine, which IIRC doesn't seem to be the same, but on the other hand it's not documented anywhere so it might be testing code that doesn't work yet.

BTW, I saw TWLTool's 3DS brute expecting 16 bytes at offset 0x1e0 to be all zero, but I found out that's not true on DSi for having a 3rd partition, so I moved to verify offset 0x1f0 instead, which by coincidence is similar to your efforts as documented in that NGemu thread. Does 3DS TWL FIRM actually only had two partitions?

About my 3DS question, I was just wondering if they would be useful for archival or research purposes as there isn't much documentation on 3DS Console IDs, though I do understand that bruteforcing 3DS Console IDs is beyond the scope of this tool and practically useless to do anyways.
I'm really not the right person to answer this, but I've never heard anybody in the 3DS hacking community talking about this, it might be a sign?

Is there a way to dump the nand without hardmod yet?
I believe that requires you to have a working dsiwarehax to run fwtool.
 
Last edited by JimmyZ,

JimmyZ

Sarcastic Troll
OP
Member
Joined
Apr 2, 2009
Messages
681
Trophies
0
XP
762
Country
Zimbabwe
For multi-core CPUs, I wonder if each core is having its own AES hardware? If not, then multi-threading won't actually speedup the calculations.
I tried to run 4 processes on a Windows 10 i5-3450(4C4T 3.1GHz) like this:
Code:
@echo off
start /b /belownormal /affinity 1 twlbf_mbedtls console_id_bcd 0820100000000000 ab6778e02d034d303046504100001500 001f 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa
start /b /belownormal /affinity 2 twlbf_mbedtls console_id_bcd 0820200000000000 ab6778e02d034d303046504100001500 001f 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa
start /b /belownormal /affinity 4 twlbf_mbedtls console_id_bcd 0820300000000000 ab6778e02d034d303046504100001500 001f 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa
start /b /belownormal /affinity 8 twlbf_mbedtls console_id_bcd 0820400000000000 ab6778e02d034d303046504100001500 001f 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa
@echo on
they finished in 775, 781, 797, 808 seconds respectively. as opposed to running a single process it cost 764 seconds, it scales quite well as expected.

I'll do a hyper-threaded test later.
 
Last edited by JimmyZ,
  • Like
Reactions: daxtsu

JimmyZ

Sarcastic Troll
OP
Member
Joined
Apr 2, 2009
Messages
681
Trophies
0
XP
762
Country
Zimbabwe
Hyper Threading test on Linux, Xeon E3-1230v2 (4C8T 3.3GHz):
1 process costed 680 seconds.
4 processes with -c 1, 3, 5, 7 costed 917, 917, 918, 918 seconds respectively. hmm, interesting.
8 processes with -c 0, 1, 2, 3, 4, 5, 6, 7 costed 968, 968, 968, 968, 968, 968, 968, 968 seconds respectively. what?
Code:
cat /proc/cpuinfo |egrep "processor|core id"
processor       : 0
core id         : 0
processor       : 1
core id         : 1
processor       : 2
core id         : 2
processor       : 3
core id         : 3
processor       : 4
core id         : 0
processor       : 5
core id         : 1
processor       : 6
core id         : 2
processor       : 7
core id         : 3
silly me, should have used -c 4, 5, 6, 7 instead, also, this means hyper threading rocks for TWLbf.
re-run 4 processes with -c 4, 5, 6, 7 costed 721, 721, 721, 721 seconds respectively, that looks about right.

BTW Linux test commands was like this:
Code:
taskset -c 4 ./twlbf_mbedtls console_id_bcd 0820100000000000 ab6778e02d034d303046504100001500 001f 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa &
taskset -c 5 ./twlbf_mbedtls console_id_bcd 0820200000000000 ab6778e02d034d303046504100001500 001f 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa &
taskset -c 6 ./twlbf_mbedtls console_id_bcd 0820300000000000 ab6778e02d034d303046504100001500 001f 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa &
taskset -c 7 ./twlbf_mbedtls console_id_bcd 0820400000000000 ab6778e02d034d303046504100001500 001f 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa &

--------------------- MERGED ---------------------------

damn so you can bruteforce all you need in 15 minutes? f*** the software exploits man, this is awesome ! :D
If the first 5 digits collections and BCD assumptions are right and if you're testing a XL, yeah, like that :D
 
Last edited by JimmyZ,
  • Like
Reactions: FR0ZN and daxtsu

FR0ZN

Well-Known Member
Member
Joined
Nov 2, 2013
Messages
1,385
Trophies
1
Age
37
XP
3,887
Country
United States
Let's say the 14th digit is indeed always 1, would your application also accept this?:

Code:
@echo off
start /b /belownormal /affinity 1 twlbf_mbedtls console_id_bcd 0820100000000100 ab6778e02d034d303046504100001500 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa 001f
start /b /belownormal /affinity 2 twlbf_mbedtls console_id_bcd 0820200000000100 ab6778e02d034d303046504100001500 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa 001f
start /b /belownormal /affinity 4 twlbf_mbedtls console_id_bcd 0820300000000100 ab6778e02d034d303046504100001500 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa 001f
start /b /belownormal /affinity 8 twlbf_mbedtls console_id_bcd 0820400000000100 ab6778e02d034d303046504100001500 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa 001f
@echo on

Because that would narrow down the search area even further to 10^10 possibilities (maximum of 10000000000 tries per core / thread instead of 25937424601) if I'm not mistaken.
 

JimmyZ

Sarcastic Troll
OP
Member
Joined
Apr 2, 2009
Messages
681
Trophies
0
XP
762
Country
Zimbabwe
Let's say the 14th digit is indeed always 1, would your application also accept this?:

Code:
@echo off
start /b /belownormal /affinity 1 twlbf_mbedtls console_id_bcd 0820100000000100 ab6778e02d034d303046504100001500 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa 001f
start /b /belownormal /affinity 2 twlbf_mbedtls console_id_bcd 0820200000000100 ab6778e02d034d303046504100001500 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa 001f
start /b /belownormal /affinity 4 twlbf_mbedtls console_id_bcd 0820300000000100 ab6778e02d034d303046504100001500 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa 001f
start /b /belownormal /affinity 8 twlbf_mbedtls console_id_bcd 0820400000000100 ab6778e02d034d303046504100001500 1ced45c75e810bb6b51a5318e0fc5eee 000000000000000000000000000055aa 001f
@echo on

Because that would narrow down the search area even further to 10^10 possibilities (maximum of 10000000000 tries per core / thread instead of 25937424601) if I'm not mistaken.
No need, it's already been taken care of.
 
  • Like
Reactions: FR0ZN

wsquan171

Well-Known Member
Member
Joined
Feb 14, 2015
Messages
289
Trophies
0
XP
669
Country
China
I'm lucky enough to have both Biggest Loser and Field Runners on my DSi so I'm able to get both Console ID and eMMC CID without doing a hardmod. To help out as much as I can, and considering DSi family life cycle has terminated, I'll share all the info here.

The console is a regular sized USA region Cyan DSi
Code:
Serial No. TW408006543
eMMC CID: 2CDF570409034D303046504100001500
Console ID: 7da02160-08a2144507087128

Yes I know first 4 bytes of Console ID is not needed but I'd like to post it here anyways. Since I didn't open up my console can't provide eMMC label here.

Good work OP. Hope this will help a little bit. Good luck!
 
Last edited by wsquan171,
  • Like
Reactions: JimmyZ

JimmyZ

Sarcastic Troll
OP
Member
Joined
Apr 2, 2009
Messages
681
Trophies
0
XP
762
Country
Zimbabwe
Is that using an "optimized" SHA1 function? One older optimization mentioned here https://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1 uses gerneral-purpose SSSE3 instructions (this should be also implemented in openssl). And newer intel processors should have extra opcodes SHA1RNDS4, SHA1NEXTE, SHA1MSG1/2 (not sure if/when/where that's supported, intel announced that stuff in 2013, but some other webpage mentioned it not being implemented until 2016, or so).
I looked up a bit, it's really an interesting subject, the only Intel architecture supporting SHAEXT is Goldmont(Atom) release April 2016, later Kaby Lake(Oct 2016) and just released Coffee Lake still doesn't support this, for Intel desktop, we'd wait until Cannonlake expected H1 2018. On the other hand AMD supported SHAEXT in Ryzen released Feb 2017.

And, another (small) optimization would be appending the sha1-end-byte and sha1-padding-bytes to the CID, and then passing that directly to the 64-byte-sha1-core function (ie. avoiding the same padding to be repeated on each calculation).
I was considering an OpenCL port so I dug out the SHA1 code from mbed TLS, thought might as well give it a go as you suggested, with the same brute range it went from 122 seconds to 105 seconds, not that impressive but a substantial improvement.

I'm lucky enough to have both Biggest Loser and Field Runners on my DSi so I'm able to get both Console ID and eMMC CID without doing a hardmod. To help out as much as I can, and considering DSi family life cycle has terminated, I'll share all the info here.

The console is a regular sized USA region Cyan DSi
Code:
Serial No. TW408006543
eMMC CID: 2CDF570409034D303046504100001500
Console ID: 7da02160-08a2144507087128

Yes I know first 4 bytes of Console ID is not needed but I'd like to post it here anyways. Since I didn't open up my console can't provide eMMC label here.

Good work OP. Hope this will help a little bit. Good luck!
Thank you! 08A21, that's new!
 
Last edited by JimmyZ,

JimmyZ

Sarcastic Troll
OP
Member
Joined
Apr 2, 2009
Messages
681
Trophies
0
XP
762
Country
Zimbabwe
Played a little with OpenCL, ported sha1 over, test show about 100x faster on a AMD HD7970 than (single threaded)Xeon E3-1230v2, so we can probably brute 3~4 bits more in the same time frame, not very impressive but not bad for a fun project :D

code is here:
https://github.com/Jimmy-Z/bfCL

update: about 200x faster on friend's Fury X, and 70x faster on another friend's GTX980
 
Last edited by JimmyZ,
  • Like
Reactions: Valery0p

FR0ZN

Well-Known Member
Member
Joined
Nov 2, 2013
Messages
1,385
Trophies
1
Age
37
XP
3,887
Country
United States
Played a little with OpenCL, ported sha1 over, test show about 100x faster on a AMD HD7970 than (single threaded)Xeon E3-1230v2, so we can probably brute 3~4 bits more in the same time frame, not very impressive but not bad for a fun project :D

code is here:
https://github.com/Jimmy-Z/bfCL

update: about 200x faster on friend's Fury X, and 70x faster on another friend's GTX980

So with that we should be able to walk the entire 08***********1** range in the same time?
(13^10 = 137858491849 - a lot of possibilities)
 

Site & Scene News

Popular threads in this forum

General chit-chat
Help Users
  • No one is chatting at the moment.
    Y @ YuseiFD: :creep: