Skip to content

WinArm64-MSVC#1910

Draft
qurious-pixel wants to merge 3 commits into
cemu-project:mainfrom
qurious-pixel:win-arm64-msvc
Draft

WinArm64-MSVC#1910
qurious-pixel wants to merge 3 commits into
cemu-project:mainfrom
qurious-pixel:win-arm64-msvc

Conversation

@qurious-pixel

Copy link
Copy Markdown
Contributor

Build Windows on Arm using MSVC

@qurious-pixel qurious-pixel marked this pull request as draft May 14, 2026 07:44

@Crementif Crementif left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: I see that this has now been marked as a draft PR. But since I started reviewing this before then. I hope my feedback will guide you through a compatible solution that can be merged, because I think as-is there's some concerns.


Thanks for making this PR. It would definitely be nice to support windows ARM support in this project, with the assumption that the code is properly written and doesn't increase complexity for other architectures. So the fixes in this PR would ideally be as small as possible, and extend our existing ARM code so that its arm support extends to Windows. I do think this PR is kind of chaotic. From what I can see, a big issue seems to be the assembly, at least with the dependencies. Its preferable if we used the upstream version.

I did see a number of tells that a decent amount of AI usage was involved with these changes. Although its fine to use AI while writing code, we have some hesitations about PRs like this that change existing code in multiple places, potentially affecting other platforms as well. Reviewing code takes a good chunk of time away from working on our own projects within Cemu, and with the increase of AI PRs, we generally will be quite conservative with what to merge.

So, ideally in the next PR, please declare whether you've used AI, and if yes, know that we expect you to understand the changes it has made and whether those changes match the quality of what you or any other maintainer would've written manually. This includes checking the coding guidelines, bundling up unrelated changes and making a precise, targeted PR.

{
this->type = PPCREC_IML_TYPE_FPR_COMPARE;
this->operation = -999;
this->operation = 255;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, I thought you made this change so that the value fits some uint8 type. But looking further, this type is larger then uint8... so why make this change? Is this to prevent a compiler warning? These random changes require a lot of decipher and double-checking, so I would just avoid them in the first place. Splitting up your PR into multiple changes would've also allowed to document these things.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

operation is actually a uint8, so assigning -999 causes a warning. It's just something we didn't bother to clean up yet. I will say that personally I prefer feature PRs to be focused on the feature and not randomly clean up or refactor other things unless it relates to the feature in some way. Probably the LLM saw the warning and decided to take care of it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen this error for a long time, and was trying to clean up some of the noise when building. can revert as before. just noise.

Comment on lines +38 to +39
BARRIER_FENCE();
uint64 tscStart = READ_TSC();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__rdtsc was already being used our arm code. Why not reuse that instead of redefining this. Same with Multiply64to128 etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the MSVC ARM code does not have the headers as-is with the Windows ARM runner. I can "install" them in to the image and use as is, but it will fail to link using the reusable code.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qurious-pixel __rdtsc is already implemented for arm. I'm not sure if MSVC uses the __aarch64__ preprocessor, but if not you'd have to add the _M_ARM64 there and wherever else it is used.

Comment thread src/Cafe/HW/Espresso/PPCTimer.cpp
Comment on lines -54 to -68
uint64 tscEnd = __rdtsc();
// derive frequency approximation from measured time difference
uint64 tscEnd = READ_TSC();

uint64 tsc_diff = tscEnd - tscStart;
uint64 hrtFreq = 0;
uint64 hrtDiff = HighResolutionTimer::getTimeDiffEx(startTick, stopTick, hrtFreq);
uint64 tsc_freq = muldiv64(tsc_diff, hrtFreq, hrtDiff);

// uint64 freqMultiplier = tsc_freq / hrtFreq;
//cemuLog_log(LogType::Force, "RDTSC measurement test:");
//cemuLog_log(LogType::Force, "TSC-diff: 0x{:016x}", tsc_diff);
//cemuLog_log(LogType::Force, "TSC-freq: 0x{:016x}", tsc_freq);
//cemuLog_log(LogType::Force, "HPC-diff: 0x{:016x}", qpc_diff);
//cemuLog_log(LogType::Force, "HPC-freq: 0x{:016x}", (uint64)qpc_freq.QuadPart);
//cemuLog_log(LogType::Force, "Multiplier: 0x{:016x}", freqMultiplier);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, comments removed.

EDIT: On hindsight. This is an issue everywhere. But I've already made a few comments on it so I'll just leave it.

Comment thread src/Cafe/CMakeLists.txt Outdated
Comment thread dist/windows/Cemu.manifest Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole file was rewritten by AI? A lot of changes, removal of comments and making things shorter? I'm not sure what you changed. It seems like a lot of files are now no longer compiled? I would prefer to keep the source code of dependencies the same, if that's possible.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's some upstream changes that are merged since we forked it. Would be more ideal to not diverge from that source code.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check if ih264d had any upstream changes to make it compile-able on MSVC ARM?

SetThreadContext(hThread, &ctx);
}
ResumeThread(hThread);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You added support for setting the breakpoint registers for the debugger. But when I check the ExceptionHandler_win32.cpp code, I'm quite sure that this is not actually handled? I feel like this should've come up if you had tested it, which would've been nice. You can use the PPC debugger to make a write/read breakpoint in the PPC debugger at some memory address.

@qurious-pixel

Copy link
Copy Markdown
Contributor Author

NOTE: I see that this has now been marked as a draft PR. But since I started reviewing this before then. I hope my feedback will guide you through a compatible solution that can be merged, because I think as-is there's some concerns.

Thanks for making this PR. It would definitely be nice to support windows ARM support in this project, with the assumption that the code is properly written and doesn't increase complexity for other architectures. So the fixes in this PR would ideally be as small as possible, and extend our existing ARM code so that its arm support extends to Windows. I do think this PR is kind of chaotic. From what I can see, a big issue seems to be the assembly, at least with the dependencies. Its preferable if we used the upstream version.

I did see a number of tells that a decent amount of AI usage was involved with these changes. Although its fine to use AI while writing code, we have some hesitations about PRs like this that change existing code in multiple places, potentially affecting other platforms as well. Reviewing code takes a good chunk of time away from working on our own projects within Cemu, and with the increase of AI PRs, we generally will be quite conservative with what to merge.

So, ideally in the next PR, please declare whether you've used AI, and if yes, know that we expect you to understand the changes it has made and whether those changes match the quality of what you or any other maintainer would've written manually. This includes checking the coding guidelines, bundling up unrelated changes and making a precise, targeted PR.

Thanks for the feedback. Set this as draft since the Windows on Arm code for MSVC is giving some issues i did not experience with Clang Arm for Windows. Specifically a thread lock when closing Cemu. And interaction between WxWidgets windows and the backend.

@qurious-pixel qurious-pixel force-pushed the win-arm64-msvc branch 3 times, most recently from 276c226 to c52b5dd Compare June 6, 2026 19:23
Comment thread src/Common/Intrinsics.h
@@ -0,0 +1,101 @@
#pragma once

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intrinsics are already defined in precompiled.h, any new ones should probably stay there.

Comment thread src/Common/Intrinsics.h
Comment on lines +55 to +101
inline uint64 portable_udiv128(uint64 high, uint64 low, uint64 denominator, uint64* remainder) {
#if defined(_MSC_VER)
#if defined(_M_X64)
return _udiv128(high, low, denominator, remainder);
#else
if (high == 0) {
if (remainder) *remainder = low % denominator;
return low / denominator;
}

uint64 rem = 0;
uint64 quot = 0;
for (int i = 63; i >= 0; i--) {
rem = (rem << 1) | (high >> 63);
high <<= 1;
if (rem >= denominator) {
rem -= denominator;
quot |= (1ULL << i);
}
}
for (int i = 63; i >= 0; i--) {
rem = (rem << 1) | (low >> 63);
low <<= 1;
if (rem >= denominator) {
rem -= denominator;
quot |= (1ULL << i);
}
}
// Secure native software fallback block
uint64_t q = 0;
uint64_t r = 0;
for (int i = 127; i >= 0; i--) {
r = (r << 1) | ((i >= 64 ? high >> (i - 64) : low >> i) & 1);
if (r >= denominator) {
r -= denominator;
q |= (1ULL << (i % 64));
}
}
if (remainder) *remainder = r;
return q;
#endif
#else
unsigned __int128 dividend = ((unsigned __int128)high << 64) | low;
if (remainder) *remainder = (uint64)(dividend % denominator);
return (uint64)(dividend / denominator);
#endif
}

@exverge-0 exverge-0 Jun 8, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_udiv128 and _umul128 is an MSVC intrinsic which should be avaliable for aarch64, and is already implemented for other compilers in precompiled.hpp otherwise

Comment thread CMakeSettings.json
Comment on lines +19 to +26
{
"name": "ARM64-Release",
"configurationType": "Release",
"generator": "Ninja",
"inheritEnvironments": [ "msvc_arm64_x64" ],
"buildRoot": "${projectDir}\\out\\build\\${name}",
"installRoot": "${projectDir}\\out\\install\\${name}"
},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's different in this vs. Release? CMake configs should be compatible cross platforms

Comment thread CMakeLists.txt
Comment on lines +54 to +55
message(STATUS "CPU Arch is: $ENV{PROCESSOR_ARCHITECTURE}")
if($ENV{PROCESSOR_ARCHITECTURE} MATCHES "x86_64|amd64|AMD64")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably use CEMU_ARCHITECTURE for consistency

Comment thread src/Common/precompiled.h
Comment on lines +387 to +392
#if defined(_MSC_VER)
__dmb(_ARM64_BARRIER_ISH); // Inner Shareable Data Memory Barrier
#else
asm volatile("" ::: "memory");
std::atomic_thread_fence(std::memory_order_seq_cst);
#endif

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory_order_seq_cst should compile to ISH anyways so this is redundant here, just use C++'s atomic fence

Suggested change
#if defined(_MSC_VER)
__dmb(_ARM64_BARRIER_ISH); // Inner Shareable Data Memory Barrier
#else
asm volatile("" ::: "memory");
std::atomic_thread_fence(std::memory_order_seq_cst);
#endif
std::atomic_thread_fence(std::memory_order_seq_cst);

Comment on lines +38 to +39
BARRIER_FENCE();
uint64 tscStart = READ_TSC();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qurious-pixel __rdtsc is already implemented for arm. I'm not sure if MSVC uses the __aarch64__ preprocessor, but if not you'd have to add the _M_ARM64 there and wherever else it is used.

Comment on lines +94 to +100
if(MSVC)
message(STATUS "MSVC ARM64 detected: Forcing portable C implementation.")
target_sources(ih264d PRIVATE
"decoder/arm/ih264d_function_selector.c"
"decoder/arm/ih264d_function_selector_av8.c"
)
target_compile_definitions(ih264d PRIVATE PORTABLE_C ARCH_GENERIC DISABLE_NEON)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this? MSVC should be able to support NEON.

Comment thread src/Common/Intrinsics.h
Comment on lines +9 to +33
#if defined(_M_X64) || defined(__x86_64__)
#if defined(_MSC_VER)
#include <immintrin.h>
#pragma intrinsic(__rdtsc)
#define BARRIER_FENCE() _mm_mfence()
#define READ_TSC() __rdtsc()
#else
#include <x86intrin.h>
#define BARRIER_FENCE() __builtin_ia32_mfence()
#define READ_TSC() __rdtsc()
#endif
#elif defined(_M_ARM64) || defined(__aarch64__)
#if defined(_MSC_VER)
#include <intrin.h>
#define BARRIER_FENCE() __dmb(_ARM64_BARRIER_SY)
#define READ_TSC() _ReadStatusReg(ARM64_CNTVCT_EL0)
#else
#define BARRIER_FENCE() __asm__ __volatile__("dmb sy" : : : "memory")
inline uint64 READ_TSC() {
uint64 virtual_timer;
__asm__ __volatile__("mrs %0, cntvct_el0" : "=r" (virtual_timer));
return virtual_timer;
}
#endif
#endif

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, BARRIER_FENCE is already implemented for arm64 in _mm_mfence (which you used in your implementation?) Same with rdtsc and __rdtsc()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants