WinArm64-MSVC#1910
Conversation
There was a problem hiding this comment.
NOTE: I see that this has now been marked as a draft PR. But since I started reviewing this before then. I hope my feedback will guide you through a compatible solution that can be merged, because I think as-is there's some concerns.
Thanks for making this PR. It would definitely be nice to support windows ARM support in this project, with the assumption that the code is properly written and doesn't increase complexity for other architectures. So the fixes in this PR would ideally be as small as possible, and extend our existing ARM code so that its arm support extends to Windows. I do think this PR is kind of chaotic. From what I can see, a big issue seems to be the assembly, at least with the dependencies. Its preferable if we used the upstream version.
I did see a number of tells that a decent amount of AI usage was involved with these changes. Although its fine to use AI while writing code, we have some hesitations about PRs like this that change existing code in multiple places, potentially affecting other platforms as well. Reviewing code takes a good chunk of time away from working on our own projects within Cemu, and with the increase of AI PRs, we generally will be quite conservative with what to merge.
So, ideally in the next PR, please declare whether you've used AI, and if yes, know that we expect you to understand the changes it has made and whether those changes match the quality of what you or any other maintainer would've written manually. This includes checking the coding guidelines, bundling up unrelated changes and making a precise, targeted PR.
| { | ||
| this->type = PPCREC_IML_TYPE_FPR_COMPARE; | ||
| this->operation = -999; | ||
| this->operation = 255; |
There was a problem hiding this comment.
At first, I thought you made this change so that the value fits some uint8 type. But looking further, this type is larger then uint8... so why make this change? Is this to prevent a compiler warning? These random changes require a lot of decipher and double-checking, so I would just avoid them in the first place. Splitting up your PR into multiple changes would've also allowed to document these things.
There was a problem hiding this comment.
operation is actually a uint8, so assigning -999 causes a warning. It's just something we didn't bother to clean up yet. I will say that personally I prefer feature PRs to be focused on the feature and not randomly clean up or refactor other things unless it relates to the feature in some way. Probably the LLM saw the warning and decided to take care of it?
There was a problem hiding this comment.
I have seen this error for a long time, and was trying to clean up some of the noise when building. can revert as before. just noise.
| BARRIER_FENCE(); | ||
| uint64 tscStart = READ_TSC(); |
There was a problem hiding this comment.
__rdtsc was already being used our arm code. Why not reuse that instead of redefining this. Same with Multiply64to128 etc.
There was a problem hiding this comment.
the MSVC ARM code does not have the headers as-is with the Windows ARM runner. I can "install" them in to the image and use as is, but it will fail to link using the reusable code.
There was a problem hiding this comment.
@qurious-pixel __rdtsc is already implemented for arm. I'm not sure if MSVC uses the __aarch64__ preprocessor, but if not you'd have to add the _M_ARM64 there and wherever else it is used.
| uint64 tscEnd = __rdtsc(); | ||
| // derive frequency approximation from measured time difference | ||
| uint64 tscEnd = READ_TSC(); | ||
|
|
||
| uint64 tsc_diff = tscEnd - tscStart; | ||
| uint64 hrtFreq = 0; | ||
| uint64 hrtDiff = HighResolutionTimer::getTimeDiffEx(startTick, stopTick, hrtFreq); | ||
| uint64 tsc_freq = muldiv64(tsc_diff, hrtFreq, hrtDiff); | ||
|
|
||
| // uint64 freqMultiplier = tsc_freq / hrtFreq; | ||
| //cemuLog_log(LogType::Force, "RDTSC measurement test:"); | ||
| //cemuLog_log(LogType::Force, "TSC-diff: 0x{:016x}", tsc_diff); | ||
| //cemuLog_log(LogType::Force, "TSC-freq: 0x{:016x}", tsc_freq); | ||
| //cemuLog_log(LogType::Force, "HPC-diff: 0x{:016x}", qpc_diff); | ||
| //cemuLog_log(LogType::Force, "HPC-freq: 0x{:016x}", (uint64)qpc_freq.QuadPart); | ||
| //cemuLog_log(LogType::Force, "Multiplier: 0x{:016x}", freqMultiplier); | ||
|
|
There was a problem hiding this comment.
Ditto, comments removed.
EDIT: On hindsight. This is an issue everywhere. But I've already made a few comments on it so I'll just leave it.
There was a problem hiding this comment.
I think this whole file was rewritten by AI? A lot of changes, removal of comments and making things shorter? I'm not sure what you changed. It seems like a lot of files are now no longer compiled? I would prefer to keep the source code of dependencies the same, if that's possible.
There was a problem hiding this comment.
I wonder if there's some upstream changes that are merged since we forked it. Would be more ideal to not diverge from that source code.
There was a problem hiding this comment.
Did you check if ih264d had any upstream changes to make it compile-able on MSVC ARM?
| SetThreadContext(hThread, &ctx); | ||
| } | ||
| ResumeThread(hThread); | ||
| } |
There was a problem hiding this comment.
You added support for setting the breakpoint registers for the debugger. But when I check the ExceptionHandler_win32.cpp code, I'm quite sure that this is not actually handled? I feel like this should've come up if you had tested it, which would've been nice. You can use the PPC debugger to make a write/read breakpoint in the PPC debugger at some memory address.
Thanks for the feedback. Set this as draft since the Windows on Arm code for MSVC is giving some issues i did not experience with Clang Arm for Windows. Specifically a thread lock when closing Cemu. And interaction between WxWidgets windows and the backend. |
276c226 to
c52b5dd
Compare
c52b5dd to
6a5219f
Compare
| @@ -0,0 +1,101 @@ | |||
| #pragma once | |||
There was a problem hiding this comment.
Intrinsics are already defined in precompiled.h, any new ones should probably stay there.
| inline uint64 portable_udiv128(uint64 high, uint64 low, uint64 denominator, uint64* remainder) { | ||
| #if defined(_MSC_VER) | ||
| #if defined(_M_X64) | ||
| return _udiv128(high, low, denominator, remainder); | ||
| #else | ||
| if (high == 0) { | ||
| if (remainder) *remainder = low % denominator; | ||
| return low / denominator; | ||
| } | ||
|
|
||
| uint64 rem = 0; | ||
| uint64 quot = 0; | ||
| for (int i = 63; i >= 0; i--) { | ||
| rem = (rem << 1) | (high >> 63); | ||
| high <<= 1; | ||
| if (rem >= denominator) { | ||
| rem -= denominator; | ||
| quot |= (1ULL << i); | ||
| } | ||
| } | ||
| for (int i = 63; i >= 0; i--) { | ||
| rem = (rem << 1) | (low >> 63); | ||
| low <<= 1; | ||
| if (rem >= denominator) { | ||
| rem -= denominator; | ||
| quot |= (1ULL << i); | ||
| } | ||
| } | ||
| // Secure native software fallback block | ||
| uint64_t q = 0; | ||
| uint64_t r = 0; | ||
| for (int i = 127; i >= 0; i--) { | ||
| r = (r << 1) | ((i >= 64 ? high >> (i - 64) : low >> i) & 1); | ||
| if (r >= denominator) { | ||
| r -= denominator; | ||
| q |= (1ULL << (i % 64)); | ||
| } | ||
| } | ||
| if (remainder) *remainder = r; | ||
| return q; | ||
| #endif | ||
| #else | ||
| unsigned __int128 dividend = ((unsigned __int128)high << 64) | low; | ||
| if (remainder) *remainder = (uint64)(dividend % denominator); | ||
| return (uint64)(dividend / denominator); | ||
| #endif | ||
| } |
There was a problem hiding this comment.
_udiv128 and _umul128 is an MSVC intrinsic which should be avaliable for aarch64, and is already implemented for other compilers in precompiled.hpp otherwise
| { | ||
| "name": "ARM64-Release", | ||
| "configurationType": "Release", | ||
| "generator": "Ninja", | ||
| "inheritEnvironments": [ "msvc_arm64_x64" ], | ||
| "buildRoot": "${projectDir}\\out\\build\\${name}", | ||
| "installRoot": "${projectDir}\\out\\install\\${name}" | ||
| }, |
There was a problem hiding this comment.
What's different in this vs. Release? CMake configs should be compatible cross platforms
| message(STATUS "CPU Arch is: $ENV{PROCESSOR_ARCHITECTURE}") | ||
| if($ENV{PROCESSOR_ARCHITECTURE} MATCHES "x86_64|amd64|AMD64") |
There was a problem hiding this comment.
Should probably use CEMU_ARCHITECTURE for consistency
| #if defined(_MSC_VER) | ||
| __dmb(_ARM64_BARRIER_ISH); // Inner Shareable Data Memory Barrier | ||
| #else | ||
| asm volatile("" ::: "memory"); | ||
| std::atomic_thread_fence(std::memory_order_seq_cst); | ||
| #endif |
There was a problem hiding this comment.
memory_order_seq_cst should compile to ISH anyways so this is redundant here, just use C++'s atomic fence
| #if defined(_MSC_VER) | |
| __dmb(_ARM64_BARRIER_ISH); // Inner Shareable Data Memory Barrier | |
| #else | |
| asm volatile("" ::: "memory"); | |
| std::atomic_thread_fence(std::memory_order_seq_cst); | |
| #endif | |
| std::atomic_thread_fence(std::memory_order_seq_cst); |
| BARRIER_FENCE(); | ||
| uint64 tscStart = READ_TSC(); |
There was a problem hiding this comment.
@qurious-pixel __rdtsc is already implemented for arm. I'm not sure if MSVC uses the __aarch64__ preprocessor, but if not you'd have to add the _M_ARM64 there and wherever else it is used.
| if(MSVC) | ||
| message(STATUS "MSVC ARM64 detected: Forcing portable C implementation.") | ||
| target_sources(ih264d PRIVATE | ||
| "decoder/arm/ih264d_function_selector.c" | ||
| "decoder/arm/ih264d_function_selector_av8.c" | ||
| ) | ||
| target_compile_definitions(ih264d PRIVATE PORTABLE_C ARCH_GENERIC DISABLE_NEON) |
There was a problem hiding this comment.
What's the purpose of this? MSVC should be able to support NEON.
| #if defined(_M_X64) || defined(__x86_64__) | ||
| #if defined(_MSC_VER) | ||
| #include <immintrin.h> | ||
| #pragma intrinsic(__rdtsc) | ||
| #define BARRIER_FENCE() _mm_mfence() | ||
| #define READ_TSC() __rdtsc() | ||
| #else | ||
| #include <x86intrin.h> | ||
| #define BARRIER_FENCE() __builtin_ia32_mfence() | ||
| #define READ_TSC() __rdtsc() | ||
| #endif | ||
| #elif defined(_M_ARM64) || defined(__aarch64__) | ||
| #if defined(_MSC_VER) | ||
| #include <intrin.h> | ||
| #define BARRIER_FENCE() __dmb(_ARM64_BARRIER_SY) | ||
| #define READ_TSC() _ReadStatusReg(ARM64_CNTVCT_EL0) | ||
| #else | ||
| #define BARRIER_FENCE() __asm__ __volatile__("dmb sy" : : : "memory") | ||
| inline uint64 READ_TSC() { | ||
| uint64 virtual_timer; | ||
| __asm__ __volatile__("mrs %0, cntvct_el0" : "=r" (virtual_timer)); | ||
| return virtual_timer; | ||
| } | ||
| #endif | ||
| #endif |
There was a problem hiding this comment.
Also, BARRIER_FENCE is already implemented for arm64 in _mm_mfence (which you used in your implementation?) Same with rdtsc and __rdtsc()
Build Windows on Arm using MSVC