citra/src/common
Wunk e13735b624
video_core: Implement an arm64 shader-jit backend (#7002)
* externals: Add oaksim submodule

Used for emitting ARM64 assembly

* common: Implement aarch64 ABI

Utilize oaknut to implement a stack frame.

* tests: Allow shader-jit tests for x64 and a64

Run the shader-jit tests for both x86_64 and arm64 targets

* video_core: Initialize arm64 shader-jit backend

Passes all current unit tests!

* shader_jit_a64: protect/unprotect memory when jit-ing

Required on MacOS. Memory needs to be fully unprotected and then
re-protected when writing or there will be memory access errors on
MacOS.

* shader_jit_a64: Fix ARM64-Imm overflow

These conditionals were throwing exceptions since the immediate values
were overflowing the available space in the `EOR` instructions. Instead
they are generated from `MOV` and then `EOR`-ed after.

* shader_jit_a64: Fix Geometry shader conditional

* shader_jit_a64: Replace `ADRL` with `MOVP2R`

Fixes some immediate-generation exceptions.

* common/aarch64: Fix CallFarFunction

* shader_jit_a64: Optimize `SantitizedMul`

Co-authored-by: merryhime <merryhime@users.noreply.github.com>

* shader_jit_a64: Fix address register offset behavior

Based on https://github.com/citra-emu/citra/pull/6942
Passes unit tests.

* shader_jit_a64: Fix `RET` address offset

A64 stack is 16-byte aligned rather than 8. So a direct port of the x64
code won't work. Fixes weird branches into invalid memory for any
shaders with subroutines.

* shader_jit_a64: Increase max program size

Tuned for A64 program size.

* shader_jit_a64: Use `UBFX` for extracting loop-state

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Optimize `SUB+CMP` to `SUBS`

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Optimize `CMP+B` to `CBNZ`

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Use `FMOV` for `ONE` vector

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Remove x86-specific documentation

* shader_jit_a64: Use `UBFX` to extract exponent

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Remove redundant MIN/MAX `SRC2`-NaN check

Special handling only needs to check SRC1 for NaN, not SRC2.
It would work as follows in the four possible cases:

No NaN: No special handling needed.
Only SRC1 is NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.
Only SRC2 is NaN: FMAX automatically picks SRC2 because it always picks the NaN if there is one.
Both SRC1 and SRC2 are NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit/tests:: Add catch-stringifier for vec2f/vec3f

* shader_jit/tests: Add Dest Mask unit test

* shader_jit_a64: Fix Dest-Mask `BSL` operand order

Passes the dest-mask unit tests now.

* shader_jit_a64: Use `MOVI` for DestEnable mask

Accelerate certain cases of masking with MOVI as well

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit/tests: Add source-swizzle unit test

This is not expansive. Generating all `4^4` cases seems to make Catch2
crash. So I've added some component-masking(non-reordering) tests based
on the Dest-Mask unit-test and some additional ones to test
broadcasts/splats and component re-ordering.

* shader_jit_a64: Fix swizzle index generation

This was still generating `SHUFPS` indices and not the ones that we wanted for the `TBL` instruction. Passes all unit tests now.

* shader_jit/tests: Add `ShaderSetup` constructor to `ShaderTest`

Rather than using the direct output of `CompileShaderSetup` allow a
`ShaderSetup` object to be passed in directly.  This enabled the ability
emit assembly that is not directly supported by nihstro.

* shader_jit/tests: Add `CALL` unit-test

Tests nested `CALL` instructions to eventually reach an `EX2`
instruction.

EX2 is picked in particular since it is implemented as an even deeper
dispatch and ensures subroutines are properly implemented between `CALL`
instructions and implementation-calls.

* shader_jit_a64: Fix nested `BL` subroutines

`lr` was getting writen over by nested calls to `BL`, causing undefined
behavior with mixtures of `CALL`, `EX2`, and `LG2` instructions.

Each usage of `BL` is now protected with a stach push/pop to preserve
and restore teh `lr` register to allow nested subroutines to work
properly.

* shader_jit/tests: Allocate generated tests on heap

Each of these generated shader-test objects were causing the stack to
overflow.  Allocate each of the generated tests on the heap and use
unique_ptr so they only exist within the life-time of the `REQUIRE`
statement.

* shader_jit_a64: Preserve `lr` register from external function calls

`EMIT` makes an external function call, and should be preserving `lr`

* shader_jit/tests: Add `MAD` unit-test

The Inline Asm version requires an upstream fix:
https://github.com/neobrain/nihstro/issues/68

Instead, the program code is manually configured and added.

* shader_jit/tests: Fix uninitialized instructions

These `union`-type instruction-types were uninitialized, causing tests
to indeterminantly fail at times.

* shader_jit_a64: Remove unneeded `MOV`

Residue from the direct-port of x64 code.

* shader_jit_a64: Use `std::array` for `instr_table`

Add some type-safety and const-correctness around this type as well.

* shader_jit_a64: Avoid c-style offset casting

Add some more const-correctness to this function as well.

* video_core: Add arch preprocessor comments

* common/aarch64: Use X16 as the veneer register

https://developer.arm.com/documentation/102374/0101/Procedure-Call-Standard

* shader_jit/tests: Add uniform reading unit-test

Particularly to ensure that addresses are being properly truncated

* common/aarch64: Use `X0` as `ABI_RETURN`

`X8` is used as the indirect return result value in the case that the
result is bigger than 128-bits. Principally `X0` is the general-case
return register though.

* common/aarch64: Add veneer register note

`LR` is generally overwritten by `BLR` anyways, and would also be a safe
veneer to utilize for far-calls.

* shader_jit_a64: Remove unneeded scratch register from `SanitizedMul`

* shader_jit_a64: Fix CALLU condition

Should be `EQ` not `NE`. Fixes the regression on Kid Icarus.
No known regressions anymore!

---------

Co-authored-by: merryhime <merryhime@users.noreply.github.com>
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
2023-11-05 21:40:31 +01:00
..
aarch64 video_core: Implement an arm64 shader-jit backend (#7002) 2023-11-05 21:40:31 +01:00
dynamic_library audio_core: Replace AAC decoders with single FAAD2-based decoder. (#7098) 2023-11-04 14:56:13 -07:00
logging common: Only use libbacktrace if present. (#6827) 2023-07-31 14:24:27 -07:00
serialization Fix memory region serialization (OSK crash) 2020-04-10 16:51:01 +01:00
x64 code: Use std::span where appropriate (#6658) 2023-07-07 01:52:40 +03:00
alignment.h common: Resolve C4267 warning on MSVC 2022-05-18 00:05:40 -04:00
android_storage.cpp Various miscelaneous changes (#6496) 2023-05-03 17:24:10 +02:00
android_storage.h citra_android: Storage Access Framework implementation (#6313) 2023-03-23 14:30:52 +01:00
announce_multiplayer_room.h core, web_service: Check for error when registering rooms 2019-04-20 12:50:14 +08:00
apple_authorization.cpp common: Add C++ version of Apple authorization logic. (#6616) 2023-06-19 15:50:26 -07:00
apple_authorization.h common: Add C++ version of Apple authorization logic. (#6616) 2023-06-19 15:50:26 -07:00
arch.h build: Update to support multi-arch builds. 2023-01-07 01:09:32 -08:00
archives.h Code review actions (plus hopefully fix the linux CI) 2020-03-31 17:54:28 +01:00
assert.h Chore: Enable warnings as errors on MSVC (#6456) 2023-05-01 22:38:58 +03:00
atomic_ops.h Core: Port Exclusive memory impl from yuzu 2022-10-23 13:19:33 +05:30
bit_field.h Various miscelaneous changes (#6496) 2023-05-03 17:24:10 +02:00
bit_set.h Prefix all size_t with std:: 2018-09-06 16:03:28 -04:00
bounded_threadsafe_queue.h logging: Address some issues 2023-07-03 02:18:35 +03:00
cityhash.cpp Prefix all size_t with std:: 2018-09-06 16:03:28 -04:00
cityhash.h Port yuzu-emu/yuzu#4528: "common: Make use of [[nodiscard]] where applicable" (#5535) 2020-08-31 21:06:16 +02:00
CMakeLists.txt video_core: Implement an arm64 shader-jit backend (#7002) 2023-11-05 21:40:31 +01:00
color.h Rasterizer cache refactor (#6375) 2023-04-21 10:14:55 +03:00
common_funcs.h Chore: Enable warnings as errors on MSVC (#6456) 2023-05-01 22:38:58 +03:00
common_paths.h qt: Add support for building for iOS. (#6594) 2023-06-07 20:40:53 -07:00
common_precompiled_headers.h Port yuzu-emu/yuzu#9300: "CMake: Use precompiled headers to improve compile times" (#6213) 2022-12-17 16:06:38 +01:00
common_types.h Core: Port Exclusive memory impl from yuzu 2022-10-23 13:19:33 +05:30
construct.h Code review - general gardening 2020-03-29 16:14:36 +01:00
detached_tasks.cpp Port yuzu-emu/yuzu#4528: "common: Make use of [[nodiscard]] where applicable" (#5535) 2020-08-31 21:06:16 +02:00
detached_tasks.h Review comments - part 5 2018-10-20 10:35:55 -04:00
error.cpp Custom textures rewrite (#6452) 2023-04-27 07:38:28 +03:00
error.h Custom textures rewrite (#6452) 2023-04-27 07:38:28 +03:00
expected.h clang format (#7017) 2023-09-27 13:42:19 -03:00
file_util.cpp Implement RomFS cache and async reads. (#7089) 2023-11-02 17:19:00 -07:00
file_util.h Implement RomFS cache and async reads. (#7089) 2023-11-02 17:19:00 -07:00
hash.h Add vulkan backend (#6512) 2023-09-13 01:28:50 +03:00
linear_disk_cache.h code: Use std::span where appropriate (#6658) 2023-07-07 01:52:40 +03:00
literals.h Address review comments 2022-11-15 11:20:35 +01:00
math_util.h Rasterizer cache refactor (#6375) 2023-04-21 10:14:55 +03:00
memory_detect.cpp Address review comments 2022-11-15 11:20:35 +01:00
memory_detect.h Address review comments 2022-11-15 11:20:35 +01:00
memory_ref.cpp Added copyright notices on new files 2020-03-28 15:21:10 +00:00
memory_ref.h Chore: Enable warnings as errors on MSVC (#6456) 2023-05-01 22:38:58 +03:00
microprofile.cpp Integrate the MicroProfile profiling library 2015-08-24 22:16:28 -03:00
microprofile.h code: Cleanup and warning fixes from the Vulkan PR (#6163) 2022-11-04 23:32:57 +01:00
microprofileui.h Common: Remove section measurement from profiler (#1731) 2016-04-29 00:07:10 -07:00
misc.cpp android + common: fix warnings 2023-06-17 21:24:20 +05:30
param_package.cpp common/logging: Reduce scope of fmt include 2023-06-30 12:15:52 +03:00
param_package.h Port yuzu-emu/yuzu#4528: "common: Make use of [[nodiscard]] where applicable" (#5535) 2020-08-31 21:06:16 +02:00
polyfill_thread.h clang format (#7017) 2023-09-27 13:42:19 -03:00
precompiled_headers.h Port yuzu-emu/yuzu#9300: "CMake: Use precompiled headers to improve compile times" (#6213) 2022-12-17 16:06:38 +01:00
quaternion.h Port yuzu-emu/yuzu#4528: "common: Make use of [[nodiscard]] where applicable" (#5535) 2020-08-31 21:06:16 +02:00
ring_buffer.h code: Use std::span where appropriate (#6658) 2023-07-07 01:52:40 +03:00
scm_rev.cpp.in Add shader cache version generation 2020-01-15 19:58:33 -07:00
scm_rev.h Add shader cache version generation 2020-01-15 19:58:33 -07:00
scope_exit.h common/scope_exit: Replace std::move with std::forward in ScopeExit() 2019-04-15 17:55:44 +02:00
settings.cpp Add vulkan backend (#6512) 2023-09-13 01:28:50 +03:00
settings.h clang format (#7017) 2023-09-27 13:42:19 -03:00
slot_vector.h rasterizer_cache: Remove runtime allocation caching (#6705) 2023-08-01 03:35:41 +03:00
static_lru_cache.h Implement RomFS cache and async reads. (#7089) 2023-11-02 17:19:00 -07:00
string_literal.h common: Add StringLiteral 2022-11-22 22:52:37 +00:00
string_util.cpp Merge pull request #6602 from SachinVin/wall 2023-06-18 22:37:39 +05:30
string_util.h Implement more HTTP:C functionality (#7035) 2023-10-11 10:09:16 -07:00
swap.h general: Fix various spelling errors 2021-01-03 02:39:41 +01:00
telemetry.cpp build: Update to support multi-arch builds. 2023-01-07 01:09:32 -08:00
telemetry.h common/telemetry: Migrate namespace into the Common namespace 2021-01-04 05:17:13 +01:00
texture.cpp Custom textures rewrite (#6452) 2023-04-27 07:38:28 +03:00
texture.h Custom textures rewrite (#6452) 2023-04-27 07:38:28 +03:00
thread.cpp Custom textures rewrite (#6452) 2023-04-27 07:38:28 +03:00
thread.h Custom textures rewrite (#6452) 2023-04-27 07:38:28 +03:00
thread_queue_list.h Port yuzu-emu/yuzu#4528: "common: Make use of [[nodiscard]] where applicable" (#5535) 2020-08-31 21:06:16 +02:00
thread_worker.h Custom textures rewrite (#6452) 2023-04-27 07:38:28 +03:00
threadsafe_queue.h common: Replace lock_guard with scoped_lock 2023-06-30 12:15:52 +03:00
timer.cpp common: Resolve C4267 warning on MSVC 2022-05-18 00:05:40 -04:00
timer.h common: Resolve C4267 warning on MSVC 2022-05-18 00:05:40 -04:00
unique_function.h Custom textures rewrite (#6452) 2023-04-27 07:38:28 +03:00
vector_math.h clang format (#7017) 2023-09-27 13:42:19 -03:00
web_result.h Put WebResult into a seperate file 2018-10-27 00:39:02 +02:00
zstd_compression.cpp build: fix build failure when not using precompiled headers (#7087) 2023-10-23 17:21:35 -03:00
zstd_compression.h code: Use std::span where appropriate (#6658) 2023-07-07 01:52:40 +03:00