Ryujinx/ARMeilleure/Instructions/InstEmitSimdCvt32.cs

801 lines
31 KiB
C#
Raw Normal View History

Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
using ARMeilleure.Decoders;
using ARMeilleure.IntermediateRepresentation;
using ARMeilleure.State;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
using ARMeilleure.Translation;
using System;
using System.Diagnostics;
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
using System.Reflection;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
using static ARMeilleure.Instructions.InstEmitHelper;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
using static ARMeilleure.Instructions.InstEmitSimdHelper;
using static ARMeilleure.Instructions.InstEmitSimdHelper32;
Reduce JIT GC allocations (#2515) * Turn `MemoryOperand` into a struct * Remove `IntrinsicOperation` * Remove `PhiNode` * Remove `Node` * Turn `Operand` into a struct * Turn `Operation` into a struct * Clean up pool management methods * Add `Arena` allocator * Move `OperationHelper` to `Operation.Factory` * Move `OperandHelper` to `Operand.Factory` * Optimize `Operation` a bit * Fix `Arena` initialization * Rename `NativeList<T>` to `ArenaList<T>` * Reduce `Operand` size from 88 to 56 bytes * Reduce `Operation` size from 56 to 40 bytes * Add optimistic interning of Register & Constant operands * Optimize `RegisterUsage` pass a bit * Optimize `RemoveUnusedNodes` pass a bit Iterating in reverse-order allows killing dependency chains in a single pass. * Fix PPTC symbols * Optimize `BasicBlock` a bit Reduce allocations from `_successor` & `DominanceFrontiers` * Fix `Operation` resize * Make `Arena` expandable Change the arena allocator to be expandable by allocating in pages, with some of them being pooled. Currently 32 pages are pooled. An LRU removal mechanism should probably be added to it. Apparently MHR can allocate bitmaps large enough to exceed the 16MB limit for the type. * Move `Arena` & `ArenaList` to `Common` * Remove `ThreadStaticPool` & co * Add `PhiOperation` * Reduce `Operand` size from 56 from 48 bytes * Add linear-probing to `Operand` intern table * Optimize `HybridAllocator` a bit * Add `Allocators` class * Tune `ArenaAllocator` sizes * Add page removal mechanism to `ArenaAllocator` Remove pages which have not been used for more than 5s after each reset. I am on fence if this would be better using a Gen2 callback object like the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right now if a large translation happens, the pages will be freed only after a reset. This reset may not happen for a while because no new translation is hit, but the arena base sizes are rather small. * Fix `OOM` when allocating larger than page size in `ArenaAllocator` Tweak resizing mechanism for Operand.Uses and Assignemnts. * Optimize `Optimizer` a bit * Optimize `Operand.Add<T>/Remove<T>` a bit * Clean up `PreAllocator` * Fix phi insertion order Reduce codegen diffs. * Fix code alignment * Use new heuristics for degree of parallelism * Suppress warnings * Address gdkchan's feedback Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that `Operand.Value` should usually not be modified directly. * Add fast path to `ArenaAllocator` * Assembly for `ArenaAllocator.Allocate(ulong)`: .L0: mov rax, [rcx+0x18] lea r8, [rax+rdx] cmp r8, [rcx+0x10] ja short .L2 .L1: mov rdx, [rcx+8] add rax, [rdx+8] mov [rcx+0x18], r8 ret .L2: jmp ArenaAllocator.AllocateSlow(UInt64) A few variable/field had to be changed to ulong so that RyuJIT avoids emitting zero-extends. * Implement a new heuristic to free pooled pages. If an arena is used often, it is more likely that its pages will be needed, so the pages are kept for longer (e.g: during PPTC rebuild or burst sof compilations). If is not used often, then it is more likely that its pages will not be needed (e.g: after PPTC rebuild or bursts of compilations). * Address riperiperi's feedback * Use `EqualityComparer<T>` in `IntrusiveList<T>` Avoids a potential GC hole in `Equals(T, T)`.
2021-08-17 20:08:34 +02:00
using static ARMeilleure.IntermediateRepresentation.Operand.Factory;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
namespace ARMeilleure.Instructions
{
static partial class InstEmit32
{
private static int FlipVdBits(int vd, bool lowBit)
{
if (lowBit)
{
// Move the low bit to the top.
return ((vd & 0x1) << 4) | (vd >> 1);
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
}
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
else
{
// Move the high bit to the bottom.
return ((vd & 0xf) << 1) | (vd >> 4);
}
}
private static Operand EmitSaturateFloatToInt(ArmEmitterContext context, Operand op1, bool unsigned)
{
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
MethodInfo info;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
if (op1.Type == OperandType.FP64)
{
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
info = unsigned
? typeof(SoftFallback).GetMethod(nameof(SoftFallback.SatF64ToU32))
: typeof(SoftFallback).GetMethod(nameof(SoftFallback.SatF64ToS32));
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
else
{
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
info = unsigned
? typeof(SoftFallback).GetMethod(nameof(SoftFallback.SatF32ToU32))
: typeof(SoftFallback).GetMethod(nameof(SoftFallback.SatF32ToS32));
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
return context.Call(info, op1);
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
public static void Vcvt_V(ArmEmitterContext context)
{
OpCode32Simd op = (OpCode32Simd)context.CurrOp;
bool unsigned = (op.Opc & 1) != 0;
bool toInteger = (op.Opc & 2) != 0;
OperandType floatSize = (op.Size == 2) ? OperandType.FP32 : OperandType.FP64;
if (toInteger)
{
if (Optimizations.UseAdvSimd)
{
InstEmitSimdHelper32Arm64.EmitVectorUnaryOpF32(context, unsigned ? Intrinsic.Arm64FcvtzuV : Intrinsic.Arm64FcvtzsV);
}
else if (Optimizations.UseSse41)
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
{
EmitSse41ConvertVector32(context, FPRoundingMode.TowardsZero, !unsigned);
}
else
{
EmitVectorUnaryOpF32(context, (op1) =>
{
return EmitSaturateFloatToInt(context, op1, unsigned);
});
}
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
else
{
if (Optimizations.UseSse2)
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
{
EmitVectorUnaryOpSimd32(context, (n) =>
{
if (unsigned)
{
Operand mask = X86GetAllElements(context, 0x47800000);
Operand res = context.AddIntrinsic(Intrinsic.X86Psrld, n, Const(16));
res = context.AddIntrinsic(Intrinsic.X86Cvtdq2ps, res);
res = context.AddIntrinsic(Intrinsic.X86Mulps, res, mask);
Operand res2 = context.AddIntrinsic(Intrinsic.X86Pslld, n, Const(16));
res2 = context.AddIntrinsic(Intrinsic.X86Psrld, res2, Const(16));
res2 = context.AddIntrinsic(Intrinsic.X86Cvtdq2ps, res2);
return context.AddIntrinsic(Intrinsic.X86Addps, res, res2);
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
}
else
{
return context.AddIntrinsic(Intrinsic.X86Cvtdq2ps, n);
}
});
}
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
else
{
if (unsigned)
{
EmitVectorUnaryOpZx32(context, (op1) => EmitFPConvert(context, op1, floatSize, false));
}
else
{
EmitVectorUnaryOpSx32(context, (op1) => EmitFPConvert(context, op1, floatSize, true));
}
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
}
}
public static void Vcvt_FD(ArmEmitterContext context)
{
OpCode32SimdS op = (OpCode32SimdS)context.CurrOp;
int vm = op.Vm;
int vd;
if (op.Size == 3)
{
vd = FlipVdBits(op.Vd, false);
// Double to single.
Operand fp = ExtractScalar(context, OperandType.FP64, vm);
Operand res = context.ConvertToFP(OperandType.FP32, fp);
InsertScalar(context, vd, res);
}
else
{
vd = FlipVdBits(op.Vd, true);
// Single to double.
Operand fp = ExtractScalar(context, OperandType.FP32, vm);
Operand res = context.ConvertToFP(OperandType.FP64, fp);
InsertScalar(context, vd, res);
}
}
// VCVT (floating-point to integer, floating-point) | VCVT (integer to floating-point, floating-point).
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
public static void Vcvt_FI(ArmEmitterContext context)
{
OpCode32SimdCvtFI op = (OpCode32SimdCvtFI)context.CurrOp;
bool toInteger = (op.Opc2 & 0b100) != 0;
OperandType floatSize = op.RegisterSize == RegisterSize.Int64 ? OperandType.FP64 : OperandType.FP32;
if (toInteger)
{
bool unsigned = (op.Opc2 & 1) == 0;
bool roundWithFpscr = op.Opc != 1;
if (!roundWithFpscr && Optimizations.UseAdvSimd)
{
bool doubleSize = floatSize == OperandType.FP64;
if (doubleSize)
{
Operand m = GetVecA32(op.Vm >> 1);
Operand toConvert = InstEmitSimdHelper32Arm64.EmitExtractScalar(context, m, op.Vm, doubleSize);
Intrinsic inst = (unsigned ? Intrinsic.Arm64FcvtzuGp : Intrinsic.Arm64FcvtzsGp) | Intrinsic.Arm64VDouble;
Operand asInteger = context.AddIntrinsicInt(inst, toConvert);
InsertScalar(context, op.Vd, asInteger);
}
else
{
InstEmitSimdHelper32Arm64.EmitScalarUnaryOpF32(context, unsigned ? Intrinsic.Arm64FcvtzuS : Intrinsic.Arm64FcvtzsS);
}
}
else if (!roundWithFpscr && Optimizations.UseSse41)
{
EmitSse41ConvertInt32(context, FPRoundingMode.TowardsZero, !unsigned);
}
else
{
Operand toConvert = ExtractScalar(context, floatSize, op.Vm);
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
// TODO: Fast Path.
if (roundWithFpscr)
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
{
Fpsr and Fpcr freed. (#3701) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Fpsr and Fpcr freed. Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this. Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS. Depends on shlreg. * Update InstEmitSimdHelper.cs * De-magic Masks. Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions. * Addressed PR feedback.
2022-09-20 23:55:13 +02:00
toConvert = EmitRoundByRMode(context, toConvert);
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
Fpsr and Fpcr freed. (#3701) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Fpsr and Fpcr freed. Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this. Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS. Depends on shlreg. * Update InstEmitSimdHelper.cs * De-magic Masks. Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions. * Addressed PR feedback.
2022-09-20 23:55:13 +02:00
// Round towards zero.
Operand asInteger = EmitSaturateFloatToInt(context, toConvert, unsigned);
InsertScalar(context, op.Vd, asInteger);
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
}
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
else
{
bool unsigned = op.Opc == 0;
Operand toConvert = ExtractScalar(context, OperandType.I32, op.Vm);
Operand asFloat = EmitFPConvert(context, toConvert, floatSize, !unsigned);
InsertScalar(context, op.Vd, asFloat);
}
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
private static Operand EmitRoundMathCall(ArmEmitterContext context, MidpointRounding roundMode, Operand n)
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
{
IOpCode32Simd op = (IOpCode32Simd)context.CurrOp;
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
string name = nameof(Math.Round);
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
MethodInfo info = (op.Size & 1) == 0
? typeof(MathF).GetMethod(name, new Type[] { typeof(float), typeof(MidpointRounding) })
: typeof(Math). GetMethod(name, new Type[] { typeof(double), typeof(MidpointRounding) });
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
return context.Call(info, n, Const((int)roundMode));
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
private static FPRoundingMode RMToRoundMode(int rm)
{
FPRoundingMode roundMode;
switch (rm)
{
case 0b00:
roundMode = FPRoundingMode.ToNearestAway;
break;
case 0b01:
roundMode = FPRoundingMode.ToNearest;
break;
case 0b10:
roundMode = FPRoundingMode.TowardsPlusInfinity;
break;
case 0b11:
roundMode = FPRoundingMode.TowardsMinusInfinity;
break;
default:
throw new ArgumentOutOfRangeException(nameof(rm));
}
return roundMode;
}
// VCVTA/M/N/P (floating-point).
public static void Vcvt_RM(ArmEmitterContext context)
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
{
OpCode32SimdCvtFI op = (OpCode32SimdCvtFI)context.CurrOp; // toInteger == true (opCode<18> == 1 => Opc2<2> == 1).
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
OperandType floatSize = op.RegisterSize == RegisterSize.Int64 ? OperandType.FP64 : OperandType.FP32;
bool unsigned = op.Opc == 0;
int rm = op.Opc2 & 3;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
Intrinsic inst;
if (Optimizations.UseAdvSimd)
{
if (unsigned)
{
inst = rm switch {
0b00 => Intrinsic.Arm64FcvtauS,
0b01 => Intrinsic.Arm64FcvtnuS,
0b10 => Intrinsic.Arm64FcvtpuS,
0b11 => Intrinsic.Arm64FcvtmuS,
_ => throw new ArgumentOutOfRangeException(nameof(rm))
};
}
else
{
inst = rm switch {
0b00 => Intrinsic.Arm64FcvtasS,
0b01 => Intrinsic.Arm64FcvtnsS,
0b10 => Intrinsic.Arm64FcvtpsS,
0b11 => Intrinsic.Arm64FcvtmsS,
_ => throw new ArgumentOutOfRangeException(nameof(rm))
};
}
InstEmitSimdHelper32Arm64.EmitScalarUnaryOpF32(context, inst);
}
else if (Optimizations.UseSse41)
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
{
EmitSse41ConvertInt32(context, RMToRoundMode(rm), !unsigned);
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
else
{
Operand toConvert = ExtractScalar(context, floatSize, op.Vm);
switch (rm)
{
case 0b00: // Away
toConvert = EmitRoundMathCall(context, MidpointRounding.AwayFromZero, toConvert);
break;
case 0b01: // Nearest
toConvert = EmitRoundMathCall(context, MidpointRounding.ToEven, toConvert);
break;
case 0b10: // Towards positive infinity
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
toConvert = EmitUnaryMathCall(context, nameof(Math.Ceiling), toConvert);
break;
case 0b11: // Towards negative infinity
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
toConvert = EmitUnaryMathCall(context, nameof(Math.Floor), toConvert);
break;
}
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
Fpsr and Fpcr freed. (#3701) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Fpsr and Fpcr freed. Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this. Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS. Depends on shlreg. * Update InstEmitSimdHelper.cs * De-magic Masks. Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions. * Addressed PR feedback.
2022-09-20 23:55:13 +02:00
Operand asInteger = EmitSaturateFloatToInt(context, toConvert, unsigned);
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
InsertScalar(context, op.Vd, asInteger);
}
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
public static void Vcvt_TB(ArmEmitterContext context)
{
OpCode32SimdCvtTB op = (OpCode32SimdCvtTB)context.CurrOp;
if (Optimizations.UseF16c)
{
Debug.Assert(!Optimizations.ForceLegacySse);
if (op.Op)
{
Operand res = ExtractScalar(context, op.Size == 1 ? OperandType.FP64 : OperandType.FP32, op.Vm);
if (op.Size == 1)
{
res = context.AddIntrinsic(Intrinsic.X86Cvtsd2ss, context.VectorZero(), res);
}
res = context.AddIntrinsic(Intrinsic.X86Vcvtps2ph, res, Const(X86GetRoundControl(FPRoundingMode.ToNearest)));
res = context.VectorExtract16(res, 0);
InsertScalar16(context, op.Vd, op.T, res);
}
else
{
Operand res = context.VectorCreateScalar(ExtractScalar16(context, op.Vm, op.T));
res = context.AddIntrinsic(Intrinsic.X86Vcvtph2ps, res);
if (op.Size == 1)
{
res = context.AddIntrinsic(Intrinsic.X86Cvtss2sd, context.VectorZero(), res);
}
res = context.VectorExtract(op.Size == 1 ? OperandType.I64 : OperandType.I32, res, 0);
InsertScalar(context, op.Vd, res);
}
}
else
{
if (op.Op)
{
// Convert to half.
Operand src = ExtractScalar(context, op.Size == 1 ? OperandType.FP64 : OperandType.FP32, op.Vm);
MethodInfo method = op.Size == 1
? typeof(SoftFloat64_16).GetMethod(nameof(SoftFloat64_16.FPConvert))
: typeof(SoftFloat32_16).GetMethod(nameof(SoftFloat32_16.FPConvert));
ARMeilleure: Respect FZ/RM flags for all floating point operations (#4618) * ARMeilleure: Respect Fz flag for all floating point operations. This is a change in strategy for emulating the Fz FPCR flag. Before, it was set before instructions that "needed it" and reset after. However, this missed a few hot instructions like the multiplication instruction, and the entirety of A32. The new strategy is to set the Fz flag only in the following circumstances: - Set to match FPCR before translated functions/loop are executed. - Reset when calling SoftFloat methods, set when returning. - Reset when exiting execution. This allows us to remove the code around the existing Fz aware instructions, and get the accuracy benefits on all floating point instructions executed while in translated code. Single step executions now need to be called with a context wrapper - right now it just contains the Fz flag initialization, and won't actually do anything on ARM. This fixes a bug in Breath of the Wild where some physics interactions could randomly crash the game due to subnormal values not flushing to zero. This is draft right now because I need to answer the questions: - Does dotnet avoid changing the value of Mxcsr? - Is it a good idea to assume that? Or should the flag set/restore be done on every managed method call, not just softfloat? - If we assume that, do we want a unit test to verify the behaviour? I recommend testing a bunch of games, especially games affected when this was originally added, such as #1611. * Remove unused method * Use FMA for Fmadd, Fmsub, Fnmadd, Fnmsub, Fmla, Fmls ...when available. Similar implementation to A32 * Use FMA for Frecps, Frsqrts * Don't set DAZ. * Add round mode to ARM FP mode * Fix mistakes * Add test for FP state when calling managed methods * Add explanatory comment to test. * Cleanup * Add A64 FPCR flags * Vrintx_S A32 fast path on A64 backend * Address feedback 1, re-enable DAZ * Fix FMA instructions By Elem * Address feedback
2023-04-10 12:22:58 +02:00
context.ExitArmFpMode();
context.StoreToContext();
Operand res = context.Call(method, src);
context.LoadFromContext();
ARMeilleure: Respect FZ/RM flags for all floating point operations (#4618) * ARMeilleure: Respect Fz flag for all floating point operations. This is a change in strategy for emulating the Fz FPCR flag. Before, it was set before instructions that "needed it" and reset after. However, this missed a few hot instructions like the multiplication instruction, and the entirety of A32. The new strategy is to set the Fz flag only in the following circumstances: - Set to match FPCR before translated functions/loop are executed. - Reset when calling SoftFloat methods, set when returning. - Reset when exiting execution. This allows us to remove the code around the existing Fz aware instructions, and get the accuracy benefits on all floating point instructions executed while in translated code. Single step executions now need to be called with a context wrapper - right now it just contains the Fz flag initialization, and won't actually do anything on ARM. This fixes a bug in Breath of the Wild where some physics interactions could randomly crash the game due to subnormal values not flushing to zero. This is draft right now because I need to answer the questions: - Does dotnet avoid changing the value of Mxcsr? - Is it a good idea to assume that? Or should the flag set/restore be done on every managed method call, not just softfloat? - If we assume that, do we want a unit test to verify the behaviour? I recommend testing a bunch of games, especially games affected when this was originally added, such as #1611. * Remove unused method * Use FMA for Fmadd, Fmsub, Fnmadd, Fnmsub, Fmla, Fmls ...when available. Similar implementation to A32 * Use FMA for Frecps, Frsqrts * Don't set DAZ. * Add round mode to ARM FP mode * Fix mistakes * Add test for FP state when calling managed methods * Add explanatory comment to test. * Cleanup * Add A64 FPCR flags * Vrintx_S A32 fast path on A64 backend * Address feedback 1, re-enable DAZ * Fix FMA instructions By Elem * Address feedback
2023-04-10 12:22:58 +02:00
context.EnterArmFpMode();
InsertScalar16(context, op.Vd, op.T, res);
}
else
{
// Convert from half.
Operand src = ExtractScalar16(context, op.Vm, op.T);
MethodInfo method = op.Size == 1
? typeof(SoftFloat16_64).GetMethod(nameof(SoftFloat16_64.FPConvert))
: typeof(SoftFloat16_32).GetMethod(nameof(SoftFloat16_32.FPConvert));
ARMeilleure: Respect FZ/RM flags for all floating point operations (#4618) * ARMeilleure: Respect Fz flag for all floating point operations. This is a change in strategy for emulating the Fz FPCR flag. Before, it was set before instructions that "needed it" and reset after. However, this missed a few hot instructions like the multiplication instruction, and the entirety of A32. The new strategy is to set the Fz flag only in the following circumstances: - Set to match FPCR before translated functions/loop are executed. - Reset when calling SoftFloat methods, set when returning. - Reset when exiting execution. This allows us to remove the code around the existing Fz aware instructions, and get the accuracy benefits on all floating point instructions executed while in translated code. Single step executions now need to be called with a context wrapper - right now it just contains the Fz flag initialization, and won't actually do anything on ARM. This fixes a bug in Breath of the Wild where some physics interactions could randomly crash the game due to subnormal values not flushing to zero. This is draft right now because I need to answer the questions: - Does dotnet avoid changing the value of Mxcsr? - Is it a good idea to assume that? Or should the flag set/restore be done on every managed method call, not just softfloat? - If we assume that, do we want a unit test to verify the behaviour? I recommend testing a bunch of games, especially games affected when this was originally added, such as #1611. * Remove unused method * Use FMA for Fmadd, Fmsub, Fnmadd, Fnmsub, Fmla, Fmls ...when available. Similar implementation to A32 * Use FMA for Frecps, Frsqrts * Don't set DAZ. * Add round mode to ARM FP mode * Fix mistakes * Add test for FP state when calling managed methods * Add explanatory comment to test. * Cleanup * Add A64 FPCR flags * Vrintx_S A32 fast path on A64 backend * Address feedback 1, re-enable DAZ * Fix FMA instructions By Elem * Address feedback
2023-04-10 12:22:58 +02:00
context.ExitArmFpMode();
context.StoreToContext();
Operand res = context.Call(method, src);
context.LoadFromContext();
ARMeilleure: Respect FZ/RM flags for all floating point operations (#4618) * ARMeilleure: Respect Fz flag for all floating point operations. This is a change in strategy for emulating the Fz FPCR flag. Before, it was set before instructions that "needed it" and reset after. However, this missed a few hot instructions like the multiplication instruction, and the entirety of A32. The new strategy is to set the Fz flag only in the following circumstances: - Set to match FPCR before translated functions/loop are executed. - Reset when calling SoftFloat methods, set when returning. - Reset when exiting execution. This allows us to remove the code around the existing Fz aware instructions, and get the accuracy benefits on all floating point instructions executed while in translated code. Single step executions now need to be called with a context wrapper - right now it just contains the Fz flag initialization, and won't actually do anything on ARM. This fixes a bug in Breath of the Wild where some physics interactions could randomly crash the game due to subnormal values not flushing to zero. This is draft right now because I need to answer the questions: - Does dotnet avoid changing the value of Mxcsr? - Is it a good idea to assume that? Or should the flag set/restore be done on every managed method call, not just softfloat? - If we assume that, do we want a unit test to verify the behaviour? I recommend testing a bunch of games, especially games affected when this was originally added, such as #1611. * Remove unused method * Use FMA for Fmadd, Fmsub, Fnmadd, Fnmsub, Fmla, Fmls ...when available. Similar implementation to A32 * Use FMA for Frecps, Frsqrts * Don't set DAZ. * Add round mode to ARM FP mode * Fix mistakes * Add test for FP state when calling managed methods * Add explanatory comment to test. * Cleanup * Add A64 FPCR flags * Vrintx_S A32 fast path on A64 backend * Address feedback 1, re-enable DAZ * Fix FMA instructions By Elem * Address feedback
2023-04-10 12:22:58 +02:00
context.EnterArmFpMode();
InsertScalar(context, op.Vd, res);
}
}
}
// VRINTA/M/N/P (floating-point).
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
public static void Vrint_RM(ArmEmitterContext context)
{
OpCode32SimdS op = (OpCode32SimdS)context.CurrOp;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
OperandType floatSize = op.RegisterSize == RegisterSize.Int64 ? OperandType.FP64 : OperandType.FP32;
int rm = op.Opc2 & 3;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
if (Optimizations.UseAdvSimd)
{
Intrinsic inst = rm switch {
0b00 => Intrinsic.Arm64FrintaS,
0b01 => Intrinsic.Arm64FrintnS,
0b10 => Intrinsic.Arm64FrintpS,
0b11 => Intrinsic.Arm64FrintmS,
_ => throw new ArgumentOutOfRangeException(nameof(rm))
};
InstEmitSimdHelper32Arm64.EmitScalarUnaryOpF32(context, inst);
}
else if (Optimizations.UseSse41)
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
{
EmitScalarUnaryOpSimd32(context, (m) =>
{
FPRoundingMode roundMode = RMToRoundMode(rm);
if (roundMode != FPRoundingMode.ToNearestAway)
{
Intrinsic inst = (op.Size & 1) == 0 ? Intrinsic.X86Roundss : Intrinsic.X86Roundsd;
return context.AddIntrinsic(inst, m, Const(X86GetRoundControl(roundMode)));
}
else
{
return EmitSse41RoundToNearestWithTiesToAwayOpF(context, m, scalar: true);
}
});
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
else
{
Operand toConvert = ExtractScalar(context, floatSize, op.Vm);
switch (rm)
{
case 0b00: // Away
toConvert = EmitRoundMathCall(context, MidpointRounding.AwayFromZero, toConvert);
break;
case 0b01: // Nearest
toConvert = EmitRoundMathCall(context, MidpointRounding.ToEven, toConvert);
break;
case 0b10: // Towards positive infinity
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
toConvert = EmitUnaryMathCall(context, nameof(Math.Ceiling), toConvert);
break;
case 0b11: // Towards negative infinity
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
toConvert = EmitUnaryMathCall(context, nameof(Math.Floor), toConvert);
break;
}
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
InsertScalar(context, op.Vd, toConvert);
}
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
// VRINTA (vector).
public static void Vrinta_V(ArmEmitterContext context)
{
if (Optimizations.UseAdvSimd)
{
InstEmitSimdHelper32Arm64.EmitVectorUnaryOpF32(context, Intrinsic.Arm64FrintaS);
}
else
{
EmitVectorUnaryOpF32(context, (m) => EmitRoundMathCall(context, MidpointRounding.AwayFromZero, m));
}
}
// VRINTM (vector).
public static void Vrintm_V(ArmEmitterContext context)
{
if (Optimizations.UseAdvSimd)
{
InstEmitSimdHelper32Arm64.EmitVectorUnaryOpF32(context, Intrinsic.Arm64FrintmS);
}
else if (Optimizations.UseSse2)
{
EmitVectorUnaryOpSimd32(context, (m) =>
{
return context.AddIntrinsic(Intrinsic.X86Roundps, m, Const(X86GetRoundControl(FPRoundingMode.TowardsMinusInfinity)));
});
}
else
{
EmitVectorUnaryOpF32(context, (m) => EmitUnaryMathCall(context, nameof(Math.Floor), m));
}
}
// VRINTN (vector).
public static void Vrintn_V(ArmEmitterContext context)
{
if (Optimizations.UseAdvSimd)
{
InstEmitSimdHelper32Arm64.EmitVectorUnaryOpF32(context, Intrinsic.Arm64FrintnS);
}
else if (Optimizations.UseSse2)
{
EmitVectorUnaryOpSimd32(context, (m) =>
{
return context.AddIntrinsic(Intrinsic.X86Roundps, m, Const(X86GetRoundControl(FPRoundingMode.ToNearest)));
});
}
else
{
EmitVectorUnaryOpF32(context, (m) => EmitRoundMathCall(context, MidpointRounding.ToEven, m));
}
}
// VRINTP (vector).
public static void Vrintp_V(ArmEmitterContext context)
{
if (Optimizations.UseAdvSimd)
{
InstEmitSimdHelper32Arm64.EmitVectorUnaryOpF32(context, Intrinsic.Arm64FrintpS);
}
else if (Optimizations.UseSse2)
{
EmitVectorUnaryOpSimd32(context, (m) =>
{
return context.AddIntrinsic(Intrinsic.X86Roundps, m, Const(X86GetRoundControl(FPRoundingMode.TowardsPlusInfinity)));
});
}
else
{
EmitVectorUnaryOpF32(context, (m) => EmitUnaryMathCall(context, nameof(Math.Ceiling), m));
}
}
// VRINTZ (floating-point).
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
public static void Vrint_Z(ArmEmitterContext context)
{
OpCode32SimdS op = (OpCode32SimdS)context.CurrOp;
if (Optimizations.UseAdvSimd)
{
InstEmitSimdHelper32Arm64.EmitScalarUnaryOpF32(context, Intrinsic.Arm64FrintzS);
}
else if (Optimizations.UseSse2)
{
EmitScalarUnaryOpSimd32(context, (m) =>
{
Intrinsic inst = (op.Size & 1) == 0 ? Intrinsic.X86Roundss : Intrinsic.X86Roundsd;
return context.AddIntrinsic(inst, m, Const(X86GetRoundControl(FPRoundingMode.TowardsZero)));
});
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
}
else
{
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
EmitScalarUnaryOpF32(context, (op1) => EmitUnaryMathCall(context, nameof(Math.Truncate), op1));
}
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
// VRINTX (floating-point).
public static void Vrintx_S(ArmEmitterContext context)
{
ARMeilleure: Respect FZ/RM flags for all floating point operations (#4618) * ARMeilleure: Respect Fz flag for all floating point operations. This is a change in strategy for emulating the Fz FPCR flag. Before, it was set before instructions that "needed it" and reset after. However, this missed a few hot instructions like the multiplication instruction, and the entirety of A32. The new strategy is to set the Fz flag only in the following circumstances: - Set to match FPCR before translated functions/loop are executed. - Reset when calling SoftFloat methods, set when returning. - Reset when exiting execution. This allows us to remove the code around the existing Fz aware instructions, and get the accuracy benefits on all floating point instructions executed while in translated code. Single step executions now need to be called with a context wrapper - right now it just contains the Fz flag initialization, and won't actually do anything on ARM. This fixes a bug in Breath of the Wild where some physics interactions could randomly crash the game due to subnormal values not flushing to zero. This is draft right now because I need to answer the questions: - Does dotnet avoid changing the value of Mxcsr? - Is it a good idea to assume that? Or should the flag set/restore be done on every managed method call, not just softfloat? - If we assume that, do we want a unit test to verify the behaviour? I recommend testing a bunch of games, especially games affected when this was originally added, such as #1611. * Remove unused method * Use FMA for Fmadd, Fmsub, Fnmadd, Fnmsub, Fmla, Fmls ...when available. Similar implementation to A32 * Use FMA for Frecps, Frsqrts * Don't set DAZ. * Add round mode to ARM FP mode * Fix mistakes * Add test for FP state when calling managed methods * Add explanatory comment to test. * Cleanup * Add A64 FPCR flags * Vrintx_S A32 fast path on A64 backend * Address feedback 1, re-enable DAZ * Fix FMA instructions By Elem * Address feedback
2023-04-10 12:22:58 +02:00
if (Optimizations.UseAdvSimd)
{
InstEmitSimdHelper32Arm64.EmitScalarUnaryOpF32(context, Intrinsic.Arm64FrintxS);
}
else
{
ARMeilleure: Respect FZ/RM flags for all floating point operations (#4618) * ARMeilleure: Respect Fz flag for all floating point operations. This is a change in strategy for emulating the Fz FPCR flag. Before, it was set before instructions that "needed it" and reset after. However, this missed a few hot instructions like the multiplication instruction, and the entirety of A32. The new strategy is to set the Fz flag only in the following circumstances: - Set to match FPCR before translated functions/loop are executed. - Reset when calling SoftFloat methods, set when returning. - Reset when exiting execution. This allows us to remove the code around the existing Fz aware instructions, and get the accuracy benefits on all floating point instructions executed while in translated code. Single step executions now need to be called with a context wrapper - right now it just contains the Fz flag initialization, and won't actually do anything on ARM. This fixes a bug in Breath of the Wild where some physics interactions could randomly crash the game due to subnormal values not flushing to zero. This is draft right now because I need to answer the questions: - Does dotnet avoid changing the value of Mxcsr? - Is it a good idea to assume that? Or should the flag set/restore be done on every managed method call, not just softfloat? - If we assume that, do we want a unit test to verify the behaviour? I recommend testing a bunch of games, especially games affected when this was originally added, such as #1611. * Remove unused method * Use FMA for Fmadd, Fmsub, Fnmadd, Fnmsub, Fmla, Fmls ...when available. Similar implementation to A32 * Use FMA for Frecps, Frsqrts * Don't set DAZ. * Add round mode to ARM FP mode * Fix mistakes * Add test for FP state when calling managed methods * Add explanatory comment to test. * Cleanup * Add A64 FPCR flags * Vrintx_S A32 fast path on A64 backend * Address feedback 1, re-enable DAZ * Fix FMA instructions By Elem * Address feedback
2023-04-10 12:22:58 +02:00
EmitScalarUnaryOpF32(context, (op1) =>
{
return EmitRoundByRMode(context, op1);
});
}
}
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
private static Operand EmitFPConvert(ArmEmitterContext context, Operand value, OperandType type, bool signed)
{
Debug.Assert(value.Type == OperandType.I32 || value.Type == OperandType.I64);
if (signed)
{
return context.ConvertToFP(type, value);
}
else
{
return context.ConvertToFPUI(type, value);
}
}
private static void EmitSse41ConvertInt32(ArmEmitterContext context, FPRoundingMode roundMode, bool signed)
{
// A port of the similar round function in InstEmitSimdCvt.
OpCode32SimdCvtFI op = (OpCode32SimdCvtFI)context.CurrOp;
bool doubleSize = (op.Size & 1) != 0;
int shift = doubleSize ? 1 : 2;
Operand n = GetVecA32(op.Vm >> shift);
n = EmitSwapScalar(context, n, op.Vm, doubleSize);
if (!doubleSize)
{
Operand nRes = context.AddIntrinsic(Intrinsic.X86Cmpss, n, n, Const((int)CmpCondition.OrderedQ));
nRes = context.AddIntrinsic(Intrinsic.X86Pand, nRes, n);
if (roundMode != FPRoundingMode.ToNearestAway)
{
nRes = context.AddIntrinsic(Intrinsic.X86Roundss, nRes, Const(X86GetRoundControl(roundMode)));
}
else
{
nRes = EmitSse41RoundToNearestWithTiesToAwayOpF(context, nRes, scalar: true);
}
Operand zero = context.VectorZero();
Operand nCmp;
Reduce JIT GC allocations (#2515) * Turn `MemoryOperand` into a struct * Remove `IntrinsicOperation` * Remove `PhiNode` * Remove `Node` * Turn `Operand` into a struct * Turn `Operation` into a struct * Clean up pool management methods * Add `Arena` allocator * Move `OperationHelper` to `Operation.Factory` * Move `OperandHelper` to `Operand.Factory` * Optimize `Operation` a bit * Fix `Arena` initialization * Rename `NativeList<T>` to `ArenaList<T>` * Reduce `Operand` size from 88 to 56 bytes * Reduce `Operation` size from 56 to 40 bytes * Add optimistic interning of Register & Constant operands * Optimize `RegisterUsage` pass a bit * Optimize `RemoveUnusedNodes` pass a bit Iterating in reverse-order allows killing dependency chains in a single pass. * Fix PPTC symbols * Optimize `BasicBlock` a bit Reduce allocations from `_successor` & `DominanceFrontiers` * Fix `Operation` resize * Make `Arena` expandable Change the arena allocator to be expandable by allocating in pages, with some of them being pooled. Currently 32 pages are pooled. An LRU removal mechanism should probably be added to it. Apparently MHR can allocate bitmaps large enough to exceed the 16MB limit for the type. * Move `Arena` & `ArenaList` to `Common` * Remove `ThreadStaticPool` & co * Add `PhiOperation` * Reduce `Operand` size from 56 from 48 bytes * Add linear-probing to `Operand` intern table * Optimize `HybridAllocator` a bit * Add `Allocators` class * Tune `ArenaAllocator` sizes * Add page removal mechanism to `ArenaAllocator` Remove pages which have not been used for more than 5s after each reset. I am on fence if this would be better using a Gen2 callback object like the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right now if a large translation happens, the pages will be freed only after a reset. This reset may not happen for a while because no new translation is hit, but the arena base sizes are rather small. * Fix `OOM` when allocating larger than page size in `ArenaAllocator` Tweak resizing mechanism for Operand.Uses and Assignemnts. * Optimize `Optimizer` a bit * Optimize `Operand.Add<T>/Remove<T>` a bit * Clean up `PreAllocator` * Fix phi insertion order Reduce codegen diffs. * Fix code alignment * Use new heuristics for degree of parallelism * Suppress warnings * Address gdkchan's feedback Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that `Operand.Value` should usually not be modified directly. * Add fast path to `ArenaAllocator` * Assembly for `ArenaAllocator.Allocate(ulong)`: .L0: mov rax, [rcx+0x18] lea r8, [rax+rdx] cmp r8, [rcx+0x10] ja short .L2 .L1: mov rdx, [rcx+8] add rax, [rdx+8] mov [rcx+0x18], r8 ret .L2: jmp ArenaAllocator.AllocateSlow(UInt64) A few variable/field had to be changed to ulong so that RyuJIT avoids emitting zero-extends. * Implement a new heuristic to free pooled pages. If an arena is used often, it is more likely that its pages will be needed, so the pages are kept for longer (e.g: during PPTC rebuild or burst sof compilations). If is not used often, then it is more likely that its pages will not be needed (e.g: after PPTC rebuild or bursts of compilations). * Address riperiperi's feedback * Use `EqualityComparer<T>` in `IntrusiveList<T>` Avoids a potential GC hole in `Equals(T, T)`.
2021-08-17 20:08:34 +02:00
Operand nIntOrLong2 = default;
if (!signed)
{
nCmp = context.AddIntrinsic(Intrinsic.X86Cmpss, nRes, zero, Const((int)CmpCondition.NotLessThanOrEqual));
nRes = context.AddIntrinsic(Intrinsic.X86Pand, nRes, nCmp);
}
int fpMaxVal = 0x4F000000; // 2.14748365E9f (2147483648)
Operand fpMaxValMask = X86GetScalar(context, fpMaxVal);
Operand nIntOrLong = context.AddIntrinsicInt(Intrinsic.X86Cvtss2si, nRes);
if (!signed)
{
nRes = context.AddIntrinsic(Intrinsic.X86Subss, nRes, fpMaxValMask);
nCmp = context.AddIntrinsic(Intrinsic.X86Cmpss, nRes, zero, Const((int)CmpCondition.NotLessThanOrEqual));
nRes = context.AddIntrinsic(Intrinsic.X86Pand, nRes, nCmp);
nIntOrLong2 = context.AddIntrinsicInt(Intrinsic.X86Cvtss2si, nRes);
}
nRes = context.AddIntrinsic(Intrinsic.X86Cmpss, nRes, fpMaxValMask, Const((int)CmpCondition.NotLessThan));
Operand nInt = context.AddIntrinsicInt(Intrinsic.X86Cvtsi2si, nRes);
Operand dRes;
if (signed)
{
dRes = context.BitwiseExclusiveOr(nIntOrLong, nInt);
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
}
else
{
dRes = context.BitwiseExclusiveOr(nIntOrLong2, nInt);
dRes = context.Add(dRes, nIntOrLong);
}
InsertScalar(context, op.Vd, dRes);
}
else
{
Operand nRes = context.AddIntrinsic(Intrinsic.X86Cmpsd, n, n, Const((int)CmpCondition.OrderedQ));
nRes = context.AddIntrinsic(Intrinsic.X86Pand, nRes, n);
if (roundMode != FPRoundingMode.ToNearestAway)
{
nRes = context.AddIntrinsic(Intrinsic.X86Roundsd, nRes, Const(X86GetRoundControl(roundMode)));
}
else
{
nRes = EmitSse41RoundToNearestWithTiesToAwayOpF(context, nRes, scalar: true);
}
Operand zero = context.VectorZero();
Operand nCmp;
Reduce JIT GC allocations (#2515) * Turn `MemoryOperand` into a struct * Remove `IntrinsicOperation` * Remove `PhiNode` * Remove `Node` * Turn `Operand` into a struct * Turn `Operation` into a struct * Clean up pool management methods * Add `Arena` allocator * Move `OperationHelper` to `Operation.Factory` * Move `OperandHelper` to `Operand.Factory` * Optimize `Operation` a bit * Fix `Arena` initialization * Rename `NativeList<T>` to `ArenaList<T>` * Reduce `Operand` size from 88 to 56 bytes * Reduce `Operation` size from 56 to 40 bytes * Add optimistic interning of Register & Constant operands * Optimize `RegisterUsage` pass a bit * Optimize `RemoveUnusedNodes` pass a bit Iterating in reverse-order allows killing dependency chains in a single pass. * Fix PPTC symbols * Optimize `BasicBlock` a bit Reduce allocations from `_successor` & `DominanceFrontiers` * Fix `Operation` resize * Make `Arena` expandable Change the arena allocator to be expandable by allocating in pages, with some of them being pooled. Currently 32 pages are pooled. An LRU removal mechanism should probably be added to it. Apparently MHR can allocate bitmaps large enough to exceed the 16MB limit for the type. * Move `Arena` & `ArenaList` to `Common` * Remove `ThreadStaticPool` & co * Add `PhiOperation` * Reduce `Operand` size from 56 from 48 bytes * Add linear-probing to `Operand` intern table * Optimize `HybridAllocator` a bit * Add `Allocators` class * Tune `ArenaAllocator` sizes * Add page removal mechanism to `ArenaAllocator` Remove pages which have not been used for more than 5s after each reset. I am on fence if this would be better using a Gen2 callback object like the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right now if a large translation happens, the pages will be freed only after a reset. This reset may not happen for a while because no new translation is hit, but the arena base sizes are rather small. * Fix `OOM` when allocating larger than page size in `ArenaAllocator` Tweak resizing mechanism for Operand.Uses and Assignemnts. * Optimize `Optimizer` a bit * Optimize `Operand.Add<T>/Remove<T>` a bit * Clean up `PreAllocator` * Fix phi insertion order Reduce codegen diffs. * Fix code alignment * Use new heuristics for degree of parallelism * Suppress warnings * Address gdkchan's feedback Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that `Operand.Value` should usually not be modified directly. * Add fast path to `ArenaAllocator` * Assembly for `ArenaAllocator.Allocate(ulong)`: .L0: mov rax, [rcx+0x18] lea r8, [rax+rdx] cmp r8, [rcx+0x10] ja short .L2 .L1: mov rdx, [rcx+8] add rax, [rdx+8] mov [rcx+0x18], r8 ret .L2: jmp ArenaAllocator.AllocateSlow(UInt64) A few variable/field had to be changed to ulong so that RyuJIT avoids emitting zero-extends. * Implement a new heuristic to free pooled pages. If an arena is used often, it is more likely that its pages will be needed, so the pages are kept for longer (e.g: during PPTC rebuild or burst sof compilations). If is not used often, then it is more likely that its pages will not be needed (e.g: after PPTC rebuild or bursts of compilations). * Address riperiperi's feedback * Use `EqualityComparer<T>` in `IntrusiveList<T>` Avoids a potential GC hole in `Equals(T, T)`.
2021-08-17 20:08:34 +02:00
Operand nIntOrLong2 = default;
if (!signed)
{
nCmp = context.AddIntrinsic(Intrinsic.X86Cmpsd, nRes, zero, Const((int)CmpCondition.NotLessThanOrEqual));
nRes = context.AddIntrinsic(Intrinsic.X86Pand, nRes, nCmp);
}
long fpMaxVal = 0x41E0000000000000L; // 2147483648.0000000d (2147483648)
Operand fpMaxValMask = X86GetScalar(context, fpMaxVal);
Operand nIntOrLong = context.AddIntrinsicInt(Intrinsic.X86Cvtsd2si, nRes);
if (!signed)
{
nRes = context.AddIntrinsic(Intrinsic.X86Subsd, nRes, fpMaxValMask);
nCmp = context.AddIntrinsic(Intrinsic.X86Cmpsd, nRes, zero, Const((int)CmpCondition.NotLessThanOrEqual));
nRes = context.AddIntrinsic(Intrinsic.X86Pand, nRes, nCmp);
nIntOrLong2 = context.AddIntrinsicInt(Intrinsic.X86Cvtsd2si, nRes);
}
nRes = context.AddIntrinsic(Intrinsic.X86Cmpsd, nRes, fpMaxValMask, Const((int)CmpCondition.NotLessThan));
Operand nLong = context.AddIntrinsicLong(Intrinsic.X86Cvtsi2si, nRes);
nLong = context.ConvertI64ToI32(nLong);
Operand dRes;
if (signed)
{
dRes = context.BitwiseExclusiveOr(nIntOrLong, nLong);
}
else
{
dRes = context.BitwiseExclusiveOr(nIntOrLong2, nLong);
dRes = context.Add(dRes, nIntOrLong);
}
InsertScalar(context, op.Vd, dRes);
}
}
private static void EmitSse41ConvertVector32(ArmEmitterContext context, FPRoundingMode roundMode, bool signed)
{
OpCode32Simd op = (OpCode32Simd)context.CurrOp;
EmitVectorUnaryOpSimd32(context, (n) =>
{
int sizeF = op.Size & 1;
if (sizeF == 0)
{
Operand nRes = context.AddIntrinsic(Intrinsic.X86Cmpps, n, n, Const((int)CmpCondition.OrderedQ));
nRes = context.AddIntrinsic(Intrinsic.X86Pand, nRes, n);
nRes = context.AddIntrinsic(Intrinsic.X86Roundps, nRes, Const(X86GetRoundControl(roundMode)));
Operand zero = context.VectorZero();
Operand nCmp;
if (!signed)
{
nCmp = context.AddIntrinsic(Intrinsic.X86Cmpps, nRes, zero, Const((int)CmpCondition.NotLessThanOrEqual));
nRes = context.AddIntrinsic(Intrinsic.X86Pand, nRes, nCmp);
}
Operand fpMaxValMask = X86GetAllElements(context, 0x4F000000); // 2.14748365E9f (2147483648)
Operand nInt = context.AddIntrinsic(Intrinsic.X86Cvtps2dq, nRes);
Reduce JIT GC allocations (#2515) * Turn `MemoryOperand` into a struct * Remove `IntrinsicOperation` * Remove `PhiNode` * Remove `Node` * Turn `Operand` into a struct * Turn `Operation` into a struct * Clean up pool management methods * Add `Arena` allocator * Move `OperationHelper` to `Operation.Factory` * Move `OperandHelper` to `Operand.Factory` * Optimize `Operation` a bit * Fix `Arena` initialization * Rename `NativeList<T>` to `ArenaList<T>` * Reduce `Operand` size from 88 to 56 bytes * Reduce `Operation` size from 56 to 40 bytes * Add optimistic interning of Register & Constant operands * Optimize `RegisterUsage` pass a bit * Optimize `RemoveUnusedNodes` pass a bit Iterating in reverse-order allows killing dependency chains in a single pass. * Fix PPTC symbols * Optimize `BasicBlock` a bit Reduce allocations from `_successor` & `DominanceFrontiers` * Fix `Operation` resize * Make `Arena` expandable Change the arena allocator to be expandable by allocating in pages, with some of them being pooled. Currently 32 pages are pooled. An LRU removal mechanism should probably be added to it. Apparently MHR can allocate bitmaps large enough to exceed the 16MB limit for the type. * Move `Arena` & `ArenaList` to `Common` * Remove `ThreadStaticPool` & co * Add `PhiOperation` * Reduce `Operand` size from 56 from 48 bytes * Add linear-probing to `Operand` intern table * Optimize `HybridAllocator` a bit * Add `Allocators` class * Tune `ArenaAllocator` sizes * Add page removal mechanism to `ArenaAllocator` Remove pages which have not been used for more than 5s after each reset. I am on fence if this would be better using a Gen2 callback object like the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right now if a large translation happens, the pages will be freed only after a reset. This reset may not happen for a while because no new translation is hit, but the arena base sizes are rather small. * Fix `OOM` when allocating larger than page size in `ArenaAllocator` Tweak resizing mechanism for Operand.Uses and Assignemnts. * Optimize `Optimizer` a bit * Optimize `Operand.Add<T>/Remove<T>` a bit * Clean up `PreAllocator` * Fix phi insertion order Reduce codegen diffs. * Fix code alignment * Use new heuristics for degree of parallelism * Suppress warnings * Address gdkchan's feedback Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that `Operand.Value` should usually not be modified directly. * Add fast path to `ArenaAllocator` * Assembly for `ArenaAllocator.Allocate(ulong)`: .L0: mov rax, [rcx+0x18] lea r8, [rax+rdx] cmp r8, [rcx+0x10] ja short .L2 .L1: mov rdx, [rcx+8] add rax, [rdx+8] mov [rcx+0x18], r8 ret .L2: jmp ArenaAllocator.AllocateSlow(UInt64) A few variable/field had to be changed to ulong so that RyuJIT avoids emitting zero-extends. * Implement a new heuristic to free pooled pages. If an arena is used often, it is more likely that its pages will be needed, so the pages are kept for longer (e.g: during PPTC rebuild or burst sof compilations). If is not used often, then it is more likely that its pages will not be needed (e.g: after PPTC rebuild or bursts of compilations). * Address riperiperi's feedback * Use `EqualityComparer<T>` in `IntrusiveList<T>` Avoids a potential GC hole in `Equals(T, T)`.
2021-08-17 20:08:34 +02:00
Operand nInt2 = default;
if (!signed)
{
nRes = context.AddIntrinsic(Intrinsic.X86Subps, nRes, fpMaxValMask);
nCmp = context.AddIntrinsic(Intrinsic.X86Cmpps, nRes, zero, Const((int)CmpCondition.NotLessThanOrEqual));
nRes = context.AddIntrinsic(Intrinsic.X86Pand, nRes, nCmp);
nInt2 = context.AddIntrinsic(Intrinsic.X86Cvtps2dq, nRes);
}
nRes = context.AddIntrinsic(Intrinsic.X86Cmpps, nRes, fpMaxValMask, Const((int)CmpCondition.NotLessThan));
if (signed)
{
return context.AddIntrinsic(Intrinsic.X86Pxor, nInt, nRes);
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
}
else
{
Operand dRes = context.AddIntrinsic(Intrinsic.X86Pxor, nInt2, nRes);
return context.AddIntrinsic(Intrinsic.X86Paddd, dRes, nInt);
}
}
else /* if (sizeF == 1) */
{
Operand nRes = context.AddIntrinsic(Intrinsic.X86Cmppd, n, n, Const((int)CmpCondition.OrderedQ));
nRes = context.AddIntrinsic(Intrinsic.X86Pand, nRes, n);
nRes = context.AddIntrinsic(Intrinsic.X86Roundpd, nRes, Const(X86GetRoundControl(roundMode)));
Operand zero = context.VectorZero();
Operand nCmp;
if (!signed)
{
nCmp = context.AddIntrinsic(Intrinsic.X86Cmppd, nRes, zero, Const((int)CmpCondition.NotLessThanOrEqual));
nRes = context.AddIntrinsic(Intrinsic.X86Pand, nRes, nCmp);
}
Operand fpMaxValMask = X86GetAllElements(context, 0x43E0000000000000L); // 9.2233720368547760E18d (9223372036854775808)
Operand nLong = InstEmit.EmitSse2CvtDoubleToInt64OpF(context, nRes, false);
Reduce JIT GC allocations (#2515) * Turn `MemoryOperand` into a struct * Remove `IntrinsicOperation` * Remove `PhiNode` * Remove `Node` * Turn `Operand` into a struct * Turn `Operation` into a struct * Clean up pool management methods * Add `Arena` allocator * Move `OperationHelper` to `Operation.Factory` * Move `OperandHelper` to `Operand.Factory` * Optimize `Operation` a bit * Fix `Arena` initialization * Rename `NativeList<T>` to `ArenaList<T>` * Reduce `Operand` size from 88 to 56 bytes * Reduce `Operation` size from 56 to 40 bytes * Add optimistic interning of Register & Constant operands * Optimize `RegisterUsage` pass a bit * Optimize `RemoveUnusedNodes` pass a bit Iterating in reverse-order allows killing dependency chains in a single pass. * Fix PPTC symbols * Optimize `BasicBlock` a bit Reduce allocations from `_successor` & `DominanceFrontiers` * Fix `Operation` resize * Make `Arena` expandable Change the arena allocator to be expandable by allocating in pages, with some of them being pooled. Currently 32 pages are pooled. An LRU removal mechanism should probably be added to it. Apparently MHR can allocate bitmaps large enough to exceed the 16MB limit for the type. * Move `Arena` & `ArenaList` to `Common` * Remove `ThreadStaticPool` & co * Add `PhiOperation` * Reduce `Operand` size from 56 from 48 bytes * Add linear-probing to `Operand` intern table * Optimize `HybridAllocator` a bit * Add `Allocators` class * Tune `ArenaAllocator` sizes * Add page removal mechanism to `ArenaAllocator` Remove pages which have not been used for more than 5s after each reset. I am on fence if this would be better using a Gen2 callback object like the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right now if a large translation happens, the pages will be freed only after a reset. This reset may not happen for a while because no new translation is hit, but the arena base sizes are rather small. * Fix `OOM` when allocating larger than page size in `ArenaAllocator` Tweak resizing mechanism for Operand.Uses and Assignemnts. * Optimize `Optimizer` a bit * Optimize `Operand.Add<T>/Remove<T>` a bit * Clean up `PreAllocator` * Fix phi insertion order Reduce codegen diffs. * Fix code alignment * Use new heuristics for degree of parallelism * Suppress warnings * Address gdkchan's feedback Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that `Operand.Value` should usually not be modified directly. * Add fast path to `ArenaAllocator` * Assembly for `ArenaAllocator.Allocate(ulong)`: .L0: mov rax, [rcx+0x18] lea r8, [rax+rdx] cmp r8, [rcx+0x10] ja short .L2 .L1: mov rdx, [rcx+8] add rax, [rdx+8] mov [rcx+0x18], r8 ret .L2: jmp ArenaAllocator.AllocateSlow(UInt64) A few variable/field had to be changed to ulong so that RyuJIT avoids emitting zero-extends. * Implement a new heuristic to free pooled pages. If an arena is used often, it is more likely that its pages will be needed, so the pages are kept for longer (e.g: during PPTC rebuild or burst sof compilations). If is not used often, then it is more likely that its pages will not be needed (e.g: after PPTC rebuild or bursts of compilations). * Address riperiperi's feedback * Use `EqualityComparer<T>` in `IntrusiveList<T>` Avoids a potential GC hole in `Equals(T, T)`.
2021-08-17 20:08:34 +02:00
Operand nLong2 = default;
if (!signed)
{
nRes = context.AddIntrinsic(Intrinsic.X86Subpd, nRes, fpMaxValMask);
nCmp = context.AddIntrinsic(Intrinsic.X86Cmppd, nRes, zero, Const((int)CmpCondition.NotLessThanOrEqual));
nRes = context.AddIntrinsic(Intrinsic.X86Pand, nRes, nCmp);
nLong2 = InstEmit.EmitSse2CvtDoubleToInt64OpF(context, nRes, false);
}
nRes = context.AddIntrinsic(Intrinsic.X86Cmppd, nRes, fpMaxValMask, Const((int)CmpCondition.NotLessThan));
if (signed)
{
return context.AddIntrinsic(Intrinsic.X86Pxor, nLong, nRes);
}
else
{
Operand dRes = context.AddIntrinsic(Intrinsic.X86Pxor, nLong2, nRes);
return context.AddIntrinsic(Intrinsic.X86Paddq, dRes, nLong);
}
}
});
}
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
}