Ryujinx

Author	SHA1	Message	Date
gdkchan	6922862db8	Optimize kernel memory block lookup and consolidate RBTree implementations (#3410 ) * Implement intrusive red-black tree, use it for HLE kernel block manager * Implement TreeDictionary using IntrusiveRedBlackTree * Implement IntervalTree using IntrusiveRedBlackTree * Implement IntervalTree (on Ryujinx.Memory) using IntrusiveRedBlackTree * Make PredecessorOf and SuccessorOf internal, expose Predecessor and Successor properties on the node itself * Allocation free tree node lookup	2022-08-26 18:21:48 +00:00
merry	f5235fff29	ARMeilleure: Hardware accelerate SHA256 (#3585 ) * ARMeilleure/HardwareCapabilities: Add Sha * ARMeilleure/Intrinsic: Add X86Sha256Rnds2 * ARmeilleure: Hardware accelerate SHA256H/SHA256H2 * ARMeilleure/Intrinsic: Add X86Sha256Msg1, X86Sha256Msg2 * ARMeilleure/Intrinsic: Add X86Palignr * ARMeilleure: Hardware accelerate SHA256SU0, SHA256SU1 * PTC: Bump InternalVersion	2022-08-25 10:12:13 +00:00
gdkchan	eba682b767	Implement some 32-bit Thumb instructions (#3614 ) * Implement some 32-bit Thumb instructions * Optimize OpCode32MemMult using PopCount	2022-08-25 09:59:34 +00:00
Nicholas Rodine	7defc59b9d	A few minor documentation fixes. (#3599 ) * A few minor documentation fixes. * Removed more invalid inheritdoc instances.	2022-08-19 18:21:06 -03:00
Nicholas Rodine	951700fdd8	Removed unused usings. (#3593 ) * Removed unused usings. * Added back using, now that it's used. * Removed extra whitespace.	2022-08-18 18:04:54 +02:00
merry	6dfb6ccf8c	PreAllocator: Check if instruction supports a Vex prefix in IsVexSameOperandDestSrc1 (#3587 )	2022-08-14 17:35:08 -03:00
gdkchan	2bb9b33da1	Implement Arm32 Sha256 and MRS Rd, CPSR instructions (#3544 ) * Implement Arm32 Sha256 and MRS Rd, CPSR instructions * Add tests using Arm64 outputs	2022-08-05 19:03:50 +02:00
riperiperi	14ce9e1567	Move partial unmap handler to the native signal handler (#3437 ) * Initial commit with a lot of testing stuff. * Partial Unmap Cleanup Part 1 * Fix some minor issues, hopefully windows tests. * Disable partial unmap tests on macos for now Weird issue. * Goodbye magic number * Add COMPlus_EnableAlternateStackCheck for tests `COMPlus_EnableAlternateStackCheck` is needed for NullReferenceException handling to work on linux after registering the signal handler, due to how dotnet registers its own signal handler. * Address some feedback * Force retry when memory is mapped in memory tracking This case existed before, but returning `false` no longer retries, so it would crash immediately after unprotecting the memory... Now, we return `true` to deliberately retry. This case existed before (was just broken by this change) and I don't really want to look into fixing the issue right now. Technically, this means that on guest code partial unmaps will retry _due to this_ rather than hitting the handler. I don't expect this to cause any issues. This should fix random crashes in Xenoblade Chronicles 2. * Use IsRangeMapped * Suppress MockMemoryManager.UnmapEvent warning This event is not signalled by the mock memory manager. * Remove 4kb mapping	2022-07-29 19:16:29 -03:00
gdkchan	f7ef6364b7	Implement CPU FCVT Half <-> Double conversion variants (#3439 ) * Half <-> Double conversion support * Add tests, fast path and deduplicate SoftFloat code * PPTC version	2022-07-06 13:40:31 +02:00
gdkchan	633c5ec330	Extend uses count from ushort to uint on Operand Data structure (#3374 )	2022-06-05 14:15:27 -03:00
gdkchan	0c87bf9ea4	Refactor CPU interface to allow the implementation of other CPU emulators (#3362 ) * Refactor CPU interface * Use IExecutionContext interface on SVC handler, change how CPU interrupts invokes the handlers * Make CpuEngine take a ITickSource rather than returning one The previous implementation had the scenario where the CPU engine had to implement the tick source in mind, like for example, when we have a hypervisor and the game can read CNTPCT on the host directly. However given that we need to do conversion due to different frequencies anyway, it's not worth it. It's better to just let the user pass the tick source and redirect any reads to CNTPCT to the user tick source * XML docs for the public interfaces * PPTC invalidation due to NativeInterface function name changes * Fix build of the CPU tests * PR feedback	2022-05-31 16:29:35 -03:00
gdkchan	95017b8c66	Support memory aliasing (#2954 ) * Back to the origins: Make memory manager take guest PA rather than host address once again * Direct mapping with alias support on Windows * Fixes and remove more of the emulated shared memory * Linux support * Make shared and transfer memory not depend on SharedMemoryStorage * More efficient view mapping on Windows (no more restricted to 4KB pages at a time) * Handle potential access violations caused by partial unmap * Implement host mapping using shared memory on Linux * Add new GetPhysicalAddressChecked method, used to ensure the virtual address is mapped before address translation Also align GetRef behaviour with software memory manager * We don't need a mirrorable memory block for software memory manager mode * Disable memory aliasing tests while we don't have shared memory support on Mac * Shared memory & SIGBUS handler for macOS * Fix typo + nits + re-enable memory tests * Set MAP_JIT_DARWIN on x86 Mac too * Add back the address space mirror * Only set MAP_JIT_DARWIN if we are mapping as executable * Disable aliasing tests again (still fails on Mac) * Fix UnmapView4KB (by not casting size to int) * Use ref counting on memory blocks to delay closing the shared memory handle until all blocks using it are disposed * Address PR feedback * Make RO hold a reference to the guest process memory manager to avoid early disposal Co-authored-by: nastys <nastys@users.noreply.github.com>	2022-05-02 20:30:02 -03:00
merry	6a1a03566a	T32: Implement load/store single (immediate) (#3186 ) * T32: Implement load/store single (immediate) * tests * tidy formatting * address comments	2022-04-21 01:25:43 +02:00
gdkchan	26a881176e	Fix tail merge from block with conditional jump to multiple returns (#3267 ) * Fix tail merge from block with conditional jump to multiple returns * PPTC version bump	2022-04-09 16:56:50 +02:00
merry	df70442c46	InstEmitMemoryEx: Barrier after write on ordered store (#3193 ) * InstEmitMemoryEx: Barrier after write on ordered store * increment ptc version * 32	2022-03-19 10:32:35 -03:00
merry	bb2f9df0a1	KThread: Fix GetPsr mask (#3180 ) * ExecutionContext: GetPstate / SetPstate * Put it in NativeContext * KThread: Fix GetPsr mask * ExecutionContext: Turn methods into Pstate property * Address nit	2022-03-11 03:16:32 +01:00
merry	7af9fcbc06	T32: Implement Data Processing (Modified Immediate) instructions (#3178 ) * T32: Implement Data Processing (Modified Immediate) instructions * Update tests * switch -> lookup table	2022-03-06 22:25:01 +01:00
merry	b97ff4da5e	A32: Fix ALU immediate instructions (#3179 ) * Tests: Add A32 tests for immediate ADC/ADCS/RSC/RSCS/SBC/SBCS * A32: Fix bug in ADC/ADCS/RSC/RSCS/SBC/SBCS * CpuTestAluImm32: Add more opcodes * Increment PTC version	2022-03-05 15:23:10 -03:00
merry	747081d2c7	Decoders: Fix instruction lengths for 16-bit B instructions (#3177 )	2022-03-05 16:20:24 +01:00
merry	497199bb50	Decoder: Exit on trapping instructions, and resume execution at trapping instruction (#3153 ) * Decoder: Exit on trapping instructions, and resume execution at trapping instruction * Resume at trapping address * remove mustExit	2022-03-04 23:16:58 +01:00
merry	bd9ac0fdaa	T32: Implement B, B.cond, BL, BLX (#3155 ) * Decoders: Make IsThumb a function of OpCode32 * OpCode32: Fix GetPc * T32: Implement B, B.cond, BL, BLX * rm usings	2022-03-04 23:05:08 +01:00
merry	7b35ebc64a	T32: Implement ALU (shifted register) instructions (#3135 ) * T32: Implement ADC, ADD, AND, BIC, CMN, CMP, EOR, MOV, MVN, ORN, ORR, RSB, SBC, SUB, TEQ, TST (shifted register) * OpCodeTable: Sort T32 list * Tests: Rename RandomTestCase to PrecomputedThumbTestCase * T32: Tests for AluRsImm instructions * fix nit * fix nit 2	2022-02-22 19:11:28 -03:00
merry	dc063eac83	ARMeilleure: Implement single stepping (#3133 ) * Decoder: Implement SingleInstruction decoder mode * Translator: Implement Step * DecoderMode: Rename Normal to MultipleBlocks	2022-02-22 11:11:42 -03:00
merry	f1460d5494	A32: Fix BLX and BXWritePC (#3151 )	2022-02-22 10:41:56 -03:00
Berkan Diler	644b497df1	Collapse AsSpan().Slice(..) calls into AsSpan(..) (#3145 ) * Collapse AsSpan().Slice(..) calls into AsSpan(..) Less code and a bit faster * Collapse an Array.Clear(array, 0, array.Length) call to Array.Clear(array)	2022-02-22 10:32:10 -03:00
gdkchan	f2087ca29e	PPTC version increment (#3139 )	2022-02-17 23:52:42 -03:00
gdkchan	92d166ecb7	Enable CPU JIT cache invalidation (#2965 ) * Enable CPU JIT cache invalidation * Invalidate cache on IC IVAU	2022-02-18 02:53:18 +01:00
merry	747876dc67	Decoders: Add IOpCode32HasSetFlags (#3136 )	2022-02-18 01:33:43 +01:00
merry	98e05ee4b7	ARMeilleure: Thumb support (All T16 instructions) (#3105 ) * Decoders: Add InITBlock argument * OpCodeTable: Minor cleanup * OpCodeTable: Remove existing thumb instruction implementations * OpCodeTable: Prepare for thumb instructions * OpCodeTables: Improve thumb fast lookup * Tests: Prepare for thumb tests * T16: Implement BX * T16: Implement LSL/LSR/ASR (imm) * T16: Implement ADDS, SUBS (reg) * T16: Implement ADDS, SUBS (3-bit immediate) * T16: Implement MOVS, CMP, ADDS, SUBS (8-bit immediate) * T16: Implement ANDS, EORS, LSLS, LSRS, ASRS, ADCS, SBCS, RORS, TST, NEGS, CMP, CMN, ORRS, MULS, BICS, MVNS (low registers) * T16: Implement ADD, CMP, MOV (high reg) * T16: Implement BLX (reg) * T16: Implement LDR (literal) * T16: Implement {LDR,STR}{,H,B,SB,SH} (register) * T16: Implement {LDR,STR}{,B,H} (immediate) * T16: Implement LDR/STR (SP) * T16: Implement ADR * T16: Implement Add to SP (immediate) * T16: Implement ADD/SUB (SP) * T16: Implement SXTH, SXTB, UXTH, UTXB * T16: Implement CBZ, CBNZ * T16: Implement PUSH, POP * T16: Implement REV, REV16, REVSH * T16: Implement NOP * T16: Implement LDM, STM * T16: Implement SVC * T16: Implement B (conditional) * T16: Implement B (unconditional) * T16: Implement IT * fixup! T16: Implement ADD/SUB (SP) * fixup! T16: Implement Add to SP (immediate) * fixup! T16: Implement IT * CpuTestThumb: Add randomized tests * Remove inITBlock argument * Address nits * Use index to handle IfThenBlockState * Reduce line noise * fixup * nit	2022-02-17 19:39:45 -03:00
Berkan Diler	9ca040c0ff	Use ReadOnlySpan<byte> compiler optimization for static data (#3130 )	2022-02-17 21:38:50 +01:00
merry	ce71f9144e	InstEmitMemory32: Literal loads always have word-aligned PC (#3104 )	2022-02-11 17:51:03 -03:00
gdkchan	c3c3914ed3	Add a limit on the number of uses a constant may have (#3097 )	2022-02-09 17:42:47 -03:00
merry	86b37d0ff7	ARMeilleure: A32: Implement SHSUB8 and UHSUB8 (#3089 ) * ARMeilleure: A32: Implement UHSUB8 * ARMeilleure: A32: Implement SHSUB8	2022-02-08 10:46:42 +01:00
merry	88d3ffb97c	ARMeilleure: A32: Implement SHADD8 (#3086 )	2022-02-06 12:25:45 -03:00
merry	222b1ad7da	ARMeilleure: OpCodeTable: Add CMN (RsReg) (#3087 )	2022-02-06 02:01:05 +01:00
gdkchan	bd412afb9f	Fix small precision error on CPU reciprocal estimate instructions (#3061 ) * Fix small precision error on CPU reciprocal estimate instructions * PPTC version bump	2022-01-29 23:59:34 +01:00
gdkchan	f3bfd799e1	Fix calls passing V128 values on Linux (#3034 ) * Fix calls passing V128 values on Linux * PPTC version bump	2022-01-24 11:23:24 +01:00
gdkchan	f0824fde9f	Add host CPU memory barriers for DMB/DSB and ordered load/store (#3015 ) * Add host CPU memory barriers for DMB/DSB and ordered load/store * PPTC version bump * Revert to old barrier order	2022-01-21 12:47:34 -03:00
sharmander	60f7cba30a	Implement FCVTNS (Scalar GP) (#2953 ) * Implement FCVTNS (Scalar GP) * Update Ptc Version	2022-01-19 22:21:44 -03:00
gdkchan	bd215e447d	Fix return type mismatch on 32-bit titles (#3000 )	2022-01-16 08:39:43 -03:00
sharmander	e5f7ff1eee	CPU - Implement FCVTMS (Vector) (#2937 ) * Add FCVTMS_V Implementation to Armeilleure * Fix opcode designation * Add tests * Amend Ptc version * Fix OpCode / Tests * Create Math.Floor helper method + Update implementation * Address gdk comments * Re-address gdk comments * Update ARMeilleure/Decoders/OpCodeTable.cs Co-authored-by: gdkchan <gab.dark.100@gmail.com> * Update Tests to use 2S (4S) and 2D Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2022-01-04 16:45:28 -03:00
gdkchan	e24949ca2c	Implement CSDB instruction (#2927 )	2021-12-19 11:19:05 -03:00
Mary	00c69f2098	Remove usage of Mono.Posix.NETStandard accross all projects (#2906 ) * Remove usage of Mono.Posix.NETStandard in Ryujinx project * Remove usage of Mono.Posix.NETStandard in ARMeilleure project * Remove usage of Mono.Posix.NETStandard in Ryujinx.Memory project * Address gdkchan's comments	2021-12-08 18:24:26 -03:00
Piyachet Kanda	3e2f89b4fd	Implement UHADD8 instruction (#2908 ) * Implement UHADD8 instruction along with a test unit * Update PTC revision number	2021-12-08 17:05:59 -03:00
Mary	f39fce8f54	misc: Migrate usage of RuntimeInformation to OperatingSystem (#2901 ) Very basic migration across the codebase.	2021-12-04 20:02:30 -03:00
Mary	57d3296ba4	infra: Migrate to .NET 6 (#2829 ) * infra: Migrate to .NET 6 * Rollback version naming change * Workaround .NET 6 ZipArchive API issues * ci: Switch to VS 2022 for AppVeyor CI is now ready for .NET 6 * Suppress WebClient warning in DoUpdateWithMultipleThreads * Attempt to workaround System.Drawing.Common changes on 6.0.0 * Change keyboard rendering from System.Drawing to ImageSharp * Make the software keyboard renderer multithreaded * Bump ImageSharp version to 1.0.4 to fix a bug in Image.Load * Add fallback fonts to the keyboard renderer * Fix warnings * Address caian's comment * Clean up linux workaround as it's uneeded now * Update readme Co-authored-by: Caian Benedicto <caianbene@gmail.com>	2021-11-28 21:24:17 +01:00
FICTURE7	fbf40424f4	Add an early `TailMerge` pass (#2721 ) * Add an early `TailMerge` pass Some translations can have a lot of guest calls and since for each guest call there is a call guard which may return. This can produce a lot of epilogue code for returns. This pass merges the epilogue into a single block. ``` Using filter 'hcq'. Using metric 'code size'. Total diff: -1648111 (-7.19 %) (bytes): Base: 22913847 Diff: 21265736 Improved: 4567, regressed: 14, unchanged: 144 ``` * Set PTC version * Address feedback * Handle `void` returning functions * Actually handle `void` returning functions * Fix `RegisterToLocal` logging	2021-10-18 19:51:22 -03:00
FICTURE7	69093cf2d6	Optimize LSRA (#2563 ) * Optimize `TryAllocateRegWithtoutSpill` a bit * Add a fast path for when all registers are live. * Do not query `GetOverlapPosition` if the register is already in use (i.e: free position is 0). * Do not allocate child split list if not parent * Turn `LiveRange` into a reference struct `LiveRange` is now a reference wrapping struct like `Operand` and `Operation`. It has also been changed into a singly linked-list. In micro-benchmarks traversing the linked-list was faster than binary search on `List<T>`. Even for quite large input sizes (e.g: 1,000,000), surprisingly. Could be because the code gen for traversing the linked-list is much much cleaner and there is no virtual dispatch happening when checking if intervals overlaps. * Turn `LiveInterval` into an iterator The LSRA allocates in forward order and never inspect previous `LiveInterval` once they are expired. Something similar can be done for the `LiveRange`s within the `LiveInterval`s themselves. The `LiveInterval` is turned into a iterator which expires `LiveRange` within it. The iterator is moved forward along with interval walking code, i.e: AllocateInterval(context, interval, cIndex). * Remove `LinearScanAllocator.Sources` Local methods are less susceptible to do allocations than lambdas. * Optimize `GetOverlapPosition(interval)` a bit Time complexity should be in O(n+m) instead of O(nm) now. * Optimize `NumberLocals` a bit Use the same idea as in `HybridAllocator` to store the visited state in the MSB of the Operand's value instead of using a `HashSet<T>`. * Optimize `InsertSplitCopies` a bit Avoid allocating a redundant `CopyResolver`. * Optimize `InsertSplitCopiesAtEdges` a bit Avoid redundant allocations of `CopyResolver`. * Use stack allocation for `freePositions` Avoid redundant computations. * Add `UseList` Replace `SortedIntegerList` with an even more specialized data structure. It allocates memory on the arena allocators and does not require copying use positions when splitting it. * Turn `LiveInterval` into a reference struct `LiveInterval` is now a reference wrapping struct like `Operand` and `Operation`. The rationale behind turning this in a reference wrapping struct is because a `LiveInterval` is associated with each local variable, and these intervals may themselves be split further. I've seen translations having up to 8000 local variables. To make the `LiveInterval` unmanaged, a new data structure called `LiveIntervalList` was added to store child splits. This differs from `SortedList<,>` because it can contain intervals with the same start position. Really wished we got some more of C++ template in C#. :^( * Optimize `GetChildSplit` a bit No need to inspect the remaining ranges if we've reached a range which starts after position, since the split list is ordered. * Optimize `CopyResolver` a bit Lazily allocate the fill, spill and parallel copy structures since most of the time only one of them is needed. * Optimize `BitMap.Enumerator` a bit Marking `MoveNext` as `AggressiveInlining` allows RyuJIT to promote the `Enumerator` struct into registers completely, reducing load/store code a lot since it does not have to store the struct on the stack for ABI purposes. * Use stack allocation for `use/blockedPositions` * Optimize `AllocateWithSpill` a bit * Address feedback * Make `LiveInterval.AddRange(,)` more conservative Produces no diff against master, but just for good measure.	2021-10-08 18:15:44 -03:00
FICTURE7	ecc64c934d	Add `Operand.Label` support to `Assembler` (#2680 ) * Add `Operand.Label` support to `Assembler` This adds label support to `Assembler` and enables branch tightening when compiling with relocatables. Jump management and patching has been moved to the `Assembler`. * Move instruction table to `Assembler.Table` * Set PTC internal version * Rename `Assembler.Table` to `AssemblerTable`	2021-10-05 14:04:55 -03:00
riperiperi	d92fff541b	Replace CacheResourceWrite with more general "precise" write (#2684 ) * Replace CacheResourceWrite with more general "precise" write The goal of CacheResourceWrite was to notify GPU resources when they were modified directly, by looking up the modified address/size in a structure and calling a method on each resource. The downside of this is that each resource cache has to be queried individually, they all have to implement their own way to do this, and it can only signal to resources using the same PhysicalMemory instance. This PR adds the ability to signal a write as "precise" on the tracking, which signals a special handler (if present) which can be used to avoid unnecessary flush actions, or maybe even more. For buffers, precise writes specifically do not flush, and instead punch a hole in the modified range list to indicate that the data on GPU has been replaced. The downside is that precise actions must ignore the page protection bits and always signal - as they need to notify the target resource to ignore the sequence number optimization. I had to reintroduce the sequence number increment after I2M, as removing it was causing issues in rabbids kingdom battle. However - all resources modified by I2M are notified directly to lower their sequence number, so the problem is likely that another unrelated resource is not being properly updated. Thankfully, doing this does not affect performance in the games I tested. This should fix regressions from #2624. Test any games that were broken by that. (RF4, rabbids kingdom battle) I've also added a sequence number increment to ThreedClass.IncrementSyncpoint, as it seems to fix buffer corruption in OpenGL homebrew. (this was a regression from removing sequence number increment from constant buffer update - another unrelated resource thing) * Add tests. * Add XML docs for GpuRegionHandle * Skip UpdateProtection if only precise actions were called This allows precise actions to skip reprotection costs.	2021-09-29 02:27:03 +02:00

1 2 3 4 5

203 commits