Our Vulkan backend inserts image barriers when a texture is sampled after it is rendered. This is done via a "modification flag" which is set when a render target is unbound (presuming that a texture has finished drawing to it).
Imagine the following scenario:
- Game sets render target to texture A
- Game renders to texture A
- (render pass ends)
- Game binds texture A to a sampler
- Game sets render target to texture B
- Renders to texture B using texture A (barrier required)
Because of the previous behaviour, the check to add a barrier for sampling a texture actually happens before it is registered as modified, meaning no barrier was added at all. This isn't always the case, but it was definitely causing issues in Xenoblade 2.
This doesn't fix any more complicated issues where a texture is repeatedly sampled while it is currently being rendered.
Fixes visual glitches at lower resolutions in Xenoblade 2. May fix other cases.
* Add hide-cursor command line argument
* gtk: Adjust SettingsWindow for hide cursor options
* ava: Adjust SettingsWindow for hide cursor options
* ava: Add override check for HideCursor arg
* Remove copy&paste sins
* ava: Leave a little more room between the options
* gtk: Fix hide cursor issues
* ava: Only hide cursor if it's within the embedded window
* GPU: Keep sampled textures without any pool references alive
Occasionally games are very wasteful and clear/write to a texture without ever sampling it. As rendered textures in NVN games seem to all have overlapping memory ranges, the texture will eventually get overwritten.
Normally, this would trigger a removal from the auto delete cache, but a pool entry would keep the texture alive. However, with these textures that are never used, they will get deleted immediately and recreated on the next frame.
This change makes it so the ShortTextureCache can keep textures that have naver had a pool reference alive for a few frames, so they're not constantly being created and deleted.
This improves performance in Zelda BOTW a little.
* Cleanup
* WIP texture pre-flush
Improve performance of TextureView GetData to buffer
Fix copy/sync ordering
Fix minor bug
Make this actually work
WIP host mapping stuff
* Fix usage flags
* message
* Cleanup 1
* Fix rebase
* Fix
* Improve pre-flush rules
* Fix pre-flush
* A lot of cleanup
* Use the host memory bits
* Select the correct memory type
* Cleanup TextureGroupHandle
* Missing comment
* Remove debugging logs
* Revert BufferHandle _value access modifier
* One interrupt action at a time.
* Support D32S8 to D24S8 conversion, safeguards
* Interrupt cannot happen in sync handle's lock
Waitable needs to be checked twice now, but this should stop it from deadlocking.
* Remove unused using
* Address some feedback
* Address feedback
* Address more feedback
* Address more feedback
* Improve sync rules
Should allow for faster sync in some cases.
* GPU: Fix errors handling texture remapping
- Fixes an error where a pool entry and memory mapping changing at the same time could cause a texture to rebind its data from the wrong GPU VA (data swaps)
- Fixes an error where the texture pool could act on a mapping change before the mapping has actually been changed ("Unmapped" event happens before change, we need to signal it changed _after_ it completes)
TODO: remove textures from partially mapped list... if they aren't.
* Add Remap actions for handling post-mapping behaviours
* Remove unused code.
* Address feedback
* Nit
* Refactor attribute handling on the shader generator
* Implement gl_ViewportMask[]
* Add back the Intel FrontFacing bug workaround
* Fix GLSL transform feedback outputs mistmatch with fragment stage
* Shader cache version bump
* Fix geometry shader recognition
* PR feedback
* Delete GetOperandDef and GetOperandUse
* Remove replacements that are no longer needed on GLSL compilation on Vulkan
* Fix incorrect load for per-patch outputs
* Fix build
* Use vector transform feedback outputs with fragment shaders
* Shader cache version bump
* Fix missing outputs when vector transform feedback outputs are used
* use ArrayPool, avoid 6000-7000 allocs/sec of runtime
* use ArrayPool, avoid ~7k allocs/second during game execution
* use ArrayPool, avoid ~3000 allocs/sec during game execution
* use MemoryPool, reduce 0.5 MB/sec of new allocations during game execution
* avoid over-allocation by setting List<> Capacity when known
* remove LINQ in KTimeManager.UnscheduleFutureInvocation
* KTimeManager - avoid spinning one more time when the time has arrived
* KTimeManager - let SpinWait decide when to Thread.Yield(), and don't SpinOnce() immediately after Thread.Yield()
* use MemoryPool, reduce ~175k bytes/sec allocation during game execution
* IpcService - call commands via dynamic methods instead of reflection .Invoke(). Faster to call and with fewer allocations because parameters can be passed directly instead of as an array
* Make ButtonMappingEntry a record struct to avoid allocations. Set the List<ButtonMappingEntry> capacity according to use.
* add MemoryBuffer type for working with MemoryPool<byte>
* update changes to use MemoryBuffer
* make parameter ReadOnlySpan instead of Span
* whitespace fix
* Revert "IpcService - call commands via dynamic methods instead of reflection .Invoke(). Faster to call and with fewer allocations because parameters can be passed directly instead of as an array"
This reverts commit f2c698bdf65f049e8481c9f2ec7138d9b9a8261d.
* tweak KTimeManager spin behavior
* replace MemoryBuffer with ByteMemoryPool modeled after System.Buffers.ArrayMemoryPool<T>
* make ByteMemoryPoolBuffer responsible for renting memory
* ava: Remove unused doWhileDeferred parameters
* ava: Minimally improve swkbd dialog
It's currently impossible to get the dialog to redirect focus to the InputBox.
* ava: Fix nca extraction dialog never closing
Also contains some minor cleanup
* Added HiddenFileTypes to config state, and check to file enumeration
* Added hiddenfiletypes checkboxes to the UI
* Added Ava version of HiddenFileTypes
* Inverted Hide to Show with file types, minor formatting
* all variables with a reference to 'hidden' is now 'shown'
* one more variable name changed
* review feedback
* added FileTypes extension methof to get the correlating config value
* moved extension method to new folder and file in Ryujinx.Ui.Common
* added default case for ToggleFileType
* changed exception type to OutOfRangeException
* Added check for eventual symlink when displaying game files.
* Moved symlink check logic
* Moved symlink check logic
* Fixed prev commit
---------
Co-authored-by: Daniel Shala <danielshala00@gmail.com>
* hle: Deal with empty titleNames in some languages
* gui: Fix displaying the wrong title name
* Remove unnecessary bounds check
* Fix a NRE when getting the version string
* Restore empty string logic
* Flush in the middle of long command buffers.
* Vulkan: add situational "Fast Flush" mode
The AutoFlushCounter class was added to periodically flush Vulkan command buffers throughout a frame, which reduces latency to the GPU as commands are submitted and processed much sooner. This was done by allowing command buffers to flush when framebuffer attachments changed.
However, some games have incredibly long render passes with a large number of draws, and really aggressive data access that forces GPU sync.
The Vulkan backend could potentially end up building a single command buffer for 4-5ms if a pass has enough draws, such as in BOTW. In the scenario where sync is waited on immediately after submission, this would have to wait for the completion of a much longer command buffer than usual.
The solution is to force command buffer submission periodically in a "fast flush" mode. This will end up splitting render passes, but it will only enable if sync is aggressive enough.
This should improve performance in GPU limited scenarios, or in games that aggressively wait on synchronization. In some games, it may only kick in when res scaling. It won't trigger in games like SMO where sync is not an issue.
Improves performance in Pokemon Scarlet/Violet (res scaled) and BOTW (in general).
* Add conversions in milliseconds next to flush timers.
* ARMeilleure: Move TPIDR_EL0 and TPIDRRO_EL0 to NativeContext
Some games access these system registers several tens of thousands of times in a second from many different threads. While this isn't really crippling, it is a lot of wasted time spent in a reverse pinvoke transition.
Example games are Pokemon Scarlet/Violet and BOTW. These games have a lot of different potential bottlenecks so it's unlikely you will see a consistent improvement, but it definitely disappears from the cpu profile.
* Remove unreachable code.
* Add ulong conversion for offsets
* Nit
This seems to have been removed by the Post-Processing PR, but it is required for the display in OBS to be the right way up and properly scaled.
I've tested this with AA and FSR on MK8D and it seems to behave properly. Testing is welcome.
* ARMeilleure: Respect Fz flag for all floating point operations.
This is a change in strategy for emulating the Fz FPCR flag. Before, it was set before instructions that "needed it" and reset after. However, this missed a few hot instructions like the multiplication instruction, and the entirety of A32.
The new strategy is to set the Fz flag only in the following circumstances:
- Set to match FPCR before translated functions/loop are executed.
- Reset when calling SoftFloat methods, set when returning.
- Reset when exiting execution.
This allows us to remove the code around the existing Fz aware instructions, and get the accuracy benefits on all floating point instructions executed while in translated code.
Single step executions now need to be called with a context wrapper - right now it just contains the Fz flag initialization, and won't actually do anything on ARM.
This fixes a bug in Breath of the Wild where some physics interactions could randomly crash the game due to subnormal values not flushing to zero.
This is draft right now because I need to answer the questions:
- Does dotnet avoid changing the value of Mxcsr?
- Is it a good idea to assume that? Or should the flag set/restore be done on every managed method call, not just softfloat?
- If we assume that, do we want a unit test to verify the behaviour?
I recommend testing a bunch of games, especially games affected when this was originally added, such as #1611.
* Remove unused method
* Use FMA for Fmadd, Fmsub, Fnmadd, Fnmsub, Fmla, Fmls
...when available.
Similar implementation to A32
* Use FMA for Frecps, Frsqrts
* Don't set DAZ.
* Add round mode to ARM FP mode
* Fix mistakes
* Add test for FP state when calling managed methods
* Add explanatory comment to test.
* Cleanup
* Add A64 FPCR flags
* Vrintx_S A32 fast path on A64 backend
* Address feedback 1, re-enable DAZ
* Fix FMA instructions By Elem
* Address feedback