Just some simple changes to the buffer conversion shaders. (stride conversion, D32S8 to D24S8)
The first change is using a device local buffer for converted vertex buffers, since they're only read/written on the GPU. These paths don't trigger on NVIDIA, but if you force them to use it demonstrates the full extent writing to host owned memory from compute absolutely destroys them. AMD GPUs are less heavily affected by this issue, but since the game in question was writing 230MB from compute, I imagine it should have some effect.
The second change is allowing the buffer conversion shaders to scale their work group count. While dividing the work between 32 invocations works OK for M1 macs, it's not so great for anything with more cores like AMD GPUs, which should be able to do a lot more parallel copies. Now, it scales by roughly 100 elements per invocation.
Some stride change cases could be improved further by either limiting vertex buffer size somehow (reading the index buffer could help, but is always risky) or only updating regions that changed, rather than invalidating the whole thing.
* Implement vertex and geometry shader conversion to compute
* Call InitializeReservedCounts for compute too
* PR feedback
* Set clip distance mask for geometry and tessellation shaders too
* Transform feedback emulation only for vertex
#5576 changed where the position was declared, but forgot to add the Invariant declaration to position when the ReducedPrecision flag was enabled. This was causing weird graphical bugs in a bunch of games, mostly to do with mismatching depth between multiple draws of the same geometry.
Maybe the attempt to add it to Position in DeclareInputOrOutput can be removed now, assuming that path is never used.
* mm: Migrate service in Horizon project
This PR migrate the `mm:u` service to the Horizon project, things were checked by some RE aswell, that's why some vars are renamed, the logic should be the same as before.
Tests are welcome.
* Lock _sessionList instead
* Fix comment
* Fix Session fields order
* Move shuffle handling out of the backend to a transform pass
* Handle subgroup sizes higher than 32
* Stop using the subgroup size control extension
* Make GenerateShuffleFunction static
* Shader cache version bump
* Vulkan: Periodically free regions of the staging buffer
There was an edge case where a game could submit tens of thousands of small copies over the course of over half a minute to unique fences. This could result in a large stutter when the staging buffer became full and it tried to check and free thousands of completed fences.
This became visible with some games and mirrors on Windows, as they don't submit any buffer data via the staging buffer, but may submit copies of the support buffer.
This change makes the Vulkan backend check for staging buffer completion on each command buffer submit, so it can't get backed up with 1000s of copies to check.
* Add comment
* pr_triage: Fix invalid workflow
* Don't assign reviewers to draft PRs
* Add team review request for developers team
* Introduce Mako to make team reviewers work
* Initial implementation of buffer mirrors
Generally slower right now, goal is to reduce render passes in games that do inline updates
Fix support buffer mirrors
Reintroduce vertex buffer mirror
Add storage buffer support
Optimisation part 1
More optimisation
Avoid useless data copies.
Remove unused cbIndex stuff
Properly set write flag for storage buffers.
Fix minor issues
Not sure why this was here.
Fix BufferRangeList
Fix some big issues
Align storage buffers rather than getting full buffer as a range
Improves mirrorability of read-only storage buffers
Increase staging buffer size, as it now contains mirrors
Fix some issues with buffers not updating
Fix buffer SetDataUnchecked offset for one of the paths when using mirrors
Fix buffer mirrors interaction with buffer textures
Fix mirror rebinding
Move GetBuffer calls on indirect draws before BeginRenderPass to avoid draws without render pass
Fix mirrors rebase
Fix rebase 2023
* Fix crash when using stale vertex buffer
Similar to `Get` with a size that's too large, just treat it as a clamp.
* Explicitly set support buffer as mirrorable
* Address feedback
* Remove unused fragment of MVK workaround
* Replace logging for staging buffer OOM
* Address format issues
* Address more format issues
* Mini cleanup
* Address more things
* Rename BufferRangeList
* Support bounding range for ClearMirrors and UploadPendingData
* Add maximum size for vertex buffer mirrors
* Enable index buffer mirrors
Enabled on all platforms for the IbStreamer.
* Feedback
* Remove mystery BufferCache change
Probably macos related?
* Fix mirrors not creating when staging buffer is empty.
* Change log level to debug
This branch changes the buffer copy fast path to notify memory tracking for all resources that aren't buffers. This fixes cases where games would copy buffer data directly into texture memory, which before would only work if the texture did not already exist. I imagine this happens when the guest driver is moving data between allocations or uploading it.
Since this only affects the fast path, cases where the source data has been modified from GPU (fast path copy destination doesn't count) will still fail to notify the texture, though I don't imagine games will do this. This should be resolved in future.
This should fix some texture issues with guest OpenGL games on switch, such as Dragon Quest Builders.
This may also be useful in future for games that move shader data around memory, if we end up using memory tracking for those.
* Move some properties out of ShaderConfig
* Stop using ShaderConfig on backends
* Replace ShaderConfig usages on Translator and passes
* Move remaining properties out of ShaderConfig and delete ShaderConfig
* Remove ResourceManager property from TranslatorContext
* Move Rewriter passes to separate transform pass files
* Fix TransformPasses.RunPass on cases where a node is removed
* Move remaining ClipDistancePrimitivesWritten and UsedFeatures updates to decode stage
* Reduce excessive parameter passing a bit by using structs more
* Remove binding parameter from ShaderProperties methods since it is redundant
* Replace decoder instruction checks with switch statement
* Put GLSL on the same plan as SPIR-V for input/output declaration
* Stop mutating TranslatorContext state when Translate is called
* Pass most of the graphics state using a struct instead of individual query methods
* Auto-format
* Auto-format
* Add backend logging interface
* Auto-format
* Remove unnecessary use of interpolated strings
* Remove more modifications of AttributeUsage after decode
* PR feedback
* gl_Layer is not supported on compute
ForceDpiAware.Windows has a side effect of forcing the application DPI to be the same as the primary monitor. This isn't good if you have multiple monitors with different DPI.
On Avalonia, I don't think there are any downsides to disabling this. When it's disabled, `ForceDpiAware.GetWindowScaleFactor` always returns 1.
* It builds
(Doesn’t run waiting on FluentAvalonia Preview 5 Release)
* Enable CompiledBindings by default
* Ignore `PointerPressedEventArgs` Init warning
* Define MIME and UTI Types
* Update `UserProfileImageSelectorView` to StorageProvider API
* PFS0 Magic
* Update `MainWindowViewModel` to StorageProvider API
* Update `SettingsUIView` to StorageProvider API
* Update `ApplicationHelper` to StorageProvider API
* Use `IsCheckChanged`
* Rename events
* Update Fluent Avalonia to Preivew 5
* More package updates
* Fix long selection bar
* return glyph value directly, instead of using a binding
* fix menu item checkboxes
* Fix build
* Update to Preview 6
Unicorn conflict
Fix remaining package oopsie
* Fix issues from merge
* Fix some warnings
* Warnings
* Squashed commit of the following:
commit 79d1c190db
Author: Mary <mary@mary.zone>
Date: Sun Apr 16 11:38:07 2023 +0200
chore: Update Silk.NET to 2.17.1 (#4686)
commit 2bc88467eb
Author: Ac_K <Acoustik666@gmail.com>
Date: Sun Apr 16 09:37:31 2023 +0000
Update README.md
commit baf8752e74
Author: Vincenzo Nizza <vincenzonizzaufficio@gmail.com>
Date: Sun Apr 16 11:19:33 2023 +0200
Ensure the updater doesn't delete hidden or system files (#4626)
* Copy desktop.ini to update directory if it exists in HomeDir
* EnumerateFilesToDelete() exclude files with "Hidden" and "System" attributes
commit d5e4378aea
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun Apr 16 09:02:06 2023 +0000
nuget: bump DynamicData from 7.13.1 to 7.13.5 (#4654)
Bumps [DynamicData](https://github.com/reactiveui/DynamicData) from 7.13.1 to 7.13.5.
- [Release notes](https://github.com/reactiveui/DynamicData/releases)
- [Changelog](https://github.com/reactivemarbles/DynamicData/blob/main/ReleaseNotes.md)
- [Commits](https://github.com/reactiveui/DynamicData/compare/7.13.1...7.13.5)
---
updated-dependencies:
- dependency-name: DynamicData
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
commit 6dbcdfea47
Author: TSRBerry <20988865+TSRBerry@users.noreply.github.com>
Date: Sun Apr 16 09:09:02 2023 +0200
Ava: Fix nca extraction window never closing & minor cleanup (#4569)
* ava: Remove unused doWhileDeferred parameters
* ava: Minimally improve swkbd dialog
It's currently impossible to get the dialog to redirect focus to the InputBox.
* ava: Fix nca extraction dialog never closing
Also contains some minor cleanup
commit c5258cf082
Author: NitroTears <73270647+NitroTears@users.noreply.github.com>
Date: Sun Apr 16 11:03:35 2023 +1000
Ability to hide file types in Game List (#4555)
* Added HiddenFileTypes to config state, and check to file enumeration
* Added hiddenfiletypes checkboxes to the UI
* Added Ava version of HiddenFileTypes
* Inverted Hide to Show with file types, minor formatting
* all variables with a reference to 'hidden' is now 'shown'
* one more variable name changed
* review feedback
* added FileTypes extension methof to get the correlating config value
* moved extension method to new folder and file in Ryujinx.Ui.Common
* added default case for ToggleFileType
* changed exception type to OutOfRangeException
commit 5c89e22bb9
Author: Daniel Shala <daniel.shala08@gmail.com>
Date: Sat Apr 15 18:11:24 2023 +0200
Added check for eventual symlink when displaying game files. (#4526)
* Added check for eventual symlink when displaying game files.
* Moved symlink check logic
* Moved symlink check logic
* Fixed prev commit
---------
Co-authored-by: Daniel Shala <danielshala00@gmail.com>
commit 11ecff2ff0
Author: Alex Barney <thealexbarney@gmail.com>
Date: Fri Apr 14 16:00:34 2023 -0700
Rename Hipc to Cmif where appropriate (#3880)
commit 4c3f09644a
Author: MutantAura <44103205+MutantAura@users.noreply.github.com>
Date: Wed Apr 12 20:18:40 2023 +0100
Move swkbd message null check into constructor (#4671)
commit e187a8870a
Author: TSRBerry <20988865+TSRBerry@users.noreply.github.com>
Date: Wed Apr 12 03:09:47 2023 +0200
HLE: Deal with empty title names properly (#4643)
* hle: Deal with empty titleNames in some languages
* gui: Fix displaying the wrong title name
* Remove unnecessary bounds check
* Fix a NRE when getting the version string
* Restore empty string logic
commit a64fee29dc
Author: riperiperi <rhy3756547@hotmail.com>
Date: Tue Apr 11 08:23:41 2023 +0100
Vulkan: add situational "Fast Flush" mode (#4667)
* Flush in the middle of long command buffers.
* Vulkan: add situational "Fast Flush" mode
The AutoFlushCounter class was added to periodically flush Vulkan command buffers throughout a frame, which reduces latency to the GPU as commands are submitted and processed much sooner. This was done by allowing command buffers to flush when framebuffer attachments changed.
However, some games have incredibly long render passes with a large number of draws, and really aggressive data access that forces GPU sync.
The Vulkan backend could potentially end up building a single command buffer for 4-5ms if a pass has enough draws, such as in BOTW. In the scenario where sync is waited on immediately after submission, this would have to wait for the completion of a much longer command buffer than usual.
The solution is to force command buffer submission periodically in a "fast flush" mode. This will end up splitting render passes, but it will only enable if sync is aggressive enough.
This should improve performance in GPU limited scenarios, or in games that aggressively wait on synchronization. In some games, it may only kick in when res scaling. It won't trigger in games like SMO where sync is not an issue.
Improves performance in Pokemon Scarlet/Violet (res scaled) and BOTW (in general).
* Add conversions in milliseconds next to flush timers.
commit 9ef94c8292
Author: riperiperi <rhy3756547@hotmail.com>
Date: Tue Apr 11 07:55:04 2023 +0100
ARMeilleure: Move TPIDR_EL0 and TPIDRRO_EL0 to NativeContext (#4661)
* ARMeilleure: Move TPIDR_EL0 and TPIDRRO_EL0 to NativeContext
Some games access these system registers several tens of thousands of times in a second from many different threads. While this isn't really crippling, it is a lot of wasted time spent in a reverse pinvoke transition.
Example games are Pokemon Scarlet/Violet and BOTW. These games have a lot of different potential bottlenecks so it's unlikely you will see a consistent improvement, but it definitely disappears from the cpu profile.
* Remove unreachable code.
* Add ulong conversion for offsets
* Nit
commit 915d6d044c
Author: riperiperi <rhy3756547@hotmail.com>
Date: Tue Apr 11 07:32:31 2023 +0100
OpenGL: Fix OBS/Overlays again by binding FB before present (#4668)
This seems to have been removed by the Post-Processing PR, but it is required for the display in OBS to be the right way up and properly scaled.
I've tested this with AA and FSR on MK8D and it seems to behave properly. Testing is welcome.
commit a4780ab33b
Author: MutantAura <44103205+MutantAura@users.noreply.github.com>
Date: Mon Apr 10 23:04:31 2023 +0100
Force activate parent window before dialog is shown (#4663)
* Fix build
Extraction dialogue not working
* Avalonia Preview 7
Needs Fluent Avalonia update still…
* Fix Render Scaling
* Update Fluent Avalonia
* Remove `pfs0` as runnable file type
* Restore Info.plist formatting
* Plist Format
* Update Avalonia.Svg.Skia
* Update theme code (TODO)
* swtich to using theme variants for light dark
* Fix crashes
* Text centering issues
* Update `TitleUpdateViewModel` to StorageProvider API
* Fixed for new PR
(Will crash on launch)
* Fixes…
* UI: Fix sections extraction (#4820)
* UI: Fix sections extraction
There is currently an issue when the update NCA doesn't contains the section we want to extract, this is fixed by adding a check.
I have fixed the inverted handler of ExeFs/Logo introduced in #4755.
Fixes#4521
* Addresses feedback
* Fix issues…
* Preview 8
* Fix fuck ups
* Fixes
* More cleanup
* Ava 11 RC
Maybe there is a god
* Update FluentAvalonia
* update svg
* Second RC (kill me)
* It builds
* Ava 11
* Remove unnecessary usings
* Fix build
* Formatting
* GAS GAS GAS!!!!
* Fix DLC Window Crash
* Linux runner try not to crash challenge (impossible)
* Add app.manifest
* Fix accidental Silk.NET.Vulkan bump
* Try fix truncation
* Linux fix popup Windows
* Fix cutoff text on windows
* Status bar styling fixes
* Volume Toggle Split Button Fixes
* Fix load bar color
* Fix shortcuts
* Best we're gonna get
* Fix spacing
Co-authored-by: Exhigh <exhigh01@gmail.com>
* Formatting
* Fix Profile Dropdown
* Fix Window Startup Position
* Format Fixes
* Fix stupid mistake
* Fix accidental change
* Scaling Handler (peri pls make sure is working)
* Remove Locale Reflection Binding Use + Unsued Usings
* Fix formatting
Code styling
Ughhhh
Fix interface
Make TimeZoneConverter internal
* Remove bell workaround (no longer needed)
* Disable accent menu
* Update to Ava 11.0.2
* Peri suggestions
* Formatting
* Cleanup a bunch of jank
* Dependency update
* Berry fixes and suggestions
* Final suggestions
* Rename assemblyIdentity to Ryujinx.Emulator.Avalonia
---------
Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com>
Co-authored-by: Ac_K <Acoustik666@gmail.com>
Co-authored-by: Exhigh <exhigh01@gmail.com>
Co-authored-by: TSR Berry <20988865+TSRBerry@users.noreply.github.com>
* GPU: Don't sync/bind index buffer when it's not in use
Sometimes draws don't use an index buffer. It's not necessary to check or upload data for the current index buffer binding as it won't be used.
This fixes Pokemon: Legends Arceus updating a stale index buffer for every draw during its TFB pass, which was all non-indexed draws.
This probably didn't cost much on normal PCs, but it had a large impact on MacOS, which the macos1 release build avoided by mirroring index buffers (the PR currently does not). Needs buffer mirrors still for the rest of the performance.
There are additional cases where index buffers are bound or checked with non-indexed draws on the backend, but this one was straightforward to fix and has the largest impact. Testing is welcome to ensure nothing weird broke.
* Fix case with _rebind
* checks: Add retry logic to dotnet format style step as well
I can't imagine dotnet format whitespace ever segfaulting,
so hopefully it won't be needed there.
* checks: Replace bash scripts with unstable-commands action
* build: Add unstable-commands action for test step
* Fix incorrect fragment origin when YNegate is enabled
* Shader cache version bump
* Do not update support buffer if shader does not read gl_FragCoord
* Pass unscaled viewport size to the support buffer