* Add Texture Size Capacity and 8GB Dram Build
* Update AutoDeleteCache.cs
* Dynamic Texture Cache (WIP)
* Change to float Multiplier, in-case it needs fine-tuning.
* Delete src/src.sln
* Update AutoDeleteCache.cs
* Format
* Fix Formatting
* Add DefaultTextureSizeCapacity and MemoryScaleFactor
- Also remove redundant New Lines
* Fix 4GB dram crashing
* Format newline
* Refractor
- Added Initialize() function to TextureCache and AutoDeleteCache
- Removed GetMaxTextureCapacity() function and instead added _maxCacheMemoryUsage
- Added private const MaxTextureSizeCapacity to AutoDelete Cache
- Added TextureCache.Initialize() to MemoryManager in order to fetch MaxGpuMemory at the right time.
- Moved and Changed Logger.Info for Gpu Memory to Logger.Notice and Moved it to PrintGpuInformation function.
- Opted to use a ternary operator for the Initialize function, I think it looks cleaner than bunch of if statements.
* Update src/Ryujinx.Graphics.Gpu/Image/AutoDeleteCache.cs
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
* maxMemory to CacheMemory, use Clamp instead of Ternary. Changed MinTextureCapacity 1GiB to 512 MiB
* Update src/Ryujinx.Graphics.Gpu/Image/AutoDeleteCache.cs
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
* Format comment
* comment context
* Increase TextureSize capacity for OpenGL back to 1024
- Added a new const ulong for OpenGLTextureSizeCapacity
* Fix changes from last commit.
* Adjust last OpenGL changes.
* Remove garbage VSC file
* Update src/Ryujinx.Graphics.Gpu/Image/AutoDeleteCache.cs
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
* Update src/Ryujinx.Graphics.Gpu/Image/AutoDeleteCache.cs
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
* Update src/Ryujinx.Graphics.Gpu/Image/AutoDeleteCache.cs
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
---------
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
* Vulkan: Feedback loop improvements
This PR allows the Vulkan backend to detect attachment feedback loops. These are currently used in the following ways:
- Partial use of VK_EXT_attachment_feedback_loop_layout
- All renderable textures have AttachmentFeedbackLoopBitExt
- Compile pipelines with Color/DepthStencil feedback loop flags when present
- Support using FragmentBarrier for feedback loops (fixes regressions from https://github.com/Ryujinx/Ryujinx/pull/7012 )
TODO:
- AMD GPUs may need layout transitions for it to properly allow textures to be used in feedback loops.
- Use dynamic state for feedback loops. The background pipeline will always miss since feedback loop state isn't known on the GPU project.
- How is the barrier dependency flag used? (DXVK just ignores it, there's no vulkan validation...)
- Improve subpass dependencies to fix validation errors
* Mark field readonly
* Add feedback loop dynamic state
* fix: add MoltenVK resolver workaround
fix: add MoltenVK resolver workaround
* Formatting
* Fix more complaints
* RADV dcc workaround
* Use dynamic state properly, cleanup.
* Use aspects flags in more places
* More guarantees for buffer correct placement, defer guest requested buffers
* Split RP on indirect barrier rn
* Better handling for feedback loops.
* Qualcomm barriers suck too
* Fix condition
* Remove unused field
* Allow render pass barriers on turnip for now
* Report base and extra sets from the backend
* Pass texture set index everywhere
* Key textures using set and binding (rather than just binding)
* Start using extra sets for array textures
* Shader cache version bump
* Separate new commands, some PR feedback
* Introduce new manual descriptor set reservation method that prevents it from being used by something else while owned by an array
* Move bind extra sets logic to new method
* Should only use separate array is MaximumExtraSets is not zero
* Format whitespace
* GPU: Migrate buffers on GPU project, pre-emptively flush device local mappings
Essentially retreading #4540, but it's on the GPU project now instead of the backend. This allows us to have a lot more control + knowledge of where the buffer backing has been changed and allows us to pre-emptively flush pages to host memory for quicker readback. It will allow us to do other stuff in the future, but we'll get there when we get there.
Performance greatly improved in Hyrule Warriors: Age of Calamity. Performance notably improved in TOTK (average). Performance for BOTW restored to how it was before #4911, perhaps a bit better.
- Rewrites a bunch of buffer migration stuff. Might want to tighten up how dispose stuff works.
- Fixed an issue where the copy for texture pre-flush would happen _after_ the syncpoint.
TODO: remove a page from pre-flush if it isn't flushed after a certain number of copies.
* Add copy deactivation
* Fix dependent virtual buffers
* Remove logging
* Fix format issues (maybe)
* Vulkan: Remove backing swap
* Add explicit memory access types for most buffers
* Fix typo
* Add device local force expiry, change buffer inheritance behaviour
* General cleanup, OGL fix
* BufferPreFlush comments
* BufferBackingState comments
* Add an extra precaution to BufferMigration
This is very unlikely, but it's important to cover loose ends like this.
* Address some feedback
* Docs
* Add support for bindless textures from shader input (vertex buffer)
* Shader cache version bump
* Format whitespace
* Remove cache entries on pool removal, disable for OpenGL
* PR feedback
* Add support for large sampler arrays on Vulkan
* Shader cache version bump
* Format whitespace
* Move DescriptorSetManager to PipelineLayoutCacheEntry to allow different pool sizes per layout
* Handle array textures with different types on the same buffer
* Somewhat better caching system
* Avoid useless buffer data modification checks
* Move redundant bindings update checking to the backend
* Fix an issue where texture arrays would get the same bindings across stages on Vulkan
* Backport some fixes from part 2
* Fix typo
* PR feedback
* Format whitespace
* Add some missing XML docs
* Move some init logic out of PrintGpuInformation, then delete it
* Disable push descriptors for Intel ARC on Windows
* Re-add PrintGpuInformation just to show it in the log
* WIP barrier batch
* Add store op to image usage barrier
* Dispose the barrier batch
* Fix encoding?
* Handle read and write on the load op barrier.
Load op consumes read accesses but does not add one, as the only other operation that can read is another load.
* Simplify null check
* Insert barriers on program change in case stale bindings are reintroduced
* Not sure how I messed this one up
* Improve location of bindings barrier update
This is also important for emergency deferred clear
* Update src/Ryujinx.Graphics.Vulkan/BarrierBatch.cs
Co-authored-by: Mary Guillemard <thog@protonmail.com>
---------
Co-authored-by: Mary Guillemard <thog@protonmail.com>
* Fix Push Descriptors
* Use push descriptor templates
* Use reserved bindings
* Formatting
* Disable when using MVK
("my heart will go on" starts playing as thousands of mac users shed a tear in unison)
* Introduce limit on push descriptor binding number
The bitmask used for updating push descriptors is ulong, so only 64 bindings can be tracked for now.
* Address feedback
* Fix logic for binding rejection
Should only offset limit when reserved bindings are less than the requested one.
* Workaround pascal and older nv bug
* Add GPU number detection for nvidia
* Only do workaround if it's valid to do so.
* Replace vendor id lookup with driver name
* Create separate field for driver name, handle OpenGL
* Document changes in VulkanPhysicalDevice.cs
* Always display driver over vendor
* Replace Vulkan 1.2 requirement with VK_KHR_driver_properties
* Remove empty line
* Remove redundant unsafe block
* Apply suggestions from code review
---------
Co-authored-by: Ac_K <Acoustik666@gmail.com>
This prevents a small allocation each time this method is called. This is a top 3 SOH allocation during gameplay in most games, and eliminating it is pretty free.
* Pass MultiRange to BufferManager
* Implement support for multi-range buffers using Vulkan sparse mappings
* Use multi-range for remaining buffers, delete old methods
* Assume that more buffers are contiguous
* Dispose multi-range buffers after they are removed from the list
* Properly init BufferBounds for constant and storage buffers
* Do not try reading zero bytes data from an unmapped address on the shader cache + PR feedback
* Fix misaligned sparse buffer offsets
* Null check can be simplified
* PR feedback
* GPU: Add fallback when textureGatherOffsets is not supported.
This PR adds a fallback for GPUs or APIs that don't support an equivalent to the method `textureGatherOffsets`, where each of the 4 gathered texels has an individual offset. This is done by reusing the existing code to handle non-const offsets for texture instructions, though it has also been corrected as there were a few implementation issues.
MoltenVK reports support for this capability, and it didn't error when we initially released the MacOS build, but that has since changed. MVK still reports support, but spirv-cross has been fixed in a way that it _attempts_ to use this capability, but the metal compiler errors since it doesn't exist.
Some other fixes:
- textureGatherOffsets emulation has been changed significantly. It now uses 4 texture sample instructions (not gather), calculates a base texel (i=0 j=0) and adds the offsets onto it before converting into a tex coord. The final result is offset into a texel center, so it shouldn't be subject to interpolation, though this isn't perfect and could have some error with floating point formats with linear sampling. It is subject to texture wrap mode as it should be, which is why texelFetch was not used.
- Maybe gather should be used here with component `w` (i=0, j=0), though this multiplies number of texels fetched by 4... The way it was doing this before _was_ wrong_, but doing it right would avoid issues with texel center precision.
- textureGatherOffset (singular) now performs textureGather with the offset applied to the coords, rather than the slower fallback where each texel is fetched individually.
* Increment shader cache version, remove unused arg
* Use base texture size for gather coord offset.
Implicit LOD for gather is not supported.
* Use 4 texture gathers for offsets emulation
Avoids issues with interpolation at cost of performance
(not sure how bad this is)
* Address Feedback
* Reduce the amount of descriptor pool allocations on Vulkan
* Formatting
* Slice can be simplified
* Make GetDescriptorPoolSizes static
* Adjust CanFit calculation so that TryAllocateDescriptorSets never fails
* Remove unused field
* Replace image barriers inside render pass with more generic memory barrier
* Remove forceStorage since it was creating images with storage bit for formats that are not StorageImage compatible
* Add missing flags on subpass dependency
* Don't call vkCmdSetScissor with a scissor count of 0
* One semaphore per swapchain image
* Remove compute stage from read to write barriers
* Try to improve Pipeline.Barrier nonsense
* Set PipelineStateFlags based on supported stages
* Implement vertex and geometry shader conversion to compute
* Call InitializeReservedCounts for compute too
* PR feedback
* Set clip distance mask for geometry and tessellation shaders too
* Transform feedback emulation only for vertex
* Move shuffle handling out of the backend to a transform pass
* Handle subgroup sizes higher than 32
* Stop using the subgroup size control extension
* Make GenerateShuffleFunction static
* Shader cache version bump
* Vulkan: Periodically free regions of the staging buffer
There was an edge case where a game could submit tens of thousands of small copies over the course of over half a minute to unique fences. This could result in a large stutter when the staging buffer became full and it tried to check and free thousands of completed fences.
This became visible with some games and mirrors on Windows, as they don't submit any buffer data via the staging buffer, but may submit copies of the support buffer.
This change makes the Vulkan backend check for staging buffer completion on each command buffer submit, so it can't get backed up with 1000s of copies to check.
* Add comment
* Initial implementation of buffer mirrors
Generally slower right now, goal is to reduce render passes in games that do inline updates
Fix support buffer mirrors
Reintroduce vertex buffer mirror
Add storage buffer support
Optimisation part 1
More optimisation
Avoid useless data copies.
Remove unused cbIndex stuff
Properly set write flag for storage buffers.
Fix minor issues
Not sure why this was here.
Fix BufferRangeList
Fix some big issues
Align storage buffers rather than getting full buffer as a range
Improves mirrorability of read-only storage buffers
Increase staging buffer size, as it now contains mirrors
Fix some issues with buffers not updating
Fix buffer SetDataUnchecked offset for one of the paths when using mirrors
Fix buffer mirrors interaction with buffer textures
Fix mirror rebinding
Move GetBuffer calls on indirect draws before BeginRenderPass to avoid draws without render pass
Fix mirrors rebase
Fix rebase 2023
* Fix crash when using stale vertex buffer
Similar to `Get` with a size that's too large, just treat it as a clamp.
* Explicitly set support buffer as mirrorable
* Address feedback
* Remove unused fragment of MVK workaround
* Replace logging for staging buffer OOM
* Address format issues
* Address more format issues
* Mini cleanup
* Address more things
* Rename BufferRangeList
* Support bounding range for ClearMirrors and UploadPendingData
* Add maximum size for vertex buffer mirrors
* Enable index buffer mirrors
Enabled on all platforms for the IbStreamer.
* Feedback
* Remove mystery BufferCache change
Probably macos related?
* Fix mirrors not creating when staging buffer is empty.
* Change log level to debug
* Move support buffer update out of the backends
* Fix render scale init and remove redundant state from SupportBufferUpdater
* Stop passing texture scale to the backends
* XML docs for SupportBufferUpdater
* dotnet format style --severity info
Some changes were manually reverted.
* dotnet format analyzers --serverity info
Some changes have been minimally adapted.
* Restore a few unused methods and variables
* Silence dotnet format IDE0060 warnings
* Silence dotnet format IDE0059 warnings
* Address dotnet format CA1816 warnings
* Fix new dotnet-format issues after rebase
* Address most dotnet format whitespace warnings
* Apply dotnet format whitespace formatting
A few of them have been manually reverted and the corresponding warning was silenced
* Format if-blocks correctly
* Another rebase, another dotnet format run
* Run dotnet format whitespace after rebase
* Run dotnet format style after rebase
* Run dotnet format analyzers after rebase
* Run dotnet format style after rebase
* Run dotnet format after rebase and remove unused usings
- analyzers
- style
- whitespace
* Disable 'prefer switch expression' rule
* Add comments to disabled warnings
* Simplify properties and array initialization, Use const when possible, Remove trailing commas
* Run dotnet format after rebase
* Address IDE0251 warnings
* Address a few disabled IDE0060 warnings
* Silence IDE0060 in .editorconfig
* Revert "Simplify properties and array initialization, Use const when possible, Remove trailing commas"
This reverts commit 9462e4136c0a2100dc28b20cf9542e06790aa67e.
* dotnet format whitespace after rebase
* First dotnet format pass
* Fix naming rule violations
* Remove redundant code
* Rename generics
* Address review feedback
* Remove SetOrigin
* Implement transform feedback emulation for hardware without native support
* Stop doing some useless buffer updates and account for non-zero base instance
* Reduce redundant updates even more
* Update descriptor init logic to account for ResourceLayout
* Fix transform feedback and storage buffers not being updated in some cases
* Shader cache version bump
* PR feedback
* SetInstancedDrawVertexCount must be always called after UpdateState
* Minor typo
* Add support for VK_EXT_depth_clip_control.
* Code review feedback
Minor formatting
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
* Check .DepthClipControl to make sure the host actually supports the feature.
* Review feedback: remove Vulkan platform switch, relying on QueryHostSupportsDepthClipControl to drive the behaviour - OpenGL returns true, and any future platforms that don't support the [-1, 1] depth mode can return false for the transformation.
---------
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
* fix crash when Vulkan isn't available
* add VulkanRenderer.GetPhysicalDevices() overload that provides its own Vk API object and logs on failure
* adjustments per AcK77
* Introduce ResourceLayout
* Part 1: Use new ResourceSegments array on UpdateAndBind
* Part 2: Use ResourceLayout to build PipelineLayout
* Delete old code
* XML docs
* Fix shader cache load NRE
* Fix typo
* WIP texture pre-flush
Improve performance of TextureView GetData to buffer
Fix copy/sync ordering
Fix minor bug
Make this actually work
WIP host mapping stuff
* Fix usage flags
* message
* Cleanup 1
* Fix rebase
* Fix
* Improve pre-flush rules
* Fix pre-flush
* A lot of cleanup
* Use the host memory bits
* Select the correct memory type
* Cleanup TextureGroupHandle
* Missing comment
* Remove debugging logs
* Revert BufferHandle _value access modifier
* One interrupt action at a time.
* Support D32S8 to D24S8 conversion, safeguards
* Interrupt cannot happen in sync handle's lock
Waitable needs to be checked twice now, but this should stop it from deadlocking.
* Remove unused using
* Address some feedback
* Address feedback
* Address more feedback
* Address more feedback
* Improve sync rules
Should allow for faster sync in some cases.