Commit graph

61 commits

Author SHA1 Message Date
Ac_K
85faa9d8fa
Revert "Relax Vulkan requirements (#4228)" (#4279)
This reverts commit dca5b14493.
2023-01-13 06:04:59 +00:00
gdkchan
dca5b14493
Relax Vulkan requirements (#4228) 2023-01-13 06:09:48 +01:00
riperiperi
8fa248ceb4
Vulkan: Add workarounds for MoltenVK (#4202)
* Add MVK basics.

* Use appropriate output attribute types

* 4kb vertex alignment, bunch of fixes

* Add reduced shader precision mode for mvk.

* Disable ASTC on MVK for now

* Only request robustnes2 when it is available.

* It's just the one feature actually

* Add triangle fan conversion

* Allow NullDescriptor on MVK for some reason.

* Force safe blit on MoltenVK

* Use ASTC only when formats are all available.

* Disable multilevel 3d texture views

* Filter duplicate render targets (on backend)

* Add Automatic MoltenVK Configuration

* Do not create color attachment views with formats that are not RT compatible

* Make sure that the host format matches the vertex shader input types for invalid/unknown guest formats

* FIx rebase for Vertex Attrib State

* Fix 4b alignment for vertex

* Use asynchronous queue submits for MVK

* Ensure color clear shader has correct output type

* Update MoltenVK config

* Always use MoltenVK workarounds on MacOS

* Make MVK supersede all vendors

* Fix rebase

* Various fixes on rebase

* Get portability flags from extension

* Fix some minor rebasing issues

* Style change

* Use LibraryImport for MVKConfiguration

* Rename MoltenVK vendor to Apple

Intel and AMD GPUs on moltenvk report with the those vendors - only apple silicon reports with vendor 0x106B.

* Fix features2 rebase conflict

* Rename fragment output type

* Add missing check for fragment output types

Might have caused the crash in MK8

* Only do fragment output specialization on MoltenVK

* Avoid copy when passing capabilities

* Self feedback

* Address feedback

Co-authored-by: gdk <gab.dark.100@gmail.com>
Co-authored-by: nastys <nastys@users.noreply.github.com>
2023-01-13 01:31:21 +01:00
riperiperi
e20abbf9cc
Vulkan: Don't flush commands when creating most sync (#4087)
* Vulkan: Don't flush commands when creating most sync

When the WaitForIdle method is called, we create sync as some internal GPU method may read back written buffer data. Some games randomly intersperse compute dispatch into their render passes, which result in this happening an unbounded number of times depending on how many times they run compute.

Creating sync in Vulkan is expensive, as we need to flush the current command buffer so that it can be waited on. We have a limited number of active command buffers due to how we track resource usage, so submitting too many command buffers will force us to wait for them to return to the pool.

This PR allows less "important" sync (things which are less likely to be waited on) to wait on a command buffer's result without submitting it, instead relying on AutoFlush or another, more important sync to flush it later on.

Because of the possibility of us waiting for a command buffer that hasn't submitted yet, any thread needs to be able to force the active command buffer to submit. The ability to do this has been added to the backend multithreading via an "Interrupt", though it is not supported without multithreading.

OpenGL drivers should already be doing something similar so they don't blow up when creating lots of sync, which is why this hasn't been a problem for these games over there.

Improves Vulkan performance on Xenoblade DE, Pokemon Scarlet/Violet, and Zelda BOTW (still another large issue here)

* Add strict argument

This is technically a separate concern from whether the sync is a host syncpoint.

* Remove _interrupted variable

* Actually wait for the invoke

This is required by AMD GPUs, and also may have caused some issues on other GPUs.

* Remove unused using.

* I don't know why it added these ones.

* Address Feedback

* Fix typo
2022-12-29 15:39:04 +01:00
riperiperi
470be03c2f
GPU: Add fallback when 16-bit formats are not supported (#4108)
* Add conversion for 16 bit RGBA formats (not supported in Rosetta)

* Rebase fix

Rebase fix

* Forgot to remove this

* Fix RGBA16 format conversion

* Add RGBA4 -> RGBA8 conversion

* Handle host stride alignment

* Address Feedback Part 1

* Can't count

* Don't zero out rgb when alpha is 0

* Separate RGBA4 and 5-bit component formats

Not sure of a better way to name them...

* Add A1B5G5R5 conversion

* Put this in the right place.

* Make format naming consistent for capabilities

* Change method names
2022-12-26 15:50:27 -03:00
Hunter
c963b3c804
Added Generic Math to BitUtils (#3929)
* Generic Math Update

Updated Several functions in Ryujinx.Common/Utilities/BitUtils to use generic math

* Updated BitUtil calls

* Removed Whitespace

* Switched decrement

* Fixed changed method calls.

The method calls were originally changed on accident due to me relying too much on intellisense doing stuff for me

* Update Ryujinx.Common/Utilities/BitUtils.cs

Co-authored-by: gdkchan <gab.dark.100@gmail.com>

Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2022-12-26 14:11:05 +00:00
gdkchan
f906eb06c2
Implement a software ETC2 texture decoder (#4121)
* Implement a software ETC2 texture decoder

* Fix output size calculation for non-2D textures

* Address PR feedback
2022-12-21 20:39:58 -03:00
Georg Lehmann
0f50de72be
Vulkan: enable VK_EXT_custom_border_color features (#4116)
* Vulkan: enable VK_EXT_custom_border_color features

radv only create the border color bo if this feature is enabled, so it crashed when creating samplers with custom border colors
Fixes #4072
Fixes #3993

* Address gdkchan's comment

Co-authored-by: Mary <mary@mary.zone>
2022-12-14 20:53:33 -03:00
Andrey Sukharev
535fbec675
Use NuGet Central Package Management to manage package versions solution-wise (#4095) 2022-12-12 16:03:10 +01:00
Isaac Marovitz
851d81d24a
Fix Redundant Qualifer Warnings (#4091)
* Fix Redundant Qualifer Warnings

* Remove unnecessary using
2022-12-10 21:21:13 +01:00
riperiperi
e211c3f00a
UI: Add Metal surface creation for MoltenVK (#3980)
* Initial implementation of metal surface across UIs

* Fix SDL2 on windows

* Update Ryujinx/Ryujinx.csproj

Co-authored-by: Mary-nyan <thog@protonmail.com>

* Address Feedback

Co-authored-by: Mary-nyan <thog@protonmail.com>
2022-12-06 19:00:25 -03:00
Andrey Sukharev
4da44e09cb
Make structs readonly when applicable (#4002)
* Make all structs readonly when applicable. It should reduce amount of needless defensive copies

* Make structs with trivial boilerplate equality code record structs

* Remove unnecessary readonly modifiers from TextureCreateInfo

* Make BitMap structs readonly too
2022-12-05 14:47:39 +01:00
Mary-nyan
ae13f0ab4d
misc: Fix obsolete warnings in Ryujinx.Graphics.Vulkan (#4020)
Was caused by some merges after the Silk.NET update
2022-12-05 12:57:11 +00:00
gdkchan
17a1cab5d2
Allow SNorm buffer texture formats on Vulkan (#3957)
* Allow SNorm buffer texture formats on Vulkan

* Shader cache version bump
2022-12-04 15:36:03 -03:00
gdkchan
73aed239c3
Implement non-MS to MS copies with draws (#3958)
* Implement non-MS to MS copies with draws, simplify MS to non-MS copies and supports any host sample count

* Remove unused program
2022-12-04 15:07:11 -03:00
Andrey Sukharev
3868a00206
Use source generated regular expressions (#4005) 2022-12-04 00:43:23 +00:00
Mary-nyan
ce92e8cd04
chore: Update Silk.NET to 2.16.0 (#3953) 2022-12-01 19:11:56 +01:00
riperiperi
458452279c
GPU: Track buffer migrations and flush source on incomplete copy (#3952)
* Track buffer migrations and flush source on incomplete copy

Makes sure that the modified range list is always from the latest iteration of the buffer, and flushes earlier iterations of a buffer if the data has not been migrated yet.

* Cleanup 1

* Reduce cost for redundant signal checks on Vulkan

* Only inherit the range list if there are pending ranges.

* Fix OpenGL

* Address Feedback

* Whoops
2022-12-01 16:30:13 +01:00
gdkchan
4905101df1
Remove shader dependency on SPV_KHR_shader_ballot and SPV_KHR_subgroup_vote extensions (#3943)
* Remove shader dependency on SPV_KHR_shader_ballot and SPV_KHR_subgroup_vote extensions

* Shader cache version bump
2022-11-30 18:24:15 -03:00
Mary-nyan
d41c95dcff
chore: Update OpenTK to 4.7.5 (#3944) 2022-11-29 13:32:40 +00:00
riperiperi
1fc0f569de
GPU: Always draw polygon topology as triangle fan (#3932)
Polygon topology wasn't really supported and would only work on OpenGL on drivers that haven't removed it. As an alternative, this PR makes all cases of polygon topology use triangle fan. The topology type and transform feedback type have not been changed, as I don't think geo shader/tfb should be used with polygons.

The OpenGL spec states:
Only convex polygons are guaranteed to be drawn correctly by the GL.

For convex polygons, triangle fan is equivalent to polygon. I imagine this is probably how it works on device, as this get-out-of-jail-free card is too enticing to pass up.

This fixes the stat display in Pokemon S/V.
2022-11-28 19:18:22 -03:00
Ac_K
a1ddaa2736
ui: Fixes disposing on GTK/Avalonia and Firmware Messages on Avalonia (#3885)
* ui: Only wait on _exitEvent when MainLoop is active under GTK

This fixes a dispose issue under Horizon/GTK, we don't check if the ApplicationClient is null so it throw NCE. We don't check if the main loop is active and waiting an event which is set in the main loop... So that could lead to a freeze.

Everything works fine in GTK now.

Related issue: https://github.com/Ryujinx/Ryujinx/issues/3873

As a side note, same kind of issue appear in Avalonia UI too. Firmware's popup doesn't show anything and the emulator just freeze.

* TSRBerry's change

Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com>

* Fix Avalonia crashing/freezing

* Add Avalonia OpenGL fixes

* Fix firmware popup on windows

* Fixes everything

* Add _initialized bool to VulkanRenderer and OpenGL Window

Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com>
2022-11-24 15:08:27 +01:00
riperiperi
ece36b274d
GAL: Send all buffer assignments at once rather than individually (#3881)
* GAL: Send all buffer assignments at once rather than individually

The `(int first, BufferRange[] ranges)` method call has very significant performance implications when the bindings are spread out, which they generally always are in Vulkan. This change makes it so that these methods are only called a maximum of one time per draw.

Significantly improves GPU thread performance in Pokemon Scarlet/Violet.

* Address Feedback

Removed SetUniformBuffers(int first, ReadOnlySpan<BufferRange> buffers)
2022-11-24 07:50:59 +00:00
gdkchan
2e43d01d36
Move gl_Layer from vertex to geometry if GPU does not support it on vertex (#3866)
* Move gl_Layer from vertex to geometry if GPU does not support it on vertex

* Shader cache version bump

* PR feedback
2022-11-18 23:27:54 -03:00
riperiperi
7373ec5792
Vulkan: Clear dummy texture to (0,0,0,0) on creation (#3867)
This might fix an issue with AMD gpus on linux where the data could contain random garbage data. On the switch, it always samples as 0.
2022-11-18 23:11:34 -03:00
riperiperi
131baebe2a
Vulkan: Don't create preload command buffer outside a render pass (#3864)
* Vulkan: Don't create preload buffer outside a render pass

The preload command buffer is used to avoid render pass splits and barriers when updating buffer data. However, when a render pass is not active (for example, at the start of a pass, or during compute invocations) buffer uploads can be performed at any time, so the optimization isn't as useful.

This PR makes it so that the preload command buffer is only used for buffer updates outside of a render pass. It's still used for textures as I don't want to shake things up right now regarding how the preload buffer is obtained before some other changes, and texture updates are a lot rarer anyways.

Improves performance slightly in Pokemon Scarlet/Violet (43 -> 48), as it was switching to compute, writing a bunch of buffers inline, then dispatching, then flushing commands... It uses 1 command buffer instead of 2 every time it does this now. Maybe it would be nice to find a faster way to sync without creating so many command buffers in a short period of time.

* Address feedback
2022-11-18 14:58:56 +00:00
Wunk
d536cc8ae6
Update units of memory from decimal to binary prefixes (#3716)
`MB` and `GB` can either be interpreted as having base-10 units, or
base-2. `MiB` and `GiB` removes this discrepancy so that units of memory
are always interpreted using base-2 units.
2022-11-16 23:27:42 +01:00
gdkchan
f1d1670b0b
Implement HLE macro for DrawElementsIndirect (#3748)
* Implement HLE macro for DrawElementsIndirect

* Shader cache version bump

* Use GL_ARB_shader_draw_parameters extension on OpenGL

* Fix DrawIndexedIndirectCount on Vulkan when extension is not supported

* Implement DrawIndex

* Alignment

* Fix some validation errors

* Rename BaseIds to DrawParameters

* Fix incorrect index buffer and vertex buffer size in some cases

* Add HLE macros for DrawArraysInstanced and DrawElementsInstanced

* Perform a regular draw when indirect data is not modified

* Use non-indirect draw methods if indirect buffer was not GPU modified

* Only check if draw parameters match if the shader actually uses them

* Expose Macro HLE setting on GUI

* Reset FirstVertex and FirstInstance after draw

* Update shader cache version again since some people already tested this

* PR feedback

Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-11-16 14:53:04 -03:00
gdkchan
a6a67a2b7a
Minor improvement to Vulkan pipeline state and bindings management (#3829)
* Minor improvement to Vulkan pipeline state and bindings management

* Clean up buffer textures too

* Use glBindTextureUnit
2022-11-10 13:38:38 -03:00
Mary-nyan
c6d05301aa
infra: Migrate to .NET 7 (#3795)
* Update readme to mention .NET 7

* infra: Migrate to .NET 7

.NET 7 is still in preview but this prepare for the release coming up
next month.

* Use Random.Shared in CreateRandom

* Move UInt128Utils.cs to Ryujinx.Common project

* Fix inverted parameters in System.UInt128 constructor

* Fix Visual Studio complains on  Ryujinx.Graphics.Vic

* time: Fix missing alignment enforcement in SystemClockContext

Fixes at least Smash

* time: Fix missing alignment enforcement in SteadyClockContext

Fix games (like recent version of Smash) using time shared memory

* Switch to .NET 7.0.100 release

* Enable Tiered PGO

* Ensure CreateId validity requirements are meet when doing random generation

Also enforce correct packing layout for other Mii structures.

This fix a Mario Kart 8 crashes related to the default Miis.
2022-11-09 20:22:43 +01:00
gdkchan
f82309fa2d
Vulkan: Implement multisample <-> non-multisample copies and depth-stencil resolve (#3723)
* Vulkan: Implement multisample <-> non-multisample copies and depth-stencil resolve

* FramebufferParams is no longer required there

* Implement Specialization Constants and merge CopyMS Shaders (#15)

* Vulkan: Initial Specialization Constants

* Replace with specialized helper shader

* Reimplement everything

Fix nonexistant interaction with Ryu pipeline caching
Decouple specialization info from data and relocate them
Generalize mapping and add type enum to better match spv types
Use local fixed scopes instead of global unmanaged allocs

* Fix misses in initial implementation

Use correct info variable in Create2DLayerView
Add ShaderStorageImageMultisample to required feature set

* Use texture for source image

* No point in using ReadOnlyMemory

* Apply formatting feedback

Co-authored-by: gdkchan <gab.dark.100@gmail.com>

* Apply formatting suggestions on shader source

Co-authored-by: gdkchan <gab.dark.100@gmail.com>

Co-authored-by: gdkchan <gab.dark.100@gmail.com>

* Support conversion with samples count that does not match the requested count, other minor changes

Co-authored-by: mageven <62494521+mageven@users.noreply.github.com>
2022-11-02 18:17:19 -03:00
Wunk
3fe3598d41
Vulkan: Replace VK_EXT_debug_report usage with VK_EXT_debug_utils (#3802)
* Vulkan: Replace `VK_EXT_debug_report` usage with `VK_EXT_debug_utils`

[VK_EXT_debug_report](https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_debug_report.html)
has been depreciated for quite some time now in favor of the much more
featureful
[VK_EXT_debug_utils](https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_debug_utils.html)
extension.

This PR converts our debug-report-callback into the newer
debug-messenger pattern.

`VK_EXT_debug_utils` adds some additional diagnostic tooling for marking
debug-label scopes for queue-operations, command-buffers, and assigning
name-labels to vulkan objects to aid in debugging(for a later PR).

* Vulkan: Fix `DebugMessenger` severity-flag classification

Extension bits between the two flags, for reference:

https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkDebugUtilsMessageSeverityFlagBitsEXT.html

https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkDebugReportFlagBitsEXT.html
2022-10-29 14:09:25 -03:00
gdkchan
28ba55598d
Vulkan: Fix indirect buffer barrier (#3798) 2022-10-26 14:53:11 -03:00
riperiperi
9719b6a112
Vulkan: Use dynamic state for blend constants (#3793) 2022-10-25 23:49:23 +00:00
riperiperi
6e92b7a378
Dispose Vulkan TextureStorage when views hit 0 instead of immediately (#3738)
Due to the `using` statement being scoped to the `CreateTextureView` method, `TextureStorage` would be disposed as soon as the view was returned.

This was largely fine as the TextureStorage resources were being kept alive by the views holding their own references to them, but it also meant that dispose is only called as soon as the texture is created.

Aliased Storages are TextureStorages created with the same allocation as another TextureStorage, if they have to be aliased as another format. We keep track of a TextureStorage's `_aliasedStorages` as they are created, and dispose them when the TextureStorage is disposed...

...except it is disposed immediately, before any aliased storages are even created. The aliased storages added after this will never be disposed.

This PR attempts to fix this by disposing TextureStorage when its view count reaches 0. The other use of texture storage - the D32S8 blit - still manually disposes the storage, but regular uses created via the GAL are now disposed by the view count.

I think this makes the most sense, as otherwise in the future this behaviour might be forgotton and more things could be added to the Dispose() method that don't work due to it not actually calling at the right time.

This should improve memory leaks in Super Mario Odyssey, most noticeable when resolution scaling. The memory usage of the game is still wildly unpredictable due to how it interacts with the texture cache, but now it shouldn't get considerably longer as you play... I hope. I've seen it typically recover back to the same level occasionally, though it can spike significantly.

Please test a bunch of games on multiple GPUs to make sure this doesn't break anything.
2022-10-18 23:52:08 +00:00
gdkchan
a6cd044f0f
Vulkan: Fix blit levels/layers parameters being inverted (#3768) 2022-10-18 10:13:44 +02:00
riperiperi
0dbe45ae37
Fix various issues caused by Vertex/Index buffer conversions (#3762)
* Fix various issues caused by #3679

- The arguments for the 0th dummy vertex buffer were incorrect - it was given an offset of 16 rather than a size of 16.
- The wrong size was used when doing `autoBuffer.Get` on a converted vertex buffer.
- The possibility of a vertex buffer being disposed and then rebound can rebindings to find a different buffer where the current range is out of bounds. Avoid binding when out of range to prevent validation errors.
- The above also affects generation of converted buffers, which was a bit more fatal. Conversion functions now attempt to bound input offset/size.

* Fix offset for converted buffer
2022-10-16 19:38:58 -03:00
riperiperi
2b50e52e48
Fix primitive count calculation for topology conversion (#3763)
Luigi's Mansion 3 performs a non-index quads draw with 6 vertices. It's meant to ignore the last two, but the index pattern's primitive count calculation was rounding up.

No idea why the game does this but this should fix random triangles in the map.
2022-10-16 19:25:40 -03:00
gdkchan
5af1327068
Vulkan: Fix sampler custom border color (#3751) 2022-10-10 08:35:44 +02:00
riperiperi
bf77d1cab9
GPU: Pass SpanOrArray for Texture SetData to avoid copy (#3745)
* GPU: Pass SpanOrArray for Texture SetData to avoid copy

Texture data is often converted before upload, meaning that an array was allocated to perform the conversion into. However, the backend SetData methods were being passed a Span of that data, and the Multithreaded layer does `ToArray()` on it so that it can be stored for later! This method can't extract the original array, so it creates a copy.

This PR changes the type passed for textures to a new ref struct called SpanOrArray, which is backed by either a ReadOnlySpan or an array. The benefit here is that we can have a ToArray method that doesn't copy if it is originally backed by an array.

This will also avoid a copy when running the ASTC decoder.

On NieR this was taking 38% of texture upload time, which it does a _lot_ of when you move between areas, so there should be a 1.6x performance boost when strictly uploading textures. No doubt this will also improve texture streaming performance in UE4 games, and maybe a small reduction with video playback.

From the numbers, it's probably possible to improve the upload rate by a further 1.6x by performing layout conversion on GPU. I'm not sure if we could improve it further than that - multithreading conversion on CPU would probably result in memory bottleneck.

This doesn't extend to buffers, since we don't convert their data on the GPU emulator side.

* Remove implicit cast to array.
2022-10-08 12:04:47 -03:00
riperiperi
1ca0517c99
Vulkan: Fix some issues with CacheByRange (#3743)
* Fix some issues with CacheByRange

- Cache now clears under more circumstances, the most important being the fast path write.
- Cache supports partial clear which should help when more buffers join.
- Fixed an issue with I8->I16 conversion where it wouldn't register the buffer for use on dispose.

Should hopefully fix issues with https://github.com/Ryujinx/Ryujinx-Games-List/issues/4010 and maybe others.

* Fix collection modified exception

* Fix accidental use of parameterless constructor

* Replay DynamicState when restoring from helper shader
2022-10-08 11:28:27 -03:00
gdkchan
a4fc9f8050
Support use of buffer ranges with size 0 (#3736) 2022-10-03 20:08:38 -03:00
gdkchan
5437d6cb13
Vulkan: Fix buffer texture storage not being updated on buffer handle reuse (#3731) 2022-10-03 19:45:33 -03:00
mageven
96bf7f8522
Avoid allocating unmanaged string per shader (#3730)
* Avoid reallocating same unmanaged string per shader

* Address PR feedback

* Rename to _disposed
2022-10-02 10:59:34 +02:00
riperiperi
f502cfaf62
Vulkan: Zero blend state when disabled or write mask is 0 (#3719)
* Zero blend state when disabled or write mask is 0

Any difference in the blend state when blend is disabled is meaningless, but Ryujinx would compare different disabled blends and compile them as separate pipelines. This change ensures that all pipelines where blend state is meaningless record it as such, which avoids compiling a bunch of pipelines that are essentially identical.

The NVIDIA driver is pretty forgiving when it comes to silly pipeline misses like this, but other drivers don't offer the same level of kindness.

This should reduce stuttering on those drivers, and might improve overall performance very slightly due to less pipeline variants being in the hash table.

* Fix blend possibly being wrong when an attachment is unmasked
2022-09-29 12:32:49 -03:00
riperiperi
4c0eb91d7e
Convert Quads to Triangles in Vulkan (#3715)
* Add Index Buffer conversion for quads to Vulkan

Also adds a reusable repeating pattern index buffer to use for non-indexed
draws, and generalizes the conversion cache for buffers.

* Fix some issues

* End render pass before conversion

* Resume transform feedback after we ensure we're in a pass.

* Always generate UInt32 type indices for topology conversion

* No it's not.

* Remove unused code

* Rely on TopologyRemap to convert quads to tris.

* Remove double newline

* Ensure render pass ends before stride or I8 conversion
2022-09-20 18:38:48 -03:00
Emmanuel Hansen
6f0395538b
Avalonia - Use embedded window for avalonia (#3674)
* wip

* use embedded window

* fix race condition on opengl Windows

* fix glx issues on prime nvidia

* fix mouse support win32

* clean up

* addressed review

* addressed review

* fix warnings

* fix sotware keyboard dialog

* Update Ryujinx.Ava/Ui/Applet/SwkbdAppletDialog.axaml.cs

Co-authored-by: gdkchan <gab.dark.100@gmail.com>

* remove double semi

Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2022-09-19 15:05:26 -03:00
riperiperi
c3c41fa4bb
Periodically Flush Commands for Vulkan (#3689)
* Periodically Flush Commands for Vulkan

NVIDIA's OpenGL driver has a built-in mechanism to automatically flush commands to GPU when a lot have been queued. It's also pretty inconsistent, but we'll ignore that for now.

Our Vulkan implementation only submits a command buffer (flush equivalent) when it needs to. This is typically when another command buffer needs to be sequenced after it, presenting a frame, or an edge case where we flush around GPU queries to get results sooner.

This difference in flush behaviour causes a notable difference between Vulkan and OpenGL when we have to wait for commands. In the worst case, we will wait for a sync point that has just been created. In Vulkan, this sync point is created by flushing the command buffer, and storing a waitable fence that signals its completion. Our command buffer contains _every command that we queued since the last submit_, which could be an entire frame's worth of draws.

This has a huge effect on CPU <-> GPU latency. The more commands in a command buffer, the longer we have to wait for it to complete, which results in wasted time. Because we don't know when the guest will force us to wait, we always want the smallest possible latency.

By periodically flushing, we ensure that each command buffer takes a more consistent, smaller amount of time to execute, and that the back of the GPU queue isn't as far away when we need to wait for something to happen. This also might reduce time that the GPU is left inactive while commands are being built.

The main affected game is Pokemon Sword, which got significantly faster in overworld areas due to reduced waiting time when it flushes a shadow map from the main GPU thread.

Another affected game is BOTW, which gets faster depending on the area. This game flushes textures/buffers from its game thread, which is the bottleneck.

Flush latency and throughput may be improved on other games that are inexplicably slower than OpenGL. It's possible that certain games could have their performance _decreased_ slightly due to flushes not being free, but it is unlikely.

Also, flushing to get query results sooner has been tweaked to improve the number of full draw skips that can be done. (tested in SMO)

* Remove unused variable

* Fix possible issue with early query flush
2022-09-14 13:48:31 -03:00
gdkchan
2492e7e808
Fix R4G4B4A4 format on Vulkan (#3696) 2022-09-13 07:59:38 +02:00
gdkchan
619ac86bd0
Do not output ViewportIndex on SPIR-V if GPU does not support it (#3644)
* Do not output ViewportIndex on SPIR-V if GPU does not support it

* Bump shader cache version
2022-09-10 13:20:23 +00:00