Migrates a remaining common file over to the Common namespace, making it
consistent with the rest of common files.
This also allows for high-traffic FS related code to alias the
filesystem function namespace as
namespace FS = Common::FS;
for more concise typing.
* Switch game settings to use a pointer
In order to add full per-game settings, we need to be able to tell yuzu to switch
to using either the global or game configuration. Using a pointer makes it easier
to switch.
* configuration: add new UI without changing existing funcitonality
The new UI also adds General, System, Graphics, Advanced Graphics,
and Audio tabs, but as yet they do nothing. This commit keeps yuzu
to the same functionality as originally branched.
* configuration: Rename files
These weren't included in the last commit. Now they are.
* configuration: setup global configuration checkbox
Global config checkbox now enables/disables the appropriate tabs in the game
properties dialog. The use global configuration setting is now saved to the
config, defaulting to true. This also addresses some changes requested in the PR.
* configuration: swap to per-game config memory for properties dialog
Does not set memory going in-game. Swaps to game values when opening the
properties dialog, then swaps back when closing it. Uses a `memcpy` to swap.
Also implements saving config files, limited to certain groups of configurations
so as to not risk setting unsafe configurations.
* configuration: change config interfaces to use config-specific pointers
When a game is booted, we need to be able to open the configuration dialogs
without changing the settings pointer in the game's emualtion. A new pointer
specific to just the configuration dialogs can be used to separate changes
to just those config dialogs without affecting the emulation.
* configuration: boot a game using per-game settings
Swaps values where needed to boot a game.
* configuration: user correct config during emulation
Creates a new pointer specifically for modifying the configuration while
emulation is in progress. Both the regular configuration dialog and the game
properties dialog now use the pointer Settings::config_values to focus edits to
the correct struct.
* settings: split Settings::values into two different structs
By splitting the settings into two mutually exclusive structs, it becomes easier,
as a developer, to determine how to use the Settings structs after per-game
configurations is merged. Other benefits include only duplicating the required
settings in memory.
* settings: move use_docked_mode to Controls group
`use_docked_mode` is set in the input settings and cannot be accessed from the
system settings. Grouping it with system settings causes it to be saved with
per-game settings, which may make transferring configs more difficult later on,
especially since docked mode cannot be set from within the game properties
dialog.
* configuration: Fix the other yuzu executables and a regression
In main.cpp, we have to get the title ID before the ROM is loaded, else the
renderer will reflect only the global settings and now the user's game specific
settings.
* settings: use a template to duplicate memory for each setting
Replaces the type of each variable in the Settings::Values struct with a new
class that allows basic data reading and writing. The new struct
Settings::Setting duplicates the data in memory and can manage global overrides
per each setting.
* configuration: correct add-ons config and swap settings when apropriate
Any add-ons interaction happens directly through the global values struct.
Swapping bewteen structs now also includes copying the necessary global configs
that cannot be changed nor saved in per-game settings. General and System config
menus now update based on whether it is viewing the global or per-game settings.
* settings: restore old values struct
No longer needed with the Settings::Setting class template.
* configuration: implement hierarchical game properties dialog
This sets the apropriate global or local data in each setting.
* clang format
* clang format take 2
can the docker container save this?
* address comments and style issues
* config: read and write settings with global awareness
Adds new functions to read and write settings while keeping the global state in
focus. Files now generated per-game are much smaller since often they only need
address the global state.
* settings: restore global state when necessary
Upon closing a game or the game properties dialog, we need to restore all global
settings to the original global state so that we can properly open the
configuration dialog or boot a different game.
* configuration: guard setting values incorrectly
This disables setting values while a game is running if the setting is
overwritten by a per game setting.
* config: don't write local settings in the global config
Simple guards to prevent writing the wrong settings in the wrong files.
* configuration: add comments, assume less, and clang format
No longer assumes that a disabled UI element means the global state is turned
off, instead opting to directly answer that question. Still however assumes a
game is running if it is in that state.
* configuration: fix a logic error
Should not be negated
* restore settings' global state regardless of accept/cancel
Fixes loading a properties dialog and causing the global config dialog to show
local settings.
* fix more logic errors
Fixed the frame limit would set the global setting from the game properties
dialog. Also strengthened the Settings::Setting member variables and simplified
the logic in config reading (ReadSettingGlobal).
* fix another logic error
In my efforts to guard RestoreGlobalState, I accidentally negated the IsPowered
condition.
* configure_audio: set toggle_stretched_audio to tristate
* fixed custom rtc and rng seed overwriting the global value
* clang format
* rebased
* clang format take 4
* address my own review
Basically revert unintended changes
* settings: literal instead of casting
"No need to cast, use 1U instead"
Thanks, Morph!
Co-authored-by: Morph <39850852+Morph1984@users.noreply.github.com>
* Revert "settings: literal instead of casting
"
This reverts commit 95e992a87c898f3e882ffdb415bb0ef9f80f613f.
* main: fix status buttons reporting wrong settings after stop emulation
* settings: Log UseDockedMode in the Controls group
This should have happened when use_docked_mode was moved over to the controls group
internally. This just reflects this in the log.
* main: load settings if the file has a title id
In other words, don't exit if the loader has trouble getting a title id.
* use a zero
* settings: initalize resolution factor with constructor instead of casting
* Revert "settings: initalize resolution factor with constructor instead of casting"
This reverts commit 54c35ecb46a29953842614620f9b7de1aa9d5dc8.
* configure_graphics: guard device selector when Vulkan is global
Prevents the user from editing the device selector if Vulkan is the global
renderer backend. Also resets the vulkan_device variable when the users
switches back-and-forth between global and Vulkan.
* address reviewer concerns
Changes function variables to const wherever they don't need to be changed. Sets Settings::Setting to final as it should not be inherited from. Sets ConfigurationShared::use_global_text to static.
Co-Authored-By: VolcaEM <volcaem@users.noreply.github.com>
* main: load per-game settings after LoadROM
This prevents `Restart Emulation` from restoring the global settings *after* the per-game settings were applied. Thanks to BSoDGamingYT for finding this bug.
* Revert "main: load per-game settings after LoadROM"
This reverts commit 9d0d48c52d2dcf3bfb1806cc8fa7d5a271a8a804.
* main: only restore global settings when necessary
Loading the per-game settings cannot happen after the ROM is loaded, so we have to specify when to restore the global state. Again thanks to BSoD for finding the bug.
* configuration_shared: address reviewer concerns except operator overrides
Dropping operator override usage in next commit.
Co-Authored-By: LC <lioncash@users.noreply.github.com>
* settings: Drop operator overrides from Setting template
Requires using GetValue and SetValue explicitly. Also reverts a change that broke title ID formatting in the game properties dialog.
* complete rebase
* configuration_shared: translate "Use global configuration"
Uses ConfigurePerGame to do so, since its usage, at least as of now, corresponds with ConfigurationShared.
* configure_per_game: address reviewer concern
As far as I understand, it prevents the program from unnecessarily copying strings.
Co-Authored-By: LC <lioncash@users.noreply.github.com>
Co-authored-by: Morph <39850852+Morph1984@users.noreply.github.com>
Co-authored-by: VolcaEM <volcaem@users.noreply.github.com>
Co-authored-by: LC <lioncash@users.noreply.github.com>
This moves dynamic state present in VK_EXT_extended_dynamic_state to a
separate structure in FixedPipelineState. This is structure is at the
bottom allowing us to hash and memcmp only when the extension is not
supported.
After marking buffers as resident, Nvidia's driver seems to take a
slow path. To workaround this issue, copy to a STREAM_READ buffer and
then call GetNamedBufferSubData on it.
This is a temporary solution until we have asynchronous flushing.
Make stream buffer and cached buffers as resident and query their
address. This allows us to use GPU addresses for several proprietary
Nvidia extensions.
Update validation layer string to VK_LAYER_KHRONOS_validation.
While we are at it, properly check for available validation layers
before enabling them.
Check() can throw an exception if the Vulkan result isn't successful.
We remove the check so that std::terminate isn't outright called and
allows for better debugging (should it ever actually fail).
There's no need to load contents from the CPU when a clear resets all
the contents of the underlying memory. This is already implemented on
OpenGL and the texture cache.
maxwell_to_vk: Reorder filtering modes to start with None, then Nearest, then Linear.
maxwell_to_vk: Logs filter modes under UNREACHABLE_MSG instead of UNIMPLEMENTED_MSG, since any unknown filter modes are invalid and not unimplemented.
maxwell_to_vk: Return VK_SAMPLER_MIPMAP_MODE_NEAREST instead of VK_SAMPLER_MIPMAP_MODE_LINEAR when mipmap_filter is None with the description from the VkSamplerCreateInfo(3) man page.
Instead of using as template argument a shared pointer, use the
underlying type and manage shared pointers explicitly. This can make
removing shared pointers from the cache more easy.
While we are at it, make some misc style changes and general
improvements (like insert_or_assign instead of operator[] + operator=).
This allows rendering to 3D textures with more than one slice.
Applications are allowed to render to more than one slice of a texture
using gl_Layer from a VTG shader.
This also requires reworking how 3D texture collisions are handled, for
now, this commit allows rendering to slices but not to miplevels. When a
render target attempts to write to a mipmap, we fallback to the previous
implementation (copying or flushing as needed).
- Fixes color correction 3D textures on UE4 games (rainbow effects).
- Allows Xenoblade games to render to 3D textures directly.
Games using D3D idioms can join images and samplers when a shader
executes, instead of baking them into a combined sampler image. This is
also possible on Vulkan.
One approach to this solution would be to use separate samplers on
Vulkan and leave this unimplemented on OpenGL, but we can't do this
because there's no consistent way of determining which constant buffer
holds a sampler and which one an image. We could in theory find the
first bit and if it's in the TIC area, it's an image; but this falls
apart when an image or sampler handle use an index of zero.
The used approach is to track for a LOP.OR operation (this is done at an
IR level, not at an ISA level), track again the constant buffers used as
source and store this pair. Then, outside of shader execution, join
the sample and image pair with a bitwise or operation.
This approach won't work on games that truly use separate samplers in a
meaningful way. For example, pooling textures in a 2D array and
determining at runtime what sampler to use.
This invalidates OpenGL's disk shader cache :)
- Used mostly by D3D ports to Switch
Stop ignoring image swizzles on depth and stencil images.
This doesn't fix a known issue on Xenoblade Chronicles 2 where an OpenGL
texture changes swizzles twice before being used. A proper fix would be
having a small texture view cache for this like we do on Vulkan.
The check to flip faces when viewports are negative were a left over
from the old OpenGL code. This is not required on Vulkan where we have
negative viewports.
Hardware S2R special registers match gl_Thread*MaskNV. We can trivially
implement these using Nvidia's extension on OpenGL or naively stubbing
them with the ARB instructions to match. This might cause issues if the
host device warp size doesn't match Nvidia's. That said, this is
unlikely on proper shaders.
Refer to the attached url for more documentation about these flags.
https://www.khronos.org/registry/OpenGL/extensions/NV/NV_shader_thread_group.txt
Some operations like atomicMin were ignored because they returned were
being stored to RZ. This operations have a side effect and it was being
ignored.
Instead of using boost::icl::interval_map for caching, use
boost::intrusive::set. interval_map is intended as a container where the
keys can overlap with one another; we don't need this for caching
buffers and a std::set-like data structure that allows us to search with
lower_bound is enough.
Constant attributes (in OpenGL known disabled attributes) are not
supported on Vulkan, even with extensions. To emulate this behavior we
return zero on reads from disabled vertex attributes in shader code.
This has no caching cost because attribute formats are not dynamic state
on Vulkan and we have to store it in the pipeline cache anyway.
- Fixes Animal Crossing: New Horizons terrain borders
This was a left over from OpenGL when disabled buffers where not properly
emulated. We no longer have to assert this as it is checked in vertex
buffer initialization.
This should fix grass interactions on Breath of the Wild on Vulkan.
It is currently untested against validation layers.
Nvidia's Windows 443.09 beta driver or Linux 440.66.12 is required for
now.
Reduces some header churn and reduces rebuilds when some header
internals change.
While we're at it we can also resolve a missing include in buffer_cache.
Xenoblade 2 invokes a draw call with zero vertices.
This is likely due to indirect drawing (glDrawArraysIndirect).
This causes a crash in the staging buffer pool when trying to create a
buffer with a size of zero. To workaround this, skip index buffer setup
entirely when the number of indices is zero.
Drop MemoryBarrier from the buffer cache and use Maxwell3D's register
WaitForIdle.
To implement this on OpenGL we just call glMemoryBarrier with the
necessary bits.
Vulkan lacks this synchronization primitive, so we set an event and
immediately wait for it. This is not a pretty solution, but it's what
Vulkan can do without submitting the current command buffer to the queue
(which ends up being more expensive on the CPU).
Using deko3d as reference:
4e47ba0013/source/maxwell/gpu_3d_state.cpp (L42)
We were using bits 3 and 4 to determine depth clamping, but these are
the same both enabled and disabled:
state->depthClampEnable ? 0x101A : 0x181D
The same happens on Nvidia's OpenGL driver, where they do something like
this (default capabilities, GL 4.5 compatibility):
(state & DEPTH_CLAMP) != 0 ? 0x201a : 0x281c
There's always a difference between the first bits in this register, but
bit 11 is consistently disabled on both deko3d/NVN and OpenGL. This
commit changes yuzu's behaviour to use bit 11 to determine depth
clamping.
- Fixes depth issues on Super Mario Odyssey's intro.
This reverts commit 94b0e2e5da.
preserve_contents proved to be a meaningful optimization. This commit
reintroduces it but properly implemented on OpenGL.
We have to make sure the clear removes all the previous contents of the
image.
It's not currently implemented on Vulkan because we can do smart things
there that's preferred to be introduced in a separate commit.
Deduplicate code shared between vk_pipeline_cache and gl_shader_cache as
well as shader decoder code.
While we are at it, fix a bug in gl_shader_cache where compute shaders
had an start offset of a stage shader.
Sometimes for unknown reasons NVN games can bind a render target format
of 0. This may be a yuzu bug.
With the commits before this the formats were specified without being
"packed", assuming all formats and texceptions will be written like in
the color_attachments vector.
To address this issue, iterate all render targets and pack them as they
are valid. This way they will match color_attachments.
- Fixes validation errors and graphical issues on Breath of the Wild.
The intention behind this was to assign a float to from an uint32_t, but
it was unintentionally being copied directly into the std::optional.
Copy to a temporary and assign that temporary to std::optional. This can
be replaced with std::bit_cast<float> once we are in C++20.
All drivers (even Intel) seem to have a device local memory type that is
not host visible. Remove this flag so all devices follow the same path.
This fixes a crash when trying to map to host device local memory on
integrated devices.
Introduce a default buffer getter that lazily constructs an empty
buffer. This is intended to match OpenGL's buffer 0.
Use this for disabled vertex and uniform buffers.
While we are at it, include vertex buffer usages for staging buffers to
silence validation errors.
On NVN buffers can be enabled but have no size. According to deko3d and
the behavior we see in Animal Crossing: New Horizons these buffers get
the special address of 0x1000 and limit themselves to 0xfff.
Implement buffers without a size by binding a null buffer to OpenGL
without a side.
1d1930beea/source/maxwell/gpu_3d_vbo.cpp (L62-L63)
Render.Vulkan <Error> video_core/renderer_vulkan/renderer_vulkan.cpp:CreateInstance:131: Presentation not supported on this platform
Render.Vulkan <Error> video_core/renderer_vulkan/renderer_vulkan.cpp:CreateSurface:378: Presentation not supported on this platform
Core <Critical> core/core.cpp:Load:199: Failed to initialize system (Error 5)!
Sort discrete GPUs over the rest, Nvidia over AMD, AMD over Intel, Intel
over the rest. This gives us a somewhat consistent order when Optimus
is removed (renderdoc does this when it's attached).
This can break the configuration of users with an Intel GPU that
manually remove Optimus on yuzu. That said, it's a very unlikely to
happen.
Pad FixedPipelineState's size to 384 bytes to be a multiple of 16.
Compare the whole struct with std::memcmp and hash with CityHash. Using
CityHash instead of a naive hash should reduce the number of collisions.
Improve used type traits to ensure this operation is safe.
With these changes the improvements to the hashable pipeline state are:
Optimized structure
Hash: 89 ns
Comparison: 103 ns
Construction*: 164 ns
Struct size: 384 bytes
Original structure
Hash: 148 ns
Equal: 174 ns
Construction*: 281 ns
Size: 1384 bytes
* Attribute state initialization is not measured
These measures are averages taken with std::chrono::high_accuracy_clock
on MSVC shipped on Visual Studio 16.6.0 Preview 2.1.
Nvidia recently introduced a new memory type for data streaming
(awesome!), but yuzu was assuming that all heaps had enough memory
for the assumed stream buffer size (256 MiB).
This worked fine on AMD but Nvidia's new memory heap was smaller than
256 MiB. This commit changes this assumption and allocates a bit less
than the size of the preferred heap, with a maximum of 256 MiB (to avoid
allocating all system memory on integrated devices).
- Fixes a crash on NVIDIA 450.82.0.0
Implement indexed quads (GL_QUADS used with glDrawElements*) with a
compute pass conversion.
The compute shader converts from uint8/uint16/uint32 indices to uint32.
The format is passed through push constants to avoid having different
variants of the same shader.
- Used by Fast RMX
- Used by Xenoblade Chronicles 2 (it still has graphical due to
synchronization issues on Vulkan)
The original idea of returning pointers is that handles can be moved.
The problem is that the implementation didn't take that in mind and made
everything harder to work with. This commit drops pointer to handles and
returns the handles themselves. While it is still true that handles can
be invalidated, this way we get an old handle instead of a dangling
pointer.
This problem can be solved in the future with sparse buffers.
When the dynamic state is specified, pViewports and pScissors are
ignored, quoting the specification:
pViewports is a pointer to an array of VkViewport structures, defining
the viewport transforms. If the viewport state is dynamic, this member
is ignored.
That said, AMD's proprietary driver itself seem to read it regardless of
what the specification says.
Adds optional support for Nsight Aftermath. It is enabled through
ENABLE_NSIGHT_AFTERMATH in cmake. A path to the SDK has to be provided
by the environment variable NSIGHT_AFTERMATH_SDK.
Nsight Aftermath allows an application to generate "minidumps" of the
GPU state when a device loss happens. By analysing these on Nsight we
can know what a game was doing and why it triggered a device loss.
The dump is generated inside %APPDATA%\yuzu\log\gpucrash and this
directory is deleted every time a new instance is initialized with
Nsight enabled.
To enable it on yuzu there has a to be a driver and device capable of
running Nsight Aftermath on Vulkan. That means only Turing based GPUs
on the latest stable driver, beta drivers won't work for now.
It is manually enabled in Configuration>Debug>Enable Graphics Debugging
because when using all debugging capabilities there is a runtime cost.
preserve_contents was always true. We can't assume we don't have to
preserve clears because scissored and color masked clears exist.
This removes preserve_contents and assumes it as true at all times.
Implements a reduction operation. It's an atomic operation that doesn't
return a value.
This commit introduces another primitive because some shading languages
might have a primitive for reduction operations.
Credits go to gdkchan and Ryujinx. The pull request used for this can
be found here: https://github.com/Ryujinx/Ryujinx/pull/1082
yuzu was already using the header for interpolation, but it was missing
the FragCoord.w multiplication described in the linked pull request.
This commit finally removes the FragCoord.w == 1.0f hack from the shader
decompiler.
While we are at it, this commit renames some enumerations to match
Nvidia's documentation (linked below) and fixes component declaration
order in the shader program header (z and w were swapped).
https://github.com/NVIDIA/open-gpu-doc/blob/master/Shader-Program-Header/Shader-Program-Header.html
The intention behind a Vulkan wrapper is to drop Vulkan-Hpp.
The issues with Vulkan-Hpp are:
- Regular breaks of the API.
- Copy constructors that do the same as the aggregates (fixed recently)
- External dynamic dispatch that is hard to remove
- Alias KHR handles with non-KHR handles making it impossible to use
smart handles on Vulkan 1.0 instances with extensions that were included
on Vulkan 1.1.
- Dynamic dispatchers silently change size depending on preprocessor
definitions. Different files will have different dispatch definitions,
generating all kinds of hard to debug memory issues.
In other words, Vulkan-Hpp is not "production ready" for our needs and
this wrapper aims to replace it without losing RAII and exception
safety.
Changes the GraphicsContext to be managed by the GPU core. This
eliminates the need for the frontends to fool around with tricky
MakeCurrent/DoneCurrent calls that are dependent on the settings (such
as async gpu option).
This also refactors out the need to use QWidget::fromWindowContainer as
that caused issues with focus and input handling. Now we use a regular
QWidget and just access the native windowHandle() directly.
Another change is removing the debug tool setting in FrameMailbox.
Instead of trying to block the frontend until a new frame is ready, the
core will now take over presentation and draw directly to the window if
the renderer detects that its hooked by NSight or RenderDoc
Lastly, since it was in the way, I removed ScopeAcquireWindowContext and
replaced it with a simple subclass in GraphicsContext that achieves the
same result
It's possible that the window is resized from the moment we ask for its
size to the moment a swapchain is created, causing validation issues.
To workaround this Vulkan issue request the capabilities again just
before creating the swapchain, making the race condition less likely.
Implement accessing textures through an index. It uses the same
interface as OpenGL, the main difference is that Vulkan bindings are
forced to be arrayed (the binding index doesn't change for stacked
textures in SPIR-V).
Layered framebuffer attachments is a feature that allows applications to
write attach layered textures to a single attachment. What layer the
fragments are written to is decided from the shader using gl_Layer.
SPIR-V's Layer is GLSL's gl_Layer. It lets the application choose from a
shader stage (vertex, tessellation or geometry) which framebuffer layer
write the output fragments to.
Vulkan's VertexIndex and InstanceIndex don't match with hardware. This
is because Nvidia implements gl_VertexID and gl_InstanceID. The math
that relates these is:
gl_VertexIndex = gl_BaseVertex + gl_VertexID
gl_InstanceIndex = gl_InstanceIndex + gl_InstanceID
To emulate it using what Vulkan's SPIR-V offers (the *Index variants)
this commit substracts gl_Base* from gl_*Index to obtain the OpenGL and
hardware's equivalent.
ATOM operates atomically on global memory. For now only add ATOM.ADD
since that's what was found in commercial games.
This asserts for ATOM.ADD.S32 (handling the others as unimplemented),
although ATOM.ADD.U32 shouldn't be any different.
This change forces us to change the default type on SPIR-V storage
buffers from float to uint. We could also alias the buffers, but it's
simpler for now to just use uint. While we are at it, abstract the code
to avoid repetition.
Some games like The Legend of Zelda: Breath of the Wild assign
render targets without writing them from the fragment shader. This
generates Vulkan validation errors, so silence these I previously
introduced a commit to set "vec4(0, 0, 0, 1)" for these attachments. The
problem is that this is not what games expect. This commit reverts that
change.
Front face was being forced to a certain value when cull face is
disabled. Set a default value on initialization and drop the forcefully
set front facing value with culling disabled.
Nvidia's driver defaults invalid enumerations to GL_CLAMP. Vulkan
doesn't expose GL_CLAMP through its API, but we can hack it on Nvidia's
driver using the internal driver defaults.
This currently only supports quad arrays and u8 indices.
In the future we can remove quad arrays with a table written from the
CPU, but this was used to bootstrap the other passes helpers and it
was left in the code.
The blob code is generated from the "shaders/" directory. Read the
instructions there to know how to generate the SPIR-V.
This abstractio represents the state of the 3D engine at a given draw.
Instead of changing individual bits of the pipeline how it's done in
APIs like D3D11, OpenGL and NVN; on Vulkan we are forced to put
everything together into a single, immutable object.
It takes advantage of the few dynamic states Vulkan offers.
The update descriptor is used to store in flat memory a large chunk of
staging data used to update descriptor sets through templates. It
provides a push interface to easily insert descriptors following the
current pipeline. The order used in the descriptor update template has
to be implicitly followed. We can catch bugs here using validation
layers.
The stream buffer before this commit once it was full (no more bytes to
write before looping) waiting for all previous operations to finish.
This was a temporary solution and had a noticeable performance penalty
in performance (from what a profiler showed).
To avoid this mark with fences usages of the stream buffer and once it
loops wait for them to be signaled. On average this will never wait.
Each fence knows where its usage finishes, resulting in a non-paged
stream buffer.
On the other side, the buffer cache is reimplemented using the generic
buffer cache. It makes use of the staging buffer pool and the new
stream buffer.
* Allocate memory in discrete exponentially increasing chunks until the
128 MiB threshold. Allocations larger thant that increase linearly by
256 MiB (depending on the required size). This allows to use small
allocations for small resources.
* Move memory maps to a RAII abstraction. To optimize for debugging
tools (like RenderDoc) users will map/unmap on usage. If this ever
becomes a noticeable overhead (from my profiling it doesn't) we can
transparently move to persistent memory maps without harming the API,
getting optimal performance for both gameplay and debugging.
* Improve messages on exceptional situations.
* Fix typos "requeriments" -> "requirements".
* Small style changes.
Create a large descriptor pool where we allocate all our descriptors
from. It has to be wide enough to support any pipeline, hence its large
numbers.
If the descritor pool is filled, we allocate more memory at that moment.
This way we can take advantage of permissive drivers like Nvidia's that
allocate more descriptors than what the spec requires.
This commit introduces a mechanism by which shader IR code can be
amended and extended. This useful for track algorithms where certain
information can derived from before the track such as indexes to array
samplers.
The job of this abstraction is to provide staging buffers for temporary
operations. Think of image uploads or buffer uploads to device memory.
It automatically deletes unused buffers.
This object's job is to contain an image and manage its transitions.
Since Nvidia hardware doesn't know what a transition is but Vulkan
requires them anyway, we have to state track image subresources
individually.
To avoid the overhead of tracking each subresource in images with many
subresources (think of cubemap arrays with several mipmaps), this commit
tracks when subresources have diverged. As long as this doesn't happen
we can check the state of the first subresource (that will be shared
with all subresources) and update accordingly.
Image transitions are deferred to the scheduler command buffer.
The intention behind this hasheable structure is to describe the state
of fixed function pipeline state that gets compiled to a single graphics
pipeline state object. This is all dynamic state in OpenGL but Vulkan
wants it in an immutable state, even if hardware can edit it freely.
In this commit the structure is defined in an optimized state (it uses
booleans, has paddings and many data entries that can be packed to
single integers). This is intentional as an initial implementation that
is easier to debug, implement and review. It will be optimized in later
stages, or it might change if Vulkan gets more dynamic states.
ExprCondCode visit implements the generic Visit. Use this instead of
that one.
As an intended side effect this fixes unwritten memory usages in cases
when a negation of a condition code is used.
This allows us to put VKFenceWatch inside a std::vector without storing
it in heap. On move we have to signal the fences where the new protected
resource is, adding some overhead.
VK_NV_device_diagnostic_checkpoints allows us to push data to a Vulkan
queue and then query it even after a device loss. This allows us to push
the current pipeline object and see what was the call that killed the
device.
Some games write from fragment shaders to an unexistant framebuffer
attachment or they don't write to one when it exists in the framebuffer.
Fix this by skipping writes or adding zeroes.
These shaders are used to specify code that is not dynamically generated
in the Vulkan backend. Instead of packing it inside the build system,
it's manually built and copied to the C++ file to avoid adding
unnecessary build time dependencies.
quad_array should be dropped in the future since it can be emulated with
a memory pool generated from the CPU.
Add an extra argument to query device capabilities in the future. The
intention behind this is to use native quads, quad strips, line loops
and polygons if these are released for Vulkan.
The OpenGL spec defines GL_CLAMP's formula similarly to CLAMP_TO_EDGE
and CLAMP_TO_BORDER depending on the filter mode used. It doesn't
exactly behave like this, but it's the closest we can get with what
Vulkan offers without emulating it by injecting shader code.
Introduce a worker thread approach for delegating Vulkan work derived
from dxvk's approach. https://github.com/doitsujin/dxvk
Now that the scheduler is what handles all Vulkan work related to
command streaming, store state tracking in itself. This way we can know
when to reupload Vulkan dynamic state to the queue (since this one is
invalidated between command buffers unlike NVN). We can also store the
renderpass state and graphics pipeline bound to avoid redundant binds
and renderpass begins/ends.
Update Sirit and its usage in vk_shader_decompiler. Highlights:
- Implement tessellation shaders
- Implement geometry shaders
- Implement some missing features
- Use native half float instructions when available.
- Setup more features and requirements.
- Improve logging for missing features.
- Collect telemetry parameters.
- Add queries for more image formats.
- Query push constants limits.
- Optionally enable some extensions.
Amends a few interfaces to be able to handle the migration over to the
new Memory class by passing the class by reference as a function
parameter where necessary.
Notably, within the filesystem services, this eliminates two ReadBlock()
calls by using the helper functions of HLERequestContext to do that for
us.
Abstracted ComponentType was not being used in a meaningful way.
This commit drops its usage.
There is one place where it was being used to test compatibility between
two cached surfaces, but this one is implied in the pixel format.
Removing the component type test doesn't change the behaviour.
In the process remove implementation of SUATOM.MIN and SUATOM.MAX as
these require a distinction between U32 and S32. These have to be
implemented with imageCompSwap loop.
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics
Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.
To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:
* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true
ballotARB, also known as "uint64_t(activeThreadsNV())", emits
VOTE.ANY Rd, PT, PT;
on nouveau's compiler. This doesn't match exactly to Nvidia's code
VOTE.ALL Rd, PT, PT;
Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
This commit implements gl_ViewportIndex and gl_Layer in vertex and
geometry shaders. In the case it's used in a vertex shader, it requires
ARB_shader_viewport_layer_array. This extension is available on AMD and
Nvidia devices (mesa and proprietary drivers), but not available on
Intel on any platform. At the moment of writing this description I don't
know if this is a hardware limitation or a driver limitation.
In the case that ARB_shader_viewport_layer_array is not available,
writes to these registers on a vertex shader are ignored, with the
appropriate logging.
Instead of passing by copy an execution context through out the whole
Vulkan call hierarchy, use a command buffer view and fence view
approach.
This internally dereferences the command buffer or fence forcing the
user to be unable to use an outdated version of it on normal usage.
It is still possible to keep store an outdated if it is casted to
VKFence& or vk::CommandBuffer.
While changing this file, add an extra parameter for Flush and Finish to
allow releasing the fence from this calls.
Hardware testing revealed that SSY and PBK push to a different stack,
allowing code like this:
SSY label1;
PBK label2;
SYNC;
label1: PBK;
label2: EXIT;
Instead of having a vector of unique_ptr stored in a vector and
returning star pointers to this, use shared_ptr. While changing
initialization code, move it to a separate file when possible.
This is a first step to allow code analysis and node generation beyond
the ShaderIR class.
Fix missing OpSelectionMerge instruction. This caused devices loses on
most hardware, Intel didn't care.
Fix [-1;1] -> [0;1] depth conversions.
Conditionally use VK_EXT_scalar_block_layout. This allows us to use
non-std140 layouts on UBOs.
Update external Vulkan headers.
Keeps track of native ASTC support, VK_EXT_scalar_block_layout
availability and SSBO range.
Check for independentBlend and vertexPipelineStorageAndAtomics as a
required feature. Always enable it.
Use vk::to_string format to log Vulkan enums.
Style changes.
flushing is now responsability of children caches instead of the cache
object. This change will allow the specific cache to pass extra
parameters on flushing and will allow more flexibility.
Operations done before the main half float operation (like HAdd) were
managing a packed value instead of the unpacked one. Adding an unpacked
operation allows us to drop the per-operand MetaHalfArithmetic entry,
simplifying the code overall.
Replaces header inclusions with forward declarations where applicable
and also removes unused headers within the cpp file. This reduces a few
more dependencies on core/memory.h
Removes a few unnecessary dependencies on core-related machinery, such
as the core.h and memory.h, which reduces the amount of rebuilding
necessary if those files change.
This also uncovered some indirect dependencies within other source
files. This also fixes those.
This manages two kinds of streaming buffers: one for unified memory
models and one for dedicated GPUs. The first one skips the copy from the
staging buffer to the real buffer, since it creates an unified buffer.
This implementation waits for all fences to finish their operation
before "invalidating". This is suboptimal since it should allocate
another buffer or start searching from the beginning. There is room for
improvement here.
This could also handle AMD's "pinned" memory (a heap with 256 MiB) that
seems to be designed for buffer streaming.
The scheduler abstracts command buffer and fence management with an
interface that's able to do OpenGL-like operations on Vulkan command
buffers.
It returns by value a command buffer and fence that have to be used for
subsequent operations until Flush or Finish is executed, after that the
current execution context (the pair of command buffers and fences) gets
invalidated a new one must be fetched. Thankfully validation layers will
quickly detect if this is skipped throwing an error due to modifications
to a sent command buffer.
Handles a pool of resources protected by fences. Manages resource
overflow allocating more resources.
This class is intended to be used through inheritance.
Fences take ownership of objects, protecting them from GPU-side or
driver-side concurrent access. They must be commited from the resource
manager. Their usage flow is: commit the fence from the resource
manager, protect resources with it and use them, send the fence to an
execution queue and Wait for it if needed and then call Release. Used
resources will automatically be signaled when they are free to be
reused.
VKDevice contains all the data required to manage and initialize a
physical device. Its intention is to be passed across Vulkan objects to
query device-specific data (for example the logical device and the
dispatch loader).
This file is intended to be included instead of vulkan/vulkan.hpp. It
includes declarations of unique handlers using a dynamic dispatcher
instead of a static one (which would require linking to a Vulkan
library).