Supplements the VAAPI intel gpu decoder by implementing the D3D11VA decoder for Windows, and CUVID/VDPAU for Nvidia and AMD on drivers linux respectively.
With reference frames refreshes fix, we no longer need to buffer two frames in advance.
We can also remove other unused or otherwise unneeded variables.
This buffer was a list of EncodingData structures sorted by their bit length, with some duplication from the cpu decoder implementation.
We can take advantage of its sorted property to optimize its usage in the shader.
Thanks to wwylele for the optimization idea.
OpenGL and Vulkan images render in different coordinate systems. This allows us to specify the coordinate system of the screenshot within each renderer
Use VK_KHR_pipeline_executable_properties when enabled and available to
log statistics about the pipeline cache in a game.
For example, this is on Turing GPUs when generating a pipeline cache
from Super Smash Bros. Ultimate:
Average pipeline statistics
==========================================
Code size: 6433.167
Register count: 32.939
More advanced results could be presented, at the moment it's just an
average of all 3D and compute pipelines.
Everything else has a default constructor that does the straightforward
thing of initializing most members to a default value, except for the
size.
We explicitly initialize the size (and others, for consistency), to
prevent potential uninitialized reads from occurring. Particularly given
the largeish surface area that this struct is used in.
use_framelimiter was not being used internally by the renderers.
set_background_color was always set to true as there is no toggle for the renderer background color, instead users directly choose the color of their choice.
This setting is best referred to as a speed limit, as it involves the limits of all timing based aspects of the emulator, not only framerate.
This allows us to differentiate it from the fps unlocker setting.
GLASM is getting good enough that we can move it out of advanced
graphics settings. This removes the setting `use_assembly_shaders`,
opting for a enum class `shader_backend`. This comes with the benefits
that it is extensible for additional shader backends besides GLSL and
GLASM, and this will work better with a QComboBox.
Qt removes the related assembly shader setting from the Advanced
Graphics section and places it as a new QComboBox in the API Settings
group. This will replace the Vulkan device selector when OpenGL is
selected.
Additionally, mark all of the custom anisotropic filtering settings as
"WILL BREAK THINGS", as that is the case with a select few games.
Fixes Ori and the Blind Forest's menu on GLASM. For some reason
(probably high level optimizations) it is not sanitized on SPIR-V for
OpenGL. Vulkan is unaffected by this change.
Works around a bug where program parameters are only applied to the
current stage, and this one wasn't bound at the moment.
Affects all SSBO usages on GLASM.
This changes how Scheduler::Flush works. It queues the current command
buffer to be sent to the GPU but does not do it immediately. The Vulkan
worker thread takes care of that. Users will have to use
Scheduler::Flush + Scheduler::WaitWorker to get the previous behavior.
Scheduler::Finish is unchanged.
To avoid waiting on work never queued, Scheduler::Wait sends the current
command buffer if that's what the caller wants to wait.
Move code to separate files to be able to reuse it from OpenGL. This
greatly simplifies the pipeline cache logic on Vulkan.
Transform feedback state is not yet abstracted and it's still
intrusively stored inside vk_pipeline_cache. It will be moved when
needed on OpenGL.
Move descriptor lookup and update code to a separate thread. Delaying
this removes work from the main GPU thread and allows creating
descriptor layouts on another thread. This reduces a bit the workload
of the main thread when new pipelines are encountered.
Create multiple descriptor pools on demand. There are some degrees of
freedom what is considered a compatible pool to avoid wasting large
pools on small descriptors.
Mostly fixing unused *, implicit conversion, braced scalar init,
fpermissive, and some others.
Some Clang errors likely remain in video_core, and std::ranges is still
a pertinent issue in shader_recompiler
shader_recompiler: cmake: Force bracket depth to 1024 on Clang
Increases the maximum fold expression depth
thread_worker: Include condition_variable
Don't use list initializers in control flow
Co-authored-by: ReinUsesLisp <reinuseslisp@airmail.cc>
There's an optimization bug on non-git mesa versions where not
specifying GL_CLIENT_STORAGE_BIT causes very slow reads on the CPU
side.
Add this bit for all vendors.
On the texture cache we handle multisampled images by keeping their real
size in samples (e.g. 1920x1080 with 4 samples is 3840x2160).
This works nicely with size matches and other comparisons, but the
calculation for guest sizes was not having this in mind, and the size
was being multiplied (again) by the number of samples per dimension.
For example a 3840x2160 texture cache image had its width and height
multiplied by 2, resulting in a much larger texture.
Fix this issue.
- Fixes performance regression on cooking related titles when an
unrelated bug was fixed.
Images used as render targets were not being "prepared", causing
desynchronizations on the texture cache. Needs #6669 to avoid
performance regressions on certain cooking titles.
- Fixes black shadows on Age of Calamity.
Creates a new BasicSettings class in common/settings, and forces setting
a default and label for each setting that uses it in common/settings.
Moves defaults and labels from both frontends into common settings.
Creates a helper function in each frontend to facillitate reading the
settings now with the new default and label properties.
Settings::Setting is also now a subclass of Settings::BasicSetting. Also
adds documentation for both Setting and BasicSetting.
Fixes a regression unintentionally introduced by the garbage collector.
This makes regular memory downloads only flush the requested sizes.
This negatively affected Koei Tecmo games.
Removes common_sizes.h in favor of having `_KiB`, `_MiB`, `_GiB`, etc
user-literals within literals.h.
To keep the global namespace clean, users will have to use:
```
using namespace Common::Literals;
```
to access these literals.
There are a lot of scenarios where we don't particularly care whether or not the removal operation and just simply attempt a removal.
As such, removing the [[nodiscard]] attribute is best for these functions.
Use its std::stop_token to abort shader cache loading.
Using std::stop_token instead of std::atomic_bool allows the usage of
other utilities like std::stop_callback.
These changes should help in reducing crashes/drivers panics that may
occur due to synchronization issues between the shader completion and
later access of the decoded texture.
Per the spec, L1 is clamped to the value 0xff if it is greater than 0xff. An oversight caused us to take the maximum of L1 and 0xff, rather than the minimum.
Huge thanks to wwylele for finding this.
Co-Authored-By: Weiyi Wang <wwylele@gmail.com>
Users may want to fall back to the CPU ASTC texture decoder due to hangs
and crashes that may be caused by keeping the GPU under compute heavy
loads for extended periods of time. This is especially the case in games
such as Astral Chain which make extensive use of ASTC textures.
yuzu requires CMake 3.15 yet find_program was using REQUIRED, which is
only available on 3.18 and later. Instead, we check for
"<VAR>-NOTFOUND".
In addition, check for additional requirements before building libusb or
FFmpeg with autotools. Otherwise, CMake configuration will pass yet
compilation will fail.
* Wrong alignment in u64 LOG_DEBUG -> memcpy.
* Huge shift exponent in stride calculation for linear buffer, unused result -> skipped.
* Large shift in buffer cache if word = 0, skip checking for set bits.
Non of those were critical, so this should not change any behavior.
At least with the assumption, that the last one used masking behavior, which always yield continuous_bits = 0.
- Fixes a hang on shutdown when NVFlinger thread is waiting on a syncpoint that will never occur.
- Commonly observed when stopping emulation in Super Mario Odyssey.
* common: fs: fs_types: Create filesystem types
Contains various filesystem types used by the Common::FS library
* common: fs: fs_util: Add std::string to std::u8string conversion utility
* common: fs: path_util: Add utlity functions for paths
Contains various utility functions for getting or manipulating filesystem paths used by the Common::FS library
* common: fs: file: Rewrite the IOFile implementation
* common: fs: Reimplement Common::FS library using std::filesystem
* common: fs: fs_paths: Add fs_paths to replace common_paths
* common: fs: path_util: Add the rest of the path functions
* common: Remove the previous Common::FS implementation
* general: Remove unused fs includes
* string_util: Remove unused function and include
* nvidia_flags: Migrate to the new Common::FS library
* settings: Migrate to the new Common::FS library
* logging: backend: Migrate to the new Common::FS library
* core: Migrate to the new Common::FS library
* perf_stats: Migrate to the new Common::FS library
* reporter: Migrate to the new Common::FS library
* telemetry_session: Migrate to the new Common::FS library
* key_manager: Migrate to the new Common::FS library
* bis_factory: Migrate to the new Common::FS library
* registered_cache: Migrate to the new Common::FS library
* xts_archive: Migrate to the new Common::FS library
* service: acc: Migrate to the new Common::FS library
* applets/profile: Migrate to the new Common::FS library
* applets/web: Migrate to the new Common::FS library
* service: filesystem: Migrate to the new Common::FS library
* loader: Migrate to the new Common::FS library
* gl_shader_disk_cache: Migrate to the new Common::FS library
* nsight_aftermath_tracker: Migrate to the new Common::FS library
* vulkan_library: Migrate to the new Common::FS library
* configure_debug: Migrate to the new Common::FS library
* game_list_worker: Migrate to the new Common::FS library
* config: Migrate to the new Common::FS library
* configure_filesystem: Migrate to the new Common::FS library
* configure_per_game_addons: Migrate to the new Common::FS library
* configure_profile_manager: Migrate to the new Common::FS library
* configure_ui: Migrate to the new Common::FS library
* input_profiles: Migrate to the new Common::FS library
* yuzu_cmd: config: Migrate to the new Common::FS library
* yuzu_cmd: Migrate to the new Common::FS library
* vfs_real: Migrate to the new Common::FS library
* vfs: Migrate to the new Common::FS library
* vfs_libzip: Migrate to the new Common::FS library
* service: bcat: Migrate to the new Common::FS library
* yuzu: main: Migrate to the new Common::FS library
* vfs_real: Delete the contents of an existing file in CreateFile
Current usages of CreateFile expect to delete the contents of an existing file, retain this behavior for now.
* input_profiles: Don't iterate the input profile dir if it does not exist
Silences an error produced in the log if the directory does not exist.
* game_list_worker: Skip parsing file if the returned VfsFile is nullptr
Prevents crashes in GetLoader when the virtual file is nullptr
* common: fs: Validate paths for path length
* service: filesystem: Open the mod load directory as read only
The FPS counter was based on metrics in the nvdisp swapbuffers call. This metric would be accurate if the gpu thread/renderer were synchronous with the nvdisp service, but that's no longer the case.
This commit moves the frame counting responsibility onto the concrete renderers after their frame draw calls. Resulting in more meaningful metrics.
The displayed FPS is now made up of the average framerate between the previous and most recent update, in order to avoid distracting FPS counter updates when framerate is oscillating between close values.
The status bar update frequency was also changed from 2 seconds to 500ms.
Implements the OnClose method of the nvhost_vic device, and removes the remnants of an older implementation.
Also cleans up some of the surrounding code.
This line can only ever be reached if src is null, so dereferencing it
here is a logic bug that slipped through.
Instead, we dereference dst instead which is guaranteed to be valid.
Eliminates a potential bug vector related to inheritance. Plus, we
should generally be specifying the destructor as virtual within purely
virtual interfaces to begin with.
Amends implicit sign conversions occurring with usages of std::reduce
and also relocates it to its own utility function to reduce verbosity a
little bit.
When this was being made mandatory, these enablement of these features was removed, but this is still needed.
Fixes: 757fd1e917 ("vulkan_device: Require VK_EXT_robustness2")
Else the fence might get submited out-of-order into the queue, which makes testing them pointless.
Overhead should be tiny as the mutex is just moved from the queue to the writing code.
This was implicitly done by `is_powered_on = false`, however the explicit method allows us to block until the GPU is actually gone.
This should fix a race condition while removing the other subsystems while the GPU is still active.
It shall block until there is something to consume in the queue.
And use it for the GPU emulation instead of the spin loop.
This is only in booting the emulator, however in BOTW this is the case for about 1 second.
Avoid sending null pointer to memcpy as reported by Undefined Behaviour
Sanitizer. Replaces the std::memcpy calls in SpliceVectors with
std::copy calls. Opting to replace all the memcpy's with copy's.
Co-authored-by: LC <mathew1800@gmail.com>
Currently, the Windows versions of the Intel OpenGL driver and the AMD
proprietary OpenGL driver do not properly support (or in fact degrade)
when asynchronous shader compilation is enabled. This blocks
specifically those drivers from using this feature. This affects
AMDGPU-PRO on Linux, and AMD's and Intel's OpenGL drivers on Windows.
ASTC texture decoding is currently handled by a CPU decoder for GPU's without native ASTC decoding support (most desktop GPUs). This is the cause for noticeable performance degradation in titles which use the format extensively.
This commit adds support to accelerate ASTC decoding using a compute shader on OpenGL for GPUs without native support.
In order to force the BGRA8 conversion on Nvidia using OpenGL, we need to forbid texture copies and views with other formats.
This commit also adds a boolean relating to this, as this needs to be done only for the OpenGL api, Vulkan must remain unchanged.
OpenGL does not natively support BGR internal formats, which causes many BGR textures to render incorrectly, with Red and Blue channels swapped.
This commit aims to address this by swizzling the blue and red channels on texture copies when a BGR format is encountered.
- Uses a fixed 64MB for the cache instead of an ever growing map.
- Slightly faster by using atomics instead of a single mutex for access.
- Thanks for Rodrigo for the idea.
Some games benefit from skipping caches (Pokémon Sword), and others
don't (Animal Crossing: New Horizons). Add an heuristic to decide this
at runtime.
The cache hit ratio has to be ~98% or better to not skip the cache.
There are 16 frames of buffer.
This commit removes early placeholders for an implementation of async nvdec. With recent changes to the source code, the placeholders are no longer accurate, and can cause a nullptr dereference due to the nature of the cdma_pusher lifetime.
src/video_core/shader_notify.cpp: In member function 'void VideoCore::ShaderNotify::MarkShaderComplete()':
src/video_core/shader_notify.cpp:33:10: error: 'unique_lock' is not a member of 'std'
33 | std::unique_lock lock{mutex};
| ^~~~~~~~~~~
src/video_core/shader_notify.cpp:6:1: note: 'std::unique_lock' is defined in header '<mutex>'; did you forget to '#include <mutex>'?
5 | #include "video_core/shader_notify.h"
+++ |+#include <mutex>
6 |
src/video_core/shader_notify.cpp: In member function 'void VideoCore::ShaderNotify::MarkSharderBuilding()':
src/video_core/shader_notify.cpp:38:10: error: 'unique_lock' is not a member of 'std'
38 | std::unique_lock lock{mutex};
| ^~~~~~~~~~~
src/video_core/shader_notify.cpp:38:10: note: 'std::unique_lock' is defined in header '<mutex>'; did you forget to '#include <mutex>'?
This creates non-sRGB texture views for sRGB texture formats to allow for interfacing with these views in compute shaders using imageLoad and imageStore.
Co-Authored-By: Rodrigo Locatti <reinuseslisp@airmail.cc>
Load the current tick to a local variable, moving it out of an atomic
and allowing us to compare the value without going through a pointer
each time. This should make the loop more optimizable.
Fix a tragic off-by-one condition that causes Vulkan's stream buffer to
think it's always full, using fallback memory. The OpenGL was also
affected by this bug to a lesser extent.
We are already using robustness2 features without requiring it
explicitly, causing potential crashes on drivers without the extension.
Requiring this at boot allows better diagnostics for it and formalizes
our usage on the extension.
There was still a code path that could wait on a timeline semaphore tick
that would never be signalled.
While we are at it, make use of more STL algorithms.
Games can bind a null index buffer (size=0) where all indices are
evaluated as zero. VK_EXT_robustness2 doesn't support this and all
drivers segfault when a null index buffer is passed to
vkCmdBindIndexBuffer.
Workaround this by creating a 4 byte buffer and filling it with zeroes.
If it's read out of bounds, robustness takes care of returning zeroes as
indices.
Bind extra bytes beyond the guest API's bound range.
This is due to some games like Astral Chain operating out of bounds.
Binding the whole map range would be technically correct, but games
have large maps that make this approach unaffordable for now.
Avoids waiting idle while the GPU finishes to do work, and fixes an
issue where we'd wait forever if a single command buffer (logic tick)
all the data.
Detect when a memory region has been joined several times and increase
the size of the created buffer on those instances. The buffer is assumed
to be a "stream buffer", increasing its size should stop us from
constantly recreating it and fragmenting memory.
Ports from OpenGL the optimization to skip small 3D uniform buffer
uploads. This will take advantage of the previously introduced stream
buffer.
Fixes instances where the staging buffer offset was being ignored.
This uses a ring buffer similar to OpenGL's stream buffer for small
uploads. This stops us from allocating several small buffers, reducing
memory fragmentation and cache locality.
It uses dedicated allocations when possible.
Reimplement the buffer cache using cached bindings and page level
granularity for modification tracking. This also drops the usage of
shared pointers and virtual functions from the cache.
- Bindings are cached, allowing to skip work when the game changes few
bits between draws.
- OpenGL Assembly shaders no longer copy when a region has been modified
from the GPU to emulate constant buffers, instead GL_EXT_memory_object
is used to alias sub-buffers within the same allocation.
- OpenGL Assembly shaders stream constant buffer data using
glProgramBufferParametersIuivNV, from NV_parameter_buffer_object. In
theory this should save one hash table resolve inside the driver
compared to glBufferSubData.
- A new OpenGL stream buffer is implemented based on fences for drivers
that are not Nvidia's proprietary, due to their low performance on
partial glBufferSubData calls synchronized with 3D rendering (that
some games use a lot).
- Most optimizations are shared between APIs now, allowing Vulkan to
cache more bindings than before, skipping unnecesarry work.
This commit adds the necessary infrastructure to use Vulkan object from
OpenGL. Overall, it improves performance and fixes some bugs present on
the old cache. There are still some edge cases hit by some games that
harm performance on some vendors, this are planned to be fixed in later
commits.
Workaround an issue on Nvidia where creating a Vulkan instance from an
active OpenGL thread disables threaded optimization on the driver.
This optimization is important to have good performance on Nvidia
OpenGL.
Instead of using a two step initialization to report errors, initialize
the GPU renderer and rasterizer on the constructor and report errors
through std::runtime_error.
Some games usually write memory pages currently used by the GPU, causing
rendering issues (e.g. flashing geometry and shadows on Link's
Awakening). To workaround this issue, Guest CPU writes are delayed until
the command buffer finishes processing, but the pages are updated
immediately.
The overall behavior is:
- CPU writes are cached until they are flushed, they update the page
state, but don't change the modification state. Cached writes stop
pages from being flushed, in case games have meaningful data in it.
- Command processing writes (e.g. push constants) update the page state
and are marked to the command processor as dirty. They don't remove
the state of cached writes.
Also renames related CMake variables to match both the Find*FFmpeg* and
variables defined within the file. Fixes odd errors produced by the old
FindFFmpeg.
Citra's FindFFmpeg is slightly modified here: adds Citra's copyright at
the beginning, renames FFmpeg_INCLUDES to FFmpeg_INCLUDE_DIR, disables a
few components in _FFmpeg_ALL_COMPONENTS, and adds the missing avutil
component to the comment above.
For Linux, instructs CMake to use the FFmpeg submodule in externals.
This is HEAVILY based on our usage of the late Unicorn. Minimal change
to MSVC as it uses the yuzu-emu/ext-windows-bin. MinGW now targets the
same ext-windows-bin libraries as MSVC for FFmpeg. Adds FFMPEG_LIBRARIES
to WIN32 and simplifies video_core/CMakeLists.txt a bit.
Vulkan 1.0 didn't support creating sRGB image views on an ABGR8 VkImage
with storage usage bits. VK_KHR_maintenance2 addressed this allowing to
reduce the usage bits on a VkImageView.
To allow image store on non-sRGB image views when the VkImage is created
with sRGB, always create VkImages without sRGB and add the sRGB format
on the view.
The VertexA stage is not yet implemented, but Vulkan is adding its
descriptors, causing a discrepancy in the pushed descriptors and the
template. This generally ends up in a driver side crash.
Bypass the VertexA stage for now.
When we copy into a buffer, it might contain data modified from the GPU
on the same pages. Because of this, we have to flush the contents before
writing new data.
An alternative approach would be to write the data in place, but games
can also write data in other ways, invalidating our contents.
Fixes geometry in Zombie Panic in Wonderland DX.
Setting __GL_SHADER_DISK_CACHE_PATH we can force the cache directory to
be in yuzu's user directory to stop commonly distributed malware from
deleting our driver shader cache. And by setting
__GL_SHADER_DISK_CACHE_SKIP_CLEANUP we can have an unbounded shader
cache size.
This has only been implemented on Windows, mostly because previous tests
didn't seem to work on Linux.
Disable the precompiled cache on Nvidia's driver. There's no need to
hide information the driver already has in its own cache.
Silence the new validation layer error about SPIR-V not allowing OpUndef
on a OpTypeVoid, even when the SPIR-V spec doesn't say anything against
it.
They will be inserted as an undefined int to avoid SPIRV-Cross and
validation errors, but only when a debugging tool is attached.
Allow users of the allocator to hint memory usage for downloads. This
removes the non-descriptive boolean passed for "host visible" or not
host visible memory commits, and uses an enum to hint device local,
upload and download usages.
Fix a bug where the memory allocator could leave gaps between commits.
To fix this the allocation algorithm was reworked, although it's still
short in number of lines of code.
Rework the allocation API to self-contained movable objects instead of
naively using an unique_ptr to do the job for us. Remove the VK prefix.
Stops us from merging code with unused functions in the future.
If something is invoked behind conditionally evaluated code in
a way that the language can't see it (e.g. preprocessor macros), the
potentially unused function should use [[maybe_unused]].
It keeps track of the modified CPU and GPU ranges on a CPU page
granularity, notifying the given rasterizer about state changes
in the tracking behavior of the buffer.
Use a small vector optimization to store buffers smaller than 256 KiB
locally instead of using free store memory allocations.
With timeline semaphores we can avoid creating objects. Instead of
creating an event, grab the current tick from the scheduler and flush
the current command buffer. When the fence has to be queried/waited, we
can do so against the master semaphore instead of spinning on an event.
If Vulkan supported NVN like events or fences, we could signal from the
command buffer and wait for that without splitting things in two
separate command buffers.
Intel and AMD proprietary drivers are incapable of rendering to texture
views of different formats than the original texture. Avoid creating
these at a cache level. This will consume more memory, emulating them
with copies.
This breaks accelerated decoders trying to imageStore into images with
sRGB. The decoders are currently disabled so this won't cause issues at
runtime.
The "VK" prefix predates the "Vulkan" namespace. It was carried around
the codebase for consistency. "VKDevice" currently is a bad alias with
"VkDevice" (only an upcase character of difference) that can cause
confusion. Rename all instances of it.
For listing the available physical devices we can use Vulkan 1.0.
Now that MoltenVK supports 1.1 we can require it for running games.
Add missing documentation.
VKDevice::IsSuitable was not being called. To address this issue, check
suitability before initialization and throw an exception if it fails.
By doing this, we can deduplicate some code on queue searches.
Previosuly we would first search if a present and graphics queue
existed, then on initialization we would search again to find the index.
Report device enumeration errors with exceptions to be consistent with
other initialization related function calls. Reduces the amount of code
to maintain.
Move surface initialization code to a separate file. It's unlikely to
use this code outside of Vulkan, but keeping platform-specific code
(Win32, Xlib, Wayland) in its own translation unit keeps things cleaner.
Move more Vulkan code to report errors with exceptions and report them
through a log before notifying it with an error boolean for backwards
compatibility. In the future we can replace the rasterizer two-step
initialization to always use exceptions.
Initialize debug callbacks (messenger) from a separate file. This allows
sharing code with different backends.
Change our Vulkan error handling to use exceptions instead of error
codes, simplifying the initialization process.
The current texture cache has several points that hurt maintainability
and performance. It's easy to break unrelated parts of the cache
when doing minor changes. The cache can easily forget valuable
information about the cached textures by CPU writes or simply by its
normal usage.The current texture cache has several points that hurt
maintainability and performance. It's easy to break unrelated parts
of the cache when doing minor changes. The cache can easily forget
valuable information about the cached textures by CPU writes or simply
by its normal usage.
This commit aims to address those issues.
Without using VK_EXT_robustness2, we can't consider the 'enabled' (not
null) vertex buffers as dynamic state, as this leads to invalid Vulkan
state. Move this to static state that is always hashed and compared in
the pipeline key.
The bits for enabled vertex buffers are moved into the attribute state
bitfield. This is not 'correct' as it's not an attribute state, but that
struct has bits to spare, and it's used in an array of 32 elements (the
exact same number of vertex buffer bindings).
Most of the time people write code that always returns a value,
terminates execution, throws an exception, or uses an unconventional
jump primitive.
This is not always true when we build without asserts on mainline builds.
To avoid introducing undefined behavior on our most used builds, enforce
this warning signalling an error and stopping the build from shipping.