Commit graph

33 commits

Author SHA1 Message Date
riperiperi e20abbf9cc
Vulkan: Don't flush commands when creating most sync (#4087)
* Vulkan: Don't flush commands when creating most sync

When the WaitForIdle method is called, we create sync as some internal GPU method may read back written buffer data. Some games randomly intersperse compute dispatch into their render passes, which result in this happening an unbounded number of times depending on how many times they run compute.

Creating sync in Vulkan is expensive, as we need to flush the current command buffer so that it can be waited on. We have a limited number of active command buffers due to how we track resource usage, so submitting too many command buffers will force us to wait for them to return to the pool.

This PR allows less "important" sync (things which are less likely to be waited on) to wait on a command buffer's result without submitting it, instead relying on AutoFlush or another, more important sync to flush it later on.

Because of the possibility of us waiting for a command buffer that hasn't submitted yet, any thread needs to be able to force the active command buffer to submit. The ability to do this has been added to the backend multithreading via an "Interrupt", though it is not supported without multithreading.

OpenGL drivers should already be doing something similar so they don't blow up when creating lots of sync, which is why this hasn't been a problem for these games over there.

Improves Vulkan performance on Xenoblade DE, Pokemon Scarlet/Violet, and Zelda BOTW (still another large issue here)

* Add strict argument

This is technically a separate concern from whether the sync is a host syncpoint.

* Remove _interrupted variable

* Actually wait for the invoke

This is required by AMD GPUs, and also may have caused some issues on other GPUs.

* Remove unused using.

* I don't know why it added these ones.

* Address Feedback

* Fix typo
2022-12-29 15:39:04 +01:00
riperiperi 458452279c
GPU: Track buffer migrations and flush source on incomplete copy (#3952)
* Track buffer migrations and flush source on incomplete copy

Makes sure that the modified range list is always from the latest iteration of the buffer, and flushes earlier iterations of a buffer if the data has not been migrated yet.

* Cleanup 1

* Reduce cost for redundant signal checks on Vulkan

* Only inherit the range list if there are pending ranges.

* Fix OpenGL

* Address Feedback

* Whoops
2022-12-01 16:30:13 +01:00
gdkchan 88a0e720cb
Use RGBA16 vertex format if RGB16 is not supported on Vulkan (#3552)
* Use RGBA16 vertex format if RGB16 is not supported on Vulkan

* Catch all shader compilation exceptions
2022-08-20 16:20:27 -03:00
riperiperi 9ba73ffbe5
Prefetch capabilities before spawning translation threads. (#3338)
* Prefetch capabilities before spawning translation threads.

The Backend Multithreading only expects one thread to submit commands at a time. When compiling shaders, the translator may request the host GPU capabilities from the backend. It's possible for a bunch of translators to do this at the same time.

There's a caching mechanism in place so that the capabilities are only fetched once. By triggering this before spawning the thread, the async translation threads no longer try to queue onto the backend queue all at the same time.

The Capabilities do need to be checked from the GPU thread, due to OpenGL needing a context to check them, so it's not possible to call the underlying backend directly.

* Initialize the capabilities when setting the GPU thread + missing call in headless

* Remove private variables
2022-05-14 11:58:33 -03:00
gdkchan 43ebd7a9bb
New shader cache implementation (#3194)
* New shader cache implementation

* Remove some debug code

* Take transform feedback varying count into account

* Create shader cache directory if it does not exist + fragment output map related fixes

* Remove debug code

* Only check texture descriptors if the constant buffer is bound

* Also check CPU VA on GetSpanMapped

* Remove more unused code and move cache related code

* XML docs + remove more unused methods

* Better codegen for TransformFeedbackDescriptor.AsSpan

* Support migration from old cache format, remove more unused code

Shader cache rebuild now also rewrites the shared toc and data files

* Fix migration error with BRX shaders

* Add a limit to the async translation queue

 Avoid async translation threads not being able to keep up and the queue growing very large

* Re-create specialization state on recompile

This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access

* Make shader cache more error resilient

* Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc

* Address early PR feedback

* Fix rebase

* Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly

* Handle some missing exceptions

* Make shader cache purge delete both old and new shader caches

* Register textures on new specialization state

* Translate and compile shaders in forward order (eliminates diffs due to different binding numbers)

* Limit in-flight shader compilation to the maximum number of compilation threads

* Replace ParallelDiskCacheLoader state changed event with a callback function

* Better handling for invalid constant buffer 1 data length

* Do not create the old cache directory structure if the old cache does not exist

* Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented)

* Replace rectangle texture with just coordinate normalization

* Skip incompatible shaders that are missing texture information, instead of crashing

This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable

* Fix coordinates normalization on cubemap textures

* Check if title ID is null before combining shader cache path

* More robust constant buffer address validation on spec state

* More robust constant buffer address validation on spec state (2)

* Regenerate shader cache with one stream, rather than one per shader.

* Only create shader cache directory during initialization

* Logging improvements

* Proper shader program disposal

* PR feedback, and add a comment on serialized structs

* XML docs for RegisterTexture

Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-04-10 10:49:44 -03:00
Mary 6dffe0fad4
misc: Make PID unsigned long instead of long (#3043) 2022-02-09 17:18:07 -03:00
riperiperi c52158b733
Add timestamp to 16-byte/4-word semaphore releases. (#3049)
* Add timestamp to 16-byte semaphore releases.

BOTW was reading a ulong 8 bytes after a semaphore return. Turns out this is the timestamp it was trying to do performance calculation with, so I've made it write when necessary.

This mode was also added to the DMA semaphore I added recently, as it is required by a few games. (i think quake?)

The timestamp code has been moved to GPU context. Check other games with an unusually low framerate cap or dynamic resolution to see if they have improved.

* Cast dma semaphore payload to ulong to fill the space

* Write timestamp first

Might be just worrying too much, but we don't want the applcation reading timestamp if it sees the payload before timestamp is written.
2022-01-27 22:50:32 +01:00
gdkchan 42c75dbb8f
Add support for BC1/2/3 decompression (for 3D textures) (#2987)
* Add support for BC1/2/3 decompression (for 3D textures)

* Optimize and clean up

* Unsafe not needed here

* Fix alpha value interpolation when a0 <= a1
2022-01-22 19:23:00 +01:00
riperiperi cda659955c
Texture Sync, incompatible overlap handling, data flush improvements. (#2971)
* Initial test for texture sync

* WIP new texture flushing setup

* Improve rules for incompatible overlaps

Fixes a lot of issues with Unreal Engine games. Still a few minor issues (some caused by dma fast path?) Needs docs and cleanup.

* Cleanup, improvements

Improve rules for fast DMA

* Small tweak to group together flushes of overlapping handles.

* Fixes, flush overlapping texture data for ASTC and BC4/5 compressed textures.

Fixes the new Life is Strange game.

* Flush overlaps before init data, fix 3d texture size/overlap stuff

* Fix 3D Textures, faster single layer flush

Note: nosy people can no longer merge this with Vulkan. (unless they are nosy enough to implement the new backend methods)

* Remove unused method

* Minor cleanup

* More cleanup

* Use the More Fun and Hopefully No Driver Bugs method for getting compressed tex too

This one's for metro

* Address feedback, ASTC+ETC to FormatClass

* Change offset to use Span slice rather than IntPtr Add

* Fix this too
2022-01-09 13:28:48 -03:00
riperiperi ec3e848d79
Add a Multithreading layer for the GAL, multi-thread shader compilation at runtime (#2501)
* Initial Implementation

About as fast as nvidia GL multithreading, can be improved with faster command queuing.

* Struct based command list

Speeds up a bit. Still a lot of time lost to resource copy.

* Do shader init while the render thread is active.

* Introduce circular span pool V1

Ideally should be able to use structs instead of references for storing these spans on commands. Will try that next.

* Refactor SpanRef some more

Use a struct to represent SpanRef, rather than a reference.

* Flush buffers on background thread

* Use a span for UpdateRenderScale.

Much faster than copying the array.

* Calculate command size using reflection

* WIP parallel shaders

* Some minor optimisation

* Only 2 max refs per command now.

The command with 3 refs is gone. 😌

* Don't cast on the GPU side

* Remove redundant casts, force sync on window present

* Fix Shader Cache

* Fix host shader save.

* Fixup to work with new renderer stuff

* Make command Run static, use array of delegates as lookup

Profile says this takes less time than the previous way.

* Bring up to date

* Add settings toggle. Fix Muiltithreading Off mode.

* Fix warning.

* Release tracking lock for flushes

* Fix Conditional Render fast path with threaded gal

* Make handle iteration safe when releasing the lock

This is mostly temporary.

* Attempt to set backend threading on driver

Only really works on nvidia before launching a game.

* Fix race condition with BufferModifiedRangeList, exceptions in tracking actions

* Update buffer set commands

* Some cleanup

* Only use stutter workaround when using opengl renderer non-threaded

* Add host-conditional reservation of counter events

There has always been the possibility that conditional rendering could use a query object just as it is disposed by the counter queue. This change makes it so that when the host decides to use host conditional rendering, the query object is reserved so that it cannot be deleted. Counter events can optionally start reserved, as the threaded implementation can reserve them before the backend creates them, and there would otherwise be a short amount of time where the counter queue could dispose the event before a call to reserve it could be made.

* Address Feedback

* Make counter flush tracked again.

Hopefully does not cause any issues this time.

* Wait for FlushTo on the main queue thread.

Currently assumes only one thread will want to FlushTo (in this case, the GPU thread)

* Add SDL2 headless integration

* Add HLE macro commands.

Co-authored-by: Mary <mary@mary.zone>
2021-08-27 00:31:29 +02:00
gdkchan 40b21cc3c4
Separate GPU engines (part 2/2) (#2440)
* 3D engine now uses DeviceState too, plus new state modification tracking

* Remove old methods code

* Remove GpuState and friends

* Optimize DeviceState, force inline some functions

* This change was not supposed to go in

* Proper channel initialization

* Optimize state read/write methods even more

* Fix debug build

* Do not dirty state if the write is redundant

* The YControl register should dirty either the viewport or front face state too, to update the host origin

* Avoid redundant vertex buffer updates

* Move state and get rid of the Ryujinx.Graphics.Gpu.State namespace

* Comments and nits

* Fix rebase

* PR feedback

* Move changed = false to improve codegen

* PR feedback

* Carry RyuJIT a bit more
2021-07-11 17:20:40 -03:00
gdkchan fbb4019ed5
Initial support for separate GPU address spaces (#2394)
* Make GPU memory manager a member of GPU channel

* Move physical memory instance to the memory manager, and the caches to the physical memory

* PR feedback
2021-06-29 19:32:02 +02:00
gdkchan a10b2c5ff2
Initial support for GPU channels (#2372)
* Ground work for separate GPU channels

* Rename TextureManager to TextureCache

* Decouple texture bindings management from the texture cache

* Rename BufferManager to BufferCache

* Decouple buffer bindings management from the buffer cache

* More comments and proper disposal

* PR feedback

* Force host state update on channel switch

* Typo

* PR feedback

* Missing using
2021-06-24 01:51:41 +02:00
riperiperi 54ea2285f0
POWER - Performance Optimizations With Extensive Ramifications (#2286)
* Refactoring of KMemoryManager class

* Replace some trivial uses of DRAM address with VA

* Get rid of GetDramAddressFromVa

* Abstracting more operations on derived page table class

* Run auto-format on KPageTableBase

* Managed to make TryConvertVaToPa private, few uses remains now

* Implement guest physical pages ref counting, remove manual freeing

* Make DoMmuOperation private and call new abstract methods only from the base class

* Pass pages count rather than size on Map/UnmapMemory

* Change memory managers to take host pointers

* Fix a guest memory leak and simplify KPageTable

* Expose new methods for host range query and mapping

* Some refactoring of MapPagesFromClientProcess to allow proper page ref counting and mapping without KPageLists

* Remove more uses of AddVaRangeToPageList, now only one remains (shared memory page checking)

* Add a SharedMemoryStorage class, will be useful for host mapping

* Sayonara AddVaRangeToPageList, you served us well

* Start to implement host memory mapping (WIP)

* Support memory tracking through host exception handling

* Fix some access violations from HLE service guest memory access and CPU

* Fix memory tracking

* Fix mapping list bugs, including a race and a error adding mapping ranges

* Simple page table for memory tracking

* Simple "volatile" region handle mode

* Update UBOs directly (experimental, rough)

* Fix the overlap check

* Only set non-modified buffers as volatile

* Fix some memory tracking issues

* Fix possible race in MapBufferFromClientProcess (block list updates were not locked)

* Write uniform update to memory immediately, only defer the buffer set.

* Fix some memory tracking issues

* Pass correct pages count on shared memory unmap

* Armeilleure Signal Handler v1 + Unix changes

Unix currently behaves like windows, rather than remapping physical

* Actually check if the host platform is unix

* Fix decommit on linux.

* Implement windows 10 placeholder shared memory, fix a buffer issue.

* Make PTC version something that will never match with master

* Remove testing variable for block count

* Add reference count for memory manager, fix dispose

Can still deadlock with OpenAL

* Add address validation, use page table for mapped check, add docs

Might clean up the page table traversing routines.

* Implement batched mapping/tracking.

* Move documentation, fix tests.

* Cleanup uniform buffer update stuff.

* Remove unnecessary assignment.

* Add unsafe host mapped memory switch

On by default. Would be good to turn this off for untrusted code (homebrew, exefs mods) and give the user the option to turn it on manually, though that requires some UI work.

* Remove C# exception handlers

They have issues due to current .NET limitations, so the meilleure one fully replaces them for now.

* Fix MapPhysicalMemory on the software MemoryManager.

* Null check for GetHostAddress, docs

* Add configuration for setting memory manager mode (not in UI yet)

* Add config to UI

* Fix type mismatch on Unix signal handler code emit

* Fix 6GB DRAM mode.

The size can be greater than `uint.MaxValue` when the DRAM is >4GB.

* Address some feedback.

* More detailed error if backing memory cannot be mapped.

* SetLastError on all OS functions for consistency

* Force pages dirty with UBO update instead of setting them directly.

Seems to be much faster across a few games. Need retesting.

* Rebase, configuration rework, fix mem tracking regression

* Fix race in FreePages

* Set memory managers null after decrementing ref count

* Remove readonly keyword, as this is now modified.

* Use a local variable for the signal handler rather than a register.

* Fix bug with buffer resize, and index/uniform buffer binding.

Should fix flickering in games.

* Add InvalidAccessHandler to MemoryTracking

Doesn't do anything yet

* Call invalid access handler on unmapped read/write.

Same rules as the regular memory manager.

* Make unsafe mapped memory its own MemoryManagerType

* Move FlushUboDirty into UpdateState.

* Buffer dirty cache, rather than ubo cache

Much cleaner, may be reusable for Inline2Memory updates.

* This doesn't return anything anymore.

* Add sigaction remove methods, correct a few function signatures.

* Return empty list of physical regions for size 0.

* Also on AddressSpaceManager

Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2021-05-24 22:52:44 +02:00
mageven 69f8722e79
Fix inconsistencies in progress reporting (#2129)
Signal and setup events correctly
Eliminate possible races
Use a single event
Mark volatiles and reduce scope of waithandles
Common handler
100ms -> 50ms
2021-03-22 19:40:07 +01:00
mageven ca5d8e58dd
Add progress reporting to PTC and Shader Cache (#2057)
* UI changes

* Add progress reporting to PTC & ShaderCache

* Account for null events and expand docs

Co-authored-by: Joshi234 <46032261+Joshi234@users.noreply.github.com>
2021-03-03 01:39:36 +01:00
riperiperi a1f77a5b6a
Implement lazy flush-on-read for Buffers (SSBO/Copy) (#1790)
* Initial implementation of buffer flush (VERY WIP)

* Host shaders need to be rebuilt for the SSBO write flag.

* New approach with reserved regions and gl sync

* Fix a ton of buffer issues.

* Remove unused buffer unmapped behaviour

* Revert "Remove unused buffer unmapped behaviour"

This reverts commit f1700e52fb8760180ac5e0987a07d409d1e70ece.

* Delete modified ranges on unmap

Fixes potential crashes in Super Smash Bros, where a previously modified range could lie on either side of an unmap.

* Cache some more delegates.

* Dispose Sync on Close

* Also create host sync for GPFifo syncpoint increment.

* Copy buffer optimization, add docs

* Fix race condition with OpenGL Sync

* Enable read tracking on CommandBuffer, insert syncpoint on WaitForIdle

* Performance: Only flush individual pages of SSBO at a time

This avoids flushing large amounts of data when only a small amount is actually used.

* Signal Modified rather than flushing after clear

* Fix some docs and code style.

* Introduce a new test for tracking memory protection.

Sucessfully demonstrates that the bug causing write protection to be cleared by a read action has been fixed. (these tests fail on master)

* Address Comments

* Add host sync for SetReference

This ensures that any indirect draws will correctly flush any related buffer data written before them. Fixes some flashing and misplaced world geometry in MH rise.

* Make PageAlign static

* Re-enable read tracking, for reads.
2021-01-17 17:08:06 -03:00
Mary 863edae328
shader cache: Fix Linux boot issues (#1709)
* shader cache: Fix Linux boot issues

This rollback the init logic back to previous state, and replicate the
way PTC handle initialization.

* shader cache: set default state of ready for translation event to false

* Fix cpu unit tests
2020-11-17 22:40:19 +01:00
Mary 48f6570557
Salieri: shader cache (#1701)
Here come Salieri, my implementation of a disk shader cache!

"I'm sure you know why I named it that."
"It doesn't really mean anything."

This implementation collects shaders at runtime and cache them to be later compiled when starting a game.
2020-11-13 00:15:34 +01:00
gdkchan 111534a74e
Remove GPU MemoryAccessor (#1423)
* Remove GPU MemoryAccessor

* Update outdated XML doc

* Update more outdated stuff
2020-07-25 16:39:45 +10:00
gdkchan 5a7df48975
New GPFifo and fast guest constant buffer updates (#1400)
* Add new structures from official docs, start migrating GPFifo

* Finish migration to new GPFifo processor

* Implement fast constant buffer data upload

* Migrate to new GPFifo class

* XML docs
2020-07-23 23:53:25 -03:00
gdkchan 4d02a2d2c0
New NVDEC and VIC implementation (#1384)
* Initial NVDEC and VIC implementation

* Update FFmpeg.AutoGen to 4.3.0

* Add nvdec dependencies for Windows

* Unify some VP9 structures

* Rename VP9 structure fields

* Improvements to Video API

* XML docs for Common.Memory

* Remove now unused or redundant overloads from MemoryAccessor

* NVDEC UV surface read/write scalar paths

* Add FIXME comments about hacky things/stuff that will need to be fixed in the future

* Cleaned up VP9 memory allocation

* Remove some debug logs

* Rename some VP9 structs

* Remove unused struct

* No need to compile Ryujinx.Graphics.Host1x with unsafe anymore

* Name AsyncWorkQueue threads to make debugging easier

* Make Vp9PictureInfo a ref struct

* LayoutConverter no longer needs the depth argument (broken by rebase)

* Pooling of VP9 buffers, plus fix a memory leak on VP9

* Really wish VS could rename projects properly...

* Address feedback

* Remove using

* Catch OperationCanceledException

* Add licensing informations

* Add THIRDPARTY.md to release too

Co-authored-by: Thog <me@thog.eu>
2020-07-12 05:07:01 +02:00
gdkchan f77694e4f7
Implement a new physical memory manager and replace DeviceMemory (#856)
* Implement a new physical memory manager and replace DeviceMemory

* Proper generic constraints

* Fix debug build

* Add memory tests

* New CPU memory manager and general code cleanup

* Remove host memory management from CPU project, use Ryujinx.Memory instead

* Fix tests

* Document exceptions on MemoryBlock

* Fix leak on unix memory allocation

* Proper disposal of some objects on tests

* Fix JitCache not being set as initialized

* GetRef without checks for 8-bits and 16-bits CAS

* Add MemoryBlock destructor

* Throw in separate method to improve codegen

* Address PR feedback

* QueryModified improvements

* Fix memory write tracking not marking all pages as modified in some cases

* Simplify MarkRegionAsModified

* Remove XML doc for ghost param

* Add back optimization to avoid useless buffer updates

* Add Ryujinx.Cpu project, move MemoryManager there and remove MemoryBlockWrapper

* Some nits

* Do not perform address translation when size is 0

* Address PR feedback and format NativeInterface class

* Remove ghost parameter description

* Update Ryujinx.Cpu to .NET Core 3.1

* Address PR feedback

* Fix build

* Return a well defined value for GetPhysicalAddress with invalid VA, and do not return unmapped ranges as modified

* Typo
2020-05-04 08:54:50 +10:00
Thog 644de99e86
Implement GPU syncpoints (#980)
* Implement GPU syncpoints

This adds support for GPU syncpoints on the GPU backend & nvservices.

Everything that was implemented here is based on my researches,
hardware testing of the GM20B and reversing of nvservices (8.1.0).

Thanks to @fincs for the informations about some behaviours of the pusher
and for the initial informations about syncpoints.

* syncpoint: address gdkchan's comments

* Add some missing logic to handle SubmitGpfifo correctly

* Handle the NV event API correctly

* evnt => hostEvent

* Finish addressing gdkchan's comments

* nvservices: write the output buffer even when an error is returned

* dma pusher: Implemnet prefetch barrier

lso fix when the commands should be prefetch.

* Partially fix prefetch barrier

* Add a missing syncpoint check in QueryEvent of NvHostSyncPt

* Address Ac_K's comments and fix GetSyncpoint for ChannelResourcePolicy == Channel

* fix SyncptWait & SyncptWaitEx cmds logic

* Address ripinperi's comments

* Address gdkchan's comments

* Move user event management to the control channel

* Fix mm implementation, nvdec works again

* Address ripinperi's comments

* Address gdkchan's comments

* Implement nvhost-ctrl close accurately + make nvservices dispose channels when stopping the emulator

* Fix typo in MultiMediaOperationType
2020-04-19 11:25:57 +10:00
Thog d6b9babe1d Keep the GUI alive when closing a game (#888)
* Keep the GUI alive when closing a game

Make HLE.Switch init when starting a game and dispose it when closing
the GlScreen.

This also make HLE in charge of disposing the audio and gpu backend.

* Address Ac_k's comments

* Make sure to dispose the Discord module and use GTK quit method

Also update Discord Precense when closing a game.

* Make sure to dispose MainWindow

* Address gdk's comments
2020-01-21 23:23:11 +01:00
gdkchan 0dbfe3c23e Re-add NVDEC project (not integrated) 2020-01-09 02:13:00 +01:00
gdkchan 59fdaa744b GPU resource disposal 2020-01-09 02:13:00 +01:00
gdkchan f7bcc884e4 Add XML documentation to Ryujinx.Graphics.Gpu 2020-01-09 02:13:00 +01:00
gdkchan 647d0962df Initialize GPU physical memory accessor from KProcess, to allow homebrew that never maps anything on the GPU to work 2020-01-09 02:13:00 +01:00
gdk 6a98c643ca Add a pass to turn global memory access into storage access, and do all storage related transformations on IR 2020-01-09 02:13:00 +01:00
gdk 16d88c21fc Improved and simplified window texture presentation 2020-01-09 02:13:00 +01:00
gdk 2437ccca0e Separate sub-channel state 2020-01-09 02:13:00 +01:00
gdk 1876b346fe Initial work 2020-01-09 02:13:00 +01:00