Commit graph

12 commits

Author SHA1 Message Date
riperiperi
bf77d1cab9
GPU: Pass SpanOrArray for Texture SetData to avoid copy (#3745)
* GPU: Pass SpanOrArray for Texture SetData to avoid copy

Texture data is often converted before upload, meaning that an array was allocated to perform the conversion into. However, the backend SetData methods were being passed a Span of that data, and the Multithreaded layer does `ToArray()` on it so that it can be stored for later! This method can't extract the original array, so it creates a copy.

This PR changes the type passed for textures to a new ref struct called SpanOrArray, which is backed by either a ReadOnlySpan or an array. The benefit here is that we can have a ToArray method that doesn't copy if it is originally backed by an array.

This will also avoid a copy when running the ASTC decoder.

On NieR this was taking 38% of texture upload time, which it does a _lot_ of when you move between areas, so there should be a 1.6x performance boost when strictly uploading textures. No doubt this will also improve texture streaming performance in UE4 games, and maybe a small reduction with video playback.

From the numbers, it's probably possible to improve the upload rate by a further 1.6x by performing layout conversion on GPU. I'm not sure if we could improve it further than that - multithreading conversion on CPU would probably result in memory bottleneck.

This doesn't extend to buffers, since we don't convert their data on the GPU emulator side.

* Remove implicit cast to array.
2022-10-08 12:04:47 -03:00
riperiperi
d64594ec74
Fix various issues with texture sync (#3302)
* Fix various issues with texture sync

A variable called _actionRegistered is used to keep track of whether a tracking action has been registered for a given texture group handle. This variable is set when the action is registered, and should be unset when it is consumed. This is used to skip registering the tracking action if it's already registered, saving some time for render targets that are modified very often.

There were two issues with this. The worst issue was that the tracking action handler exits early if the handle's modified flag is false... which means that it never reset _actionRegistered, as that was done within the Sync() method called later. The second issue was that this variable was set true after the sync action was registered, so it was technically possible for the action to run immediately, set the flag to false, then set it to true.

Both situations would lead to the action never being registered again, as the texture group handle would be sure the action is already registered. This breaks the texture for the remaining runtime, or until it is disposed.

It was also possible for a texture to register sync once, then on future frames the last modified sync number did not update. This may have caused some more minor issues.

Seems to fix the Xenoblade flashing bug. Obviously this needs a lot of testing, since it was random chance. I typically had the most luck getting it to happen by switching time of day on the event theatre screen for a while, then entering the equipment screen by pressing X on an event.

May also fix weird things like random chance air swimming in BOTW, maybe a few texture streaming bugs.

* Exchange rather than CompareExchange
2022-04-29 18:34:11 -03:00
gdkchan
0a24aa6af2
Allow textures to have their data partially mapped (#2629)
* Allow textures to have their data partially mapped

* Explicitly check for invalid memory ranges on the MultiRangeList

* Update GetWritableRegion to also support unmapped ranges
2022-02-22 13:34:16 -03:00
riperiperi
cda659955c
Texture Sync, incompatible overlap handling, data flush improvements. (#2971)
* Initial test for texture sync

* WIP new texture flushing setup

* Improve rules for incompatible overlaps

Fixes a lot of issues with Unreal Engine games. Still a few minor issues (some caused by dma fast path?) Needs docs and cleanup.

* Cleanup, improvements

Improve rules for fast DMA

* Small tweak to group together flushes of overlapping handles.

* Fixes, flush overlapping texture data for ASTC and BC4/5 compressed textures.

Fixes the new Life is Strange game.

* Flush overlaps before init data, fix 3d texture size/overlap stuff

* Fix 3D Textures, faster single layer flush

Note: nosy people can no longer merge this with Vulkan. (unless they are nosy enough to implement the new backend methods)

* Remove unused method

* Minor cleanup

* More cleanup

* Use the More Fun and Hopefully No Driver Bugs method for getting compressed tex too

This one's for metro

* Address feedback, ASTC+ETC to FormatClass

* Change offset to use Span slice rather than IntPtr Add

* Fix this too
2022-01-09 13:28:48 -03:00
riperiperi
b6e093b0fc
Force copy when auto-deleting a texture with dependencies (#2687)
When a texture is deleted by falling to the bottom of the AutoDeleteCache, its data is flushed to preserve any GPU writes that occurred. This ensures that the data appears in any textures recreated in the future, but didn't account for a texture that already existed with a copy dependency.

This change forces copy dependencies to complete if a texture falls out from from the AutoDeleteCache. (not removed via overlap, as that would be wasted effort)

Fixes broken lighting caused by pausing in SMO's Metro Kingdom. May fix some other issues.
2021-09-29 02:11:05 +02:00
riperiperi
bdc1f91a5b
Remove pool cache entries for incompatible overlapping textures (#2568)
This greatly reduces memory usage in games that aggressively reuse memory without removing dead textures from the pool, such as the Xenoblade games, UE3 games, and to a lesser extent, UE4/unity games.

This change stops memory usage from ballooning in xenoblade and some other games. It will also reduce texture view/dependency complexity in some games - for example in MK8D it will reduce the number of surface copies between lighting cubemaps generated for actors.

There shouldn't be any performance impact from doing this, though the deletion and creation of textures could be improved by improving the OpenGL texture storage cache, which is very simple and limited right now. This will be improved in future.

Another potential error has been fixed with the texture cache, which could prevent data loss when data is interchangably written to textures from both the GPU and CPU. It was possible that the dirty flag for a texture would be consumed without the data being synchronized on next use, due to the old overlap check. This check no longer consumes the dirty flag.

Please test a bunch of games to make sure they still work, and there are no performance regressions.
2021-08-20 17:52:09 -03:00
riperiperi
97aedc030d
Fix GetHandleInformation for mipmapped 3d textures (#2569)
Got this the wrong way round - was causing games to try synchronize mipmap levels of like 52 on a 3d texture with 6 levels. Also, corrected the variable name in the method that _was_ working.
2021-08-20 14:59:39 -03:00
gdkchan
fbb4019ed5
Initial support for separate GPU address spaces (#2394)
* Make GPU memory manager a member of GPU channel

* Move physical memory instance to the memory manager, and the caches to the physical memory

* PR feedback
2021-06-29 19:32:02 +02:00
riperiperi
54ea2285f0
POWER - Performance Optimizations With Extensive Ramifications (#2286)
* Refactoring of KMemoryManager class

* Replace some trivial uses of DRAM address with VA

* Get rid of GetDramAddressFromVa

* Abstracting more operations on derived page table class

* Run auto-format on KPageTableBase

* Managed to make TryConvertVaToPa private, few uses remains now

* Implement guest physical pages ref counting, remove manual freeing

* Make DoMmuOperation private and call new abstract methods only from the base class

* Pass pages count rather than size on Map/UnmapMemory

* Change memory managers to take host pointers

* Fix a guest memory leak and simplify KPageTable

* Expose new methods for host range query and mapping

* Some refactoring of MapPagesFromClientProcess to allow proper page ref counting and mapping without KPageLists

* Remove more uses of AddVaRangeToPageList, now only one remains (shared memory page checking)

* Add a SharedMemoryStorage class, will be useful for host mapping

* Sayonara AddVaRangeToPageList, you served us well

* Start to implement host memory mapping (WIP)

* Support memory tracking through host exception handling

* Fix some access violations from HLE service guest memory access and CPU

* Fix memory tracking

* Fix mapping list bugs, including a race and a error adding mapping ranges

* Simple page table for memory tracking

* Simple "volatile" region handle mode

* Update UBOs directly (experimental, rough)

* Fix the overlap check

* Only set non-modified buffers as volatile

* Fix some memory tracking issues

* Fix possible race in MapBufferFromClientProcess (block list updates were not locked)

* Write uniform update to memory immediately, only defer the buffer set.

* Fix some memory tracking issues

* Pass correct pages count on shared memory unmap

* Armeilleure Signal Handler v1 + Unix changes

Unix currently behaves like windows, rather than remapping physical

* Actually check if the host platform is unix

* Fix decommit on linux.

* Implement windows 10 placeholder shared memory, fix a buffer issue.

* Make PTC version something that will never match with master

* Remove testing variable for block count

* Add reference count for memory manager, fix dispose

Can still deadlock with OpenAL

* Add address validation, use page table for mapped check, add docs

Might clean up the page table traversing routines.

* Implement batched mapping/tracking.

* Move documentation, fix tests.

* Cleanup uniform buffer update stuff.

* Remove unnecessary assignment.

* Add unsafe host mapped memory switch

On by default. Would be good to turn this off for untrusted code (homebrew, exefs mods) and give the user the option to turn it on manually, though that requires some UI work.

* Remove C# exception handlers

They have issues due to current .NET limitations, so the meilleure one fully replaces them for now.

* Fix MapPhysicalMemory on the software MemoryManager.

* Null check for GetHostAddress, docs

* Add configuration for setting memory manager mode (not in UI yet)

* Add config to UI

* Fix type mismatch on Unix signal handler code emit

* Fix 6GB DRAM mode.

The size can be greater than `uint.MaxValue` when the DRAM is >4GB.

* Address some feedback.

* More detailed error if backing memory cannot be mapped.

* SetLastError on all OS functions for consistency

* Force pages dirty with UBO update instead of setting them directly.

Seems to be much faster across a few games. Need retesting.

* Rebase, configuration rework, fix mem tracking regression

* Fix race in FreePages

* Set memory managers null after decrementing ref count

* Remove readonly keyword, as this is now modified.

* Use a local variable for the signal handler rather than a register.

* Fix bug with buffer resize, and index/uniform buffer binding.

Should fix flickering in games.

* Add InvalidAccessHandler to MemoryTracking

Doesn't do anything yet

* Call invalid access handler on unmapped read/write.

Same rules as the regular memory manager.

* Make unsafe mapped memory its own MemoryManagerType

* Move FlushUboDirty into UpdateState.

* Buffer dirty cache, rather than ubo cache

Much cleaner, may be reusable for Inline2Memory updates.

* This doesn't return anything anymore.

* Add sigaction remove methods, correct a few function signatures.

* Return empty list of physical regions for size 0.

* Also on AddressSpaceManager

Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2021-05-24 22:52:44 +02:00
riperiperi
9b7335a63b
Improve linear texture compatibility rules (#2099)
* Improve linear texture compatibility rules

Fixes an issue where small or width-aligned (rather than byte aligned) textures would fail to create a view of existing data. Creates a copy dependency as size change may be risky.

* Minor cleanup

* Remove Size Change for Copy Depenedencies

The copy to the target (potentially different sized) texture can properly deal with cropping by itself.

* Move StrideAlignment and GobAlignment into Constants
2021-03-19 02:17:38 +01:00
riperiperi
8d36681bf1
Improve handling for unmapped GPU resources (#2083)
* Improve handling for unmapped GPU resources

- Fixed a memory tracking bug that would set protection on empty PTEs
- When a texture's memory is (partially) unmapped, all pool references are forcibly removed and the texture must be rediscovered to draw with it. This will also force the texture discovery to always compare the texture's range for a match.
- RegionHandles now know if they are unmapped, and automatically unset their dirty flag when unmapped.
- Partial texture sync now loads only the region of texture that has been modified. Unmapped memory tracking handles cause dirty flags for a texture group handle to be ignored.

This greatly improves the emulator's stability for newer UE4 games.

* Address feedback, fix MultiRange slice

Fixed an issue where the size of the multi-range slice would be miscalculated.

* Update Ryujinx.Memory/Range/MultiRange.cs (feedback)

Co-authored-by: Mary <thog@protonmail.com>

Co-authored-by: Mary <thog@protonmail.com>
2021-03-06 11:43:55 -03:00
riperiperi
b530f0e110
Texture Cache: "Texture Groups" and "Texture Dependencies" (#2001)
* Initial implementation (3d tex mips broken)

This works rather well for most games, just need to fix 3d texture mips.

* Cleanup

* Address feedback

* Copy Dependencies and various other fixes

* Fix layer/level offset for copy from view<->view.

* Remove dirty flag from dependency

The dirty flag behaviour is not needed - DeferredCopy is all we need.

* Fix tracking mip slices.

* Propagate granularity (fix astral chain)

* Address Feedback pt 1

* Save slice sizes as part of SizeInfo

* Fix nits

* Fix disposing multiple dependencies causing a crash

This list is obviously modified when removing dependencies, so create a copy of it.
2021-03-02 19:30:54 -03:00