Protection for the `xgetbv` instruction for systems that do not support
`xcr0` such as nehalem processors.
The `XSAVE` cpuid indicates support for `XSAVE`, `XRESTOR`, `XSETBV`,
`XGETBV` while `OSXSAVE` indicates if the operating system itself has
`XSAVE` turned on. Both must be checked at the same time.
* Use source generated json serializers in order to improve code trimming
* Use strongly typed github releases model to fetch updates instead of raw Newtonsoft.Json parsing
* Use separate model for LogEventArgs serialization
* Make dynamic object formatter static. Fix string builder pooling.
* Do not inherit json version of LogEventArgs from EventArgs
* Fix extra space in object formatting
* Write log json directly to stream instead of using buffer writer
* Rebase fixes
* Rebase fixes
* Rebase fixes
* Enforce block-scoped namespaces in the solution. Convert style for existing code
* Apply suggestions from code review
Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com>
* Rebase indent fix
* Fix indent
* Delete unnecessary json properties
* Rebase fix
* Remove overridden json property names as they are handled in the options
* Apply suggestions from code review
Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com>
* Use default json options in github api calls
* Indentation and spacing fixes
---------
Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com>
* ARMeilleure: Add AVX512{F,VL,DQ,BW} detection
Add `UseAvx512Ortho` and `UseAvx512OrthoFloat` optimization flags as
short-hands for `F+VL` and `F+VL+DQ`.
* ARMeilleure: Add initial support for EVEX instruction encoding
Does not implement rounding, or exception controls.
* ARMeilleure: Add `X86Vpternlogd`
Accelerates the vector-`Not` instruction.
* ARMeilleure: Add check for `OSXSAVE` for AVX{2,512}
* ARMeilleure: Add check for `XCR0` flags
Add XCR0 register checks for AVX and AVX512F, following the guidelines
from section 14.3 and 15.2 from the Intel Architecture Software
Developer's Manual.
* ARMeilleure: Remove redundant `ReProtect` and `Dispose`, formatting
* ARMeilleure: Move XCR0 procedure to GetXcr0Eax
* ARMeilleure: Add `XCR0` to `FeatureInfo` structure
* ARMeilleure: Utilize `ReadOnlySpan` for Xcr0 assembly
Avoids an additional allocation
* ARMeilleure: Formatting fixes
* ARMeilleure: Fix EVEX encoding src2 register index
> Just like in VEX prefix, vvvv is provided in inverted form.
* ARMeilleure: Add `X86Vpternlogd` acceleration to `Vmvn_I`
Passes unit tests, verified instruction utilization
* ARMeilleure: Fix EVEX register operand designations
Operand 2 was being sourced improperly.
EVEX encoded instructions source their operands like so:
Operand 1: ModRM:reg
Operand 2: EVEX.vvvvv
Operand 3: ModRM:r/m
Operand 4: Imm
This fixes the improper register designations when emitting vpternlog.
Now "dest", "src1", "src2" arguments emit in the proper order in EVEX instructions.
* ARMeilleure: Add `X86Vpternlogd` acceleration to `Orn_V`
* ARMeilleure: PTC version bump
* ARMeilleure: Update EVEX encoding Debug.Assert to Debug.Fail
* ARMeilleure: Update EVEX encoding comment capitalization
* Initial implementation of migration between memory heaps
- Missing OOM handling
- Missing `_map` data safety when remapping
- Copy may not have completed yet (needs some kind of fence)
- Map may be unmapped before it is done being used. (needs scoped access)
- SSBO accesses are all "writes" - maybe pass info in another way.
- Missing keeping map type when resizing buffers (should this be done?)
* Ensure migrated data is in place before flushing.
* Fix issue where old waitable would be signalled.
- There is a real issue where existing Auto<> references need to be replaced.
* Swap bound Auto<> instances when swapping buffer backing
* Fix conversion buffers
* Don't try move buffers if the host has shared memory.
* Make GPU methods return PinnedSpan with scope
* Storage Hint
* Fix stupidity
* Fix rebase
* Tweak rules
Attempt to sidestep BOTW slowdown
* Remove line
* Migrate only when command buffers flush
* Change backing swap log to debug
* Address some feedback
* Disallow backing swap when the flush lock is held by the current thread
* Make PinnedSpan from ReadOnlySpan explicitly unsafe
* Fix some small issues
- Index buffer swap fixed
- Allocate DeviceLocal buffers using a separate block list to images.
* Remove alternative flags
* Address feedback
* Avoid copying more handles than we have space for
* Use locks instead
* Reduce nesting by combining the lock statements
* Add locks for other uses of _sessionHandles and _portHandles
* Use one object to lock instead of locking twice
* Release the lock as soon as possible
* add RecyclableMemoryStream dependency and MemoryStreamManager
* organize BinaryReader/BinaryWriter extensions
* add StreamExtensions to reduce need for BinaryWriter
* simple replacments of MemoryStream with RecyclableMemoryStream
* add write ReadOnlySequence<byte> support to IVirtualMemoryManager
* avoid 0-length array creation
* rework IpcMessage and related types to greatly reduce memory allocation by using RecylableMemoryStream, keeping streams around longer, avoiding their creation when possible, and avoiding creation of BinaryReader and BinaryWriter when possible
* reduce LINQ-induced memory allocations with custom methods to query KPriorityQueue
* use RecyclableMemoryStream in StreamUtils, and use StreamUtils in EmbeddedResources
* add constants for nanosecond/millisecond conversions
* code formatting
* XML doc adjustments
* fix: StreamExtension.WriteByte not writing non-zero values for lengths <= 16
* XML Doc improvements. Implement StreamExtensions.WriteByte() block writes for large-enough count values.
* add copyless path for StreamExtension.Write(ReadOnlySpan<int>)
* add default implementation of IVirtualMemoryManager.Write(ulong, ReadOnlySequence<byte>); remove previous explicit implementations
* code style fixes
* remove LINQ completely from KScheduler/KPriorityQueue by implementing a custom struct-based enumerator
* GPU: Fast path for adding one texture view to a group
Texture group handles must store a list of their overlapping views, so they can be properly notified when a write is detected, and a few other things relating to texture readback. This is generally created when the group is established, with each handle looping over all views to find its overlaps. This whole process was also done when only a single view was added (and no handles were changed), however...
Sonic Frontiers had a huge cubemap array with 7350 faces (175 cubemaps * 6 faces * 7 levels), so iterating over both handles and existing views added up very fast. Since we are only adding a single view, we only need to _add_ that view to the existing overlaps, rather than recalculate them all.
This greatly improves performance during loading screens and a few seconds into gameplay on the "open zone" sections of Sonic Frontiers. May improve loading times or stutters on some other games.
Note that the current texture cache rules will cause these views to fall out of the cache, as there are more than the hard cap, so the cost will be repaid when reloading the open zone.
I also added some code to properly remove overlaps when texture views are removed, since it seems that was missing.
This can be improved further by only iterating handles that overlap the view (filter by range), but so can a few places in TextureGroup, so better to do all at once. The full generation of overlaps could probably be improved in a similar way.
I recommend testing a few games to make sure nothing breaks.
* Address feedback
* Update sparsely mapped texture ranges without recreating
Important TODO in TexturePool. Smaller TODO: should I look into making textures with views also do this? It needs to be able to detect if the views can be instantly deleted without issue if they're now remapped.
* Actually do partial updates
* Signal group dirty after mappings changed
* Fix various issues (should work now)
* Further optimisation
Should load a lot less data (16x) when partial updating 3d textures.
* Improve stability
* Allow granular uploads on large textures, improve rules
* Actually avoid updating slices that aren't modified.
* Address some feedback, minor optimisation
* Small tweak
* Refactor DereferenceRequest
More specific initialization methods.
* Improve code for resetting handles
* Explain data loading a bit more
* Add some safety for setting null from different threads.
All texture sets come from the one thread, but null sets can come from multiple. Only decrement ref count if we succeeded the null set first.
* Address feedback 1
* Make a bit safer
* GPU: Scale counter results before addition
Counter results were being scaled on ReportCounter, which meant that the _total_ value of the counter was being scaled. Not only could this result in very large numbers and weird overflows if the game doesn't clear the counter, but it also caused the result to change drastically.
This PR changes scaling to be done when the value is added to the counter on the backend. This should evaluate the scale at the same time as before, on report counter, but avoiding the issue with scaling the total.
Fixes scaling in Warioware, at least in the demo, where it seems to compare old/new counters and broke down when scaling was enabled.
* Fix issues when result is partially uploaded.
Drivers tend to write the low half first, then the high half. Retry if the high half is FFFFFFFF.
* .
* Apply suggestions from code review
Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com>
* wildcard(*) needs to be outside of quotes(") for cp to work
---------
Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com>
* use Array.Empty() where instead of allocating new zero-length arrays
* structure for loops in a way that the JIT will elide array/Span bounds checking
* avoiding function calls in for loop condition tests
* avoid LINQ in a hot path
* conform with code style
* fix mistake in GetNextWaitingObject()
* fix GetNextWaitingObject() possibility of returning null if all list items have TimePoint == long.MaxValue
* make GetNextWaitingObject() behave FIFO behavior for multiple items with the same TimePoint
* Add flatpak release workflow
Co-authored-by: Mary <mary@mary.zone>
* infra: Update required SDK version to 7.0.200
---------
Co-authored-by: Mary <mary@mary.zone>
* Sockets: Properly convert error codes on MacOS
The error codes for MacOS are very different to how they are on windows or linux. An alternate mapping is used when the host operating system is MacOS.
This PR also defaults IsDhcpEnabled to true when interfaceProperties.DhcpServerAddresses is not available.
This change was already in `macos1`.
* Address feedback
* Add Post Processing Effects
* fix events and shader issues
* fix gtk upscale slider value
* fix bgra games
* don't swap swizzle if already swapped
* restore opengl texture state after effects run
* addressed review
* use single pipeline for smaa and fsr
* call finish on all pipelines
* addressed review
* attempt fix file case
* attempt fixing file case
* fix filter level tick frequency
* adjust filter slider margins
* replace fxaa shaders with original shader
* addressed review