This PR aims to reduce the memory usage in the CPU page table by moving
GPU specific parameters into a child class. This saves 1Gb of Memory for
most games.
An implementation of the cemuhook motion/touch protocol, this adds the
ability for users to connect several different devices to citra to send
direct motion and touch data to citra.
Co-Authored-By: jroweboy <jroweboy@gmail.com>
We relies on UNREACHABLE's noreturn attribute to eliminate parent's "no return value" warning. However, this was wrapped in a `if(!false)` block, which compilers may not unfold to recognize the noreturn nature.
Makes the header more general for other potential algorithms in the
future. While we're at it, include a missing <functional> include to
satisfy the use of std::less.
This was related to the source allocator being passed into the
constructor potentially having a different type than allocator being
constructed.
We simply need to provide a constructor to handle this case.
This resolves issues related to the allocator causing debug builds on
MSVC to fail.
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics
Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.
To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:
* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true
ballotARB, also known as "uint64_t(activeThreadsNV())", emits
VOTE.ANY Rd, PT, PT;
on nouveau's compiler. This doesn't match exactly to Nvidia's code
VOTE.ALL Rd, PT, PT;
Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
This commit ensures that all backing memory allocated for the Guest CPU
is aligned to 256 bytes. This due to how gpu memory works and the heavy
constraints it has in the alignment of physical memory.
Instead of storing all block width, height and depths in their shifted
form:
block_width = 1U << block_shift;
Store them like they are provided by the emulated hardware (their
block_shift form). This way we can avoid doing the costly
Common::AlignUp operation to align texture sizes and drop CPU integer
divisions with bitwise logic (defined in Common::AlignBits).
Avoids potentially performing multiple reallocations (depending on the
size of the input data) by reserving the necessary amount of memory
ahead of time.
This is trivially doable, so there's no harm in it.