We uploaded the wrong data before. So the offset on the host GPU pointer may work for the first vertices, the last ones run out bounds.
Let's just offset the upload instead.
MSVC 19.11 (A.K.A. VS 15.3)'s C++ standard library implements P0154R1
(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html)
which defines two new constants within the <new> header, std::hardware_destructive_interference_size
and std::hardware_constructive_interference_size.
std::hardware_destructive_interference_size defines the minimum
recommended offset between two concurrently-accessed objects to avoid
performance degradation due to contention introduced by the
implementation (with the lower-bound being at least alignof(max_align_t)).
In other words, the minimum offset between objects necessary to avoid
false-sharing.
std::hardware_constructive_interference_size on the other hand defines
the maximum recommended size of contiguous memory occupied by two
objects accessed wth temporal locality by concurrent threads (also
defined to be at least alignof(max_align_t)). In other words the maximum
size to promote true-sharing.
So we can simply use this facility to determine the ideal alignment
size. Unfortunately, only MSVC supports this right now, so we need to
enclose it within an ifdef for the time being.
* Fix bug where default username value for yuzu_cmd create an userprofile with uninitialize data as username
* Fix format
* Apply code review changes
* Remove nullptr check
This can just be a regular function, getting rid of the need to also
explicitly undef the define at the end of the file. Given FuncReturn()
was already converted into a function, it's #undef can also be removed.
Previously the second half of the value being written would overwrite
the first half. Thankfully this wasn't a bug that was being encountered,
as the function is currently unused.
This modifies the CPU interface to more accurately match an
AArch64-supporting CPU as opposed to an ARM11 one. Two of the methods
don't even make sense to keep around for this interface, as Adv Simd is
used, rather than the VFP in the primary execution state. This is
essentially a modernization change that should have occurred from the
get-go.
The kernel does the equivalent of the following check before proceeding:
if (address + 0x8000000000 < 0x7FFFE00000) {
return ERR_INVALID_MEMORY_STATE;
}
which is essentially what our IsKernelVirtualAddress() function does. So
we should also be checking for this.
The kernel also checks if the given input addresses are 4-byte aligned,
however our Mutex::TryAcquire() and Mutex::Release() functions already
handle this, so we don't need to add code for this case.
Avoids including unnecessary headers within the audio_renderer.h header,
lessening the likelihood of needing to rebuild source files including
this header if they ever change.
Given std::vector allows forward declaring contained types, we can move
VoiceState to the cpp file and hide the implementation entirely.
We pass a hint to the QPainter instance that we want anti-aliasing on
the compatibility icons, which prevents the circles from looking fairly
jagged, and actually makes them look circular.
Courtesy of @ogniK5377.
This also moves them into the cpp file and limits the visibility to
where they're directly used. It also gets rid of unused or duplicate
error codes.
The kernel caps the size limit of shared memory to 8589930496 bytes (or
(1GB - 512 bytes) * 8), so approximately 8GB, where every GB has a 512
byte sector taken off of it.
It also ensures the shared memory is created with either read or
read/write permissions for both permission types passed in, allowing the
remote permissions to also be set as "don't care".
Part of the checking done by the kernel is to check if the given
address and size are 4KB aligned, as well as checking if the size isn't
zero. It also only allows mapping shared memory as readable or
read/write, but nothing else, and so we shouldn't allow mapping as
anything else either.
Previously, these were sitting outside of the Kernel namespace, which
doesn't really make sense, given they're related to the Thread class
which is within the Kernel namespace.
There were a few places where nested namespace specifiers weren't being
used where they could be within the service code. This amends that to
make the namespacing a tiny bit more compact.
While unlikely, it does avoid constructing a std::string and
unnecessarily calling into the memory code if a game or executable
decides to be really silly about their logging.
Given these are shown to the user, they should be translatable.
While we're at it, also set up the dialog to automatically retranslate
the dialog along with the combo boxes if it receives a LanguageChange
event.
Keeps the individual initialization of the combo boxes logically separate.
We also shouldn't be dumping this sort of thing in the constructor
directly.
"value" is already a used variable name within the outermost ranged-for
loop, so this variable was shadowing the outer one. This isn't a bug,
but it will get rid of a -Wshadow warning.
It allows us to use texture views and it reduces the overhead within the GPU driver.
But it disallows us to reallocate the texture, but we don't do so anyways.
In the end, it is the new way to allocate textures, so there is no need to use the old way.
This places the font data within cpp files, which mitigates the
possibility of the font data being duplicated within the binary if it's
referred to in more than one translation unit in the future. It also
stores the data within a std::array, which is more flexible when it
comes to operating with the standard library.
Furthermore, it makes the data arrays const. This is what we want, as it
allows the compiler to store the data within the read-only segment. As
it is, having several large sections of mutable data like this just
leaves spots in memory that we can accidentally write to (via accidental
overruns, what have you) and actually have it work. This ensures the
font data remains the same no matter what.
When a destructor isn't defaulted into a cpp file, it can cause the use
of forward declarations to seemingly fail to compile for non-obvious
reasons. It also allows inlining of the construction/destruction logic
all over the place where a constructor or destructor is invoked, which
can lead to code bloat. This isn't so much a worry here, given the
services won't be created and destroyed frequently.
The cause of the above mentioned non-obvious errors can be demonstrated
as follows:
------- Demonstrative example, if you know how the described error happens, skip forwards -------
Assume we have the following in the header, which we'll call "thing.h":
\#include <memory>
// Forward declaration. For example purposes, assume the definition
// of Object is in some header named "object.h"
class Object;
class Thing {
public:
// assume no constructors or destructors are specified here,
// or the constructors/destructors are defined as:
//
// Thing() = default;
// ~Thing() = default;
//
// ... Some interface member functions would be defined here
private:
std::shared_ptr<Object> obj;
};
If this header is included in a cpp file, (which we'll call "main.cpp"),
this will result in a compilation error, because even though no
destructor is specified, the destructor will still need to be generated by
the compiler because std::shared_ptr's destructor is *not* trivial (in
other words, it does something other than nothing), as std::shared_ptr's
destructor needs to do two things:
1. Decrement the shared reference count of the object being pointed to,
and if the reference count decrements to zero,
2. Free the Object instance's memory (aka deallocate the memory it's
pointing to).
And so the compiler generates the code for the destructor doing this inside main.cpp.
Now, keep in mind, the Object forward declaration is not a complete type. All it
does is tell the compiler "a type named Object exists" and allows us to
use the name in certain situations to avoid a header dependency. So the
compiler needs to generate destruction code for Object, but the compiler
doesn't know *how* to destruct it. A forward declaration doesn't tell
the compiler anything about Object's constructor or destructor. So, the
compiler will issue an error in this case because it's undefined
behavior to try and deallocate (or construct) an incomplete type and
std::shared_ptr and std::unique_ptr make sure this isn't the case
internally.
Now, if we had defaulted the destructor in "thing.cpp", where we also
include "object.h", this would never be an issue, as the destructor
would only have its code generated in one place, and it would be in a
place where the full class definition of Object would be visible to the
compiler.
---------------------- End example ----------------------------
Given these service classes are more than certainly going to change in
the future, this defaults the constructors and destructors into the
relevant cpp files to make the construction and destruction of all of
the services consistent and unlikely to run into cases where forward
declarations are indirectly causing compilation errors. It also has the
plus of avoiding the need to rebuild several services if destruction
logic changes, since it would only be necessary to recompile the single
cpp file.
* Joystick hotplug support (#4141)
* use SDL_PollEvent instead of SDL_JoystickUpdate
Register hot plugged controller by GUID if they were configured in a previous session
* Move SDL_PollEvent into its own thread
* Don't store SDLJoystick pointer in Input Device; Get pointer on each GetStatus call
* Fix that joystick_list gets cleared after SDL_Quit
* Add VirtualJoystick for InputDevices thats never nullptr
* fixup! Add VirtualJoystick for InputDevices thats never nullptr
* fixup! fixup! Add VirtualJoystick for InputDevices thats never nullptr
* Remove SDL_GameController, make SDL_Joystick* unique_ptr
* fixup! Remove SDL_GameController, make SDL_Joystick* unique_ptr
* Adressed feedback; fixed handling of same guid reconnects
* fixup! Adressed feedback; fixed handling of same guid reconnects
* merge the two joystick_lists into one
* make SDLJoystick a member of VirtualJoystick
* fixup! make SDLJoystick a member of VirtualJoystick
* fixup! make SDLJoystick a member of VirtualJoystick
* fixup! fixup! make SDLJoystick a member of VirtualJoystick
* SDLJoystick: Addressed review comments
* Address one missed review comment
This virtual function is called in a very hot spot, and it does nothing.
If this kind of feature is required, please be more specific and add callbacks
in the switch statement within Maxwell3D::WriteReg. There is no point in having
another switch statement within the rasterizer.
Lets us keep the generic portions of the compatibility list code
together, and allows us to introduce a type alias that makes it so we
don't need to type out a very long type declaration anymore, making the
immediate readability of some code better.
- Fixed all warnings, for renderer_opengl items, which were indicating a
possible incorrect behavior from integral promotion rules and types
larger than those in which arithmetic is typically performed.
- Added const for variables where possible and meaningful.
- Added constexpr where possible.
When not set, this tells the GPU to only use the X size when performing a DMA copy.
This is only implemented for linear->linear and tiled->tiled copies. Conversion copies still retain the assert.
This bit is unset by some games for various purposes, and by nouveau when copying the vertex buffers.
* video_core: Arithmetic overflow fix for gl_rasterizer
- Fixed warnings, which were indicating incorrect behavior from integral
promotion rules and types larger than those in which arithmetic is
typically performed.
- Added const for variables where possible and meaningful.
* Changed the casts from C to C++ style
Changed the C-style casts to C++ casts as proposed.
Took also care about signed / unsigned behaviour.
This has gotten sufficiently large enough to warrant moving it to its
own source files. Especially given it dumps the file_sys headers around
code that doesn't use it for the most part.
This'll also make it easier to introduce a type alias for the
compatibility list, so a large unordered_map type declaration doesn't
need to be specified all the time (we don't want to propagate the
game_list_p.h include via the main game_list.h header).
Given we now have the kernel as a class, it doesn't make sense to keep
the current process pointer within the System class, as processes are
related to the kernel.
This also gets rid of a subtle case where memory wouldn't be freed on
core shutdown, as the current_process pointer would never be reset,
causing the pointed to contents to continue to live.
The only reason this include was necessary, was because the constructor
wasn't defaulted in the cpp file and the compiler would inline it
wherever it was used. However, given Controller is forward declared, all
those inlined constructors would see an incomplete type, causing a
compilation failure. So, we just place the constructor in the cpp file,
where it can see the complete type definition, allowing us to remove
this include.
This is called ~3k times per frame in SMO ingame.
My laptop spends ~3ms per frame on allocating and freeing this string.
Let's just stop printing this kind of redundant information.
This patch caches VAO objects instead of re-emiting all pointers per draw call.
Configuring this pointers is known as a fast task, but it yields too many GL
calls. So for better performance, just bind the VAO instead of 16 pointers.
The idea of this cache is to avoid redundant uploads. So we are going
to cache the uploaded buffers within the stream_buffer and just reuse
the old pointers.
The next step is to implement a VBO cache on GPU memory, but for now,
I want to check the overhead of the cache management. Fetching the
buffer over PCI-E should be quite fast.
The std::string generation with its malloc and free requirement
was a noticeable overhead. Also switch to an ordered_map to
avoid the std::hash call. As those maps usually have a size of
two elements, the lookup time shall not matter.
Multi-line doc comments still need the '<' after the ///, otherwise it's
treated as a regular comment and makes the original doc comment broken
in viewers, IDEs, etc. While we're at it, also fix some typos in the
comments.
Eliminates the need to rebuild some source files if the file_util header
ever changes. This also uncovered some indirect inclusions, which have
also been fixed.
Now that we have a class representing the kernel in some capacity, we
now have a place to put the named port map, so we move it over and get
rid of another piece of global state within the core.
This isn't required to be visible to anything outside of the main source
file, and will eliminate needing to rebuild anything else including the
header if the SSL class needs to be changed in the future.
The follow-up to e2457418da, which
replaces most of the includes in the core header with forward declarations.
This makes it so that if any of the headers the core header was
previously including change, then no one will need to rebuild the bulk
of the core, due to core.h being quite a prevalent inclusion.
This should make turnaround for changes much faster for developers.
core.h is kind of a massive header in terms what it includes within
itself. It includes VFS utilities, kernel headers, file_sys header,
ARM-related headers, etc. This means that changing anything in the
headers included by core.h essentially requires you to rebuild almost
all of core.
Instead, we can modify the System class to use the PImpl idiom, which
allows us to move all of those headers to the cpp file and forward
declare the bulk of the types that would otherwise be included, reducing
compile times. This change specifically only performs the PImpl portion.
As means to pave the way for getting rid of global state within core,
This eliminates kernel global state by removing all globals. Instead
this introduces a KernelCore class which acts as a kernel instance. This
instance lives in the System class, which keeps its lifetime contained
to the lifetime of the System class.
This also forces the kernel types to actually interact with the main
kernel instance itself instead of having transient kernel state placed
all over several translation units, keeping everything together. It also
has a nice consequence of making dependencies much more explicit.
This also makes our initialization a tad bit more correct. Previously we
were creating a kernel process before the actual kernel was initialized,
which doesn't really make much sense.
The KernelCore class itself follows the PImpl idiom, which allows
keeping all the implementation details sealed away from everything else,
which forces the use of the exposed API and allows us to avoid any
unnecessary inclusions within the main kernel header.
Given std::vector is a type with a non-trivial destructor, this
variable cannot be optimized away by the compiler, even if unused.
Because of that, something that was intended to be fairly lightweight,
was actually allocating 32KB and deallocating it at the end of the
function.
Makes the class interface consistent and provides accessors for
obtaining a reference to the memory manager instance.
Given we also return references, this makes our more flimsy uses of
const apparent, given const doesn't propagate through pointers in the
way one would typically expect. This makes our mutable state more
apparent in some places.
Many containers within the standard library provide different behaviors
based on whether or not a move constructor/assignment operator can be
guaranteed not to throw or not.
Notably, implementations will generally use std::move_if_noexcept (or an
internal implementation of it) to provide strong exception guarantees.
If a move constructor potentially throws (in other words, is not
noexcept), then certain behaviors will create copies, rather than moving
the values.
For example, consider std::vector. When a std::vector calls resize(),
there are two ways the elements can be relocated to the new block of
memory (if a reallocation happens), by copy, or by moving the existing
elements into the new block of memory. If a type does not have a
guarantee that it will not throw in the move constructor, a copy will
happen. However, if it can be guaranteed that the move constructor won't
throw, then the elements will be moved.
This just allows ResultVal to be moved instead of copied all the time if
ever used in conjunction with containers for whatever reason.
Rightnow, in games use GetAvailableLanguageCodes(), there is a WriteBuffer() with size larger than the buffer_size. (Core Critical core\hle\kernel\hle_ipc.cpp:WriteBuffer:296: size (0000000000000088) is greater than buffer_size (0000000000000078))
0x88 = 17(languages) * 8
0x78 = 15(languages) * 8
GetAvailableLanguageCodes() can only support 15 languages.
After firmware 4.0.0 there are 17 supported language instead of 15, to enable this GetAvailableLanguageCodes2() need to be used.
So GetAvailableLanguageCodes() will be caped at 15 languages.
Reference:
http://switchbrew.org/index.php/Settings_services
We can make this error code an alias of the resource limit exceeded
error code, allowing us to get rid of the lingering 3DS error code of
the same type.
We already have the variable itself set up to perform this task, so we
can just return its value from the currently executing process instead
of always stubbing it to zero.
This is needed because the title IDs of update NCAs will not use the update title ID. The only sure way to tell is to look for a partition with BKTR crypto.
By having the following TTF files in your yuzu sysdata directory. You can load sharedfonts via TTF files.
FontStandard.ttf
FontChineseSimplified.ttf
FontExtendedChineseSimplified.ttf
FontChineseTraditional.ttf
FontKorean.ttf
FontNintendoExtended.ttf
FontNintendoExtended2.ttf
While convenient as a std::array, it's also quite a large set of data as
well (32KB). It being an array also means data cannot be std::moved. Any
situation where the code is being set or relocated means that a full
copy of that 32KB data must be done.
If we use a std::vector we do need to allocate on the heap, however, it
does allow us to std::move the data we have within the std::vector into
another std::vector instance, eliminating the need to always copy the
program data (as std::move in this case would just transfer the pointers
and bare necessities over to the new vector instance).
Namespaces all OpenGL code under the OpenGL namespace.
Prevents polluting the global namespace and allows clear distinction
between other renderers' code in the future.
* Added bfttf loading
We can now load system bfttf fonts from system archives AND shared memory dumps. This allows people who have installed their system nand dumps to yuzu to automatically get shared font support. We also now don't hard code the offsets or the sizes of the shared fonts and it's all calculated for us now.
* Addressed plu fixups
* Style changes for plu
* Fixed logic error for plu and added more error checks.
The previous form of initializing done here is a C-ism, an empty set of
braces is sufficient for initializing (and doesn't potentially cause
missing brace warnings, given the first member of the struct is a COORD
struct).
Gets rid of the potential for C array-to-pointer decay, and also makes
pointer arithmetic to get the end of the copy range unnecessary. We can
just use std::array's begin() and end() member functions.
25us is far too small, and would result in std::this_thread::sleep_for
being called with this as a maximum value. This means that a guest
application that produces frames instantly would only be limited to
40 kHz.
25ms is a more appropriate value, as it allows for a 60 Hz refresh
rate while providing enough slack in the negative region.
LOG_TRACE is only enabled on debug builds which can be quite slow when
trying to debug graphics issues. Instead we can log the messages to the
debug log, which is available on both release and debug builds.
Avoids the need to rebuild multiple source files if the filesystem code
headers change.
This also gets rid of a few instances of indirect inclusions being
relied upon
Avoids the need to rebuild whatever includes the romfs factory header if
the loader header ever changes. We also don't need to include the main
core header. We can instead include the headers we specifically need.
Given these functions aren't intended to be used frequently, there's no
need to keep the std::string instances allocated for the whole lifetime
of the program. It's just a waste of memory.
We have an overload of WriteBuffer that accepts containers that satisfy
the ContiguousContainer concept, which std::array does, so we only need
to pass in the array itself.
ProfileInfo is quite a large struct in terms of data, and we don't need
to perform a copy in these instances, so we can just pass constant
references instead.
We can use the constructor initializer list and just compare the
contained u128's together instead of comparing each element
individually. Ditto for comparing against an invalid UUID.
This is an OpenGL renderer-specific data type. Given that, this type
shouldn't be used within the base interface for the rasterizer. Instead,
we can pass this information to the rasterizer via reference.
Given we use a base-class type within the renderer for the rasterizer
(RasterizerInterface), we want to allow renderers to perform more
complex initialization if they need to do such a thing. This makes it
important to reserve type information.
Given the OpenGL renderer is quite simple settings-wise, this is just a
simple shuffling of the initialization code. For something like Vulkan
however this might involve doing something like:
// Initialize and call rasterizer-specific function that requires
// the full type of the instance created.
auto raster = std::make_unique<VulkanRasterizer>(some, params);
raster->CallSomeVulkanRasterizerSpecificFunction();
// Assign to base class variable
rasterizer = std::move(raster)
We were only writing to the first render target before.
Note that this is only the GLSL side of the implementation, supporting multiple render targets requires more changes in the OpenGL renderer.
Dual Source blending is not implemented and stuff that uses it might not work at all.
Moving a const reference isn't possible, so this just results in a copy
(and given ProfileInfo is composed of trivial types and aggregates, a
move wouldn't really do anything).
Before each draw call, for every enabled vertex array configured as instanced, we take the current instance id and divide it by its configured divisor, then we multiply that by the corresponding stride and increment the start address by the resulting amount. This way we can simulate the vertex array being incremented once per instance without actually using OpenGL's instancing functions.
The mode can be used to set the predicate to true depending on the result of the logic operation. In some cases, this means discarding the result (writing it to register 0xFF (Zero)).
This is used by Super Mario Odyssey.
The SSY instruction pushes an address into the stack, and the SYNC instruction pops it. The current stack depth is 20, we should figure out if this is enough or not.
Prevents potentially making copies or doing silly things by accident
with the System instance, particularly given our current core is
designed (unfortunately) around one instantiable instance.
This will prevent the accidental case of:
auto instance = System::Instance();
being compiled without warning when it's supposed to be:
auto& instance = System::Instance();
550d662 load_store_exclusive: Define s == t state to be Constraint_NONE
0b69381 A64/translate: Allow for unpredictable behaviour to be defined
6d236d4 system: Implement MRS CNTFRQ_EL0
6cbb6fb A32/testenv: Add missing headers
6729328 externals: Update xbyak to v5.67
1812bd2 Squashed 'externals/xbyak/' changes from 2794cde7..671fc805
9a95802 externals: Document subtrees
714a840 A64: Implement SQ{ADD, SUB}, and UQ{ADD, SUB}'s vector variants
8cab459 A64: Implement UQADD/UQSUB's scalar variants
18a8151 ir: Add opcodes for unsigned saturating add and subtract
a5660ee x64/reg_alloc: Use type alias for array returned by GetArgumentInfo()
29489b5 ir/value: Use type alias CoprocessorInfo for std::array<u8, 8>
e23ba26 status_register_access: Add support for bits 0 and 1 of mask to MSR
55190bd fuzz_with_unicorn: Split utility functions into fuzz_util
23b049d A32/translate/load_store: Correct detection of writeback
7ec9f15 A32/translate: Add TranslateSingleInstruction
efeecb4 A32/ir_emitter: Bug fix: IREmitter::ExceptionRaised using incorrect opcode
08d1d19 A32/decoders: Split instruction list into include file
2d929cc tests: Refactor unicorn_emu to allow for A32 unicorn
f672368 microinstruction: Improve assert messages
7ebff50 emit_x64_vector: EmitVectorNarrow16: AVX512 implementation
edce230 emit_x64_vector: EmitVectorNarrow32: prefer pblendw to loading constant
Allows querying the inverse of IsDomain() to make things more readable.
This will likely also be usable in the event of implementing
ConvertDomainToSession().
We can make the enum class type compatible with fmt by providing an
overload of operator<<.
While we're at it, perform proper bounds checking. If something exceeds
the array, it should be a hard fail, because it's, without a doubt, a
programmer error in this case.
Many of these aren't necessary and will cause this file to be required
to be recompiled whenever any changes to those files are made, which
lengthens compile times for no reason.
This also removes an unused metadata variable from AppLoader_XCI
Using LOG_TRACE here isn't a good idea because LOG_TRACE is only enabled
when yuzu is compiled in debug mode. Debug mode is also quite slow, and
so we're potentially throwing away logging messages that can provide
value when trying to boot games.
The thread field serves to indicate which thread a log is related to and
provides the length of the thread's name, so we can print that out,
ditto for modules.
Now we can know what threads are potentially spawning off logging
messages (for example Lydie & Suelle bounces between MainThread and
LoadingThread when initializing the game).
We keep track of the current instance and update an uniform in the shaders to let them know which instance they are.
Instanced vertex arrays are not yet implemented.
Previously core itself was the library containing the code to gather
common information (build info, CPU info, and OS info), however all of
this isn't core-dependent and can be moved to the common code and use
the common interfaces. We can then just call those functions from the
core instead.
This will allow replacing our CPU detection with Xbyak's which has
better detection facilities than ours. It also keeps more
architecture-dependent code in common instead of core.
These currently aren't used and contain commented out source code that
corresponds to Dolphin's JIT. Given our CPU code is organized quite
differently, we shouldn't be keeping this around (at the moment it just
adds to compile times marginally).
The filter is returned via const reference, so this was making a
pointless copy of the entire filter every time a message was being
pushed into the logger instance.
Despite being covered by a global mutex, we should still ensure that the
class handles its reference counts properly. This avoids potential
shenanigans when it comes to data races.
Given this is the root object that drives quite a bit of the kernel
object hierarchy, ensuring we always have the correct behavior (and no
races) is a good thing.
We divide the number of ticks to add by the number of cores (4) to obtain a more or less rough estimate of the actual number of ticks added. This assumes that all 4 cores are doing similar work. Previously we were adding ~4 times the number of ticks, thus making the games think that time was going way too fast.
This lets us bypass certain hangs in some games like Breath of the Wild.
We should modify our CoreTiming to support multiple cores (both running in a single thread, and in multiple host threads).
The current core may have nothing to do with the core where the new thread was scheduled to run. In case it's the same core, then the following PrepareReshedule call will take care of that.
WakeAfterDelay might be called from any host thread, so err on the side of caution and use the thread-safe CoreTiming::ScheduleEventThreadsafe.
Note that CoreTiming is still far from thread-safe, there may be more things we have to work on for it to be up to par with what we want.
Exit from AddMutexWaiter early if the thread is already waiting for a mutex owned by the owner thread.
This accounts for the possibility of a thread that is waiting on a condition variable being awakened twice in a row.
Also added more validation asserts.
This should fix one of the random crashes in Breath Of The Wild.
struct should be used when the data type is very simple or otherwise has
no invariants associated with it. Given these are used to form a
hierarchy, class should be used instead.