In case of redundant yields, the scheduler will now idle the core for
it's timeslice, in order to avoid continuously yielding the same thing
over and over.
This only encourages the use of the global system instance (which will
be phased out long-term). Instead, we use the direct system function
call directly to remove the appealing but discouraged short-hand.
If an unmapping operation fails, we shouldn't be decrementing the amount
of memory mapped and returning that the operation was successful. We
should actually be returning the error code in this case.
Avoids potentially expensive (depending on the size of the memory block)
allocations by reserving the necessary memory before performing both
insertions. This avoids scenarios where the second insert may cause a
reallocation to occur.
Avoids needing to read the same long sequence of code in both code
paths. Also makes it slightly nicer to read and debug, as the locals
will be able to be shown in the debugger.
This commit ensures that all backing memory allocated for the Guest CPU
is aligned to 256 bytes. This due to how gpu memory works and the heavy
constraints it has in the alignment of physical memory.
This was initially necessary when AArch64 JIT emulation was in its
infancy and all memory-related instructions weren't implemented.
Given the JIT now has all of these facilities implemented, we can remove
these functions from the CPU interface.
Prior to PR, Yuzu did not restore memory to RW-
on unmap of mirrored memory or unloading of NRO.
(In fact, in the NRO case, the memory was unmapped
instead of reprotected to --- on Load, so it was
actually lost entirely...)
This PR addresses that, and restores memory to RW-
as it should.
This fixes a crash in Super Smash Bros when creating
a World of Light save for the first time, and possibly
other games/circumstances.
This sets the DeviceMapped attribute for GPU-mapped memory blocks,
and prevents merging device mapped blocks. This prevents memory
mapped from the gpu from having its backing address changed by
block coalesce.
This implements svcMapPhysicalMemory/svcUnmapPhysicalMemory for Yuzu,
which can be used to map memory at a desired address by games since
3.0.0.
It also properly parses SystemResourceSize from NPDM, and makes
information available via svcGetInfo.
This is needed for games like Super Smash Bros. and Diablo 3 -- this
PR's implementation does not run into the "ASCII reads" issue mentioned
in the comments of #2626, which was caused by the following bugs in
Yuzu's memory management that this PR also addresses:
* Yuzu's memory coalescing does not properly merge blocks. This results
in a polluted address space/svcQueryMemory results that would be
impossible to replicate on hardware, which can lead to game code making
the wrong assumptions about memory layout.
* This implements better merging for AllocatedMemoryBlocks.
* Yuzu's implementation of svcMirrorMemory unprotected the entire
virtual memory range containing the range being mirrored. This could
lead to games attempting to map data at that unprotected
range/attempting to access that range after yuzu improperly unmapped
it.
* This PR fixes it by simply calling ReprotectRange instead of
Reprotect.
Prior to execution within a process beginning, the process establishes
its own TLS region for uses (as far as I can tell) related to exception
handling.
Now that TLS creation was decoupled from threads themselves, we can add
this behavior to our Process class. This is also good, as it allows us
to remove a stub within svcGetInfo, namely querying the address of that
region.
Provides a more accurate name for the memory region and also
disambiguates between the map and new map regions of memory, making it
easier to understand.
Handles the placement of the stack a little nicer compared to the
previous code, which was off in a few ways. e.g.
The stack (new map) region, shouldn't be the width of the entire address
space if the size of the region calculation ends up being zero. It
should be placed at the same location as the TLS IO region and also have
the same size.
In the event the TLS IO region contains a size of zero, we should also
be doing the same thing. This fixes our memory layout a little bit and
also resolves some cases where assertions can trigger due to the memory
layout being incorrect.
Extracts out all of the thread local storage management from thread
instances themselves and makes the owning process handle the management
of the memory. This brings the memory management slightly more in line
with how the kernel handles these allocations.
Furthermore, this also makes the TLS page management a little more
readable compared to the lingering implementation that was carried over
from Citra.
This will be necessary for making our TLS slot management slightly more
straightforward. This can also be utilized for other purposes in the
future.
We can implement the existing simpler overload in terms of this one
anyways, we just pass the beginning and end of the ASLR region as the
boundaries.
The old implementation had faulty Threadsafe methods where events could
be missing. This implementation unifies unsafe/safe methods and makes
core timing thread safe overall.
This is performing more work than would otherwise be necessary during
VMManager's destruction. All we actually want to occur in this scenario
is for any allocated memory to be freed, which will happen automatically
as the VMManager instance goes out of scope.
Anything else being done is simply unnecessary work.
Given we don't currently implement the personal heap yet, the existing
memory querying functions are essentially doing what the memory querying
types introduced in 6.0.0 do.
So, we can build the necessary machinery over the top of those and just
use them as part of info types.
These are only used from within this translation unit, so they don't
need to have external linkage. They were intended to be marked with this
anyways to be consistent with the other service functions.
Renames the members to more accurately indicate what they signify.
"OneShot" and "Sticky" are kind of ambiguous identifiers for the reset
types, and can be kind of misleading. Automatic and Manual communicate
the kind of reset type in a clearer manner. Either the event is
automatically reset, or it isn't and must be manually cleared.
The "OneShot" and "Sticky" terminology is just a hold-over from Citra
where the kernel had a third type of event reset type known as "Pulse".
Given the Switch kernel only has two forms of event reset types, we
don't need to keep the old terminology around anymore.
This reduces the boilerplate that services have to write out the current thread explicitly. Using current thread instead of client thread is also semantically incorrect, and will be a problem when we implement multicore (at which time there will be multiple current threads)
These are actually quite important indicators of thread lifetimes, so
they should be going into the debug log, rather than being treated as
misc info and delegated to the trace log.
Makes the code much nicer to follow in terms of behavior and control
flow. It also fixes a few bugs in the implementation.
Notably, the thread's owner process shouldn't be accessed in order to
retrieve the core mask or ideal core. This should be done through the
current running process. The only reason this bug wasn't encountered yet
is because we currently only support running one process, and thus every
owner process will be the current process.
We also weren't checking against the process' CPU core mask to see if an
allowed core is specified or not.
With this out of the way, it'll be less noisy to implement proper
handling of the affinity flags internally within the kernel thread
instances.
This is a holdover from Citra, where the 3DS has both
WaitSynchronization1 and WaitSynchronizationN. The switch only has one
form of wait synchronizing (literally WaitSynchonization). This allows
us to throw out code that doesn't apply at all to the Switch kernel.
Because of this unnecessary dichotomy within the wait synchronization
utilities, we were also neglecting to properly handle waiting on
multiple objects.
While we're at it, we can also scrub out any lingering references to
WaitSynchronization1/WaitSynchronizationN in comments, and change them
to WaitSynchronization (or remove them if the mention no longer
applies).
The actual behavior of this function is slightly more complex than what
we're currently doing within the supervisor call. To avoid dumping most
of this behavior in the supervisor call itself, we can migrate this to
another function.
This member variable is entirely unused. It was only set but never
actually utilized. Given that, we can remove it to get rid of noise in
the thread interface.
Essentially performs the inverse of svcMapProcessCodeMemory. This unmaps
the aliasing region first, then restores the general traits of the
aliased memory.
What this entails, is:
- Restoring Read/Write permissions to the VMA.
- Restoring its memory state to reflect it as a general heap memory region.
- Clearing the memory attributes on the region.
This gives us significantly more control over where in the
initialization process we start execution of the main process.
Previously we were running the main process before the CPU or GPU
threads were initialized (not good). This amends execution to start
after all of our threads are properly set up.
Initially required due to the split codepath with how the initial main
process instance was initialized. We used to initialize the process
like:
Init() {
main_process = Process::Create(...);
kernel.MakeCurrentProcess(main_process.get());
}
Load() {
const auto load_result = loader.Load(*kernel.GetCurrentProcess());
if (load_result != Loader::ResultStatus::Success) {
// Handle error here.
}
...
}
which presented a problem.
Setting a created process as the main process would set the page table
for that process as the main page table. This is fine... until we get to
the part that the page table can have its size changed in the Load()
function via NPDM metadata, which can dictate either a 32-bit, 36-bit,
or 39-bit usable address space.
Now that we have full control over the process' creation in load, we can
simply set the initial process as the main process after all the loading
is done, reflecting the potential page table changes without any
special-casing behavior.
We can also remove the cache flushing within LoadModule(), as execution
wouldn't have even begun yet during all usages of this function, now
that we have the initialization order cleaned up.
Our initialization process is a little wonky than one would expect when
it comes to code flow. We initialize the CPU last, as opposed to
hardware, where the CPU obviously needs to be first, otherwise nothing
else would work, and we have code that adds checks to get around this.
For example, in the page table setting code, we check to see if the
system is turned on before we even notify the CPU instances of a page
table switch. This results in dead code (at the moment), because the
only time a page table switch will occur is when the system is *not*
running, preventing the emulated CPU instances from being notified of a
page table switch in a convenient manner (technically the code path
could be taken, but we don't emulate the process creation svc handlers
yet).
This moves the threads creation into its own member function of the core
manager and restores a little order (and predictability) to our
initialization process.
Previously, in the multi-threaded cases, we'd kick off several threads
before even the main kernel process was created and ready to execute (gross!).
Now the initialization process is like so:
Initialization:
1. Timers
2. CPU
3. Kernel
4. Filesystem stuff (kind of gross, but can be amended trivially)
5. Applet stuff (ditto in terms of being kind of gross)
6. Main process (will be moved into the loading step in a following
change)
7. Telemetry (this should be initialized last in the future).
8. Services (4 and 5 should ideally be alongside this).
9. GDB (gross. Uses namespace scope state. Needs to be refactored into a
class or booted altogether).
10. Renderer
11. GPU (will also have its threads created in a separate step in a
following change).
Which... isn't *ideal* per-se, however getting rid of the wonky
intertwining of CPU state initialization out of this mix gets rid of
most of the footguns when it comes to our initialization process.