* Implement stencil testing
* Implement depth testing
* Implement face culling
* Implement front face
* Comparison functions now take OGL enums too
* Fix front facing when flipping was used
* Add depth and stencil clear values
* Add WIP support for Vertex Program A, add the FADD_I32 shader instruction, small fix on FFMA_I encoding, nits
* Add separate subroutines for program A/B, and copy attributes to a temp
* Move finalization code to main
* Add new line after flip uniform on the shader
* Handle possible case where VPB uses an output attribute written by VPA but not available on the vbo
* Address PR feedback
* Sort uniform binding to avoid possible failures in drivers fewer bindings
* Throw exception for Cbuf overflow
* Search for free bindings instead of using locked ones
* EnsureAllocated when binding buffers
* Fix uniform bindings
* Remove spaces
* Use 64 KiB UBOs when available
* Remove double colon
* Use IdentationStr and avoid division in Cbuf offset
* Add spaces
* Call OpenGL functions directly, remove the pfifo thread, some refactoring
* Fix PerformanceStatistics calculating the wrong host fps, remove wait event on PFIFO as this wasn't exactly was causing the freezes (may replace with an exception later)
* Organized the Gpu folder a bit more, renamed a few things, address PR feedback
* Make PerformanceStatistics thread safe
* Remove unused constant
* Use unlimited update rate for better pref
* Initial implementation of the texture cache
* Cache vertex and index data aswell, some cleanup
* Improve handling of the cache by storing cached ranges on a list for each page
* Delete old data from the caches automatically, ensure that the cache is cleaned when the mapping/size changes, and some general cleanup
* Initial implementation of NvMap/NvHostCtrl
* More work on NvHostCtrl
* Refactoring of nvservices, move GPU Vmm, make Vmm per-process, refactor most gpu devices, move Gpu to Core, fix CbBind
* Implement GetGpuTime, support CancelSynchronization, fix issue on InsertWaitingMutex, proper double buffering support (again, not working properly for commercial games, only hb)
* Try to fix perf regression reading/writing textures, moved syncpts and events to a UserCtx class, delete global state when the process exits, other minor tweaks
* Remove now unused code, add comment about probably wrong result codes
* Add support for events, move concept of domains to IpcService
* Support waiting for KThread, remove some test code, other tweaks
* Use move handle on NIFM since I can't test that now, it's better to leave it how it was