* Implement Thumb (32-bit) memory (ordered), multiply and bitfield instructions
* Remove public from interface
* Fix T32 BL immediate and implement signed and unsigned extend instructions
* Implement VRSRA, VRSHRN, VQSHRUN, VQMOVN, VQMOVUN, VQADD, VQSUB, VRHADD, VPADDL, VSUBL, VQDMULH and VMLAL Arm32 NEON instructions
* PPTC version
* Fix VQADD/VQSUB
* Improve MRC/MCR handling and exception messages
In case data is being recompiled as code, we don't want to throw at emit stage, instead we should only throw if it actually tries to execute
* Add EntryTable<TEntry>
* Add on translation call counting
* Add Counter
* Add PPTC support
* Make Counter a generic & use a 32-bit counter instead
* Return false on overflow
* Set PPTC version
* Print more information about the rejit queue
* Make Counter<T> disposable
* Remove Block.TailCall since it is not used anymore
* Apply suggestions from code review
Address gdkchan's feedback
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
* Fix more stale docs
* Remove rejit requests queue logging
* Make Counter<T> finalizable
Most certainly quite an odd use case.
* Make EntryTable<T>.TryAllocate set entry to default
* Re-trigger CI
* Dispose Counters before they hit the finalizer queue
* Re-trigger CI
Just for good measure...
* Make EntryTable<T> expandable
* EntryTable is now expandable instead of being a fixed slab.
* Remove EntryTable<T>.TryAllocate
* Remove Counter<T>.TryCreate
Address LDj3SNuD's feedback
* Apply suggestions from code review
Address LDj3SNuD's feedback
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
* Remove useless return
* POH approach, but the sequel
* Revert "POH approach, but the sequel"
This reverts commit 5f5abaa247.
The sequel got shelved
* Add extra documentation
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
* Implement VCNT based on AArch64 CNT
Add tests
* Update PTC version
* Address LDj's comments
* Explicit size in encoding
* Tighter tests
* Replace SoftFallback with IR helper
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
* Reduce one BitwiseAnd from IR fallback
Based on popcount64b from https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation
* Rename parameter and add assert
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
* Add Pmull_V Sse fast path only, both "8/16B -> 8H" and "1/2D -> 1Q" variants; with Test.
* Add Clmul fast path for the 128 bits variant.
* Small optimisation (save 60 instructions) for the Sse fast path about the 128 bits variant.
* Add slow path, both variants. Fix V128 Shl/Shr when shift = 0.
* A32: Add Vmull_I P64 variant (slow path); not tested.
* A32: Add Vmull_I_P8_P64 Test and fix P64 variant.
* Added support for offline invalidation, via PPTC, of low cq translations replaced by high cq translations; both on a single run and between runs.
Added invalidation of .cache files in the event of reuse on a different user operating system.
Added .info and .cache files invalidation in case of a failed stream decompression.
Nits.
* InternalVersion = 1712;
* Nits.
* Address comment.
* Get rid of BinaryFormatter.
Nits.
* Move Ptc.LoadTranslations().
Nits.
* Nits.
* Fixed corner cases (in case backup copies have to be used). Added save logs.
* Not core fixes.
* Complement to the previous commit. Added load logs. Removed BinaryFormatter leftovers.
* Add LoadTranslations log.
* Nits.
* Removed the search and management of LowCq overlapping functions.
* Final increment of .info and .cache flags.
* Nit.
* GetIndirectFunctionAddress(): Validate that writing actually takes place in dynamic table memory range (and not elsewhere).
* Fix Ptc.UpdateInfo() due to rebase.
* Nit for retrigger Checks.
* Nit for retrigger Checks.
* Implement VFNMA.F<32/64>
* Update PTC Version
* Update Implementation & Renames & Correct Order
* Fix alignment
* Update implementation to not trigger assert
* Actually use the intrinsic that makes sense :)
* SIMD&FP load/store with scale > 4 should be undefined
* Catch more invalid encodings for FP&SIMD LDR/STR (reg variant)
* Set PTC version to PR number
* Add Fmax/minv_V & S/Ushl_S Inst.s with Tests. Fix Maxps/d & Minps/d double zero sign handling. Allows better handling of NaNs.
* Optimized EmitSse2VectorIsNaNOpF() for multiple uses per opF.
* Add CRC32 A32 instructions.
* Fix CRC32 instructions.
* Add CRC intrinsic and fast path.
Loop is currently unrolled, will look into adding temp vars after tests are added.
* Begin work on Crc tests
* Fix SSE4.2 path for CRC32C, finialize tests.
* Remove unused IR path.
* Fix spacing between prefix checks.
* This should be Src.
* PTC Version
* OpCodeTable Order
* Integer check improvement. Value and Crc can be either 32 or 64 size.
* This wasn't necessary...
* If size is 3, value type must be I64.
* Fix same src+dest handling for non crc intrinsics.
* Pre-fix (ha) issue with vex encodings
* Add Vmvn (register), tests for both Vmvn variants.
* Add Vpmin, Vpmax, improve Non-FastFp accuracy for Vpadd
* Rebase on top of PTC.
* Add Nopcode
* Increment PTC version.
* Fix nits.
* Generalize tail continues
* Fix DecodeBasicBlock
`Next` and `Branch` would be null, which is not the state expected by
the branch instructions. They end up branching or falling into a block
which is never populated by the `Translator`. This causes an assert to
be fired when building the CFG.
* Clean up Decode overloads
* Do not synchronize when branching into exit block
If we're branching into an exit block, that exit block will tail
continue into another translation which already has a synchronization.
* Remove A32 predicate tail continue
If `block` is not an exit block then the `block.Next` must exist (as
per the last instruction of `block`).
* Throw if decoded 0 blocks
Address gdkchan's feedback
* Rebuild block list instead of setting to null
Address gdkchan's feedback