* Implement SMULWB, SMULWT, SMLAWB, SMLAWT, and add tests for some multiply instructions
* Improve test descriptions
* Rename SMULH to SMUL__
* Add SSAT, SSAT16, USAT and USAT16 ARM32 instructions
* Fix new tests
* Replace AND 0xFFFF with 16-bits zero extension (more efficient)