* Use correct shader local size instead of a hardcoded size
* Remove unused uniform block
* Update XML doc
* Local memory size has 23 bits on maxwell
* Generate compute QMD struct from nv open doc header
* Remove dummy arrays when shared or local memory is not used, other improvements