Couldn't merge the third block since it depends on a value calculated in the loop (r6). Only other way I can think of would be loading the data into seperate GPRs and reusing them. This would require 2 cycles (push) and 2 cycles (pop) to release 4 cycles(load). Am I thinking wrong ? Any other way ?