[x264-devel] arm: Load mb_y properly in mbtree_propagate_list_internal_neon
Martin Storsjö
git at videolan.org
Tue Jan 24 21:14:11 CET 2017
x264 | branch: master | Martin Storsjö <martin at martin.st> | Tue Dec 27 00:22:48 2016 +0200| [2ebdb90bd32c3d1618b1c5b360bff750b82b1d0b] | committer: Henrik Gramner
arm: Load mb_y properly in mbtree_propagate_list_internal_neon
The previous version, attempting to load two stack parameters at once,
only would have worked if they were interpreted and loaded as 32 bit
elements, not when loading them as 16 bit elements.
> http://git.videolan.org/gitweb.cgi/x264.git/?a=commit;h=2ebdb90bd32c3d1618b1c5b360bff750b82b1d0b
---
common/arm/mc-a.S | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/common/arm/mc-a.S b/common/arm/mc-a.S
index 165c1fa..8c15191 100644
--- a/common/arm/mc-a.S
+++ b/common/arm/mc-a.S
@@ -1818,13 +1818,14 @@ function x264_mbtree_propagate_cost_neon
endfunc
function x264_mbtree_propagate_list_internal_neon
- vld2.16 {d4[], d5[]}, [sp] @ bipred_weight, mb_y
+ vld1.16 {d4[]}, [sp] @ bipred_weight
movrel r12, pw_0to15
vmov.u16 q10, #0xc000
vld1.16 {q0}, [r12, :128] @h->mb.i_mb_x,h->mb.i_mb_y
+ ldrh r12, [sp, #4]
vmov.u32 q11, #4
vmov.u8 q3, #32
- vdup.u16 q8, d5[0] @ mb_y
+ vdup.u16 q8, r12 @ mb_y
vzip.u16 q0, q8
ldr r12, [sp, #8]
8:
More information about the x264-devel
mailing list