<div dir="ltr"><div>Hi,</div><div><br></div>Kindly dont push dis patch, since have missed to comment <div><br></div><div>Thanks,</div><div>Aasaipriya</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 12, 2015 at 4:49 PM,  <span dir="ltr"><<a href="mailto:aasaipriya@multicorewareinc.com" target="_blank">aasaipriya@multicorewareinc.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"># HG changeset patch<br>
# User Aasaipriya Chandran <<a href="mailto:aasaipriya@multicorewareinc.com">aasaipriya@multicorewareinc.com</a>><br>
# Date 1434107886 -19800<br>
#      Fri Jun 12 16:48:06 2015 +0530<br>
# Node ID 5fc5e0d20a595a7e666f181e9f7593fbc2fbe2df<br>
# Parent  2cd9183df03edff0b148bab6e133dfe1ae4f69a1<br>
asm: avx2 interp_8tap_hv_pp for 8bpp<br>
<br>
Removing x265_interp_8tap_hv_pp_16x16_avx2 seperate asm code, since its giving same performnace as calling interp_8tap_hv_pp_cpu C function(which calls luma_hps and luma_vsp asm functions individually)<br>
<br>
Including ALL_LUMA_PU_T for luma_hvpp which calls interp_8tap_hv_pp_cpu C function.<br>
ALL_LUMA_PU_T has declared all sizes except 4x4, hence including luma_hvpp[4x4] separately.<br>
<br>
diff -r 2cd9183df03e -r 5fc5e0d20a59 source/common/x86/asm-primitives.cpp<br>
--- a/source/common/x86/asm-primitives.cpp      Thu Jun 11 17:06:46 2015 +0530<br>
+++ b/source/common/x86/asm-primitives.cpp      Fri Jun 12 16:48:06 2015 +0530<br>
@@ -2835,6 +2835,7 @@<br>
         ALL_LUMA_PU(luma_vps, interp_8tap_vert_ps, avx2);<br>
         ALL_LUMA_PU(luma_vsp, interp_8tap_vert_sp, avx2);<br>
         ALL_LUMA_PU(luma_vss, interp_8tap_vert_ss, avx2);<br>
+        p.pu[LUMA_4x4].luma_vsp = x265_interp_8tap_vert_sp_4x4_avx2;<br>
<br>
         // missing 4x8, 4x16, 24x32, 12x16 for the fill set of luma PU<br>
         p.pu[LUMA_4x4].luma_hpp = x265_interp_8tap_horiz_pp_4x4_avx2;<br>
@@ -3106,7 +3107,9 @@<br>
         p.chroma[X265_CSP_I444].pu[LUMA_64x16].filter_vss = x265_interp_4tap_vert_ss_64x16_avx2;<br>
         p.chroma[X265_CSP_I444].pu[LUMA_16x64].filter_vss = x265_interp_4tap_vert_ss_16x64_avx2;<br>
<br>
-        p.pu[LUMA_16x16].luma_hvpp = x265_interp_8tap_hv_pp_16x16_avx2;<br>
+        //p.pu[LUMA_16x16].luma_hvpp = x265_interp_8tap_hv_pp_16x16_avx2;<br>
+        ALL_LUMA_PU_T(luma_hvpp, interp_8tap_hv_pp_cpu);<br>
+        p.pu[LUMA_4x4].luma_hvpp = interp_8tap_hv_pp_cpu<LUMA_4x4>;<br>
<br>
         p.pu[LUMA_32x8].convert_p2s = x265_filterPixelToShort_32x8_avx2;<br>
         p.pu[LUMA_32x16].convert_p2s = x265_filterPixelToShort_32x16_avx2;<br>
</blockquote></div><br></div>