[x265] [arm64] Status and combined patch

chen chenm003 at 163.com
Fri Jan 21 10:33:32 UTC 2022


Hi Sebastian,


Thank you for your contribution, I reviewed and made some of comments, could you please take a look.


Regards,
Min Chen




At 2022-01-19 23:25:30, "Pop, Sebastian" <spop at amazon.com> wrote:

Hi Gopi,





Please find attached a patch that ports scanPosLast to arm64 NEON.





     scanPosLast  5.08x    842.11          4277.83





When encoding a video where scanPosLast was accounting for 4.66% of the total samples,

with the patch the function now accounts for 1.4% of the total samples.





I still see costCoeffNxN_c at 3.5% on some profiles, and I will send a patch to implement it for arm64.





Would it be possible to commit all the arm64 NEON patches to x265 git?


How can I help to speed up the process?





Thanks,


Sebastian








From: x265-devel <x265-devel-bounces at videolan.org> on behalf of Pop, Sebastian <spop at amazon.com>
Sent: Thursday, December 9, 2021 4:48 PM
To: Gopi Satykrishna Akisetty; Development for x265
Subject: Re: [x265] [arm64] Status and combined patch
 

Hi,





Attached is a patch for weight_pp and weight_sp for arm64.





             weight_pp  4.66x    182.14          849.07
             weight_sp  1.16x    621.23          718.51



Sebastian







From: x265-devel <x265-devel-bounces at videolan.org> on behalf of Pop, Sebastian <spop at amazon.com>
Sent: Monday, November 15, 2021 5:43 PM
To: Gopi Satykrishna Akisetty; Development for x265
Subject: Re: [x265] [arm64] Status and combined patch
 

Hi,





Here is a patch to implement 8bit normFact on arm64.





normFact[8x8]6.98x 11.99    83.66   
normFact[16x16]6.40x 53.95    345.39  
normFact[32x32]5.54x 245.17   1359.08 
normFact[64x64]5.45x 996.32   5433.85 

Sebastian








From: x265-devel <x265-devel-bounces at videolan.org> on behalf of Pop, Sebastian <spop at amazon.com>
Sent: Monday, November 15, 2021 4:58 PM
To: Gopi Satykrishna Akisetty; Development for x265
Subject: Re: [x265] [arm64] Status and combined patch
 

Hi,





Here is a patch to implement 8bit ssimDist on top of the previous patches.


Tested on arm64-linux.





ssimDist[4x4]   3.66x    8.67            31.72
ssimDist[8x8]   4.69x    27.65           129.62
ssimDist[16x16] 5.00x    106.38          531.60
ssimDist[32x32] 6.98x    434.51          3034.55
ssimDist[64x64] 6.72x    1792.07         12046.95



Sebastian





From: x265-devel <x265-devel-bounces at videolan.org> on behalf of Pop, Sebastian <spop at amazon.com>
Sent: Monday, October 25, 2021 7:08 PM
To: Gopi Satykrishna Akisetty; Development for x265
Subject: Re: [x265] [arm64] Status and combined patch
 

Hi Gopi,





Please find attached the updated patches to fix an issue in sad_x4[12x16] where I was using v31 uninitialized.


The patch now passes TestBench and produces the same output on the following command:


./x265 --input=/home/ubuntu/old_town_cross_444_720p50.y4m --preset slower --crf 4 --cu-lossless --no-info --hash=1 --psnr --ssim -o out.hevc





I have also tested the patch with ./build/linux/mulitlib.sh.





Sebastian





From: Pop, Sebastian
Sent: Friday, October 22, 2021 10:30 AM
To: Gopi Satykrishna Akisetty
Cc: Siva Viswanathan; Janani T E; Liwei Wang
Subject: Re: [EXTERNAL] [x265] [arm64] Status and combined patch
 

Thanks Gopi for the clarification.


I will make sure the values in the following fields remain the same with and without the patches:


"222539.85 kb/s, Avg QP:11.68, Global PSNR: 47.406, SSIM Mean Y: 0.9957770 (23.744 dB)"





From: Gopi Satykrishna Akisetty <gopi.satykrishna at multicorewareinc.com>
Sent: Friday, October 22, 2021 10:19 AM
To: Pop, Sebastian
Cc: Siva Viswanathan; Janani T E; Liwei Wang
Subject: RE: [EXTERNAL] [x265] [arm64] Status and combined patch
 
|

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

|


Hi Sebastian,


The bitstream generated on the master tip is not the same as the bitstream generated after applying the eight patches. You can get this info from the logs where bitrate, PSNR, SSIM values are printed. 
For ex:
encoded 500 frames in 1823.56s (0.27 fps), 222539.85 kb/s, Avg QP:11.68, Global PSNR: 47.406, SSIM Mean Y: 0.9957770 (23.744 dB)

vs
encoded 500 frames in 1595.87s (0.31 fps), 222530.92 kb/s, Avg QP:11.68, Global PSNR: 47.405, SSIM Mean Y: 0.9957767 (23.743 dB)



Thanks,
Gopi.


On Fri, Oct 22, 2021 at 8:43 PM Pop, Sebastian <spop at amazon.com> wrote:


Hi Gopi,


Could you please let me know exactly what I need to pay attention to in the diff between logs on "Master Tip" and logs "after applying 8 patches".


i.e., which numbers in the diff need to be exactly the same.




Thanks,

Sebastian 





From: Gopi Satykrishna Akisetty <gopi.satykrishna at multicorewareinc.com>
Sent: Friday, October 22, 2021 9:25 AM
To: Pop, Sebastian
Cc: Siva Viswanathan; Janani T E; Liwei Wang
Subject: RE: [EXTERNAL] [x265] [arm64] Status and combined patch
 
|

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

|


Hi Sebastian,


We are seeing some output changes after applying the 8 patches shared above by you. I have attached some sample logs below. Can you look into this issue and fix the output changes with the patches.


Thanks,
Gopi.


On Wed, Oct 6, 2021 at 11:09 PM Pop, Sebastian <spop at amazon.com> wrote:


Hi,





please find attached the patches to optimize with arm64 assembly about 85% of the x265 routines optimized for the x86 target.


The patches have been tested on AWS Graviton2 arm64-linux and on Apple M1 processor.


The assembly optimized routines are faster than the reference C and faster than the arm64 intrinsics optimized routines.





I will submit additional patches to optimize for arm64 the remaining 15% of the functions.





Thanks,


Sebastian





From: Pop, Sebastian
Sent: Friday, September 24, 2021 12:46 PM
To: Gopi Satykrishna Akisetty; Development for x265
Cc: Siva Viswanathan; Janani T E; Liwei Wang
Subject: Re: [EXTERNAL] [x265] [arm64] Status and combined patch
 

I am resubmitting all the arm64 x265 patches as a compressed attachment because the x265 mailing has a limit on email size.




Sebastian 





From: Pop, Sebastian
Sent: Friday, September 24, 2021 12:36 PM
To: Gopi Satykrishna Akisetty; Development for x265
Cc: Siva Viswanathan; Janani T E; Liwei Wang
Subject: Re: [EXTERNAL] [x265] [arm64] Status and combined patch
 

Thanks for the bug report.


I was able to reproduce the build errors on an Apple M1.


Please see attached the amended patches that pass TestBench on M1.


I also have fixed builds with clang on arm64-linux.





Next: I will submit for review blockcopy.S, and I will make sure it passes on arm64-linux with gcc and clang and on Apple M1.





Sebastian 





From: Gopi Satykrishna Akisetty <gopi.satykrishna at multicorewareinc.com>
Sent: Monday, September 20, 2021 10:45 PM
To: Pop, Sebastian
Cc: Siva Viswanathan; Janani T E; Liwei Wang
Subject: RE: [EXTERNAL] [x265] [arm64] Status and combined patch
 
|

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

|


Hi,
We are seeing build errors with the patches on Apple M1 with AppleClang 12.0.5.12050022. Can you check and send the updated patches?


Thanks,
Gopi.


On Thu, Sep 16, 2021 at 6:56 AM Pop, Sebastian <spop at amazon.com> wrote:


I am re-sending ipfilters patch with gzipped attachment.

(The mailing list rejected the previous email with the patch larger than 200K.)





Sebastian


From: Pop, Sebastian
Sent: Wednesday, September 15, 2021 8:21 PM
To: Gopi Satykrishna Akisetty; Development for x265
Subject: Re: [x265] [arm64] Status and combined patch
 

Hi,





Please find attached a patch that ports all ip filters to arm64.


The patch is to be applied on top of p2s patch.


For the moment only 8bit is accelerated.  I am still working on 10bit and 12bit.


With this patch I have seen better results on Graviton2 compared to the NEON intrinsics compiled with gcc-11 and clang-12.





Thanks,


Sebastian

From: x265-devel <x265-devel-bounces at videolan.org> on behalf of Pop, Sebastian <spop at amazon.com>
Sent: Wednesday, September 15, 2021 7:15 PM
To: Gopi Satykrishna Akisetty
Cc: Development for x265
Subject: Re: [x265] [arm64] Status and combined patch
 

Hi,


Here is the updated patch for p2s on top of


https://bitbucket.org/multicoreware/x265_git/commits/4bf31dc15fb6d1f93d12ecf21fad5e695f0db5c0





Sebastian


From: Pop, Sebastian
Sent: Thursday, September 9, 2021 3:45 AM
To: Gopi Satykrishna Akisetty
Cc: Liwei Wang; Siva Viswanathan; Janani T E; Development for x265
Subject: Re: [EXTERNAL] [x265] [arm64] Status and combined patch
 

Hi Gopi,




Please see attached the patch for p2s.

The patch passes TestBench for 8bit, 10bit, and 12bit configurations.





Next, I will submit all the ipfilter functions.





Thanks,

Sebastian  





From: Pop, Sebastian
Sent: Wednesday, September 8, 2021 12:40 PM
To: Gopi Satykrishna Akisetty
Cc: Liwei Wang; Siva Viswanathan; Janani T E; Development for x265
Subject: Re: [EXTERNAL] [x265] [arm64] Status and combined patch
 

Thanks Gopi for the instructions.


I was able to see TestBench failing for 10bit and 12bit configurations.





Sebastian


From: Gopi Satykrishna Akisetty <gopi.satykrishna at multicorewareinc.com>
Sent: Wednesday, September 8, 2021 10:18 AM
To: Pop, Sebastian
Cc: Liwei Wang; Siva Viswanathan; Janani T E; Development for x265
Subject: RE: [EXTERNAL] [x265] [arm64] Status and combined patch
 
|

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

|






On Tue, Sep 7, 2021 at 8:06 PM Pop, Sebastian <spop at amazon.com> wrote:


+x265-devel@ mailing list





Hi Gopi,





Thanks for your feedback. I will check the errors you reported.

I will fix all the issues and re-submit the p2s patch for review.


Could you please send me the exact cmake flags and the commands you used to run the smoke tests?
I want to make sure my testing covers the use cases you have seen failing.

You can use the smoke-tests.txt file from the test folder in the repo  https://github.com/videolan/x265/blob/master/source/test/smoke-tests.txt




I see the following cmake flags in https://github.com/videolan/x265/blob/master/build/linux/multilib.sh#L6


# cmake ../../../source -DHIGH_BIT_DEPTH=ON -DMAIN12=ON


For 8bit you can set  WARNINGS_AS_ERRORS=OFF, ENABLE_TESTS=ON, CHECKED_BUILD=ON, ENABLE_ASSEMBLY=ON, HIGH_BIT_DEPTH=OFF
For 10bit you can set WARNINGS_AS_ERRORS=OFF, ENABLE_TESTS=ON, CHECKED_BUILD=ON, ENABLE_ASSEMBLY=ON, HIGH_BIT_DEPTH=ON

With this configuration the current code in x265/source/common/aarch64 fails to build.


Would it be ok to remove the code in x265/source/common/aarch64 and submit the p2s routines working with 8bit, 10bit, and 12bit configurations?





On my side I cleaned up aarch64/ipfilter8.S and have it complete with all the functions that x86_64 implements.


I will make sure ipfilter functions pass TestBench with and without HIGH_BIT_DEPTH before I submit the patch for review.





Thanks,


Sebastian


From: Gopi Satykrishna Akisetty <gopi.satykrishna at multicorewareinc.com>
Sent: Monday, September 6, 2021 12:00 AM
To: Pop, Sebastian
Cc: Liwei Wang; Siva Viswanathan; Janani T E
Subject: RE: [EXTERNAL] [x265] [arm64] Status and combined patch


Hi Pop Sebastian,
Sorry for the late reply. We have been running some tests at our end and found that the patch is failing when HIGH_BIT_DEPTH is enabled i.e testbench is failing and also there are output changes with decoder errors in the smoke tests for HIGH_BIT_DEPTH enabled builds. Can you check them at your end once.


Thanks,
Gopi.


On Thu, Aug 19, 2021 at 2:44 AM <spop at amazon.com> wrote:

Hello Gopi,

Please see attached the first patch of the series.
It ports p2s function.
Please let me know if the format of the patch is fine.
I will submit the next patches following your guidelines.

Thanks,
Sebastian


On 8/18/21 6:46 AM, Gopi Satykrishna Akisetty wrote:
> Hello Pop Sebastian,
> Thanks for the contribution of ARM64 patches. Can you resend all the
> final patches that have been reviewed by Min Chen over the development
> mailing list, so that it is easier to check and commit each of them
> individually instead of one big combined patch. Please include
> performance numbers and the specs of the test machine used in the
> patch as part of the commit message. Also send all these patches in a
> sequence so that it is easier to apply them and check.
>
> Thanks,
> Gopi.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20220121/306a4fca/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-arm64-port-scanPosLast-reviewed.patch
Type: application/octet-stream
Size: 6988 bytes
Desc: not available
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20220121/306a4fca/attachment-0001.obj>


More information about the x265-devel mailing list