[x265-commits] [x265] x265: cleanup unnecessary header files

Wed May 11 03:34:14 CEST 2016

details:   http://hg.videolan.org/x265/rev/70a7342bf5a1
branches:  
changeset: 11396:70a7342bf5a1
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Tue Mar 29 12:24:24 2016 +0530
description:
x265: cleanup unnecessary header files
Subject: [x265] analysis: skip rect/amp in analysis load mode

details:   http://hg.videolan.org/x265/rev/c6ec60a476f1
branches:  
changeset: 11397:c6ec60a476f1
user:      Sagar Kotecha<sagar at multicorewareinc.com>
date:      Thu Mar 24 16:36:55 2016 +0530
description:
analysis: skip rect/amp in analysis load mode

Avoid doing rect/amp analysis in load mode if the save mode has not chosen it as the best partition
Subject: [x265] remove broadcast of non-leaf CBF

details:   http://hg.videolan.org/x265/rev/ff085aa67513
branches:  
changeset: 11398:ff085aa67513
user:      Satoshi Nakagawa <nakagawa424 at oki.com>
date:      Fri Mar 25 10:27:34 2016 +0900
description:
remove broadcast of non-leaf CBF
Subject: [x265] move tables from .h to .cpp

details:   http://hg.videolan.org/x265/rev/5819d8e13b31
branches:  
changeset: 11399:5819d8e13b31
user:      Satoshi Nakagawa <nakagawa424 at oki.com>
date:      Mon Mar 28 20:47:51 2016 +0900
description:
move tables from .h to .cpp
Subject: [x265] arm: Port pixel_sa8d_8x8_neon and pixel_sa8d_16x16_neon

details:   http://hg.videolan.org/x265/rev/bdda6abf047c
branches:  
changeset: 11400:bdda6abf047c
user:      Ramya Sriraman<ramya at multicorewareinc.com>
date:      Mon Mar 28 16:16:24 2016 +0530
description:
arm: Port pixel_sa8d_8x8_neon and pixel_sa8d_16x16_neon
Subject: [x265] asm: new AVX2 version of satd_8x8 (509c -> 307c)

details:   http://hg.videolan.org/x265/rev/c9dbc1b57d84
branches:  
changeset: 11401:c9dbc1b57d84
user:      Min Chen <chenm003 at 163.com>
date:      Wed Mar 30 17:36:55 2016 -0500
description:
asm: new AVX2 version of satd_8x8 (509c -> 307c)
Subject: [x265] asm: new AVX2 version of sa8d[8x8, 16x16]

details:   http://hg.videolan.org/x265/rev/921bc48bddec
branches:  
changeset: 11402:921bc48bddec
user:      Min Chen <chenm003 at 163.com>
date:      Wed Mar 30 17:36:57 2016 -0500
description:
asm: new AVX2 version of sa8d[8x8, 16x16]
AVX:
  sa8d[8x8]    4.82x    517.79          2493.20
  sa8d[16x16]  5.65x   1952.40         11039.93

AVX2:
  sa8d[8x8]    5.13x    489.15          2507.44
  sa8d[16x16] 10.27x   1006.08         11206.09
Subject: [x265] Fix the order of fields in x265_picture

details:   http://hg.videolan.org/x265/rev/0b3dc9c48a6b
branches:  
changeset: 11403:0b3dc9c48a6b
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Thu Mar 31 16:21:38 2016 +0530
description:
Fix the order of fields in x265_picture
Subject: [x265] Fix poc value written in csv file for open-gop

details:   http://hg.videolan.org/x265/rev/46d53e6e515c
branches:  
changeset: 11404:46d53e6e515c
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Tue Apr 05 20:07:26 2016 +0530
description:
Fix poc value written in csv file for open-gop
Subject: [x265] SAO: removed redundant distortion calculation

details:   http://hg.videolan.org/x265/rev/e6373e5a18a0
branches:  
changeset: 11405:e6373e5a18a0
user:      Ashok Kumar Mishra<ashok at multicorewareinc.com>
date:      Tue Mar 29 13:10:49 2016 +0530
description:
SAO: removed redundant distortion calculation
Subject: [x265] fix threading conflict in low resolution video (Issue #260)

details:   http://hg.videolan.org/x265/rev/a49a5752efe9
branches:  
changeset: 11406:a49a5752efe9
user:      Min Chen <chenm003 at 163.com>
date:      Thu Apr 07 18:26:00 2016 -0500
description:
fix threading conflict in low resolution video (Issue #260)

The threading conflict because video resolution too low, it made threading approach finish in same time.
The root cause in our sync logic, we relase all of filter sync-lock in latest column processed, I give more details in below.

Time 0:
  Row0 - request assign work threading
  Row1 - request assign work threading

Time 1:
  Row0 - assign threading failure, and will continue in FrameFilter::processRow().
  Row1 - got a threading (since all of current row cu encode finished, the allowCol will setting to latest column, it means no restrict on sync logic)

Time 2:
  Row1 - threading process beyond Row0 bound  --> Crash here
Subject: [x265] [x265] '--dither' patch for 10/12-bit and for not default bit depth

details:   http://hg.videolan.org/x265/rev/66867beb300e
branches:  
changeset: 11407:66867beb300e
user:      Mateusz <mateuszb at poczta.onet.pl>
date:      Fri Apr 08 19:56:18 2016 +0530
description:
[x265] '--dither' patch for 10/12-bit and for not default bit depth
fix '--dither' option for 10/12-bit and for not default bit depth (fixes
#255)

Mateusz
Subject: [x265] dither: return if encoder and picture depth are the same

details:   http://hg.videolan.org/x265/rev/c7f345876a47
branches:  
changeset: 11408:c7f345876a47
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Fri Apr 08 19:53:21 2016 +0530
description:
dither: return if encoder and picture depth are the same
Subject: [x265] psy-rdoq: default values explained better, film grain details removed

details:   http://hg.videolan.org/x265/rev/adc5516669e1
branches:  
changeset: 11409:adc5516669e1
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Fri Apr 08 20:11:02 2016 +0530
description:
psy-rdoq: default values explained better, film grain details removed

Our recent tests and analysis showed that the best way to get a consistent encode
of film grain was to prevent QP fluctuations throughout the sequence as much as
possible. See rc-grain.
Subject: [x265] dither: fix dithering to 8-bit

details:   http://hg.videolan.org/x265/rev/5b5374cae60f
branches:  
changeset: 11410:5b5374cae60f
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Fri Apr 08 22:34:39 2016 +0530
description:
dither: fix dithering to 8-bit
Subject: [x265] rd and rdoq: clarify documentation, remove obsolete comments

details:   http://hg.videolan.org/x265/rev/31a417fa69ce
branches:  
changeset: 11411:31a417fa69ce
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Sat Apr 09 19:06:56 2016 +0530
description:
rd and rdoq: clarify documentation, remove obsolete comments

Also emphasize that rd-level 0 is not supported.
Subject: [x265] doc: update tune grain documentation

details:   http://hg.videolan.org/x265/rev/40afead3177d
branches:  
changeset: 11412:40afead3177d
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Sat Apr 09 19:32:28 2016 +0530
description:
doc: update tune grain documentation
Subject: [x265] arm: Implement interp_8tap_vert_pp_NxN NEON

details:   http://hg.videolan.org/x265/rev/a47ef6d4d647
branches:  
changeset: 11413:a47ef6d4d647
user:      Ramya Sriraman<ramya at multicorewareinc.com>
date:      Wed Mar 16 16:52:23 2016 +0530
description:
arm: Implement interp_8tap_vert_pp_NxN NEON
Subject: [x265] arm: Implement interp_8tap_vert_sp_NXN NEON

details:   http://hg.videolan.org/x265/rev/4e5198e636f0
branches:  
changeset: 11414:4e5198e636f0
user:      Ramya Sriraman<ramya at multicorewareinc.com>
date:      Tue Mar 22 11:10:43 2016 +0530
description:
arm: Implement interp_8tap_vert_sp_NXN NEON
Subject: [x265] arm: Implement interp_8tap_vert_ps_NxN NEON

details:   http://hg.videolan.org/x265/rev/3bddcd957d2d
branches:  
changeset: 11415:3bddcd957d2d
user:      Ramya Sriraman<ramya at multicorewareinc.com>
date:      Tue Mar 22 18:41:56 2016 +0530
description:
arm: Implement interp_8tap_vert_ps_NxN NEON
Subject: [x265] arm: Implement interp_4tap_vert_pp,ps &sp for NxN NEON

details:   http://hg.videolan.org/x265/rev/6d981f226980
branches:  
changeset: 11416:6d981f226980
user:      Ramya Sriraman<ramya at multicorewareinc.com>
date:      Thu Mar 24 15:25:37 2016 +0530
description:
arm: Implement interp_4tap_vert_pp,ps &sp for NxN NEON
Subject: [x265] arm: Implement sa8d for all luma and chroma partitions

details:   http://hg.videolan.org/x265/rev/7bae6968e9c5
branches:  
changeset: 11417:7bae6968e9c5
user:      Ramya Sriraman<ramya at multicorewareinc.com>
date:      Tue Apr 05 15:26:37 2016 +0530
description:
arm: Implement sa8d for all luma and chroma partitions
Subject: [x265] arm :Implement interp_8tap_horiz_pp ARM NEON

details:   http://hg.videolan.org/x265/rev/c839731b7799
branches:  
changeset: 11418:c839731b7799
user:      Radhakrishnan VR <radhakrishnan at multicorewareinc.com>
date:      Tue Mar 15 14:33:08 2016 +0530
description:
arm :Implement interp_8tap_horiz_pp ARM NEON
Subject: [x265] arm :Implement interp_8tap_horiz_ps ARM NEON

details:   http://hg.videolan.org/x265/rev/338691ddd4c9
branches:  
changeset: 11419:338691ddd4c9
user:      Radhakrishnan VR <radhakrishnan at multicorewareinc.com>
date:      Tue Mar 29 11:43:17 2016 +0530
description:
arm :Implement interp_8tap_horiz_ps ARM NEON
Subject: [x265] arm : Implement interp_4tap_horiz_pp,ps ARM NEON

details:   http://hg.videolan.org/x265/rev/4ab0c105127d
branches:  
changeset: 11420:4ab0c105127d
user:      Radhakrishnan VR <radhakrishnan at multicorewareinc.com>
date:      Tue Mar 29 11:56:32 2016 +0530
description:
arm : Implement interp_4tap_horiz_pp,ps ARM NEON
Subject: [x265] [x265] change int literals to char literals to avoid VS 2015 warnings

details:   http://hg.videolan.org/x265/rev/b242bb2e0a8d
branches:  
changeset: 11421:b242bb2e0a8d
user:      Mateusz <mateuszb at poczta.onet.pl>
date:      Tue Apr 12 22:57:50 2016 +0530
description:
[x265] change int literals to char literals to avoid VS 2015 warnings
Fix VS 2015 warning:
x265\source\common\cpu.cpp(277): warning C4838: conversion from 'int' to
'const char' requires a narrowing conversion
Subject: [x265] deblock: print 0:0 offsets as well in info SEI (fixes #258)

details:   http://hg.videolan.org/x265/rev/34353c2fb99f
branches:  
changeset: 11422:34353c2fb99f
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Wed Apr 13 00:39:43 2016 +0530
description:
deblock: print 0:0 offsets as well in info SEI (fixes #258)
Subject: [x265] asm: AVX2 version of sa8d[32x32]

details:   http://hg.videolan.org/x265/rev/7c1bd825ce26
branches:  
changeset: 11423:7c1bd825ce26
user:      Min Chen <chenm003 at 163.com>
date:      Tue Apr 12 12:30:48 2016 -0500
description:
asm: AVX2 version of sa8d[32x32]
AVX:
  sa8d[32x32]  5.47x    7403.68         40490.18

AVX2:
  sa8d[32x32]  10.57x   3783.80         40001.89
Subject: [x265] asm: rewrite interpolate hps width of [32,48,64], improve ~20%

details:   http://hg.videolan.org/x265/rev/a76406c9dc1e
branches:  
changeset: 11424:a76406c9dc1e
user:      Min Chen <chenm003 at 163.com>
date:      Tue Apr 12 12:30:51 2016 -0500
description:
asm: rewrite interpolate hps width of [32,48,64], improve ~20%
OLD:
  luma_hps[32x32]         6.32x    16429.69        103771.02
  luma_hps[32x16]         6.04x    10121.56        61140.21
  luma_hps[32x64]         6.47x    30813.70        199438.95
  luma_hps[32x24]         6.23x    13277.26        82747.75
  luma_hps[48x64]         6.13x    46002.25        282176.44
  luma_hps[64x64]         6.15x    61393.88        377670.03
  luma_hps[64x32]         6.79x    33001.77        224096.58
  luma_hps[64x48]         6.21x    47242.66        293529.16
  luma_hps[64x16]         6.51x    19207.61        125016.56

NEW:
  luma_hps[32x32]         7.66x    13404.22        102730.96
  luma_hps[32x16]         7.32x    8355.57         61133.25
  luma_hps[32x64]         7.68x    24496.17        188086.11
  luma_hps[32x24]         8.00x    10879.09        87077.93
  luma_hps[48x64]         7.62x    37094.37        282758.94
  luma_hps[64x64]         7.82x    48535.86        379390.78
  luma_hps[64x32]         7.91x    26512.17        209755.50
  luma_hps[64x48]         8.06x    37020.63        298498.28
  luma_hps[64x16]         7.95x    15479.03        123132.41
Subject: [x265] console: change '--limit-refs' info from 1/0 to on/off

details:   http://hg.videolan.org/x265/rev/3a0c770aa1cd
branches:  
changeset: 11425:3a0c770aa1cd
user:      Mateusz <mateuszb at poczta.onet.pl>
date:      Wed Apr 13 00:41:42 2016 +0530
description:
console: change '--limit-refs' info from 1/0 to on/off
Change displayed info about '--limit-refs' to better understanding by user.

Example: change from
x265 [info]: References / ref-limit  cu / depth  : 5 / 0 / 1
to
x265 [info]: References / ref-limit  cu / depth  : 5 / off / on
Subject: [x265] uhd-bd: turn off open GOP for UHD Blu-ray

details:   http://hg.videolan.org/x265/rev/e7d937ad1ea3
branches:  
changeset: 11426:e7d937ad1ea3
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Wed Apr 13 03:01:46 2016 +0530
description:
uhd-bd: turn off open GOP for UHD Blu-ray
Subject: [x265] SAO: fix for output mismatch in windows

details:   http://hg.videolan.org/x265/rev/34a3d35c5f97
branches:  
changeset: 11427:34a3d35c5f97
user:      Ashok Kumar Mishra<ashok at multicorewareinc.com>
date:      Tue Apr 12 19:34:49 2016 +0530
description:
SAO: fix for output mismatch in windows
Subject: [x265] doc: Improve description of --tune grain

details:   http://hg.videolan.org/x265/rev/02d79be487d7
branches:  
changeset: 11428:02d79be487d7
user:      Tom Vaughan <tom.vaughan at multicorewareinc.com>
date:      Sun Apr 17 21:07:28 2016 +0000
description:
doc: Improve description of --tune grain
Subject: [x265] arm: Implement pixel_satd ARM NEON

details:   http://hg.videolan.org/x265/rev/4f83d465d11b
branches:  
changeset: 11429:4f83d465d11b
user:      Radhakrishnan VR <radhakrishnan at multicorewareinc.com>
date:      Wed Mar 30 17:29:13 2016 +0530
description:
arm: Implement pixel_satd ARM NEON
Subject: [x265] rc: add option for encoding next gops

details:   http://hg.videolan.org/x265/rev/b280e0c3ed87
branches:  
changeset: 11430:b280e0c3ed87
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Thu Apr 07 15:21:51 2016 +0530
description:
rc: add option for encoding next gops
Subject: [x265] rc: clip the qp in ratecontrolEnd

details:   http://hg.videolan.org/x265/rev/5f7b1eb55153
branches:  
changeset: 11431:5f7b1eb55153
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Fri Apr 22 17:56:01 2016 +0530
description:
rc: clip the qp in ratecontrolEnd
Subject: [x265] doc: more clarifications for tune grain

details:   http://hg.videolan.org/x265/rev/687cf9ea82f3
branches:  
changeset: 11432:687cf9ea82f3
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Tue Apr 26 14:13:42 2016 -0700
description:
doc: more clarifications for tune grain
Subject: [x265] doc: whitespace nits

details:   http://hg.videolan.org/x265/rev/901d5409cc37
branches:  
changeset: 11433:901d5409cc37
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Tue Apr 26 14:28:25 2016 -0700
description:
doc: whitespace nits
Subject: [x265] doc: more fixes

details:   http://hg.videolan.org/x265/rev/19cced21060f
branches:  
changeset: 11434:19cced21060f
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Tue Apr 26 15:06:55 2016 -0700
description:
doc: more fixes
Subject: [x265] rc: fix the average qp calculation

details:   http://hg.videolan.org/x265/rev/427d9a6c2464
branches:  
changeset: 11435:427d9a6c2464
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Thu Apr 28 13:14:07 2016 +0530
description:
rc: fix the average qp calculation
Subject: [x265] asm: ARM NEON version of DCT[4x4]

details:   http://hg.videolan.org/x265/rev/3fc5f091725d
branches:  
changeset: 11436:3fc5f091725d
user:      Min Chen <chenm003 at 163.com>
date:      Tue Apr 26 11:44:56 2016 +0530
description:
asm: ARM NEON version of DCT[4x4]
Subject: [x265] Allows for Unicode filenames in Windows (output and stat files).

details:   http://hg.videolan.org/x265/rev/00ea3784bd36
branches:  
changeset: 11437:00ea3784bd36
user:      Ma0 <mateuszb at poczta.onet.pl>
date:      Thu Apr 28 09:59:30 2016 +0200
description:
Allows for Unicode filenames in Windows (output and stat files).

Copy of x264 code for processing Unicode filenames in Windows.
Output file and stat file(s) are very important for GUI makers.
Subject: [x265] CLI: fix Unicode output in Windows for old mingw-w64

details:   http://hg.videolan.org/x265/rev/6a5c3dbd556d
branches:  
changeset: 11438:6a5c3dbd556d
user:      Ma0 <mateuszb at poczta.onet.pl>
date:      Fri Apr 29 10:16:15 2016 +0200
description:
CLI: fix Unicode output in Windows for old mingw-w64
Subject: [x265] CLI: free memory allocated for utf8 command line in Windows

details:   http://hg.videolan.org/x265/rev/4e7047541c42
branches:  
changeset: 11439:4e7047541c42
user:      Ma0 <mateuszb at poczta.onet.pl>
date:      Tue May 03 12:46:00 2016 +0200
description:
CLI: free memory allocated for utf8 command line in Windows
Subject: [x265] rc: check scenecut flag for resetting abr

details:   http://hg.videolan.org/x265/rev/ca0e966810d2
branches:  
changeset: 11440:ca0e966810d2
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Tue May 03 18:43:35 2016 +0530
description:
rc: check scenecut flag for resetting abr
Subject: [x265] arm: Implement dequant_scaling ARM NEON

details:   http://hg.videolan.org/x265/rev/18e232f50207
branches:  
changeset: 11441:18e232f50207
user:      Radhakrishnan VR <radhakrishnan at multicorewareinc.com>
date:      Fri Apr 15 11:39:39 2016 +0530
description:
arm: Implement dequant_scaling ARM NEON
Subject: [x265] arm: Implement dequant_normal ARM NEON

details:   http://hg.videolan.org/x265/rev/72a7d488a93f
branches:  
changeset: 11442:72a7d488a93f
user:      Radhakrishnan VR <radhakrishnan at multicorewareinc.com>
date:      Fri Apr 15 16:44:32 2016 +0530
description:
arm: Implement dequant_normal ARM NEON
Subject: [x265] arm: Implement blockcopy_pp chroma ARM NEON

details:   http://hg.videolan.org/x265/rev/bc2b476da370
branches:  
changeset: 11443:bc2b476da370
user:      Radhakrishnan VR <radhakrishnan at multicorewareinc.com>
date:      Tue Apr 19 11:37:57 2016 +0530
description:
arm: Implement blockcopy_pp chroma ARM NEON
Subject: [x265] arm: Implement blockcopy_sp, ps, ss chroma ARM NEON

details:   http://hg.videolan.org/x265/rev/4981d83237db
branches:  
changeset: 11444:4981d83237db
user:      Radhakrishnan VR <radhakrishnan at multicorewareinc.com>
date:      Tue Apr 19 12:12:00 2016 +0530
description:
arm: Implement blockcopy_sp, ps, ss chroma ARM NEON
Subject: [x265] arm: Implement sub_ps chroma ARM NEON

details:   http://hg.videolan.org/x265/rev/d2183a483e60
branches:  
changeset: 11445:d2183a483e60
user:      Radhakrishnan VR <radhakrishnan at multicorewareinc.com>
date:      Wed Apr 20 15:18:13 2016 +0530
description:
arm: Implement sub_ps chroma ARM NEON
Subject: [x265] arm: Implement add_ps chroma ARM NEON

details:   http://hg.videolan.org/x265/rev/8a1c939a1a9a
branches:  
changeset: 11446:8a1c939a1a9a
user:      Radhakrishnan VR <radhakrishnan at multicorewareinc.com>
date:      Wed Apr 20 15:39:45 2016 +0530
description:
arm: Implement add_ps chroma ARM NEON
Subject: [x265] arm: Implement quant (revised)

details:   http://hg.videolan.org/x265/rev/7d61d6cf62a3
branches:  
changeset: 11447:7d61d6cf62a3
user:      Ramya Sriraman<ramya at multicorewareinc.com>
date:      Wed Apr 20 18:44:13 2016 +0530
description:
arm: Implement quant (revised)
Subject: [x265] arm: Implement nquant (revised)

details:   http://hg.videolan.org/x265/rev/68525814723e
branches:  
changeset: 11448:68525814723e
user:      Ramya Sriraman<ramya at multicorewareinc.com>
date:      Fri Apr 22 16:14:05 2016 +0530
description:
arm: Implement nquant (revised)
Subject: [x265] arm: Enable asm by default, allow gcc to auto-detect cpu

details:   http://hg.videolan.org/x265/rev/a5362b9533f6
branches:  
changeset: 11449:a5362b9533f6
user:      Pradeep Ramachandran <pradeep at multicorewareinc.com>
date:      Wed May 04 21:08:09 2016 +0000
description:
arm: Enable asm by default, allow gcc to auto-detect cpu

- Enabled ASM by default, and fixed compilation problem without asm
- GCC now auto-detects CPU instead of forcing armv6; significant speed boost
- Convert ARM compile to be native by default - cross compile requires special
work now

diffstat:

 build/arm-linux/crosscompile.cmake   |    15 +
 build/arm-linux/make-Makefiles.bash  |     2 +-
 build/arm-linux/toolchain.cmake      |    12 -
 doc/reST/cli.rst                     |    18 +-
 doc/reST/presets.rst                 |    41 +-
 source/CMakeLists.txt                |    13 +-
 source/cmake/FindNeon.cmake          |    10 +
 source/common/CMakeLists.txt         |     2 +-
 source/common/arm/asm-primitives.cpp |   628 ++++++++
 source/common/arm/blockcopy8.S       |   204 ++
 source/common/arm/blockcopy8.h       |    31 +
 source/common/arm/dct-a.S            |   122 +
 source/common/arm/dct8.h             |     2 +
 source/common/arm/ipfilter8.S        |  2648 +++++++++++++++++++++++++++++++++-
 source/common/arm/ipfilter8.h        |   288 +++
 source/common/arm/mc-a.S             |   137 +-
 source/common/arm/pixel-util.S       |  1615 ++++++++++++++++++++-
 source/common/arm/pixel-util.h       |    47 +
 source/common/arm/pixel.h            |     8 +
 source/common/common.cpp             |    94 +-
 source/common/common.h               |    12 +
 source/common/contexts.h             |   190 +--
 source/common/cpu.cpp                |     6 +-
 source/common/cudata.h               |     2 +-
 source/common/framedata.h            |     2 +
 source/common/param.cpp              |     9 +-
 source/common/primitives.cpp         |     4 +-
 source/common/x86/asm-primitives.cpp |     7 +
 source/common/x86/ipfilter16.asm     |   132 +-
 source/common/x86/pixel-a.asm        |   566 +++++++
 source/encoder/analysis.cpp          |   585 +++---
 source/encoder/analysis.h            |     2 +
 source/encoder/encoder.cpp           |    49 +-
 source/encoder/entropy.cpp           |   205 ++-
 source/encoder/entropy.h             |     1 -
 source/encoder/framefilter.cpp       |     7 +-
 source/encoder/ratecontrol.cpp       |    47 +-
 source/encoder/ratecontrol.h         |     4 +-
 source/encoder/sao.cpp               |   157 +-
 source/encoder/sao.h                 |    16 +-
 source/encoder/search.cpp            |    70 +-
 source/output/raw.cpp                |    14 +-
 source/output/raw.h                  |     2 +-
 source/x265-extras.cpp               |    53 +-
 source/x265.cpp                      |    53 +-
 source/x265.h                        |    24 +-
 source/x265cli.h                     |     2 +-
 47 files changed, 7278 insertions(+), 880 deletions(-)

diffs (truncated from 9748 to 300 lines):

diff -r 5b01678f6fb4 -r a5362b9533f6 build/arm-linux/crosscompile.cmake

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/build/arm-linux/crosscompile.cmake	Wed May 04 21:08:09 2016 +0000
@@ -0,0 +1,15 @@
+# CMake toolchain file for cross compiling x265 for ARM arch
+# This feature is only supported as experimental. Use with caution.
+# Please report bugs on bitbucket
+# Run cmake with: cmake -DCMAKE_TOOLCHAIN_FILE=crosscompile.cmake -G "Unix Makefiles" ../../source && ccmake ../../source
+
+set(CROSS_COMPILE_ARM 1)
+set(CMAKE_SYSTEM_NAME Linux)
+set(CMAKE_SYSTEM_PROCESSOR armv6l)
+
+# specify the cross compiler
+set(CMAKE_C_COMPILER arm-linux-gnueabi-gcc)
+set(CMAKE_CXX_COMPILER arm-linux-gnueabi-g++)
+
+# specify the target environment
+SET(CMAKE_FIND_ROOT_PATH  /usr/arm-linux-gnueabi)
diff -r 5b01678f6fb4 -r a5362b9533f6 build/arm-linux/make-Makefiles.bash
--- a/build/arm-linux/make-Makefiles.bash	Sat Apr 02 19:08:49 2016 +0100
+++ b/build/arm-linux/make-Makefiles.bash	Wed May 04 21:08:09 2016 +0000
@@ -1,4 +1,4 @@
 #!/bin/bash
 # Run this from within a bash shell
 
-cmake -DCMAKE_TOOLCHAIN_FILE=toolchain.cmake -G "Unix Makefiles" ../../source && ccmake ../../source
+cmake -G "Unix Makefiles" ../../source && ccmake ../../source
diff -r 5b01678f6fb4 -r a5362b9533f6 build/arm-linux/toolchain.cmake
--- a/build/arm-linux/toolchain.cmake	Sat Apr 02 19:08:49 2016 +0100
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,12 +0,0 @@
-# CMake toolchain file for cross compiling x265 for ARM arch
-
-set(CROSS_COMPILE_ARM 1)
-set(CMAKE_SYSTEM_NAME Linux)
-set(CMAKE_SYSTEM_PROCESSOR armv6l)
-
-# specify the cross compiler
-set(CMAKE_C_COMPILER arm-linux-gnueabi-gcc)
-set(CMAKE_CXX_COMPILER arm-linux-gnueabi-g++)
-
-# specify the target environment
-SET(CMAKE_FIND_ROOT_PATH  /usr/arm-linux-gnueabi)
diff -r 5b01678f6fb4 -r a5362b9533f6 doc/reST/cli.rst
--- a/doc/reST/cli.rst	Sat Apr 02 19:08:49 2016 +0100
+++ b/doc/reST/cli.rst	Wed May 04 21:08:09 2016 +0000
@@ -376,10 +376,10 @@ frame counts) are only applicable to the
 
 .. option:: --dither
 
-	Enable high quality downscaling. Dithering is based on the diffusion
-	of errors from one row of pixels to the next row of pixels in a
-	picture. Only applicable when the input bit depth is larger than
-	8bits and internal bit depth is 8bits. Default disabled
+	Enable high quality downscaling to the encoder's internal bitdepth. 
+	Dithering is based on the diffusion	of errors from one row of pixels 
+	to the next row of pixels in a picture. Only applicable when the 
+	input bit depth is larger than 8bits. Default disabled
 
 	**CLI ONLY**
 
@@ -609,7 +609,7 @@ Profile, Level, Tier
 Mode decision / Analysis
 ========================
 
-.. option:: --rd <0..6>
+.. option:: --rd <1..6>
 
 	Level of RDO in mode decision. The higher the value, the more
 	exhaustive the analysis and the more rate distortion optimization is
@@ -638,7 +638,7 @@ Mode decision / Analysis
 	| 6     | Currently same as 5                                           |
 	+-------+---------------------------------------------------------------+
 
-	**Range of values:** 0: least .. 6: full RDO analysis
+	**Range of values:** 1: least .. 6: full RDO analysis
 
 Options which affect the coding unit quad-tree, sometimes referred to as
 the prediction quad-tree.
@@ -1056,7 +1056,7 @@ a drastic effect on rate control, forcin
 cause ringing artifacts. psy-rdoq is less accurate than psy-rd, it is
 biasing towards energy in general while psy-rd biases towards the energy
 of the source image. But very large psy-rdoq values can sometimes be
-beneficial, preserving film grain for instance.
+beneficial.
 
 As a general rule, when both psycho-visual features are disabled, the
 encoder will tend to blur blocks in areas of difficult motion. Turning
@@ -1093,8 +1093,8 @@ areas of high motion.
 	energy in the reconstructed image. This generally improves perceived
 	visual quality at the cost of lower quality metric scores.  It only
 	has effect when :option:`--rdoq-level` is 1 or 2. High values can
-	be beneficial in preserving high-frequency detail like film grain.
-	Default: 1.0
+	be beneficial in preserving high-frequency detail.
+	Default: 0.0 (1.0 for presets slow, slower, veryslow)
 
 	**Range of values:** 0 .. 50.0
 
diff -r 5b01678f6fb4 -r a5362b9533f6 doc/reST/presets.rst
--- a/doc/reST/presets.rst	Sat Apr 02 19:08:49 2016 +0100
+++ b/doc/reST/presets.rst	Wed May 04 21:08:09 2016 +0000
@@ -117,33 +117,26 @@ after the preset.
 
 
 
-Film Grain Retention
-~~~~~~~~~~~~~~~~~~~~
+Film Grain
+~~~~~~~~~~
 
-:option:`--tune` *grain* tries to improve the retention of film grain in
-the reconstructed output. It disables rate distortion optimizations in
-quantization, and increases the default psy-rd.
+:option:`--tune` *grain* aims to encode grainy content with the best 
+visual quality. The purpose of this option is neither to retain nor 
+eliminate grain, but prevent noticeable artifacts caused by uneven 
+distribution of grain. :option:`--tune` *grain* strongly restricts 
+algorithms that vary the quantization parameter within and across frames.
 
-    * :option:`--psy-rd` 0.5
-    * :option:`--rdoq-level` 0
-    * :option:`--psy-rdoq` 0
+    * :option:`--aq-mode` 0
+    * :option:`--cutree` 0
+    * :option:`--ipratio` 1.1
+    * :option:`--pbratio` 1.0
+    * :option:`--qpstep` 1
 
-It lowers the strength of adaptive quantization, so residual energy can
-be more evenly distributed across the (noisy) picture:
-
-    * :option:`--aq-strength` 0.3
-
-And it similarly tunes rate control to prevent the slice QP from
-swinging too wildly from frame to frame:
-
-    * :option:`--ipratio` 1.1
-    * :option:`--pbratio` 1.1
-    * :option:`--qcomp` 0.8
-
-And lastly it reduces the strength of deblocking to prevent grain being
-blurred on block boundaries:
-
-    * :option:`--deblock` -2
+It also enables a specialised ratecontrol algorithm :option:`--rc-grain` 
+that strictly minimises QP fluctuations across frames, while still allowing 
+the encoder to hit bitrate targets and VBV buffer limits (with a slightly 
+higher margin of error than normal). It is highly recommended that this 
+algorithm is used only through the :option:`--tune` *grain* feature. 
 
 Fast Decode
 ~~~~~~~~~~~
diff -r 5b01678f6fb4 -r a5362b9533f6 source/CMakeLists.txt
--- a/source/CMakeLists.txt	Sat Apr 02 19:08:49 2016 +0100
+++ b/source/CMakeLists.txt	Wed May 04 21:08:09 2016 +0000
@@ -185,11 +185,16 @@ if(GCC)
     endif()
     if(ARM AND CROSS_COMPILE_ARM)
         set(ARM_ARGS -march=armv6 -mfloat-abi=soft -mfpu=vfp -marm)
-        add_definitions(${ARM_ARGS})
     elseif(ARM)
-        set(ARM_ARGS -march=armv6 -mfloat-abi=hard -mfpu=vfp -marm)
-        add_definitions(${ARM_ARGS})
+		find_package(Neon)
+		if(CPU_HAS_NEON)
+			set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=neon -marm)
+			add_definitions(-DHAVE_NEON)
+		else()
+			set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=vfp -marm)
+		endif()
     endif()
+	add_definitions(${ARM_ARGS})
     if(FPROFILE_GENERATE)
         if(INTEL_CXX)
             add_definitions(-prof-gen -prof-dir="${CMAKE_CURRENT_BINARY_DIR}")
@@ -281,7 +286,7 @@ endif(GCC)
 
 find_package(Yasm)
 if(ARM OR CROSS_COMPILE_ARM)
-    option(ENABLE_ASSEMBLY "Enable use of assembly coded primitives" OFF)
+    option(ENABLE_ASSEMBLY "Enable use of assembly coded primitives" ON)
 elseif(YASM_FOUND AND X86)
     if (YASM_VERSION_STRING VERSION_LESS "1.2.0")
         message(STATUS "Yasm version ${YASM_VERSION_STRING} is too old. 1.2.0 or later required")
diff -r 5b01678f6fb4 -r a5362b9533f6 source/cmake/FindNeon.cmake
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/source/cmake/FindNeon.cmake	Wed May 04 21:08:09 2016 +0000
@@ -0,0 +1,10 @@
+include(FindPackageHandleStandardArgs)
+
+# Check the version of neon supported by the ARM CPU
+execute_process(COMMAND cat /proc/cpuinfo | grep Features | grep neon
+				OUTPUT_VARIABLE neon_version
+				ERROR_QUIET
+				OUTPUT_STRIP_TRAILING_WHITESPACE)
+if(neon_version)
+	set(CPU_HAS_NEON 1)
+endif()
diff -r 5b01678f6fb4 -r a5362b9533f6 source/common/CMakeLists.txt
--- a/source/common/CMakeLists.txt	Sat Apr 02 19:08:49 2016 +0100
+++ b/source/common/CMakeLists.txt	Wed May 04 21:08:09 2016 +0000
@@ -89,7 +89,7 @@ if(ENABLE_ASSEMBLY AND (ARM OR CROSS_COM
     set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h)
 
     # add ARM assembly/intrinsic files here
-    set(A_SRCS asm.S cpu-a.S mc-a.S sad-a.S pixel-util.S ssd-a.S blockcopy8.S ipfilter8.S)
+    set(A_SRCS asm.S cpu-a.S mc-a.S sad-a.S pixel-util.S ssd-a.S blockcopy8.S ipfilter8.S dct-a.S)
     set(VEC_PRIMITIVES)
 
     set(ARM_ASMS "${A_SRCS}" CACHE INTERNAL "ARM Assembly Sources")
diff -r 5b01678f6fb4 -r a5362b9533f6 source/common/arm/asm-primitives.cpp
--- a/source/common/arm/asm-primitives.cpp	Sat Apr 02 19:08:49 2016 +0100
+++ b/source/common/arm/asm-primitives.cpp	Wed May 04 21:08:09 2016 +0000
@@ -34,6 +34,7 @@ extern "C" {
 #include "pixel.h"
 #include "pixel-util.h"
 #include "ipfilter8.h"
+#include "dct8.h"
 }
 
 namespace X265_NS {
@@ -43,6 +44,278 @@ void setupAssemblyPrimitives(EncoderPrim
 {
     if (cpuMask & X265_CPU_NEON)
     {
+        // quant
+         p.quant = PFX(quant_neon);
+         p.nquant = PFX(nquant_neon);
+
+        // dequant_scaling
+         p.dequant_scaling = PFX(dequant_scaling_neon);
+         p.dequant_normal  = PFX(dequant_normal_neon);
+
+        // luma satd
+         p.pu[LUMA_4x4].satd   = PFX(pixel_satd_4x4_neon);
+         p.pu[LUMA_4x8].satd   = PFX(pixel_satd_4x8_neon);
+         p.pu[LUMA_4x16].satd  = PFX(pixel_satd_4x16_neon);
+         p.pu[LUMA_8x4].satd   = PFX(pixel_satd_8x4_neon);
+         p.pu[LUMA_8x8].satd   = PFX(pixel_satd_8x8_neon);
+         p.pu[LUMA_8x16].satd  = PFX(pixel_satd_8x16_neon);
+         p.pu[LUMA_8x32].satd  = PFX(pixel_satd_8x32_neon);
+         p.pu[LUMA_12x16].satd = PFX(pixel_satd_12x16_neon);
+         p.pu[LUMA_16x4].satd  = PFX(pixel_satd_16x4_neon);
+         p.pu[LUMA_16x8].satd  = PFX(pixel_satd_16x8_neon);
+         p.pu[LUMA_16x16].satd = PFX(pixel_satd_16x16_neon);
+         p.pu[LUMA_16x32].satd = PFX(pixel_satd_16x32_neon);
+         p.pu[LUMA_16x64].satd = PFX(pixel_satd_16x64_neon);
+         p.pu[LUMA_24x32].satd = PFX(pixel_satd_24x32_neon);
+         p.pu[LUMA_32x8].satd  = PFX(pixel_satd_32x8_neon);
+         p.pu[LUMA_32x16].satd = PFX(pixel_satd_32x16_neon);
+         p.pu[LUMA_32x24].satd = PFX(pixel_satd_32x24_neon);
+         p.pu[LUMA_32x32].satd = PFX(pixel_satd_32x32_neon);
+         p.pu[LUMA_32x64].satd = PFX(pixel_satd_32x64_neon);
+         p.pu[LUMA_48x64].satd = PFX(pixel_satd_48x64_neon);
+         p.pu[LUMA_64x16].satd = PFX(pixel_satd_64x16_neon);
+         p.pu[LUMA_64x32].satd = PFX(pixel_satd_64x32_neon);
+         p.pu[LUMA_64x48].satd = PFX(pixel_satd_64x48_neon);
+         p.pu[LUMA_64x64].satd = PFX(pixel_satd_64x64_neon);
+
+        // chroma satd
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].satd    = PFX(pixel_satd_4x4_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].satd    = PFX(pixel_satd_4x8_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].satd   = PFX(pixel_satd_4x16_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].satd    = PFX(pixel_satd_8x4_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].satd    = PFX(pixel_satd_8x8_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].satd   = PFX(pixel_satd_8x16_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].satd   = PFX(pixel_satd_8x32_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_12x16].satd  = PFX(pixel_satd_12x16_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_16x4].satd   = PFX(pixel_satd_16x4_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].satd   = PFX(pixel_satd_16x8_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].satd  = PFX(pixel_satd_16x12_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].satd  = PFX(pixel_satd_16x16_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].satd  = PFX(pixel_satd_16x32_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_24x32].satd  = PFX(pixel_satd_24x32_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].satd   = PFX(pixel_satd_32x8_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].satd  = PFX(pixel_satd_32x16_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].satd  = PFX(pixel_satd_32x24_neon);
+         p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].satd  = PFX(pixel_satd_32x32_neon);
+
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].satd    = PFX(pixel_satd_4x4_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].satd    = PFX(pixel_satd_4x8_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].satd   = PFX(pixel_satd_4x16_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].satd   = PFX(pixel_satd_4x32_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].satd    = PFX(pixel_satd_8x4_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].satd    = PFX(pixel_satd_8x8_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].satd   = PFX(pixel_satd_8x12_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].satd   = PFX(pixel_satd_8x16_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].satd   = PFX(pixel_satd_8x32_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].satd   = PFX(pixel_satd_8x64_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].satd  = PFX(pixel_satd_12x32_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_16x8].satd   = PFX(pixel_satd_16x8_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_16x16].satd  = PFX(pixel_satd_16x16_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].satd  = PFX(pixel_satd_16x24_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_16x32].satd  = PFX(pixel_satd_16x32_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_16x64].satd  = PFX(pixel_satd_16x64_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_24x64].satd  = PFX(pixel_satd_24x64_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_32x16].satd  = PFX(pixel_satd_32x16_neon);
+         p.chroma[X265_CSP_I422].pu[CHROMA_422_32x32].satd  = PFX(pixel_satd_32x32_neon);