[x264-devel] [Git][videolan/x264][stable] 35 commits: x86: Always use PIC in x86-64 asm

Wed Jul 17 20:25:38 CEST 2019

Anton Mitrofanov pushed to branch stable at VideoLAN / x264

Commits:
275ef533 by Henrik Gramner at 2019-03-06T19:45:50Z
x86: Always use PIC in x86-64 asm

Most x86-64 operating systems nowadays doesn't even allow .text relocations
in object files any more, and there is no measurable overall performance
difference from using RIP-relative addressing in x264 asm.

Enforcing PIC reduces complexity and simplifies testing.

- - - - -
8f6ac77f by Luca Barbato at 2019-03-06T19:45:50Z
ppc: Cleanup quant

- - - - -
4dd83955 by Luca Barbato at 2019-03-06T19:45:50Z
ppc: Add quant_4x4x4

4x faster than C.

- - - - -
6e74eb5a by Luca Barbato at 2019-03-06T19:45:50Z
ppc: Rework the adds in satd8x8

10% faster.

- - - - -
e0d846a6 by Luca Barbato at 2019-03-06T19:45:51Z
ppc: Factor out the sum of absolute

And use it on the other satd > 8.

5-10% faster depending on the size.

- - - - -
83acefef by Luca Barbato at 2019-03-06T19:45:51Z
ppc: Rework satd_4* likewise

Now 4x4 is as slow as C and 4x8 is a 2% faster than before.

- - - - -
28fb2661 by Luca Barbato at 2019-03-06T19:45:51Z
ppc: Use xxpermdi to halve the computation in sad_x4_8x8

About 20% faster.

- - - - -
18262ee3 by Luca Barbato at 2019-03-06T19:45:51Z
ppc: Use a single store to write the scores for sad_x4_8x8

Yet another use of xxpermdi, another 10% gain.

- - - - -
0d111333 by Luca Barbato at 2019-03-06T19:45:51Z
ppc: Use xxpermdi in VEC_STORE8

Around a ~2% speedup to the overall encoding for --slow.

- - - - -
40688108 by Luca Barbato at 2019-03-06T19:45:51Z
ppc: Use the vec_xst_len for partial stores

Seems to give about a 1-2% overall speedup on --slow.

- - - - -
69dfb289 by Luca Barbato at 2019-03-06T19:45:51Z
ppc: Use vec_splats in mc

No overall speedup, just tidier code.

- - - - -
de380f4a by Luca Barbato at 2019-03-06T19:45:51Z
ppc: Use the vec_xst_len for partial stores in mc

Around a ~1% speedup to the overall encoding for --slow.

- - - - -
57baac4e by Alexandra Hájková at 2019-03-06T19:45:51Z
ppc: Use xxpermdi in sad_x3/x4 and use macros to avoid redundant code

- - - - -
92d36908 by Yusuke Nakamura at 2019-03-06T19:45:51Z
Signal Progressive and Constrained profiles

Progressive High, Constrained High, and Progressive High 10.

Even in Main profile, constraint_set4_flag is now set to 1 if progressive,
and constraint_set5_flag is set to 1 if no B-slices are present.

- - - - -
74c051f2 by Henrik Gramner at 2019-03-06T19:45:52Z
cli: Bash autocomplete support

Allows for automatic command line completion for both options and values.

Options such as --input-csp and --input-fmt will dynamically retrieve
supported values from libavformat when compiled with lavf support.

Execute 'source tools/bash-autocomplete.sh' in bash to enable.

- - - - -
ec1d3230 by Henrik Gramner at 2019-03-06T19:45:52Z
Bump dates to 2019

- - - - -
b7e9935c by Henrik Gramner at 2019-03-06T19:45:53Z
x86inc: Turn 'movsxd' into 'movifnidn' on x86-32

- - - - -
82721eae by Henrik Gramner at 2019-03-06T19:45:53Z
x86inc: Add x86-32 PIC support macros

- - - - -
6f85b3c4 by Henrik Gramner at 2019-03-06T19:45:53Z
x86inc: Make 'non-adjacent' default in the TAIL_CALL macro

- - - - -
101bd27d by Henrik Gramner at 2019-03-06T19:45:53Z
x86inc: Support N_PEXT bit on Mach-O

Allows for marking symbols as having limited global scope, similar to
using 'hidden' symbol visibility on ELF.

- - - - -
d3fa8b97 by Henrik Gramner at 2019-03-06T19:45:53Z
x86inc: Improve warnings for use of unsupported instructions

Warn when the following are used without the appropriate cpuflag:
 * YMM and ZMM registers
 * 'pextrw' with a memory operand
 * GPR instruction set extensions

- - - - -
3e5aed95 by Henrik Gramner at 2019-03-06T19:45:53Z
x86inc: Add support for GFNI instructions

- - - - -
120ed3af by Anton Mitrofanov at 2019-03-06T19:45:53Z
Remove h->rc dereferencing where possible

- - - - -
d4099dd4 by Anton Mitrofanov at 2019-03-06T19:45:53Z
Remove compatibility workarounds

This will break decoding with older versions of FFmpeg/Libav.

- - - - -
5493be84 by Henrik Gramner at 2019-03-14T13:31:22Z
Fix warning in autocomplete.c when compiled with lavf

- - - - -
98ee9d2f by Konstantin Pavlov at 2019-07-16T11:34:18Z
Added gitlab CI

Supported targets:
 - debian amd64
 - debian aarch64
 - windows 32 bit
 - windows 64 bit
 - macos 64bit

The tests are ran on all supported targets (via wine on windows).

The release jobs are only available on master/stable branches in
videolan/x264 repository, and must be ran manually when a developer
wishes to upload the artifacts.

- - - - -
352c0263 by Konstantin Pavlov at 2019-07-16T21:06:24Z
CI: Use a newer aarch64 image

It now includes pkg-config, so lavf can be detected.

- - - - -
bd8a88be by Konstantin Pavlov at 2019-07-16T21:06:53Z
CI: Bump macos target to darwin18

- - - - -
6381798d by Anton Mitrofanov at 2019-07-17T17:15:34Z
Fix heap-buffer-overflow read detected by ASan with interlaced encoding

Bug report by Hongxu Chen.

- - - - -
3147fa43 by Anton Mitrofanov at 2019-07-17T17:15:34Z
checkasm: Fix heap-buffer-overflow read detected by ASan

- - - - -
f06062f5 by Anton Mitrofanov at 2019-07-17T17:15:34Z
Fix integer overflow detected by UBSan in --weightp analysis

Bug report by Xuezhi Yan.

- - - - -
6b1170cb by Anton Mitrofanov at 2019-07-17T17:15:34Z
Shut up UBSan about uninitialized data read

Result was never used in that case.

- - - - -
6d494708 by Anton Mitrofanov at 2019-07-17T17:15:34Z
Fix x264_picture_alloc with X264_CSP_I400 colorspace

- - - - -
f9af2a0f by Anton Mitrofanov at 2019-07-17T17:15:34Z
Revert r2959: Signal Progressive and Constrained profiles

Some hardware decoders reject to decode streams with non-zero
constraint_set4_flag/constraint_set5_flag.

- - - - -
34c06d1c by Anton Mitrofanov at 2019-07-17T17:15:34Z
Strip git-hash from version in x264.pc

pkg-config doesn't like spaces in version string.

- - - - -

30 changed files:

- + .gitlab-ci.yml
- Makefile
- + autocomplete.c
- common/aarch64/asm-offsets.c
- common/aarch64/asm-offsets.h
- common/aarch64/asm.S
- common/aarch64/bitstream-a.S
- common/aarch64/bitstream.h
- common/aarch64/cabac-a.S
- common/aarch64/dct-a.S
- common/aarch64/dct.h
- common/aarch64/deblock-a.S
- common/aarch64/deblock.h
- common/aarch64/mc-a.S
- common/aarch64/mc-c.c
- common/aarch64/mc.h
- common/aarch64/pixel-a.S
- common/aarch64/pixel.h
- common/aarch64/predict-a.S
- common/aarch64/predict-c.c
- common/aarch64/predict.h
- common/aarch64/quant-a.S
- common/aarch64/quant.h
- common/arm/asm.S
- common/arm/bitstream-a.S
- common/arm/bitstream.h
- common/arm/cpu-a.S
- common/arm/dct-a.S
- common/arm/dct.h
- common/arm/deblock-a.S

The diff was not included because it is too large.

View it on GitLab: https://code.videolan.org/videolan/x264/compare/72db437770fd1ce3961f624dd57a8e75ff65ae0b...34c06d1c17ad968fbdda153cb772f77ee31b3095

-- 
View it on GitLab: https://code.videolan.org/videolan/x264/compare/72db437770fd1ce3961f624dd57a8e75ff65ae0b...34c06d1c17ad968fbdda153cb772f77ee31b3095
You're receiving this email because of your account on code.videolan.org.