[x264-devel] [Git][videolan/x264][master] 9 commits: loongarch: Init LSX/LASX support
Anton Mitrofanov (@BugMaster)
gitlab at videolan.org
Thu Oct 12 20:17:42 UTC 2023
Anton Mitrofanov pushed to branch master at VideoLAN / x264
Commits:
1ecc51ee by Loongson Technology Corporation Limited at 2023-10-10T09:00:09+08:00
loongarch: Init LSX/LASX support
LSX/LASX is the LOONGARCH 128-bit/256-bit SIMD Architecture.
Signed-off-by: Shiyou Yin <yinshiyou-hf at loongson.cn>
Signed-off-by: Xiwei Gu <guxiwei-hf at loongson.cn>
- - - - -
25ffd616 by Loongson Technology Corporation Limited at 2023-10-10T09:00:47+08:00
loongarch: Add loongson_asm.S and loongson_utils.S
Common macros and functions for loongson optimization.
Signed-off-by: Shiyou Yin <yinshiyou-hf at loongson.cn>
- - - - -
d7d283f6 by Loongson Technology Corporation Limited at 2023-10-10T09:04:49+08:00
loongarch: Improve the performance of deblock series functions.
Performance has improved from 4.76fps to 4.92fps.
Tested with following command:
./configure && make -j5
./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
functions performance performance
(c) (asm)
deblock_luma[0] 79 39
deblock_luma[1] 91 18
deblock_luma_intra[0] 63 44
deblock_luma_intra[1] 71 18
deblock_strength 104 33
Signed-off-by: Hao Chen <chenhao at loongson.cn>
- - - - -
00b8e3b9 by Loongson Technology Corporation Limited at 2023-10-10T09:09:52+08:00
loongarch: Improve the performance of sad/sad_x3/sad_x4 series functions
Performance has improved from 4.92fps to 6.32fps.
Tested with following command:
./configure && make -j5
./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
functions performance performance
(c) (asm)
sad_4x4 13 3
sad_4x8 26 7
sad_4x16 57 13
sad_8x4 24 3
sad_8x8 54 8
sad_8x16 108 13
sad_16x8 95 8
sad_16x16 189 13
sad_x3_4x4 37 6
sad_x3_4x8 71 13
sad_x3_8x4 70 8
sad_x3_8x8 162 14
sad_x3_8x16 323 25
sad_x3_16x8 279 15
sad_x3_16x16 555 27
sad_x4_4x4 49 8
sad_x4_4x8 95 17
sad_x4_8x4 94 8
sad_x4_8x8 214 16
sad_x4_8x16 429 33
sad_x4_16x8 372 18
sad_x4_16x16 740 34
Signed-off-by: wanglu <wanglu at loongson.cn>
- - - - -
d8ed272a by Loongson Technology Corporation Limited at 2023-10-10T09:13:58+08:00
loongarch: Improve the performance of predict series functions
Performance has improved from 6.32fps to 6.34fps.
Tested with following command:
./configure && make -j5
./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
functions performance performance
(c) (asm)
intra_predict_4x4_dc 3 2
intra_predict_4x4_dc8 1 1
intra_predict_4x4_dcl 2 1
intra_predict_4x4_dct 2 1
intra_predict_4x4_ddl 7 2
intra_predict_4x4_h 2 1
intra_predict_4x4_v 1 1
intra_predict_8x8_dc 8 2
intra_predict_8x8_dc8 1 1
intra_predict_8x8_dcl 5 2
intra_predict_8x8_dct 5 2
intra_predict_8x8_ddl 27 3
intra_predict_8x8_ddr 26 3
intra_predict_8x8_h 4 2
intra_predict_8x8_v 3 1
intra_predict_8x8_vl 29 3
intra_predict_8x8_vr 31 4
intra_predict_8x8c_dc 8 5
intra_predict_8x8c_dc8 1 1
intra_predict_8x8c_dcl 5 3
intra_predict_8x8c_dct 5 3
intra_predict_8x8c_h 4 2
intra_predict_8x8c_p 58 30
intra_predict_8x8c_v 4 1
intra_predict_16x16_dc 32 8
intra_predict_16x16_dc8 9 4
intra_predict_16x16_dcl 26 6
intra_predict_16x16_dct 26 6
intra_predict_16x16_h 23 7
intra_predict_16x16_p 182 44
intra_predict_16x16_v 22 4
Signed-off-by: Xiwei Gu <guxiwei-hf at loongson.cn>
- - - - -
65e7bac5 by Loongson Technology Corporation Limited at 2023-10-10T09:15:32+08:00
loongarch: Improve the performance of quant series functions
Performance has improved from 6.34fps to 6.78fps.
Tested with following command:
./configure && make -j5
./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
functions performance performance
(c) (asm)
coeff_last15 3 2
coeff_last16 3 1
coeff_last64 42 6
decimate_score15 8 12
decimate_score16 8 11
decimate_score64 61 43
dequant_4x4_cqm 16 5
dequant_4x4_dc_cqm 13 5
dequant_4x4_dc_flat 13 5
dequant_4x4_flat 16 5
dequant_8x8_cqm 71 9
dequant_8x8_flat 71 9
Signed-off-by: Shiyou Yin <yinshiyou-hf at loongson.cn>
- - - - -
981c8f25 by Loongson Technology Corporation Limited at 2023-10-12T17:27:40+08:00
loongarch: Improve the performance of mc series functions
Performance has improved from 6.78fps to 10.53fps.
Tested with following command:
./configure && make -j5
./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
functions performance performance
(c) (asm)
avg_4x2 16 5
avg_4x4 30 6
avg_4x8 63 10
avg_4x16 124 19
avg_8x4 60 6
avg_8x8 119 10
avg_8x16 233 19
avg_16x8 229 21
avg_16x16 451 41
get_ref_4x4 30 9
get_ref_4x8 52 11
get_ref_8x4 45 9
get_ref_8x8 80 11
get_ref_8x16 156 16
get_ref_12x10 137 13
get_ref_16x8 147 11
get_ref_16x16 282 16
get_ref_20x18 278 22
hpel_filter 5163 686
lowres_init 5440 286
mc_chroma_2x2 24 7
mc_chroma_2x4 42 10
mc_chroma_4x2 41 7
mc_chroma_4x4 75 10
mc_chroma_4x8 144 19
mc_chroma_8x4 137 15
mc_chroma_8x8 269 28
mc_luma_4x4 30 10
mc_luma_4x8 52 12
mc_luma_8x4 44 10
mc_luma_8x8 80 13
mc_luma_8x16 156 19
mc_luma_16x8 147 13
mc_luma_16x16 281 19
memcpy_aligned 14 9
memzero_aligned 24 4
offsetadd_w4 79 18
offsetadd_w8 142 18
offsetadd_w16 277 25
offsetadd_w20 1118 38
offsetsub_w4 75 18
offsetsub_w8 140 18
offsetsub_w16 265 25
offsetsub_w20 989 39
weight_w4 111 19
weight_w8 205 19
weight_w16 396 29
weight_w20 1143 45
deinterleave_chroma_fdec 76 9
deinterleave_chroma_fenc 86 9
plane_copy_deinterleave 733 90
plane_copy_interleave 791 245
store_interleave_chroma 82 12
Signed-off-by: Xiwei Gu <guxiwei-hf at loongson.cn>
- - - - -
fa7f1fce by Loongson Technology Corporation Limited at 2023-10-12T17:28:15+08:00
loongarch: Improve the performance of dct series functions
Performance has improved from 10.53fps to 11.27fps.
Tested with following command:
./configure && make -j5
./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
functions performance performance
(c) (asm)
add4x4_idct 34 9
add8x8_idct 139 31
add8x8_idct8 269 39
add8x8_idct_dc 67 7
add16x16_idct 564 123
add16x16_idct_dc 260 22
dct4x4dc 18 10
idct4x4dc 16 9
sub4x4_dct 25 7
sub8x8_dct 101 12
sub8x8_dct8 160 25
sub16x16_dct 403 52
sub16x16_dct8 646 68
zigzag_scan_4x4_frame 4 1
Signed-off-by: zhoupeng <zhoupeng at loongson.cn>
- - - - -
5f84d403 by Loongson Technology Corporation Limited at 2023-10-12T17:28:23+08:00
loongarch: Improve the performance of pixel series functions
Performance has improved from 11.27fps to 20.50fps by using the
following command:
./configure && make -j5
./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
functions performance performance
(c) (asm)
hadamard_ac_8x8 117 21
hadamard_ac_8x16 236 42
hadamard_ac_16x8 235 31
hadamard_ac_16x16 473 60
intra_sad_x3_4x4 50 21
intra_sad_x3_8x8 183 34
intra_sad_x3_8x8c 181 36
intra_sad_x3_16x16 643 68
intra_satd_x3_4x4 83 61
intra_satd_x3_8x8c 344 81
intra_satd_x3_16x16 1389 136
sa8d_8x8 97 19
sa8d_16x16 394 68
satd_4x4 24 8
satd_4x8 51 11
satd_4x16 103 24
satd_8x4 52 9
satd_8x8 108 12
satd_8x16 218 24
satd_16x8 218 19
satd_16x16 437 38
ssd_4x4 10 5
ssd_4x8 24 8
ssd_4x16 42 15
ssd_8x4 23 5
ssd_8x8 37 9
ssd_8x16 74 17
ssd_16x8 72 11
ssd_16x16 140 23
var2_8x8 91 37
var2_8x16 176 66
var_8x8 50 15
var_8x16 65 29
var_16x16 132 56
Signed-off-by: Hecai Yuan <yuanhecai at loongson.cn>
- - - - -
8 changed files:
- Makefile
- common/cpu.c
- common/dct.c
- common/deblock.c
- + common/loongarch/dct-a.S
- + common/loongarch/dct.h
- + common/loongarch/deblock-a.S
- + common/loongarch/deblock.h
The diff was not included because it is too large.
View it on GitLab: https://code.videolan.org/videolan/x264/-/compare/5a9dfddea49aae58fd18750d130301c947f7d217...5f84d403fcaf15b717a5d08d07e4411f0dcb0013
--
View it on GitLab: https://code.videolan.org/videolan/x264/-/compare/5a9dfddea49aae58fd18750d130301c947f7d217...5f84d403fcaf15b717a5d08d07e4411f0dcb0013
You're receiving this email because of your account on code.videolan.org.
VideoLAN code repository instance
More information about the x264-devel
mailing list