[x264-devel] [Git][videolan/x264][master] 9 commits: loongarch: Init LSX/LASX support

Anton Mitrofanov (@BugMaster) gitlab at videolan.org
Thu Oct 12 20:17:42 UTC 2023



Anton Mitrofanov pushed to branch master at VideoLAN / x264


Commits:
1ecc51ee by Loongson Technology Corporation Limited at 2023-10-10T09:00:09+08:00
loongarch: Init LSX/LASX support

LSX/LASX is the LOONGARCH 128-bit/256-bit SIMD Architecture.

Signed-off-by: Shiyou Yin <yinshiyou-hf at loongson.cn>
Signed-off-by: Xiwei Gu <guxiwei-hf at loongson.cn>

- - - - -
25ffd616 by Loongson Technology Corporation Limited at 2023-10-10T09:00:47+08:00
loongarch: Add loongson_asm.S and loongson_utils.S

Common macros and functions for loongson optimization.

Signed-off-by: Shiyou Yin <yinshiyou-hf at loongson.cn>

- - - - -
d7d283f6 by Loongson Technology Corporation Limited at 2023-10-10T09:04:49+08:00
loongarch: Improve the performance of deblock series functions.

Performance has improved from 4.76fps to 4.92fps.
Tested with following command:
./configure && make -j5
./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv

functions           performance     performance
                        (c)            (asm)
deblock_luma[0]         79               39
deblock_luma[1]         91               18
deblock_luma_intra[0]   63               44
deblock_luma_intra[1]   71               18
deblock_strength        104              33

Signed-off-by: Hao Chen <chenhao at loongson.cn>

- - - - -
00b8e3b9 by Loongson Technology Corporation Limited at 2023-10-10T09:09:52+08:00
loongarch: Improve the performance of sad/sad_x3/sad_x4 series functions

Performance has improved from 4.92fps to 6.32fps.
Tested with following command:
./configure && make -j5
./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv

functions           performance     performance
                        (c)            (asm)
sad_4x4                 13               3
sad_4x8                 26               7
sad_4x16                57               13
sad_8x4                 24               3
sad_8x8                 54               8
sad_8x16                108              13
sad_16x8                95               8
sad_16x16               189              13
sad_x3_4x4              37               6
sad_x3_4x8              71               13
sad_x3_8x4              70               8
sad_x3_8x8              162              14
sad_x3_8x16             323              25
sad_x3_16x8             279              15
sad_x3_16x16            555              27
sad_x4_4x4              49               8
sad_x4_4x8              95               17
sad_x4_8x4              94               8
sad_x4_8x8              214              16
sad_x4_8x16             429              33
sad_x4_16x8             372              18
sad_x4_16x16            740              34

Signed-off-by: wanglu <wanglu at loongson.cn>

- - - - -
d8ed272a by Loongson Technology Corporation Limited at 2023-10-10T09:13:58+08:00
loongarch: Improve the performance of predict series functions

Performance has improved from 6.32fps to 6.34fps.
Tested with following command:
./configure && make -j5
./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv

functions           performance     performance
                        (c)            (asm)
intra_predict_4x4_dc     3               2
intra_predict_4x4_dc8    1               1
intra_predict_4x4_dcl    2               1
intra_predict_4x4_dct    2               1
intra_predict_4x4_ddl    7               2
intra_predict_4x4_h      2               1
intra_predict_4x4_v      1               1
intra_predict_8x8_dc     8               2
intra_predict_8x8_dc8    1               1
intra_predict_8x8_dcl    5               2
intra_predict_8x8_dct    5               2
intra_predict_8x8_ddl    27              3
intra_predict_8x8_ddr    26              3
intra_predict_8x8_h      4               2
intra_predict_8x8_v      3               1
intra_predict_8x8_vl     29              3
intra_predict_8x8_vr     31              4
intra_predict_8x8c_dc    8               5
intra_predict_8x8c_dc8   1               1
intra_predict_8x8c_dcl   5               3
intra_predict_8x8c_dct   5               3
intra_predict_8x8c_h     4               2
intra_predict_8x8c_p     58              30
intra_predict_8x8c_v     4               1
intra_predict_16x16_dc   32              8
intra_predict_16x16_dc8  9               4
intra_predict_16x16_dcl  26              6
intra_predict_16x16_dct  26              6
intra_predict_16x16_h    23              7
intra_predict_16x16_p    182             44
intra_predict_16x16_v    22              4

Signed-off-by: Xiwei Gu <guxiwei-hf at loongson.cn>

- - - - -
65e7bac5 by Loongson Technology Corporation Limited at 2023-10-10T09:15:32+08:00
loongarch: Improve the performance of quant series functions

Performance has improved from 6.34fps to 6.78fps.
Tested with following command:
./configure && make -j5
./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv

functions           performance     performance
                        (c)            (asm)
coeff_last15             3               2
coeff_last16             3               1
coeff_last64             42              6
decimate_score15         8               12
decimate_score16         8               11
decimate_score64         61              43
dequant_4x4_cqm          16              5
dequant_4x4_dc_cqm       13              5
dequant_4x4_dc_flat      13              5
dequant_4x4_flat         16              5
dequant_8x8_cqm          71              9
dequant_8x8_flat         71              9

Signed-off-by: Shiyou Yin <yinshiyou-hf at loongson.cn>

- - - - -
981c8f25 by Loongson Technology Corporation Limited at 2023-10-12T17:27:40+08:00
loongarch: Improve the performance of mc series functions

Performance has improved from 6.78fps to 10.53fps.
Tested with following command:
./configure && make -j5
./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv

functions           performance     performance
                        (c)            (asm)
avg_4x2                  16              5
avg_4x4                  30              6
avg_4x8                  63              10
avg_4x16                 124             19
avg_8x4                  60              6
avg_8x8                  119             10
avg_8x16                 233             19
avg_16x8                 229             21
avg_16x16                451             41
get_ref_4x4              30              9
get_ref_4x8              52              11
get_ref_8x4              45              9
get_ref_8x8              80              11
get_ref_8x16             156             16
get_ref_12x10            137             13
get_ref_16x8             147             11
get_ref_16x16            282             16
get_ref_20x18            278             22
hpel_filter              5163            686
lowres_init              5440            286
mc_chroma_2x2            24              7
mc_chroma_2x4            42              10
mc_chroma_4x2            41              7
mc_chroma_4x4            75              10
mc_chroma_4x8            144             19
mc_chroma_8x4            137             15
mc_chroma_8x8            269             28
mc_luma_4x4              30              10
mc_luma_4x8              52              12
mc_luma_8x4              44              10
mc_luma_8x8              80              13
mc_luma_8x16             156             19
mc_luma_16x8             147             13
mc_luma_16x16            281             19
memcpy_aligned           14              9
memzero_aligned          24              4
offsetadd_w4             79              18
offsetadd_w8             142             18
offsetadd_w16            277             25
offsetadd_w20            1118            38
offsetsub_w4             75              18
offsetsub_w8             140             18
offsetsub_w16            265             25
offsetsub_w20            989             39
weight_w4                111             19
weight_w8                205             19
weight_w16               396             29
weight_w20               1143            45
deinterleave_chroma_fdec 76              9
deinterleave_chroma_fenc 86              9
plane_copy_deinterleave  733             90
plane_copy_interleave    791             245
store_interleave_chroma  82              12

Signed-off-by: Xiwei Gu <guxiwei-hf at loongson.cn>

- - - - -
fa7f1fce by Loongson Technology Corporation Limited at 2023-10-12T17:28:15+08:00
loongarch: Improve the performance of dct series functions

Performance has improved from 10.53fps to 11.27fps.
Tested with following command:
./configure && make -j5
./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv

functions           performance     performance
                        (c)            (asm)
add4x4_idct              34              9
add8x8_idct              139             31
add8x8_idct8             269             39
add8x8_idct_dc           67              7
add16x16_idct            564             123
add16x16_idct_dc         260             22
dct4x4dc                 18              10
idct4x4dc                16              9
sub4x4_dct               25              7
sub8x8_dct               101             12
sub8x8_dct8              160             25
sub16x16_dct             403             52
sub16x16_dct8            646             68
zigzag_scan_4x4_frame    4               1

Signed-off-by: zhoupeng <zhoupeng at loongson.cn>

- - - - -
5f84d403 by Loongson Technology Corporation Limited at 2023-10-12T17:28:23+08:00
loongarch: Improve the performance of pixel series functions

Performance has improved from 11.27fps to 20.50fps by using the
following command:
./configure && make -j5
./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv

functions           performance     performance
                        (c)            (asm)
hadamard_ac_8x8          117             21
hadamard_ac_8x16         236             42
hadamard_ac_16x8         235             31
hadamard_ac_16x16        473             60
intra_sad_x3_4x4         50              21
intra_sad_x3_8x8         183             34
intra_sad_x3_8x8c        181             36
intra_sad_x3_16x16       643             68
intra_satd_x3_4x4        83              61
intra_satd_x3_8x8c       344             81
intra_satd_x3_16x16      1389            136
sa8d_8x8                 97              19
sa8d_16x16               394             68
satd_4x4                 24              8
satd_4x8                 51              11
satd_4x16                103             24
satd_8x4                 52              9
satd_8x8                 108             12
satd_8x16                218             24
satd_16x8                218             19
satd_16x16               437             38
ssd_4x4                  10              5
ssd_4x8                  24              8
ssd_4x16                 42              15
ssd_8x4                  23              5
ssd_8x8                  37              9
ssd_8x16                 74              17
ssd_16x8                 72              11
ssd_16x16                140             23
var2_8x8                 91              37
var2_8x16                176             66
var_8x8                  50              15
var_8x16                 65              29
var_16x16                132             56

Signed-off-by: Hecai Yuan <yuanhecai at loongson.cn>

- - - - -


8 changed files:

- Makefile
- common/cpu.c
- common/dct.c
- common/deblock.c
- + common/loongarch/dct-a.S
- + common/loongarch/dct.h
- + common/loongarch/deblock-a.S
- + common/loongarch/deblock.h


The diff was not included because it is too large.


View it on GitLab: https://code.videolan.org/videolan/x264/-/compare/5a9dfddea49aae58fd18750d130301c947f7d217...5f84d403fcaf15b717a5d08d07e4411f0dcb0013

-- 
View it on GitLab: https://code.videolan.org/videolan/x264/-/compare/5a9dfddea49aae58fd18750d130301c947f7d217...5f84d403fcaf15b717a5d08d07e4411f0dcb0013
You're receiving this email because of your account on code.videolan.org.


VideoLAN code repository instance


More information about the x264-devel mailing list