[vlc-commits] [Git][videolan/vlc][master] 8 commits: youtube.lua: add extra "out of use" comment

Hugo Beauzée-Luyssen (@chouquette) gitlab at videolan.org
Thu Oct 21 02:54:06 UTC 2021



Hugo Beauzée-Luyssen pushed to branch master at VideoLAN / VLC


Commits:
b2c32b5e by Pierre Ynard at 2021-10-20T19:15:41+02:00
youtube.lua: add extra "out of use" comment

- - - - -
c7b4efcf by Pierre Ynard at 2021-10-20T19:15:43+02:00
youtube.lua: remove fallback to retired alternate video info API

After tightening access restrictions to it, the get_video_info YouTube
API was completely retired around July 2021, with an HTTP 410 Gone code.
All this fallback achieves anymore is poor UX.

- - - - -
a1786912 by Pierre Ynard at 2021-10-20T19:15:44+02:00
youtube.lua: fix up signature descrambling function name extraction

Javascript variables can contain other, special characters, also %a
depends on the locale.

- - - - -
8473b3bf by Pierre Ynard at 2021-10-20T19:15:46+02:00
youtube.lua: rework error handling for signature descrambling

- - - - -
095f0930 by Pierre Ynard at 2021-10-20T19:15:48+02:00
youtube.lua: rename signature descrambling function

Use a more specific name as this isn't the only parameter anymore that
we'll be descrambling by parsing and emulating javascript.

- - - - -
4cfa8b65 by Pierre Ynard at 2021-10-20T19:15:50+02:00
youtube.lua: factor out descrambling javascript fetching

We'll be descrambling the "n" parameter in addition to the URL signature
using this same javascript web asset, so we want to be able to share and
reuse it.

- - - - -
f3963e68 by Pierre Ynard at 2021-10-20T19:15:51+02:00
youtube.lua: retry fetching descrambling javascript asset once

This should help against transient errors, and parsing of the javascript
URL isn't the part that's most likely to break.

- - - - -
03e69578 by Pierre Ynard at 2021-10-20T19:15:53+02:00
youtube.lua: descramble "n" video URL parameter by parsing javascript

User agents are apparently now expected to do this; failure to do so
results in the video file data transfer getting throttled down to rates
such as 80 kB/s, 60 kB/s or 40 kB/s, below playback rate, and usually
resulting in a video that hangs upon loading or every few seconds, and
is impossible to play. This behavior seems to have first appeared in
June, but been fully rolled out only last week.

Just like with URL signatures, we interoperate with YouTube by
fulfilling what's apparently expected from us, using the same approach
as so far: we parse the descrambling rules from the javascript code, and
apply them.

Fixes #26174

- - - - -


1 changed file:

- share/lua/playlist/youtube.lua


Changes:

=====================================
share/lua/playlist/youtube.lua
=====================================
@@ -1,7 +1,7 @@
 --[[
  $Id$
 
- Copyright © 2007-2020 the VideoLAN team
+ Copyright © 2007-2021 the VideoLAN team
 
  This program is free software; you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
@@ -108,31 +108,339 @@ function js_extract( js, pattern )
             return ex
         end
     end
-    vlc.msg.err( "Couldn't process youtube video URL, please check for updates to this script" )
     return nil
 end
 
+-- Descramble the "n" parameter using the javascript code that does that
+-- in the web page
+function n_descramble( nparam, js )
+    if not js then
+        return nil
+    end
+
+    -- Look for the descrambler function's name
+    -- a.D&&(b=a.get("n"))&&(b=lha(b),a.set("n",b))}};
+    local descrambler = js_extract( js, '[=%(,&|](...?)%(.%),.%.set%("n",' )
+    if not descrambler then
+        vlc.msg.dbg( "Couldn't extract YouTube video throttling parameter descrambling function name" )
+        return nil
+    end
+
+    -- Fetch the code of the descrambler function
+    -- lha=function(a){var b=a.split(""),c=[310282131,"KLf3",b,null,function(d,e){d.push(e)},-45817231, [data and transformations...] ,1248130556];c[3]=c;c[15]=c;c[18]=c;try{c[40](c[14],c[2]),c[25](c[48]),c[21](c[32],c[23]), [scripted calls...] ,c[25](c[33],c[3])}catch(d){return"enhanced_except_4ZMBnuz-_w8_"+a}return b.join("")};
+    local code = js_extract( js, "^"..descrambler.."=function%([^)]*%){(.-)};" )
+    if not code then
+        vlc.msg.dbg( "Couldn't extract YouTube video throttling parameter descrambling code" )
+        return nil
+    end
+
+    -- Split code into two main sections: 1/ data and transformations,
+    -- and 2/ a script of calls
+    local datac, script = string.match( code, "c=%[(.*)%];.-;try{(.*)}catch%(" )
+    if ( not datac ) or ( not script ) then
+        vlc.msg.dbg( "Couldn't extract YouTube video throttling parameter descrambling rules" )
+        return nil
+    end
+
+    -- Split "n" parameter into a table as descrambling operates on it
+    -- as one of several arrays
+    local n = {}
+    for c in string.gmatch( nparam, "." ) do
+        table.insert( n, c )
+    end
+
+    -- Helper
+    local table_len = function( tab )
+        local len = 0
+        for i, val in ipairs( tab ) do
+            len = len + 1
+        end
+        return len
+    end
+
+    -- Common routine shared by the compound transformations,
+    -- compounding the "n" parameter with an input string,
+    -- character by character using a Base64 alphabet.
+    -- d.forEach(function(l,m,n){this.push(n[m]=h[(h.indexOf(l)-h.indexOf(this[m])+m-32+f--)%h.length])},e.split(""))
+    local compound = function( ntab, str, alphabet, charcode )
+        if ntab ~= n or type( str ) ~= "string" then
+            return true
+        end
+        local input = {}
+        for c in string.gmatch( str, "." ) do
+            table.insert( input, c )
+        end
+
+        local len = string.len( alphabet )
+        for i, c in ipairs( ntab ) do
+            if type( c ) ~= "string" then
+                return true
+            end
+            local pos1 = string.find( alphabet, c, 1, true )
+            local pos2 = string.find( alphabet, input[i], 1, true )
+            if ( not pos1 ) or ( not pos2 ) then
+                return true
+            end
+            local pos = ( pos1 - pos2 + charcode - 32 ) % len
+            local newc = string.sub( alphabet, pos + 1, pos + 1 )
+            ntab[i] = newc
+            table.insert( input, newc )
+        end
+    end
+
+    -- The data section contains among others function code for a number
+    -- of transformations, most of which are basic array operations.
+    -- We can match these functions' code to identify them, and emulate
+    -- the corresponding transformations.
+    local trans = {
+        reverse = {
+            func = function( tab )
+                local len = table_len( tab )
+                local tmp = {}
+                for i, val in ipairs( tab ) do
+                    tmp[len - i + 1] = val
+                end
+                for i, val in ipairs( tmp ) do
+                    tab[i] = val
+                end
+            end,
+            match = {
+                -- function(d){d.reverse()}
+                -- function(d){for(var e=d.length;e;)d.push(d.splice(--e,1)[0])}
+                "^function%(d%)",
+            }
+        },
+        append = {
+            func = function( tab, val )
+                table.insert( tab, val )
+            end,
+            match = {
+                -- function(d,e){d.push(e)}
+                "^function%(d,e%){d%.push%(e%)},",
+            }
+        },
+        remove = {
+            func = function( tab, i )
+                if type( i ) ~= "number" then
+                    return true
+                end
+                i = i % table_len( tab )
+                table.remove( tab, i + 1 )
+            end,
+            match = {
+                -- function(d,e){e=(e%d.length+d.length)%d.length;d.splice(e,1)}
+                "^[^}]-;d%.splice%(e,1%)},",
+            }
+        },
+        swap = {
+            func = function( tab, i )
+                if type( i ) ~= "number" then
+                    return true
+                end
+                i = i % table_len( tab )
+                local tmp = tab[1]
+                tab[1] = tab[i + 1]
+                tab[i + 1] = tmp
+            end,
+            match = {
+                -- function(d,e){e=(e%d.length+d.length)%d.length;var f=d[0];d[0]=d[e];d[e]=f}
+                -- function(d,e){e=(e%d.length+d.length)%d.length;d.splice(0,1,d.splice(e,1,d[0])[0])}
+                "^[^}]-;var f=d%[0%];d%[0%]=d%[e%];d%[e%]=f},",
+                "^[^}]-;d%.splice%(0,1,d%.splice%(e,1,d%[0%]%)%[0%]%)},",
+            }
+        },
+        rotate = {
+            func = function( tab, shift )
+                if type( shift ) ~= "number" then
+                    return true
+                end
+                local len = table_len( tab )
+                shift = shift % len
+                local tmp = {}
+                for i, val in ipairs( tab ) do
+                    tmp[( i - 1 + shift ) % len + 1] = val
+                end
+                for i, val in ipairs( tmp ) do
+                    tab[i] = val
+                end
+            end,
+            match = {
+                -- function(d,e){for(e=(e%d.length+d.length)%d.length;e--;)d.unshift(d.pop())}
+                -- function(d,e){e=(e%d.length+d.length)%d.length;d.splice(-e).reverse().forEach(function(f){d.unshift(f)})}
+                "^[^}]-d%.unshift%(d.pop%(%)%)},",
+                "^[^}]-d%.unshift%(f%)}%)},",
+            }
+        },
+        -- Compound transformations first build a variation of a
+        -- Base64 alphabet, then in a common section, compound the
+        -- "n" parameter with an input string, character by character.
+        compound1 = {
+            func = function( ntab, str )
+                return compound( ntab, str, "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_", 96 )
+            end,
+            match = {
+                -- function(d,e){for(var f=64,h=[];++f-h.length-32;)switch(f){case 58:f=96;continue;case 91:f=44;break;case 65:f=47;continue;case 46:f=153;case 123:f-=58;default:h.push(String.fromCharCode(f))} [ compound... ] }
+                "^[^}]-case 58:f=96;",
+            }
+        },
+        compound2 = {
+            func = function( ntab, str )
+                return compound( ntab, str,"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_", 96 )
+            end,
+            match = {
+                -- function(d,e){for(var f=64,h=[];++f-h.length-32;){switch(f){case 58:f-=14;case 91:case 92:case 93:continue;case 123:f=47;case 94:case 95:case 96:continue;case 46:f=95}h.push(String.fromCharCode(f))} [ compound... ] }
+                -- function(d,e){for(var f=64,h=[];++f-h.length-32;)switch(f){case 46:f=95;default:h.push(String.fromCharCode(f));case 94:case 95:case 96:break;case 123:f-=76;case 92:case 93:continue;case 58:f=44;case 91:} [ compound... ] }
+                "^[^}]-case 58:f%-=14;",
+                "^[^}]-case 58:f=44;",
+            }
+        },
+        -- Fallback
+        unid = {
+            func = function( )
+                vlc.msg.dbg( "Couldn't apply unidentified YouTube video throttling parameter transformation, aborting descrambling" )
+                return true
+            end,
+            match = {
+            }
+        },
+    }
+
+    -- The data section actually mixes input data, reference to the
+    -- "n" parameter array, and self-reference to its own array, with
+    -- transformation functions used to modify itself. We parse it
+    -- as such into a table.
+    local data = {}
+    datac = datac..","
+    while datac ~= "" do
+        local el = nil
+        -- Transformation functions
+        if string.match( datac, "^function%(" ) then
+            for name, tr in pairs( trans ) do
+                for i, match in ipairs( tr.match ) do
+                    if string.match( datac, match ) then
+                        el = tr.func
+                        break
+                    end
+                end
+                if el then
+                    break
+                end
+            end
+            if not el then
+                el = trans.unid.func
+                vlc.msg.warn( "Couldn't parse unidentified YouTube video throttling parameter transformation" )
+            end
+
+            -- Compounding functions use a subfunction, so we need to be
+            -- more specific in how much parsed data we consume.
+            if el == trans.compound1.func or el == trans.compound2.func then
+                datac = string.match( datac, '^.-},e%.split%(""%)%)},(.*)$' )
+            else
+                datac = string.match( datac, "^.-},(.*)$" )
+            end
+
+        -- String input data
+        elseif string.match( datac, '^"[^"]*",' ) then
+            el, datac = string.match( datac, '^"([^"]*)",(.*)$' )
+        -- Integer input data
+        elseif string.match( datac, '^-?%d+,' ) then
+            el, datac = string.match( datac, "^(.-),(.*)$" )
+            el = tonumber( el )
+        -- Reference to "n" parameter array
+        elseif string.match( datac, '^b,' ) then
+            el = n
+            datac = string.match( datac, "^b,(.*)$" )
+        -- Replaced by self-reference to data array after its declaration
+        elseif string.match( datac, '^null,' ) then
+            el = data
+            datac = string.match( datac, "^null,(.*)$" )
+        else
+            vlc.msg.warn( "Couldn't parse unidentified YouTube video throttling parameter descrambling data" )
+            el = false -- Lua tables can't contain nil values
+            datac = string.match( datac, "^[^,]-,(.*)$" )
+        end
+
+        table.insert( data, el )
+    end
+
+    -- Debugging helper to print data array elements
+    local prd = function( el, tab )
+        if not el then
+            return "???"
+        elseif el == n then
+            return "n"
+        elseif el == data then
+            return "data"
+        elseif type( el ) == "string" then
+            return '"'..el..'"'
+        elseif type( el ) == "number" then
+            el = tostring( el )
+            if type( tab ) == "table" then
+                el = el.." -> "..( el % table_len( tab ) )
+            end
+            return el
+        else
+            for name, tr in pairs( trans ) do
+                if el == tr.func then
+                    return name
+                end
+            end
+            return tostring( el )
+        end
+    end
+
+    -- The script section contains a series of calls to elements of
+    -- the data section array onto other elements of it: calls to
+    -- transformations, with a reference to the data array itself or
+    -- the "n" parameter array as first argument, and often input data
+    -- as a second argument. We parse and emulate those calls to follow
+    -- the descrambling script.
+    -- c[40](c[14],c[2]),c[25](c[48]),c[21](c[32],c[23]), [...]
+    for ifunc, itab, iarg in string.gmatch( script, "c%[(%d+)%]%(c%[(%d+)%]([^)]-)%)" ) do
+        iarg = string.match( iarg, ",c%[(%d+)%]" )
+
+        local func = data[tonumber( ifunc ) + 1]
+        local tab = data[tonumber( itab ) + 1]
+        local arg = iarg and data[tonumber( iarg ) + 1]
+
+        -- Uncomment to debug transformation chain
+        --vlc.msg.dbg( '"n" parameter transformation: '..prd( func ).."("..prd( tab )..( arg ~= nil and ( ", "..prd( arg, tab ) ) or "" )..") "..ifunc.."("..itab..( iarg and ( ", "..iarg ) or "" )..")" )
+        --local nprev = table.concat( n )
+
+        if type( func ) ~= "function" or type( tab ) ~= "table"
+            or func( tab, arg ) then
+            vlc.msg.dbg( "Invalid data type encountered during YouTube video throttling parameter descrambling transformation chain, aborting" )
+            vlc.msg.dbg( "Couldn't descramble YouTube throttling URL parameter: data transfer will get throttled" )
+            vlc.msg.err( "Couldn't process youtube video URL, please check for updates to this script" )
+            break
+        end
+
+        -- Uncomment to debug transformation chain
+        --local nnew = table.concat( n )
+        --if nprev ~= nnew then
+        --    vlc.msg.dbg( '"n" parameter transformation: '..nprev.." -> "..nnew )
+        --end
+    end
+
+    return table.concat( n )
+end
+
 -- Descramble the URL signature using the javascript code that does that
 -- in the web page
-function js_descramble( sig, js_url )
-    -- Fetch javascript code
-    local js = { stream = vlc.stream( js_url ), lines = {}, i = 0 }
-    if not js.stream then
-        vlc.msg.err( "Couldn't process youtube video URL, please check for updates to this script" )
-        return sig
+function sig_descramble( sig, js )
+    if not js then
+        return nil
     end
 
     -- Look for the descrambler function's name
-    -- if(k.s){var l=k.sp,m=pt(decodeURIComponent(k.s));f.set(l,encodeURIComponent(m))}
-    -- Descrambler function name - 3 chars length
     -- if(h.s){var l=h.sp,m=wja(decodeURIComponent(h.s));f.set(l,encodeURIComponent(m))}
     -- k.s (from stream map field "s") holds the input scrambled signature
     -- k.sp (from stream map field "sp") holds a parameter name (normally
     -- "signature" or "sig") to set with the output, descrambled signature
-    local descrambler = js_extract( js, "[=%(,&|](%a?%a?%a?)%(decodeURIComponent%(.%.s%)%)" )
+    local descrambler = js_extract( js, "[=%(,&|](...?)%(decodeURIComponent%(.%.s%)%)" )
     if not descrambler then
         vlc.msg.dbg( "Couldn't extract youtube video URL signature descrambling function name" )
-        return sig
+        return nil
     end
 
     -- Fetch the code of the descrambler function
@@ -140,15 +448,14 @@ function js_descramble( sig, js_url )
     local rules = js_extract( js, "^"..descrambler.."=function%([^)]*%){(.-)};" )
     if not rules then
         vlc.msg.dbg( "Couldn't extract youtube video URL signature descrambling rules" )
-        return sig
+        return nil
     end
 
     -- Get the name of the helper object providing transformation definitions
     local helper = string.match( rules, ";(..)%...%(" )
     if not helper then
         vlc.msg.dbg( "Couldn't extract youtube video URL signature transformation helper name" )
-        vlc.msg.err( "Couldn't process youtube video URL, please check for updates to this script" )
-        return sig
+        return nil
     end
 
     -- Fetch the helper object code
@@ -156,7 +463,7 @@ function js_descramble( sig, js_url )
     local transformations = js_extract( js, "[ ,]"..helper.."={(.-)};" )
     if not transformations then
         vlc.msg.dbg( "Couldn't extract youtube video URL signature transformation code" )
-        return sig
+        return nil
     end
 
     -- Parse the helper object to map available transformations
@@ -208,7 +515,7 @@ function js_descramble( sig, js_url )
 end
 
 -- Parse and assemble video stream URL
-function stream_url( params, js_url )
+function stream_url( params, js )
     local url = string.match( params, "url=([^&]+)" )
     if not url then
         return nil
@@ -220,10 +527,11 @@ function stream_url( params, js_url )
     if s then
         s = vlc.strings.decode_uri( s )
         vlc.msg.dbg( "Found "..string.len( s ).."-character scrambled signature for youtube video URL, attempting to descramble... " )
-        if js_url then
-            s = js_descramble( s, js_url )
-        else
+        local ds = sig_descramble( s, js )
+        if not ds then
+            vlc.msg.dbg( "Couldn't descramble YouTube video URL signature" )
             vlc.msg.err( "Couldn't process youtube video URL, please check for updates to this script" )
+            ds = s
         end
 
         local sp = string.match( params, "sp=([^&]+)" )
@@ -231,13 +539,13 @@ function stream_url( params, js_url )
             vlc.msg.warn( "Couldn't extract signature parameters for youtube video URL, guessing" )
             sp = "signature"
         end
-        url = url.."&"..sp.."="..vlc.strings.encode_uri_component( s )
+        url = url.."&"..sp.."="..vlc.strings.encode_uri_component( ds )
     end
 
     return url
 end
 
--- Parse and pick our video stream URL (classic parameters)
+-- Parse and pick our video stream URL (classic parameters, out of use)
 function pick_url( url_map, fmt, js_url )
     for stream in string.gmatch( url_map, "[^,]+" ) do
         local itag = string.match( stream, "itag=(%d+)" )
@@ -288,19 +596,54 @@ function pick_stream( stream_map, js_url )
         return nil
     end
 
+    -- Fetch javascript code: we'll need this to descramble maybe the
+    -- URL signature, and normally always the "n" throttling parameter.
+    local js = nil
+    if js_url then
+        js = { stream = vlc.stream( js_url ), lines = {}, i = 0 }
+        if not js.stream then
+            -- Retry once for transient errors
+            js.stream = vlc.stream( js_url )
+            if not js.stream then
+                js = nil
+            end
+        end
+    end
+
     -- Either the "url" or the "signatureCipher" parameter is present,
     -- depending on whether the URL signature is scrambled.
+    local url
     local cipher = string.match( pick, '"signatureCipher":"(.-)"' )
         or string.match( pick, '"[a-zA-Z]*[Cc]ipher":"(.-)"' )
     if cipher then
         -- Scrambled signature: some assembly required
-        local url = stream_url( cipher, js_url )
-        if url then
-            return url
+        url = stream_url( cipher, js )
+    end
+    if not url then
+        -- Unscrambled signature, already included in ready-to-use URL
+        url = string.match( pick, '"url":"(.-)"' )
+    end
+
+    if not url then
+        return nil
+    end
+
+    -- The "n" parameter is scrambled too, and needs to be descrambled
+    -- and replaced in place, otherwise the data transfer gets throttled
+    -- down to between 40 and 80 kB/s, below real-time playability level..
+    local n = string.match( url, "[?&]n=([^&]+)" )
+    if n then
+        n = vlc.strings.decode_uri( n )
+        local dn = n_descramble( n, js )
+        if dn then
+            url = string.gsub( url, "([?&])n=[^&]+", "%1n="..vlc.strings.encode_uri_component( dn ), 1 )
+        else
+            vlc.msg.dbg( "Couldn't descramble YouTube throttling URL parameter: data transfer will get throttled" )
+            vlc.msg.err( "Couldn't process youtube video URL, please check for updates to this script" )
         end
     end
-    -- Unscrambled signature, already included in ready-to-use URL
-    return string.match( pick, '"url":"(.-)"' )
+
+    return url
 end
 
 -- Probe function.
@@ -492,28 +835,6 @@ function parse()
             end
         end
 
-        if not path then
-            local video_id = get_url_param( vlc.path, "v" )
-            if video_id then
-                -- Passing no "el" parameter to /get_video_info seems to
-                -- let it default to "embedded", and both known values
-                -- of "embedded" and "detailpage" have historically been
-                -- wrong and failed for various restricted videos.
-                path = vlc.access.."://www.youtube.com/get_video_info?video_id="..video_id..copy_url_param( vlc.path, "fmt" )
-
-                -- The YouTube API output doesn't provide the URL to the
-                -- javascript code necessary to descramble URL signatures,
-                -- without which functionality can be seriously limited.
-                -- #18801 prevents us from using a subrequest to the API,
-                -- so we forward the URL this way.
-                if js_url then
-                    path = path.."&jsurl="..vlc.strings.encode_uri_component( js_url )
-                end
-
-                vlc.msg.warn( "Couldn't extract video URL, falling back to alternate youtube API" )
-            end
-        end
-
         if not path then
             vlc.msg.err( "Couldn't extract youtube video URL, please check for updates to this script" )
             return { }
@@ -525,7 +846,12 @@ function parse()
 
         return { { path = path; name = title; description = description; artist = artist; arturl = arturl } }
 
-    elseif string.match( vlc.path, "/get_video_info%?" ) then -- video info API
+    elseif string.match( vlc.path, "/get_video_info%?" ) then
+        -- video info API, retired since summer 2021
+        -- Replacement Innertube API requires HTTP POST requests
+        -- and so remains for now unworkable from lua parser scripts
+        -- (see #26185)
+
         local line = vlc.read( 1024*1024 ) -- data is on one line only
         if not line then
             vlc.msg.err( "YouTube API output missing" )



View it on GitLab: https://code.videolan.org/videolan/vlc/-/compare/3379c7bdba42984d56d311fcdc9810308b3a08b7...03e6957832952e118ea173955485c32783438abe

-- 
View it on GitLab: https://code.videolan.org/videolan/vlc/-/compare/3379c7bdba42984d56d311fcdc9810308b3a08b7...03e6957832952e118ea173955485c32783438abe
You're receiving this email because of your account on code.videolan.org.




More information about the vlc-commits mailing list