Download YouTube 4K Videos with PHP
Among friends let’s agree we’ll be privately caching1 videos and not permanently saving them, or we’ll be using them for Fair Use, and we’ll certainly not upload nor share these videos outside of the originating platform (e.g. YouTube.com).
Existing YouTube downloader scripts:
- YouTube-Downloader (does not work with videos using a cipher signature)
- YouTube video downloader script (doesn’t work with copyrighted videos)
- Youtube Search and Download PHP Script (does not work with videos using a cipher signature)
- youtube-dl (well-developed, but must be updated constantly)
These scripts all have limitations with cipher signatures, copyrighted videos, or need continual updating. We won’t have those problems with my approach. But first, TOS boilerplate.
YouTube TOS (Terms of Service)
I’ll actually be staying within the TOS of YouTube. Here is what it states:
“You may access Content for your information and personal use solely as intended through the provided functionality of the Service and as permitted under these Terms of Service.”
Check. We’ll be using a web browser (the “provided functionality”) to access the videos directly from the hosting web site, and we’ll be caching them.
“You shall not download any Content unless you see a ‘download’ or similar link displayed by YouTube on the Service for that Content.”
Check. We’re not going to download anything. YouTube is going to autoplay videos and save the data to our disk. YouTube is actually forcing data to our disk when you think about it. If we do download a video, then it will be for Fair Use.
“You shall not copy, reproduce, distribute, transmit, broadcast, display, sell, license, or otherwise exploit any Content …”
Check. We’ve agreed among friends not to do this.
Using PHP to download videos
There are desktop programs, CLI programs (e.g. youtube-dl) and phone apps as well as sites (e.g. savetube.com, keepvid.com) that let you save videos. My goal here is to demonstrate a simple way to download even the newest YouTube videos (and other HTML5 videos) via network inspection.
I wrote an iOS app that hooks into network requests in a UIWebView
to get the direct video URL for offline caching. It’s been humming along for years without code modification. I’ll demonstrate the same technique using PHP. Here I’ll describe how I use my network-request-hooking method used in the iOS UIWebView
with a headless browser (a browser with no GUI) to achieve an automated video downloader.
PhantomJS video downloading capability
First, let’s see what PhantomJS can do with network inspection.
“Because PhantomJS permits the inspection of network traffic, it is suitable to build various analysis on the network behavior and performance. All the resource requests and responses can be sniffed using
onResourceRequested
andonResourceReceived
.”
Perfect, almost. PhantomJS is able to do my network trick, and ultimately I’d only need it to visit an autoplaying YouTube video link, sit back, and monitor network traffic. However, there is still the matter of HTML5 video and Flash support.
Unsupported features: Support for plugins (such as Flash) was dropped a long time ago. … Video and Audio would require shipping a variety of different codecs.
PhantomJS doesn’t support Flash nor HTML5 video. If you have a similar way of thinking to myself, you will try to inject some JavaScript2 to make websites think you have Flash and/or HTML5 enabled to get at src
properties.
This works with Flash because you could get the URL for the Flash object (i.e. the <embed>
tag) and the encoded video URL which together would pull video data from the direct URL.
1 2 3 4 5 6 7 8 | page.onInitialized = function () { page.evaluate(function () { window.navigator = { plugins: { "Shockwave Flash": { description: "Shockwave Flash 11.2 e202" } }, mimeTypes: { "application/x-shockwave-flash": { enabledPlugin: true } } }; }); }; |
As for HTML5 video mocking, it is possible to make web sites believe the <video>
tag and any video format is supported. However, it isn’t possible to get the direct video URL from the encoded URL because the mocked video tag doesn’t actually play.
<video>
tag enough to get at the direct video URL.1 2 3 4 5 6 7 8 9 10 11 12 | page.onInitialized = function () { page.evaluate(function () { var create = document.createElement; document.createElement = function (tag) { var elem = create.call(document, tag); if (tag === "video") { elem.canPlayType = function () { return "probably" }; } return elem; }; }); }; |
We need a different solution than PhantomJS.
SlimerJS/Firefox can handle HTML5 video and Flash
We need an actual browser to make this work. Enter SlimerJS which has a similar API to PhantomJS, but uses an actual Firefox browser to render web pages, naturally supporting HTML5 video and Flash.
Let’s take a look at a sample request captured in a log script from the page https://www.youtube.com/watch?v=dQw4w9WgXcQ which is just a random video.
Sample request
A script on the above page crafted or retrieved this URL with a signature, requesting IP, expiration, content length and a slew of other parameters. This configuration regularly changes, and the validity of URL is short-lived plus it is restricted to being accessed by the same IP embedded in the URL, and there may be a cipher signature as well. That means copying it and manually entering in into a client-side browser will most likely fail with a 403 Forbidden
error.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | requested: { "id": 17, "method": "GET", "url": "https://r12---sn-p5qlsn7s.googlevideo.com/videoplayback?mn=sn-p5qlsn7s&mm=31&keepalive=yes&key=yt6&ip=178.61.225.128&ms=au&ipbits=0&initcwndbps=4252500&gir=yes&mt=1477460796&mv=m&requiressl=yes&id=o-ABKdWC_0b46GzsqDSHLAdpdAtiXM4Vv-3slPlpjLn5EN&pl=21&sparams=clen%2Cdur%2Cei%2Cgcr%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Ckeepalive%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cnh%2Cpl%2Crequiressl%2Csource%2Cupn%2Cexpire&gcr=us&dur=212.000&ei=iEMQWJPjOqGe8gSo5oeQCw&itag=243&nh=IgpwcjAzLmlhZDA3KgkxMjcuMC4wLjE&mime=video%2Fwebm&expire=1477482473&lmt=1464141682338873&upn=Nhid0hSZ8fw&clen=10527660&source=youtube&cpn=jPWgi3uZpGc46q9n&alr=yes&ratebypass=yes&signature=070670F79B6E92B46A835F23C8EB6D94E4585475.BA5C58F7B1114EBF96B9347E0521D236CD6610B5&c=WEB&cver=1.20161025&range=0-127214&rn=1&rbuf=0", "time": "2016-10-26T05:47:56.271Z", "headers": [ { "name": "Host", "value": "r12---sn-p5qlsn7s.googlevideo.com" }, { "name": "User-Agent", "value": "Mozilla/5.0 (X11; Linux x86_64; rv:49.0) Gecko/20100101 SlimerJS/0.10.1" }, { "name": "Accept", "value": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" }, { "name": "Accept-Language", "value": "en-US,en;q=0.5" }, { "name": "Accept-Encoding", "value": "gzip, deflate, br" }, { "name": "Referer", "value": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }, { "name": "Origin", "value": "https://www.youtube.com" } ] } |
Trying to circumvent this protection by manually editing the URL, reverse-engineering and modifying the script that crafted it, or otherwise editing the calling page is against the TOS. It would also be time better spent on something else as the protection and scripts continually change. This is where the cat-and-mouse game played by “downloader” web sites begins. Fortunately, the hosting server will happily return the legitimate video data without any intervention on the user’s part.
Sample response
In the request URL is the parameter &range=0-127214
and in the sample response there is "bodySize": 127215
with a matching Content-Length
of 127215 (~15.5 KB) confirming that the video data is indeed downloading to the cache on the server (because we are using SlimerJS and a headless Firefox browser on the server).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | received: { "id": 17, "url": "https://r12---sn-p5qlsn7s.googlevideo.com/videoplayback?mn=sn-p5qlsn7s&mm=31&keepalive=yes&key=yt6&ip=178.61.225.128&ms=au&ipbits=0&initcwndbps=4252500&gir=yes&mt=1477460796&mv=m&requiressl=yes&id=o-ABKdWC_0b46GzsqDSHLAdpdAtiXM4Vv-3slPlpjLn5EN&pl=21&sparams=clen%2Cdur%2Cei%2Cgcr%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Ckeepalive%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cnh%2Cpl%2Crequiressl%2Csource%2Cupn%2Cexpire&gcr=us&dur=212.000&ei=iEMQWJPjOqGe8gSo5oeQCw&itag=243&nh=IgpwcjAzLmlhZDA3KgkxMjcuMC4wLjE&mime=video%2Fwebm&expire=1477482473&lmt=1464141682338873&upn=Nhid0hSZ8fw&clen=10527660&source=youtube&cpn=jPWgi3uZpGc46q9n&alr=yes&ratebypass=yes&signature=070670F79B6E92B46A835F23C8EB6D94E4585475.BA5C58F7B1114EBF96B9347E0521D236CD6610B5&c=WEB&cver=1.20161025&range=0-127214&rn=1&rbuf=0", "time": "2016-10-26T05:47:56.699Z", "headers": [ { "name": "Last-Modified", "value": "Wed, 25 May 2016 02:01:22 GMT" }, { "name": "Content-Type", "value": "video/webm" }, { "name": "Date", "value": "Wed, 26 Oct 2016 05:47:56 GMT" }, { "name": "Expires", "value": "Wed, 26 Oct 2016 05:47:56 GMT" }, { "name": "Cache-Control", "value": "private, max-age=21297" }, { "name": "Accept-Ranges", "value": "bytes" }, { "name": "Content-Length", "value": "127215" }, { "name": "Connection", "value": "keep-alive" }, { "name": "Alt-Svc", "value": "quic=\":443\"; ma=2592000" }, { "name": "access-control-allow-origin", "value": "https://www.youtube.com" }, { "name": "Access-Control-Allow-Credentials", "value": "true" }, { "name": "timing-allow-origin", "value": "https://www.youtube.com" }, { "name": "Access-Control-Expose-Headers", "value": "Client-Protocol, Content-Length, Content-Type, X-Bandwidth-Est, X-Bandwidth-Est2, X-Bandwidth-Est-Comp, X-Bandwidth-Avg, X-Walltime-Ms" }, { "name": "x-content-type-options", "value": "nosniff" }, { "name": "Server", "value": "gvs 1.0" } ], "bodySize": 127215, "contentType": "video/webm", "contentCharset": "", "redirectURL": null, "stage": "end", "status": 200, "statusText": "OK", "referrer": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "isFileDownloading": false, "body": "", "httpVersion": { "major": 1, "minor": 1 } } |
Intercepting requests
From here one could either retrieve and assemble the cache chunks in the Firefox profile cache folder on the server manually, or intercept the GET requests and curl
the video and audio data to a predetermined cache location instead. Here is the matching cURL command for the above example request.
1 | curl -v -XGET -H 'Origin: https://www.youtube.com' -H 'Referer: https://www.youtube.com/watch?v=dQw4w9WgXcQ' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Host: r12---sn-p5qlsn7s.googlevideo.com' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:49.0) Gecko/20100101 SlimerJS/0.10.1' 'https://r12---sn-p5qlsn7s.googlevideo.com/videoplayback?mn=sn-p5qlsn7s&mm=31&keepalive=yes&key=yt6&ip=178.61.225.128&ms=au&ipbits=0&initcwndbps=4252500&gir=yes&mt=1477460796&mv=m&requiressl=yes&id=o-ABKdWC_0b46GzsqDSHLAdpdAtiXM4Vv-3slPlpjLn5EN&pl=21&sparams=clen%2Cdur%2Cei%2Cgcr%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Ckeepalive%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cnh%2Cpl%2Crequiressl%2Csource%2Cupn%2Cexpire&gcr=us&dur=212.000&ei=iEMQWJPjOqGe8gSo5oeQCw&itag=243&nh=IgpwcjAzLmlhZDA3KgkxMjcuMC4wLjE&mime=video%2Fwebm&expire=1477482473&lmt=1464141682338873&upn=Nhid0hSZ8fw&clen=10527660&source=youtube&cpn=jPWgi3uZpGc46q9n&alr=yes&ratebypass=yes&signature=070670F79B6E92B46A835F23C8EB6D94E4585475.BA5C58F7B1114EBF96B9347E0521D236CD6610B5&c=WEB&cver=1.20161025&range=0-127214&rn=1&rbuf=0' |
Selecting video quality
According to YouTube,
By default, the quality setting will be on “Auto” and YouTube will use the highest quality based on your video player size.
To help YouTube choose at most 1080p
videos we can increase the viewport size in the SlimerJS script:
1 2 | var page = require('webpage').create(); page.viewportSize = {width: 1920, height: 1080}; |
This renders a very large HTML5 player and in turn results in a better video quality being selected (up to 1080p
) for the best user experience possible. This is again without any hacking or reverse engineering. That is very nice of YouTube.
4K Ultra HD video quality
Now, to get even higher quality videos like 4K Ultra HD videos, we can take advantage of Window.localStorage
. YouTube remembers the quality you “manually” select and stores that setting in local storage. Here is an example from a video where I selected 1440p
.
The data in local storage in this example is:
1 2 3 4 5 | yt-player-bandwidth: {"data":"{\"delay\":0.286,\"tailDelay\":4.6732711722500194e-8,\"byterate\":1285019.6078431373}","expiration":1481096139603,"creation":1478504139603} yt-player-quality: {"data":"hd1440","expiration":1481096136093,"creation":1478504136093} yt-player-volume: {"data":"{\"volume\":100,\"muted\":true}","expiration":1481073831227,"creation":1478481831227} yt-remote-connected-devices: {"data":"[]","expiration":1478570921468,"creation":1478484521468} yt-remote-device-id: {"data":"cb79f623-7ae3-4e52-a662-7ab4e9bddf9c","expiration":1510017570833,"creation":1478481570834} |
It looks like the yt-player-quality
value is stored for a month and then expires.
Using a SlimerJS script we can inject the desired video quality into the local storage of the Firefox browser and update the expiration with this minimal snippet below. Before the URL is loaded the local storage data will be set or replaced:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | page.onInitialized = function() { page.evaluate(function() { var key = 'yt-player-quality'; var qualityJSON = localStorage.getItem(key) || '{}'; var qualityObj = JSON.parse(qualityJSON); // Quality qualityObj.data = 'hd1440'; // Creation/expiration timestamps var d = new Date(); var creation = d.getTime(); var expiration = creation + 1000 * 60 * 60 * 24 * 31; qualityObj.creation = creation; qualityObj.expiration = expiration; // Save localStorage.setItem(key, JSON.stringify(qualityObj)); }); }; |
Here is a screenshot of a 4K Ultra HD video that was downloaded using this method:
Downloading the complete file
The example request we’ve seen so far will actually only download a ~15.5KB chunk of the video file. To get the whole video in one download requires a slight modification to the cURL request: remove &range=0-127214
from the request URL and the whole video file will be downloaded instead.
Merging the audio and video streams
In the latest browsers YouTube uses MPEG-DASH (Dynamic Adaptive Streaming over HTTP) which means the data is downloaded in chunks (we eliminated that problem just above), and the video and audio streams are most likely separate for most videos. Remember to obtain both the video and audio files. They can be recognized by the mime
type parameter in the URL.
- Video – videoplayback?key=yt6&…&mime=video%2Fwebm&…
- Audio – videoplayback?key=yt6&…&mime=audio%2Fwebm&…
One solution is to feed the separate audio and video MP4 or WebM files to avconv
or ffmpeg
– something along the lines of ffmpeg -i video.mp4 -i audio.m4a -c copy combined.mp4
– to combine (mux) them into a single media file.
Discussion
I’ve demonstrated it is possible and straightforward to download a 4K YouTube video (or any available resolution video) with just SlimerJS and a bit of JavaScript. It can all be controlled by PHP, however, the controller is just a wrapper around CLI commands.
Next: Shortly I’ll put these functions together in a complete script with explanations.
Notes: