How do I completely download a web page, while preserving its functionality? [duplicate]

I just want to save the website and be able to use it offline. I noticed something interesting, however. When I'm in Safari and I go offline, the webpage performs fine. This undoubtedly means that the webpage can run offline with no problem – I just need a way to save it properly. I suppose I could create a virtual machine, load up the site on it and then save it as a snapshot and use it whenever I want to offline, but that seems like quite a disproportionate solution for such a seemingly simple problem.

On a side note: would it be possible to save a webpage like this (iPhone 6S page) with all of the scrolling animations, embedded pictures and videos and all the rest? I've only tried creating a Web Archive using Safari, but it only saved the nice scrolling animation – not the embedded pictures and such.

Skeleton Bow asked Apr 24, 2016 at 18:21 Skeleton Bow Skeleton Bow 499 2 2 gold badges 8 8 silver badges 20 20 bronze badges

Its near impossible due to all the code that runs on any given page, code that pulls images and resources from hundreds of other locations not on their web server. I Use Chrome and save as (single file) MHTML, but does not always get everything but seems to be the best for me.

Commented Apr 24, 2016 at 18:35

you could try wget from a command prompt. It will download whole websites but you can tune it to download only what you want.

Commented Apr 24, 2016 at 18:40

The problem that ultimately cannot be addressed, is code running on the server. Most web application server runtimes execute code on HTTP GET and POST, and that code is never transmitted to the client; only its output. What you want to do is only possible if the site is written to execute entirely client-side (usually via javascript), and consumes no external data.

Commented Apr 24, 2016 at 18:42

@FrankThomas thanks for the insight. The only thing is, because the website is able to run perfectly when I disconnect my Internet, shouldn't it be entirely possible to run it without an Internet connection and save it? That's what I keep on thinking.

Commented Apr 24, 2016 at 18:51 @sdjuan unfortunately wget doesn't seem to work. Does it work for you? Commented Apr 24, 2016 at 18:57

4 Answers 4

It's not possible to do this with many websites these days. And for sites that seem like it's possible, it would still require some Javascript experience for reverse-engineering and "fixing" the scripts that are saved to your computer. There is no single method that works for all websites, you have to work through each unique problem for every site you try to save.

A lot of websites are no longer just static files that are sent from the server to your computer. They have become 2-way interactive applications, where the web browser is running code that continuously interacts with the web server from the same page.

When you load a website in a browser, you are seeing the "front end" of the entire system that makes up the website. This "front end" (including the HTML, Images, CSS, and Javascript) can even be dynamically generated by code on their end! Which means there is code executing on the server side that is not sent to your web browser, and that code may be critical to supporting the code that is sent to your web browser.

There is simply no way to "download" that server-side code, which is why many websites don't work properly when you save them.

The most common problem causing things to break is that websites use javascript to load content after the initial page response is sent to your browser. The HostMath site you are trying to save offline definitely uses a back-end to retrieve javascript files that are critical to the site's functionality. In Firefox I get this error for several different javascript files when I try to open the site locally:

Loading failed for the with source “file:///D:/Home/Downloads/hostmath /HostMath%20-%20Online%20LaTeX%20formula%20editor%20and%20browser- based%20math%20equation%20editor_files/extensions/asciimath2jax.js?rev=2.6.0” 

See that ?rev=2.6.0 after the filename? That is a parameter that is passed to the back-end (webserver) to determine which asciimath2jax.js file should be sent to your web browser. My D: drive isn't a web server, so when Firefox is trying to load a file with a URL parameter, it fails.

You could try downloading the file from HostMath manually and save it in the right location without the ?rev=2.6.0 though. Then you would need to change the site's scripts and HTML to load the file from your drive without a URL parameter. This would have to be done for all of those scripts that failed to load.

You will hit a dead-end if there is any Javascript that makes requests to a web service (an API) on the host website though. This would be done to off-load computation for something that the site doesn't compute locally in the web browser, which means the back-end is essential to running the front-end.