A Tutorial on Wget


Wget is one of the powerful tools available there to download stuff from internet. You can do a lot of things using wget. Basic use is to download files from internet.

To download a file just type

[bash] wget http://your-url-to/file

But you cannot resume broken downloads.use -c option to start resumable downloads

[bash] wget -c http://your-link-to/file

You can also mask the program as web browser using -U.
This helps when the sites doesn’t allow download managers.

[bash] wget -c -U Mozilla http://your-link-to/file

Download Entire Website

You can download an entire website using -r option.

[bash] wget -r http://your-site.com

But be careful. It downloads the entire website for you. Since this tool can put a large load on servers it obeys robot.txt you can mirror a site on you local drive using -m option.

[bash] wget -m http://your-site.com

You can select the levels up to which you can dig into the site and downloads using -l option.

[bash] wget -r -l3 http://your-site.com

This will download only up to 3 levels. Suppose you want download only sub folders in a website url use –no-parent option. With this option wget downloads only the sub folders and ignores,the parent folders

[bash] wget -r –no-parent http://your-site.com/subfldr/subfolder

Now coming to terrible ideas.. to the hell with webmasters, not allowing to download the website type to ignore the robots.txt.

[bash] wget -r -U Mozilla -erobots=off http://url-to-site/

p.s. masking like a browser is a crime in some countries…. or something like that, i have heard on net.

Fooling the Webmasters

Do you think the web master cannot stop u with above command. to fool him use

[bash] wget -r -U Mozilla -erobots=off -w 5 –limit-rate=20 http://url-to-site/

here -w 5 instructs wget to wait 5 secs before downloading another file and –limit-rate=20 makes wget to cap the download speed to 20KBps. So u can fool the webmaster ….

Download all PDFs

You can download all files of a particular format , like all pdfs listed on a webpage,

[bash] wget -r -l1 -A.pdf –no-parent http://url-to-webpage-with-pdfs/

This is most useful for students. When they find a webpage of a professor with the files they can use this command to download all pdfs or lecture notes.

