Recently, while browsing for some Facebook Timeline covers that I wanted for my Facebook Profile. I came across hundreds of covers that I would love to have on my hard-disk as my Timeline Cover collection (yeah, I have a Timeline Cover collection).
And then, I came across a few websites that allowed directory browsing. So I started saving the images manually from the
index. And well those directories had thousands of images, and downloading them manually would suck (being a #hacker you always want everything to be automated).
So I started hacking a script, that would carry out this task for me. And in just 15 minutes, I cracked it. And had fun, downloading entire
webserver directories in minutes.
You can use this
Python script to download entire directories (if the webserver has
This script also makes use of
Beautifulsoup, you can install it, by using the following command:
pip install beautifulsoup4 # if you have pip installed easy_install BeautifulSoup4 # if you have easy_install
For using the script, you need to pass the
directory url as a commandline argument to the script, for Eg.
For downloading the directory at http://www.namecovers.com/asset/thumb/
$ python downloader.py http://www.namecovers.com/_asset/_thumb/
import urllib2 import sys import os from bs4 import BeautifulSoup from urlparse import urlparse def downloader(urls, grab_url, foldername): if not os.path.exists(foldername): print "\""+ foldername + "\" does not exist!" os.makedirs(foldername) print "Creating \"" + foldername + "\"..." for cover in urls: try: print "Downloading item " + cover + "..." print grab_url + cover img = urllib2.urlopen(grab_url + cover) output = open(foldername + "/" + cover,''wb'') output.write(img.read()) output.close() print cover + "... downloaded!!" except Exception, e: pass return def main(url): urls =  print "Fetching the page..." page = urllib2.urlopen(url).read() print "Fetching completed!" soup = BeautifulSoup(page) print "Grabbing the objects of the page..." lis = soup.find_all("li") for item in lis: urls.append(item.a["href"]) domain = urlparse(url) downloader(urls, url, domain.netloc) print "All files have been successfully downloaded!" return if __name__ == "__main__": main(sys.argv)
You can also fork it on Github here.