13 6 / 2013

Scrapy Logo

Hey guys, recently I just published a Guest post on David Walsh’s Blog

Its a getting started tutorial for Scrapy, in this tutorial I build a simple web spider using Scrapy that crawls the iTunes charts and extracts the list of Top free apps.

The complete post is available here

The code for the scrapper in the post is available here

24 5 / 2013

image

Recently, due to loss of hardware on AWS, we lost access to our EC2 instance, and Amazon couldn’t do anything to help us out it. And well since we already keep everything backed up, it wasn’t much of a big deal to spawn a new instance and get back online. Also with that, Markitty had its first down time since the launch.

But it taught me a lesson that we couldn’t really trust AWS infrastructure with our customers data.

We already use Dropbox at Markitty, so I decided to trust it with DB backups. We make use of Postgresql as our main Database on our Django stack.

So, I hacked this python script that would take a regular backup of our main database and upload it to one of our Dropbox folders.

You can fork it here.

It includes 3 files:

db_backup.sh is the shell script that makes use of pg_dump to get the compressed backup of the database,

uploader.py is the Python script that uploads the database to the Dropbox folder, 

client_secrets.json stores the credentials including app_key, app_secret, access_key and access_secret.

You need to provide the DB_Username and DB_Name in db_backup.sh.

Follow these steps to setup the Dropbox app:

1. You will need to create a Dropbox app, to get the App_key and App_Secret. You can create it here (select the App Type as Core and select the Permission type as Full Dropbox)

2. Once the app is successfully created, Dropbox will provide you the app_key and app_secret. Then provide this app_key and app_secret in client_secrets.json (please do not share your App_Key and App_Secret publicly).

3. Then run the uploader.py, it will generate an authentication link which you will need to open in your web browser. Press the Allow button, and hit Enter in the shell.

image

4. It will then print the access_key and the access_secret, that you will need to provide in the client_secrets.json

And you are done with the Dropbox setup. 

After that, you can setup a Cron job that will execute the db_backup.sh everyday and get your Database backup in your Dropbox folder.

The scripts all yours under creative commons license :)

You can download it here.

10 5 / 2013

Featuring in Indian Express

Featuring in Indian Express

25 2 / 2013

image

This was my first real Hackathon experience. Real because it was my first full fledged hackathon — before this I did participate in a few programming events at college’s where we had to create an app in a couple of hours. It was organized by Google Developers Group. As usual it was a 24 hours hackathon.

So here’s what I learned -

1. Prepare in advance

While you are planning to attend a hackathon, you might want to short list a few ideas, days before the event. Brainstorming this with a friend will be even better. It also helps, if you can define the scope of the idea that want to work on. Try to answer a few questions — How big will the project be? Can you build it as a solo developer? How familiar are you with  technologies that you will be using? Are the organizers putting any constraints on which technologies you can or can not use? (Come on, you seriously don’t need to be aware of anything, apart from the language itself! In fact, you can just figure out everything else)

I began thinking of ideas as soon as the hackathon dates were put up. Me and my my brother (also an hacker, and my partner at the hackathon) used to brainstorm, discussing whether or not it was realistic, and did we have enough skills to execute it. Also if the thing we build, would it be of any use to people? Are we solving an real problem? We short-listed a few ideas that answered the above questions positively.

2. Deciding on the idea

One question that you would like to ask yourself while deciding on the idea would be — is it realistic? This is one major decision, because since you’ll have a very limited amount of time with you, with limited resources. The idea actually has to be well planned before the execution really begins. Even though we had thought of a few ideas in advance, it was quite confusing to settle on one when the hackathon started.

At that point the most important thing that you have to consider is — whether or not you can execute it well in the allotted time? Once you make the decision you have to break the execution down into stages, and allocate enough time to each stage. This really helps you with analyzing whether or not, what you’re trying to achieve is doable and helps you keep check on time later.

In the beginning, it seemed to turn out to be quite challenging for us. Since the idea that we decided had much bigger scope, and only with 24 hours in our hands, it seemed too tough but we weren’t going to back down.

3. Have awesome people in your team

Well, the idea that you’ll be working often requires skills. And the better the skills you have, the more efficiently you can execute you’re idea. This is one thing, that depends on how well you can convince some other guy (a developer probably), to work on your idea. The more convincing you are, better the talent you’ll attract.

We were pretty lucky on this one — we had one of the best front end developers (I ve meet yet) Jay Kanakiya, along with Narendra Rajput, a hardcore Ruby developer and myself (a Python Lover). So we weren’t really short of skills at least. I am a Python/Django developer but it won’t be cool to praise myself :-)

3. Get stuff done

image

Hackathons are all about getting stuff done. At the end of the day, what really matters is how well did you execute your idea. And many a times its not easy, because when you get stuck at something (everyone does), and you might not succeed even after spending the precious time figuring it out (it just happens :-(  )

Well this is the prime time, where you are really tested on how well do you make decisions. You have to come up with the plan B (no-one really has a plan B), an alternative. Its all about how do you actually #hack the problem (come with a crazy solution, which before that never existed).

We did face, enough problems, but the way we hacked our way was the fun part.

image

And did I tell you we won the second prize? 

We are told we will be featured, on developers.google.com - so hold your breath!

The app that we built at the hackathon is available in beta here.

We are working hard to launch it for public soon so share your feedback (good and bad).

Do let me know if you are going for a hackathon and need a hand.

Special thanks to Nilesh Bhojani for helping me with the blog post.

You can view the hackathon album here

30 11 / 2012

Recently, while browsing for some Facebook Timeline covers that I wanted for my Facebook Profile. I came across hundreds of covers that I would love to have on my hard-disk (actually, I really dont why I wanted them on my hard-disk). And then, I came across a few websites that allowed directory browsing. So I started saving the images manually. And well those directories had thousands of images, and downloading them manually would suck (being a #hacker you always want everything to be automated). 

So I started hacking a script, that would carry out this task for me. And in just 15 minutes, I cracked it. And had fun, downloading entire webserver directories.

You can use this Python script to download entire directories (if the webserver allows Directory browsing). 

You need to have Python installed on your system, for using it. 

This script also makes use of Beautifulsoupyou can install it, by using the following command:

pip install beautifulsoup4  # if you have pip installed
easy_install BeautifulSoup4 # if you have easy_install

For using the script, you need to pass the directory url as an argument to the script, for Eg. 

For downloading the directory at http://www.namecovers.com/_asset/_thumb/

python downloader.py http://www.namecovers.com/_asset/_thumb/

A screenshot:

Screenshot

You can also fork it on Github.

The code:

import urllib2, sys, os
from bs4 import BeautifulSoup
from urlparse import urlparse

def downloader(urls, grab_url, foldername):
	if not os.path.exists(foldername):
		print "\""+ foldername + "\" does not exist!"
		os.makedirs(foldername)
		print "Creating \"" + foldername + "\"..." 
	for cover in urls:
		try:
			print "Downloading item " + cover + "..."
			print grab_url + cover
			img = urllib2.urlopen(grab_url + cover)
			output = open(foldername + "/" + cover,'wb')
			output.write(img.read())
			output.close()
			print cover + "... downloaded!!"
		except Exception, e:
			pass
	return

def main(url):
	urls = []
	print "Fetching the page..."
	page = urllib2.urlopen(url).read()
	print "Fetching completed!"
	soup = BeautifulSoup(page)
	print "Grabbing the objects of the page..."
	lis = soup.find_all("li")
	for item in lis:
		urls.append(item.a['href'])
	domain = urlparse(url)
	downloader(urls, url, domain.netloc)
	print "All files have been successfully downloaded!"
	print "\tHack by Virendra Rajput, \n\tFollow me on Twitter @bkvirendra\n\tI Blog at http://virendra.me/"
	return

if __name__ == '__main__':
	main(sys.argv[1])