What is AN API?

In simple words, an API is a (hypothetical) contract between 2 software saying if the user software provides input in a pre-defined format, the later with extend its functionality and provide the outcome to the user software. Think of it like this, Graphical user interface (GUI) or command line interface (CLI) allows humans to Interact with code, where as an Application programmable interface (API) allows one piece of code to interact with other code.

Why Should You Care?

As a Data Scientist, one of the eternal questions is how to get more data and how to give more context to the data you already have. Where your own data runs short, there are plenty of free or paid data sources available to fill the gap. These will be more often than not available in a form of a web service through an API. APIs therefor give us, as Data Scientists, power to find new context for the information that we already have.
 

Calling APIs From Python

In Python, you have two main options to call an API. Many APIs (especially the paid ones) are maintaining a special SDK for calling their API from Python. If this is not the case, you can assemble the http requests yourself. Popular choice in this case is the requests library. We will go quickly through both options:

1. SDKs

The first you should do, if interested in using a particular API, is to check if dedicated library exists for Python. If this is a popular service, there is a good chance it will. Let's take an example of Eventbrite, the online service dealing with different kinds of real-life events (concerts, meetups and such). Eventbrite has a dedicated Python library, which you can easily install with pip:

pip install eventbrite

The way to call the particular features of the API will vary with different services, but you can always find all the important details in the documentation. For the Eventbrite, we will call the API as follows:

>>> from eventbrite import Eventbrite
>>> eventbrite = Eventbrite('my-oauth-token')
>>> user = eventbrite.get_user()  # Not passing an argument returns yourself
>>> user['id']
1234567890
>>> user['name']
Daniel Roy Greenfeld

The format of the response, just like the request itself will be just like specified in the documentation details.

2. Requests Library

If the API of your choice does not have a dedicated library in Python, you can use requests to create the http requests yourself. The library can be installed with pip just like:

pip install requests

The library supports all major http keywords GET, POST, PUT, DELETE and others. It also provides interface for supplying your query parameters, headers and data. You can find out more on how to work with http requests in our online course.

 

5 APIs For Every Data Scientist

 

1. Google Maps API

For geographical data, this is the best source there is. Google Maps API provides an interface to access all the features of the Google Maps as we know it. It comes with its own Python library and enables you to do things like geocoding, reverse geocoding, searching for places etc.

 
Google_Maps_logo_wordmark.png
 

2. AYLIEN

AYLIEN gives you a chance to run all your Text Analytics through an API. There is a service to extract text, entity recognition, text classification, sentiment analysis and they are adding features all the time. It also comes with its own Python library.

 
download.png
 

3. DarkSky API

One of the most affordable APIs for weather. You would be surprised how many use cases there are, for knowing the weather in a particular place at a particular time. DarkSky is not maintaining their official Python SDK, but there is an unofficial library you can use.

 
 

4. Twitter API

Twitter is a great source of data. From figuring out a public opinion on a particular issue, to disaster tracking, it is a fastest way to gather public opinion about things. Twitter has an open API with access to loads of tweets and you can go wild with it. There are a lot of libraries for accessing Twitter from Python but Tweepy is by far the most popular option.

 
 

5. Fullcontact

Fullcontact is probably the best data source out there to gather more information about your contacts. You can search by email, email domain, social networks etc. to find out great amount of additional info including social profiles, company information and even interests and communities. Fullcontact is maintaining an official Python library for their service.