What is an API?
An API (Application Programming Interface) is a set of protocols, routines, and tools used for building software applications. It is essentially a set of rules and methods that data analyst or software developers can use to interact with and access the services and features provided by another application, service or platform.
In simpler terms, an API allows different software applications to communicate with each other and share data in a standardized way. With APIs, developers or analysts can get data without needing to scrape the website, manually download it, or directly go to the company from which they need it.
For example, if you want to integrate a weather forecast feature into your app, you can use a weather API that provides the necessary data, rather than building the entire feature from scratch. This allows you to focus on the unique aspects of your app, without worrying about the underlying functionality.
Type of request methods
As shown in Figure 2, the most common type of API request method is GET
.
General rules for using an API
To use an API to extract data, you will need to follow these steps:
Find an API that provides the data you are interested in. This may involve doing some research online to find available APIs.
Familiarize yourself with the API’s documentation to understand how to make requests and what data is available.
Use a programming language to write a script that sends a request to the API and receives the data. Depending on the API, you may need to include authentication parameters in the request to specify the data you want to receive.
Parse the data you receive from the API to extract the information you are interested in. The format of the data will depend on the API you are using, and may be in JSON, XML, or another format.
Process the extracted data in your script, or save it to a file or database for later analysis.
How to fetch API data using a programming language
with R
There are several packages available in R for consuming APIs. Some of the most commonly used packages are:
httr: This package provides convenient functions for making HTTP requests and processing the responses.
jsonlite: This package is used for parsing JSON data, which is a common format for API responses.
RCurl: This package is a wrapper around the libcurl library, which is a powerful and versatile HTTP client library.
To get data from an API in R, you need to follow these steps:
- Install the required packages by running the following command in the R console:
install.packages(c("httr", "jsonlite", "RCurl"))
- Load the packages by running the following command:
library(httr)
library(jsonlite)
library(RCurl)
- Make an API request by using the GET function from the httr package. The API endpoint should be passed as an argument to this function.
<- GET("https://api.example.com/endpoint") response
- Check the status code of the response to see if the request was successful. A status code of
200
indicates a successful request.
<- status_code(response) status_code
- Extract the data from the response. If the API returns data in JSON format, you can use the
fromJSON
function from thejsonlite
package to parse the data. Store the data in a variable for later use.
<- fromJSON(content(response, as = "text")) api_data
These are the basic steps to get data from an API in R. Depending on the API, you may need to pass additional parameters or authentication information in your request. For example,
<- GET(
response "https://api.example.com/endpoint",
authenticate(
user = "API_KEY_HERE",
password = "API_PASSWORD_HERE",
type = "basic"
) )
with Python
To use an API in Python, you can use a library such as requests
or urllib
to send HTTP requests to the API and receive responses. Here’s an example of how to use an API in Python using the requests library:
import requests
# Define the API endpoint URL and parameters
= 'https://api.example.com/data'
endpoint
= {'param1': 'value1', 'param2': 'value2'}
params
# Send a GET request to the API endpoint
= requests.get(endpoint, params = params)
response
# Check if the request was successful
if response.status_code == 200:
# Parse the response JSON data
= response.json()
data
# Process the data, for example by printing it to the console
print(data)
else:
print(f'Error: {response.status_code}')
In this example, we’re using the requests
library to send a GET request to an API endpoint at https://api.example.com/data
, passing two parameters (param1
and param2
) in the request. The requests.get()
method returns a Response
object, which we can use to check the response status code and parse the response data.
If the status code is 200
, we can assume the request was successful, and we can parse the response data using the response.json()
method, which converts the JSON-formatted response to a Python object. We can then process the data as needed, for example by printing it to the console.
Of course, the exact API endpoint and parameters will depend on the specific API that you are using, and you’ll need to consult the API documentation to learn how to construct your request correctly. But this example should give you a sense of the general process involved in using an API in Python.
Practical example in R and Python
We will use R and Python to fetch the API data without and with the key.
Without the key
In this example, we will use an API from a site called that uses an API without an API key. In this case, we will be using a GET
request to fetch the API data at the food banks using this link: https://www.givefood.org.uk/api/2/foodbanks. Please follow the steps in Section 4.1 for R and Section 4.2 for Python.
library(httr)
library(jsonlite)
library(dplyr)
<- GET("https://www.givefood.org.uk/api/2/foodbanks")
response
status_code(response)
#> [1] 200
<- fromJSON(content(response, as = "text"), flatten = TRUE)
food_dataframe
%>%
food_dataframe dim()
#> [1] 919 27
%>%
food_dataframe head()
import requests
import pandas as pd
= requests.get("https://www.givefood.org.uk/api/2/foodbanks")
response
# Check if the request was successful
print(response.status_code)
#> 200
# Parse the response JSON data
= response.json()
food_json
# Convert to a pandas dataframe
= pd.json_normalize(food_json)
food_dataframe
food_dataframe.shape
#> (919, 27)
food_dataframe.head()
#> name ... politics.urls.html
#> 0 Sherborne Food Bank ... https://www.givefood.org.uk/needs/in/constitue...
#> 1 Himmah Food Bank ... https://www.givefood.org.uk/needs/in/constitue...
#> 2 Felix Multibank ... https://www.givefood.org.uk/needs/in/constitue...
#> 3 St Thomas Food Bank ... https://www.givefood.org.uk/needs/in/constitue...
#> 4 North Lochs Food Bank ... https://www.givefood.org.uk/needs/in/constitue...
#>
#> [5 rows x 27 columns]
In this example, we use the pd.json_normalize()
method to flatten the list of dictionaries and create a dataframe from it. The resulting dataframe has columns for each key in the JSON objects.
With the key
In this example, we will use an API from that uses an API key. In this case, we will use a GET
request to fetch data for analyst jobs based in London from the Jobseeker API. Please follow the steps in Section 4.1 for R and Section 4.2 for Python, and sign up for the API key at the Jobseeker website.
library(httr)
library(jsonlite)
library(dplyr)
# Create a GET response to call the API
<- GET(
response "https://www.reed.co.uk/api/1.0/search?keywords=analyst&location=london&distancefromlocation=15",
authenticate(
user = Sys.getenv("putyourapikeyhere"),
password = ""
) )
Replace Sys.getenv("putyourapikeyhere")
with your own API key.
status_code(response)
#> [1] 200
# Convert the JSON string to a dataframe and view data in a table
<- fromJSON(content(response, as = "text"), flatten = TRUE)
job_dataframe
# The job dataframe is inside the results
$results %>%
job_dataframedim()
#> [1] 100 15
$results %>%
job_dataframehead()
import requests
import pandas as pd
# Set API endpoint and API key
= "https://www.reed.co.uk/api/1.0/search?keywords=analyst&location=london&distancefromlocation=15"
url
= "replace with your own API key" api_key
Based on the instructions in the API documentation, you will need to include your API key for all requests in a basic authentication http header as the username, leaving the password empty.
# Send a GET request to the API endpoint
= requests.get(url, auth = (api_key, '')) response
# Check if the request was successful
print(response.status_code)
#> 200
# Parse the response JSON data
= response.json()
job_json
# Convert to a pandas dataframe
# The dataframe is inside the results
= pd.json_normalize(job_json["results"])
job_dataframe
job_dataframe.shape
#> (100, 15)
job_dataframe.head()
#> jobId ... jobUrl
#> 0 53334560 ... https://www.reed.co.uk/jobs/analyst/53334560
#> 1 53119090 ... https://www.reed.co.uk/jobs/finance-analyst/53...
#> 2 53262233 ... https://www.reed.co.uk/jobs/dialler-analyst/53...
#> 3 52724870 ... https://www.reed.co.uk/jobs/catastrophe-analys...
#> 4 53239157 ... https://www.reed.co.uk/jobs/finance-analyst/53...
#>
#> [5 rows x 15 columns]
You can now use the data for your data science.
Other resources
You can also watch Dean Chereden
YouTube video on how to GET data from an API using R in RStudio.
I hope you found this article informative. You can find its GitHub repository here. If you enjoyed reading this write-up, please follow me on Twitter and Linkedin for more updates on R
, Python
, and Excel
for data science.