Exploring HTTP Response Headers with Python's requests Library

Lesson 3

Introduction

Hello and welcome to this new lesson! Today, we will build upon your existing knowledge of Python's requests library and learn more about the responses we get when making HTTP requests. Specifically, we'll be inspecting response headers.

Introduction to HTTP Response Headers

Whenever you make an HTTP request, the server does not only send back the requested content, but also some metadata related to that content. This metadata is conveyed through HTTP headers, which come as key-value pairs.

In simpler terms, if HTTP were a mailing system, headers would be similar to the information you find on the outside of the mail envelope - who it's from, where it's going, the date it was sent, and so on. HTTP headers consist of information like what type of content it's sending, how to decode it, when was the last time it was modified, and more!

Accessing Response Headers in Python using requests

Let's use Python's requests library to see some of these headers in action, using the solution code as an example.

Python
1url = 'http://quotes.toscrape.com'  # We'll scrape quotes from this webpage
2response = requests.get(url)

First, we make an HTTP GET request to our target URL, which gives us a Response object. One of the properties of this object is headers, which is a dictionary-like object of all response headers.

We can print these headers like this:

Python
1if response.ok:
2    print("Response headers:")
3    for header, value in response.headers.items():
4        print(f'{header}: {value}')

This code prints each header along with its corresponding value. Let's run this and see what we get!

The output of the above code will be:

Plain text
1Response headers:
2Date: Tue, 07 May 2024 18:28:19 GMT
3Content-Type: text/html; charset=utf-8
4Content-Length: 11054
5Connection: keep-alive

This output summarises key information from the response headers, including when the response was sent (Date), what the content type is (Content-Type), how big the content is in bytes (Content-Length), and the connection status (Connection). Such details are crucial for understanding how to handle the received data in web scraping or API calls.

Understanding Key HTTP Response Headers

When you run the previous code, you'll probably come across many headers. Here are a few important ones which come up frequently:

Server: The software used by the originating server.
Date: The date and time when the message was sent.
Content-Type: The MIME type of the returned content. This could be text/html, application/json, image/jpeg, and so on. This tells the client what the content is and how to open it.
Content-Length: The size, in bytes, of the returned content.
Connection: Options desired by the client for the connection.

These headers provide additional insights into the server and the response content, and they can be quite useful in some cases!

Applying Response Headers in Web Scraping

Now, why is all this important for web scraping? Let's dig a bit deeper.

As a web scraper, your main goal is to extract useful data from web pages. However, scraping is not just about making requests and parsing HTML. You also need to ensure that your scraper behaves well and follows the rules set by the server. The server's responses, including headers, are a critical source of feedback for your scraper, containing valuable information about what the server allows or expects you to do.

For instance, an important header in web scraping is Content-Type, which can help you determine the format of the returned content. If the Content-Type is application/json, you can use response.json() to parse the content as a JSON object. Knowing this can greatly shape how your web scraping code is structured.

Summary

Well, our journey for this lesson stops here. We learned about HTTP response headers and how to inspect them using Python's requests library. Keep practicing and experimenting with different websites to further strengthen your understanding of this important aspect of HTTP!

Happy coding!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.