How to zip and unzip files with the Python zipfile module ?

Tutorial on how to zip and unzip files using Python scripts and the zipfile, shutil and glob librairies?

If you’re looking for an easy and efficient way to zip and unzip files with Python zipfile module, then you’ve come to the right place. In this tutorial, we’re going to show you how to zip and unzip files in Python. Whether you’re working with large files or need to compress your data, we’ve got you covered. So let’s dive in this Python tutorial.

Zip and unzip files in Python is a common task, and one that can be easily accomplished using built-in libraries. In this tutorial, we’re going to cover how to zip and unzip files using Python, as well as read, write, and extract data from zip files.

1. Three Python modules are available to zip and unzip files

To zip and unzip files in Python, you need to import a few libraries. These modules or libraries are built-in to Python and easy to use. The most commonly used modules to compress and uncompress files into archives are the following :

  1. zipfile
  2. shutil
  3. glob

For all the examples of code in this tutorial, let’s consider this Windows folders and files having this structure:

  • data
    • source
      • Customers_Data_1.csv
      • Customers_Data_2.csv
      • Customers_Data_3.csv
    • Customers_Data_4.csv
Source files to zip using the zipfile Python modules
Source files to zip using the zipfile Python modules

1.1 The zipfile module in Python

The zipfile Python module is a library that is specifically used for working with zip files. It allows you to read and write to zip files, as well as extract files from them. The zipfile.ZipFile class is the main class used for working with zip files in this library.

Here’s a first example of how to extract a file from a zip archive, with the zipfile library. This code opens the files.zip file, reads it, and extracts the Customers_Data_1.csv file from the archive.

import zipfile
with zipfile.ZipFile('C:\\data\\files.zip', 'r') as archive:
    archive.extract('Customers_Data_1.csv','C:\\data\\')
Unzip an archive using the Python zipfile module
Unzip an archive using the Python zipfile module

1.2 The shutil module in Python

shutil is a library that provides a higher level interface for working with files and directories. It provides methods for copying, moving, and archiving files and directories. In this case, the shutil.unpack_archive() method can be used to extract the archive. Here’s an example of how to extract a file from a zip archive using the shutil library. This code extract the archive files.zip to the C:\data\ directory.

import shutil
shutil.unpack_archive('files.zip', 'C:\\data\\')
Extract all files from archive with the shutil module in Pyhton
Extract all files from archive using the shutil module in Python

1.3 The glob Python module

glob is a library that allows you to find all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tangle with the archive itself, but it can be useful to get all the files in a directory with a specific pattern. Here’s an example of how to extract a file from a zip archive using the glob library.

import glob
import zipfile

for file in glob.glob('C:\\data\\files.zip'):
    with zipfile.ZipFile(file, 'r') as archive:
        archive.extract('Customers_Data_2.csv','C:\\data\\')
Extract and overwrite only the second file named Customers_Data_2.csv using the glob
Extract and overwrite only the second file named Customers_Data_2.csv using the glob

2. How to read the content of zip archives using Python?

Reading the contents of a zip file is easy in Python. You can use the zipfile.ZipFile class to open a zip file and read its contents. Here’s an example of how to read the contents of a zip file in Python. This sample script will simply print the contents of the files.zip file.

import zipfile
with zipfile.ZipFile('C:\\data\\files.zip', 'r') as archive:
    archive.printdir()

To manage files in a system, like Linux or Windows, Python offers many other modules and built-in functions.

3. Read metadata from zip files in Python scripts

You can also read metadata from zip files in Python. You can use the zipfile.ZipInfo class to get information about the zip file, such as the file name, date, and size. Here’s an example of how to read metadata from a zip file in Python. This will print the following information’s:

  • Filename
  • Last modified date with Year, Month, Day, Hour, Minutes and Seconds
  • And the size of each file in bytes.
import zipfile
with zipfile.ZipFile('C:\\data\\files.zip', 'r') as archive:
    for info in archive.infolist():
        print(info.filename, info.date_time, info.file_size)
Read the ZIP archive metadata with the Python infolist function
Read the ZIP archive metadata with the Python infolist function

4. Python script to open zip files and read and write data

You can also open zip files in Python to read and write data. You can use the zipfile.ZipFile class to open a zip file, and then use the write() and extract() methods to add or extract files. Let’s say we want to add the file called Customers_Data_4.csv to the previous files.zip archive. This example below will add the file to zip archive. We use here the append mode to open the file.

import zipfile
with zipfile.ZipFile('C:\\data\\files.zip', 'a') as archive:
    archive.write('C:\\data\\Customers_Data_4.csv','Customers_Data_4.csv')

5. Open zip archives in write mode using Python

You can also open zip files in Python to write data. You can use the zipfile.ZipFile class to create a new zip file, and then use the write() method to add files to it. Here’s an example of how to create a new zip file in Python and add a file to it. This will create a new zip file called new_files.zip and add Customers_Data_1.csv to it.

import zipfile
with zipfile.ZipFile('C:\\data\\new_files.zip', 'w') as archive:
    archive.write('C:\\data\\Customers_Data_1.csv','Customers_Data_1.csv')

Let’s break down the Python code above:

  1. Import the zipfile module
  2. Create a new ZIP archive in write mode, called new_files.zip
  3. Add to the archive the CSV file called Customers_Data_1.csv

Note that you have to use the double escape characters \\ when using Windows files and folders, in order to escape the escape character itself.

6. Create zip files using the zipfile Python module

You can also create zip archives in Python using the shutil library. The shutil.make_archive() function can be used to create a zip archive from a directory. Here’s an example of how to create a zip archive from a directory in Python. The goal here is to create a zip archive with this path C:\data\files.zip from the content of the C:\data\source folder, and after that, display the content from the zip itself.

import shutil
import zipfile

shutil.make_archive('C:\\data\\files', 'zip', 'C:\\data\\source')

with zipfile.ZipFile('C:\\data\\files.zip', 'r') as archive:
    archive.printdir()
Zip content of the source folder and display content from zip archive in Python
Zip content of the source folder and display content from zip archive in Python

As you may know, zipping files is a common task in data management projects, and especially with the data integration and ETL projects. For example, it is possible to zip files and folders using SSIS and 7Zip.

7. Read files from zip archives in Python

When you need to zip and unzip files with the Python zipfile module, it is very convenient to also read from files directly in zip archives. You can use the zipfile.ZipFile class to open a zip file, and then use the extract() method to extract a file.

Now let’s try together to do the opposite operation, i.e., read the same files from the 2 archives we created in the previous steps, called files.zip and new_files.zip. To perform this extraction, copy and paste the Python script below.

# read from the first ZIP archive
print('--- FIRST FILE ---')
import zipfile
with zipfile.ZipFile('C:\\data\\files.zip', 'r') as archive:
    archive.printdir()


# read from the second ZIP archive
print('--- SECOND FILE ---')
import zipfile
with zipfile.ZipFile('C:\\data\\new_files.zip', 'r') as archive:
    archive.printdir()
Read files and folders from a ZIP archive using the zipfile Python module
Read files and folders from a ZIP archive using the zipfile Python module

8. Uncompress one file from a zip archive

You can also extract one file from a zip archive in Python. You can use the zipfile.ZipFile class to open a zip file, and then use the extract() method to extract a specific file from the archive. To build a script to extract a given file from a zip archive in Python, use this script and adapt it to your current settings.

In the example of script below, again we use the explicit path to extract the file. If the path is not provided, the export path is the default Python path.

import zipfile
with zipfile.ZipFile('C:\\data\\new_files.zip', 'r') as archive:
    archive.extract('Customers_Data_1.csv','C:\\data\\')

9. Unzip multiple files with Python and zipfile

Unzipping multiple files at once in Python is also easy. Simple use the shutil library and its unpack_archive() method to unzip the content of a file. This will unzip the files.zip file to the C:\data directory.

import shutil
shutil.unpack_archive('C:\\data\\files.zip', 'C:\\data\\')

10. How to close a zip archive in Python?

It is important to close a zip archive once you are done with it, this can be done by using the close() method from the zipfile.ZipFile class.

import zipfile
with zipfile.ZipFile('C:\\data\\files.zip', 'r') as archive:
    #scripts to read or write to manipulate the archive content
    archive.close()

Conclusion on scripts to zip and unzip files with Python

In conclusion, it is relatively easy to zip and unzip files with Python zipfile module. Indeed, with the built-in functions from the modules available, it is a straightforward task. You can use built-in libraries such as zipfile, shutil, and glob to easily read, write, and extract data from zip archives.

With the multiple examples provided in this tutorial, you should now be able to compress and uncompress files in Python with confidence. The next step after zipping and unzipping files might be to read and write the content of the files itself.

Be the first to comment

Leave a Reply

Your email address will not be published.


*