How to import an XML document into SQL Server using Python?

Data is a crucial element in any organization, and with the rapid growth of technology, the amount of data being generated is increasing exponentially. XML data is a popular format for representing structured data in a readable and machine-readable way. However, working with XML data can be challenging, especially when dealing with large datasets. In this tutorial, we will learn how to import XML data from an XML document into a SQL Server database using Python.

SQL Server is a popular relational database management system used to store and manage data in organizations. Python is a powerful programming language with libraries that allow us to easily work with XML data. By combining these two technologies, we can effectively manage large datasets in a SQL Server database using Python.

In this tutorial, we will cover step-by-step how to import XML data from an XML document into a SQL Server database using Python. We will use the pyodbc library to connect to the SQL Server database, the ElementTree module to parse the XML data, and SQL statements to insert the data into the database.

By the end of this tutorial, you will have a clear understanding of how to import XML data with a simple schema into a SQL Server XML column using Python. Allowing you to effectively manage large datasets and perform complex queries on the data.

1. Prepare the XML document to import

Let’s say we need to import an XML document into a SQL Server table using only Python code. Or at least, calling the SQL code directly from the Python scripts. Let’s consider this XML document, it corresponds to the employees table used as an example previously. We used it in the other tutorial on how to export an XML column into an XML documents in Python.

<?xml version="1.0" encoding="UTF-8"?>
<employees>
  <employee>
    <id>1</id>
    <name>John Smith</name>
    <department>Sales</department>
    <salary>50000</salary>
  </employee>
  <employee>
    <id>2</id>
    <name>Jane Doe</name>
    <department>Marketing</department>
    <salary>60000</salary>
  </employee>
  <employee>
    <id>3</id>
    <name>Bob Johnson</name>
    <department>IT</department>
    <salary>70000</salary>
  </employee>
  <employee>
    <id>4</id>
    <name>Alice Wong</name>
    <department>HR</department>
    <salary>55000</salary>
  </employee>
</employees>

Copy, paste and save the document into an XML file, called employees.xml. In this tutorial, the file is stored in this path C:\data\employees.xml. To do this I use Notepad++.

2. Create the MS SQL table with an XML type column

In order to import the XML data, you need the same XML schema structure as a target. To prepare the MS SQL table with an XML type column, we can use the same dbo.employees_xml table structure used in the previous tutorial. Here’s the SQL Server code how to create the table using the Employee schema. For this step, I recommend connecting to the database instance using SSMS and execute the code to create the table with the XML column using the schema below.

CREATE XML SCHEMA COLLECTION EmployeeSchema AS N'
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:element name="Employee">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="ID" type="xsd:integer"/>
        <xsd:element name="Name" type="xsd:string"/>
        <xsd:element name="Department" type="xsd:string"/>
        <xsd:element name="Salary" type="xsd:integer"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>';
GO

CREATE TABLE dbo.employees_xml (
   id INT PRIMARY KEY,
   employee_data xml(EmployeeSchema)
);

This creates an employees_xml table with an ID column and a column named employee_data of type XML that uses the EmployeeSchema schema. The table is now ready to store XML data. This tutorial somehow explains the exact opposite of this one, using the same structure and database objects:

3. Import the 2 necessary Python libraries

Indeed, we need 2 libraries, the pyodbc and the Elementree one. This script below imports the pyodbc library for connecting to the SQL Server database and the ElementTree module for creating and writing the XML document.

import pyodbc
import xml.etree.ElementTree as ET

4. Set up the connection to the SQL Server database

This sets up the connection parameters and establishes a connection to the SQL Server database. We use here a Windows account to connect to the local database. Simply adjust the code to your technical constraints.

# Set up the database connection
# declare some variables to store the connections details
driver = 'SQL Server'
server = 'localhost'
database = 'Expert-Only'
# connect to the local SQL Server database
connection = pyodbc.connect(f'DRIVER={driver};'
                            f'SERVER={server};'
                            f'DATABASE={database};'
                            f'Trusted_Connection=yes;')
cursor = connection.cursor()

5. Open the XML file and get the root element

# Open the XML file
tree = ET.parse('C:\data\employees.xml')
root = tree.getroot()

These lines use the xml.etree.ElementTree library to open an existing XML file employees.xml. The parse() method creates an ElementTree object tree, and getroot() returns the root element of the XML tree as an Element object root.

If you are working with SSIS, it is also possible to performs XML documents import using Integration Services packages.

6. Loop through XML elements and insert data into SQL Server

The lines of code below loop through each employee element in the XML file using the findall() method of the root element, called root. For each employee element, it retrieves the values of the id, name, department, and salary elements using the find() method, which returns the first matching element.

Then the Python script it executes an SQL query to insert the data into the dbo.employees table using the execute() method of the cursor object. Finally, it commits the changes to the database using the commit() method, this time from the connection object.

# Loop through the XML elements and insert the data into the SQL Server table
for employee in root.findall('employee'):
    id = employee.find('id').text
    name = employee.find('name').text
    department = employee.find('department').text
    salary = employee.find('salary').text

    # Build the XML string
    xml_string = "<Employee><ID>{0}</ID><Name>{1}</Name><Department>{2}</Department><Salary>{3}</Salary></Employee>".format(id, name, department, salary)

    # Insert the XML data into the SQL Server table
    cursor.execute("INSERT INTO dbo.employees_xml (id, employee_data) VALUES (?, ?)", (id, xml_string))

connection.commit()

7. Close the database connection to SQL Server

cursor.close()
connection.close()

This closes the cursor and connection objects to the SQL Server database.

Overall, this code retrieves the XML data from a SQL Server table, parses the XML data using ElementTree, creates a new XML document with the parsed data, and writes the XML data to a file. In the next section of the tutorial, we wrap up all the steps together to perform all actions in one functional script.

8. Complete Python example to import the XML file into MS SQL

As a wrap up, the full code performs this 4 actions:

  1. Establishes a connection to a local SQL Server database using the pyodbc library and declares variables for the connection details.
  2. The XML file is opened using the ElementTree module in Python and the XML elements are looped through to extract the data.
  3. The extracted data is then inserted into the dbo.employees table in the SQL Server database using a SQL query and the pyodbc library.
  4. Finally, the database connection is closed.
import pyodbc
import xml.etree.ElementTree as ET

# Set up the database connection
# declare some variables to store the connections details
driver = 'SQL Server'
server = 'localhost'
database = 'Expert-Only'
# connect to the local SQL Server database
connection = pyodbc.connect(f'DRIVER={driver};'
                            f'SERVER={server};'
                            f'DATABASE={database};'
                            f'Trusted_Connection=yes;')
cursor = connection.cursor()

# Open the XML file
tree = ET.parse('C:\data\employees.xml')
root = tree.getroot()

# Loop through the XML elements and insert the data into the SQL Server table
for employee in root.findall('employee'):
    id = employee.find('id').text
    name = employee.find('name').text
    department = employee.find('department').text
    salary = employee.find('salary').text

    # Build the XML string
    xml_string = "<Employee><ID>{0}</ID><Name>{1}</Name><Department>{2}</Department><Salary>{3}</Salary></Employee>".format(id, name, department, salary)

    # Insert the XML data into the SQL Server table
    cursor.execute("INSERT INTO dbo.employees_xml (id, employee_data) VALUES (?, ?)", (id, xml_string))

connection.commit()

# Close the database connection
cursor.close()
connection.close()

After the execution of the code, to check the result, execute a select query in SSMS to display the XML data from the dbo.employees_xml table.

select *
from dbo.employees_xml
Select query in SSMS after the import of an XML document into SQL Server using Python.
Select query in SSMS after the import of an XML document into SQL Server using Python.

9. Conclusion on importing XML data into a SQL Server column

Importing XML data from SQL Server into Python is a powerful capability that can enhance your programming projects in many ways. With the help of libraries such as pyodbc and ElementTree, Python provides an easy and efficient way of handling XML data. By following the steps outlined in this tutorial, you can import SQL Server XML column data into your Python project, opening up new possibilities for data manipulation and analysis.

Whether you’re working on a data analytics project, building a web application, or developing a machine learning algorithm, the ability to seamlessly import XML data from SQL Server into Python can be a game-changer. By leveraging the powerful tools and libraries available in Python, you can streamline your development process and achieve your goals more efficiently.

So why not give it a try? With the help of this tutorial and a little bit of practice, you’ll be able to import SQL Server XML data into your Python projects in no time!

Very often, in data management projects involving large number of files, including Python projects. It is a good practice to zip files to reduce the size, to secure them using a password or simply to organize and archive them in a different folder. The tutorial below shows some Python scripts to zip and unzip files.

Be the first to comment

Leave a Reply

Your email address will not be published.


*