Data is a crucial element in any organization, and with the rapid growth of technology, the amount of data being generated is increasing exponentially. XML data is a popular format for representing structured data in a readable and machine-readable way. However, working with XML data can be challenging, especially when dealing with large datasets. In this tutorial, we will learn how to import XML data from an XML document into a SQL Server database using Python.
SQL Server is a popular relational database management system used to store and manage data in organizations. Python is a powerful programming language with libraries that allow us to easily work with XML data. By combining these two technologies, we can effectively manage large datasets in a SQL Server database using Python.
In this tutorial, we will cover step-by-step how to import XML data from an XML document into a SQL Server database using Python. We will use the pyodbc library to connect to the SQL Server database, the ElementTree module to parse the XML data, and SQL statements to insert the data into the database.
By the end of this tutorial, you will have a clear understanding of how to import XML data with a simple schema into a SQL Server XML column using Python. Allowing you to effectively manage large datasets and perform complex queries on the data.
Table of Contents
1. Prepare the XML document to import
Let’s say we need to import an XML document into a SQL Server table using only Python code. Or at least, calling the SQL code directly from the Python scripts. Let’s consider this XML document, it corresponds to the employees table used as an example previously. We used it in the other tutorial on how to export an XML column into an XML documents in Python.
<?xml version="1.0" encoding="UTF-8"?> <employees> <employee> <id>1</id> <name>John Smith</name> <department>Sales</department> <salary>50000</salary> </employee> <employee> <id>2</id> <name>Jane Doe</name> <department>Marketing</department> <salary>60000</salary> </employee> <employee> <id>3</id> <name>Bob Johnson</name> <department>IT</department> <salary>70000</salary> </employee> <employee> <id>4</id> <name>Alice Wong</name> <department>HR</department> <salary>55000</salary> </employee> </employees>
Copy, paste and save the document into an XML file, called employees.xml. In this tutorial, the file is stored in this path C:\data\employees.xml. To do this I use Notepad++.
2. Create the MS SQL table with an XML type column
In order to import the XML data, you need the same XML schema structure as a target. To prepare the MS SQL table with an XML type column, we can use the same dbo.employees_xml table structure used in the previous tutorial. Here’s the SQL Server code how to create the table using the Employee schema. For this step, I recommend connecting to the database instance using SSMS and execute the code to create the table with the XML column using the schema below.
CREATE XML SCHEMA COLLECTION EmployeeSchema AS N' <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="Employee"> <xsd:complexType> <xsd:sequence> <xsd:element name="ID" type="xsd:integer"/> <xsd:element name="Name" type="xsd:string"/> <xsd:element name="Department" type="xsd:string"/> <xsd:element name="Salary" type="xsd:integer"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>'; GO CREATE TABLE dbo.employees_xml ( id INT PRIMARY KEY, employee_data xml(EmployeeSchema) );
This creates an employees_xml table with an ID column and a column named employee_data of type XML that uses the EmployeeSchema schema. The table is now ready to store XML data. This tutorial somehow explains the exact opposite of this one, using the same structure and database objects:
3. Import the 2 necessary Python libraries
Indeed, we need 2 libraries, the pyodbc and the Elementree one. This script below imports the pyodbc library for connecting to the SQL Server database and the ElementTree module for creating and writing the XML document.
import pyodbc import xml.etree.ElementTree as ET
4. Set up the connection to the SQL Server database
This sets up the connection parameters and establishes a connection to the SQL Server database. We use here a Windows account to connect to the local database. Simply adjust the code to your technical constraints.
# Set up the database connection # declare some variables to store the connections details driver = 'SQL Server' server = 'localhost' database = 'Expert-Only' # connect to the local SQL Server database connection = pyodbc.connect(f'DRIVER={driver};' f'SERVER={server};' f'DATABASE={database};' f'Trusted_Connection=yes;') cursor = connection.cursor()
5. Open the XML file and get the root element
# Open the XML file tree = ET.parse('C:\data\employees.xml') root = tree.getroot()
These lines use the xml.etree.ElementTree
library to open an existing XML file employees.xml
. The parse()
method creates an ElementTree
object tree
, and getroot()
returns the root element of the XML tree as an Element
object root
.
If you are working with SSIS, it is also possible to performs XML documents import using Integration Services packages.
6. Loop through XML elements and insert data into SQL Server
The lines of code below loop through each employee
element in the XML file using the findall()
method of the root element, called root
. For each employee
element, it retrieves the values of the id
, name
, department
, and salary
elements using the find()
method, which returns the first matching element.
Then the Python script it executes an SQL query to insert the data into the dbo.employees table using the execute()
method of the cursor object. Finally, it commits the changes to the database using the commit()
method, this time from the connection object.
# Loop through the XML elements and insert the data into the SQL Server table for employee in root.findall('employee'): id = employee.find('id').text name = employee.find('name').text department = employee.find('department').text salary = employee.find('salary').text # Build the XML string xml_string = "<Employee><ID>{0}</ID><Name>{1}</Name><Department>{2}</Department><Salary>{3}</Salary></Employee>".format(id, name, department, salary) # Insert the XML data into the SQL Server table cursor.execute("INSERT INTO dbo.employees_xml (id, employee_data) VALUES (?, ?)", (id, xml_string)) connection.commit()
7. Close the database connection to SQL Server
cursor.close() connection.close()
This closes the cursor and connection objects to the SQL Server database.
Overall, this code retrieves the XML data from a SQL Server table, parses the XML data using ElementTree, creates a new XML document with the parsed data, and writes the XML data to a file. In the next section of the tutorial, we wrap up all the steps together to perform all actions in one functional script.
8. Complete Python example to import the XML file into MS SQL
As a wrap up, the full code performs this 4 actions:
- Establishes a connection to a local SQL Server database using the pyodbc library and declares variables for the connection details.
- The XML file is opened using the ElementTree module in Python and the XML elements are looped through to extract the data.
- The extracted data is then inserted into the dbo.employees table in the SQL Server database using a SQL query and the pyodbc library.
- Finally, the database connection is closed.
import pyodbc import xml.etree.ElementTree as ET # Set up the database connection # declare some variables to store the connections details driver = 'SQL Server' server = 'localhost' database = 'Expert-Only' # connect to the local SQL Server database connection = pyodbc.connect(f'DRIVER={driver};' f'SERVER={server};' f'DATABASE={database};' f'Trusted_Connection=yes;') cursor = connection.cursor() # Open the XML file tree = ET.parse('C:\data\employees.xml') root = tree.getroot() # Loop through the XML elements and insert the data into the SQL Server table for employee in root.findall('employee'): id = employee.find('id').text name = employee.find('name').text department = employee.find('department').text salary = employee.find('salary').text # Build the XML string xml_string = "<Employee><ID>{0}</ID><Name>{1}</Name><Department>{2}</Department><Salary>{3}</Salary></Employee>".format(id, name, department, salary) # Insert the XML data into the SQL Server table cursor.execute("INSERT INTO dbo.employees_xml (id, employee_data) VALUES (?, ?)", (id, xml_string)) connection.commit() # Close the database connection cursor.close() connection.close()
After the execution of the code, to check the result, execute a select query in SSMS to display the XML data from the dbo.employees_xml table.
select * from dbo.employees_xml
9. Conclusion on importing XML data into a SQL Server column
Importing XML data from SQL Server into Python is a powerful capability that can enhance your programming projects in many ways. With the help of libraries such as pyodbc and ElementTree, Python provides an easy and efficient way of handling XML data. By following the steps outlined in this tutorial, you can import SQL Server XML column data into your Python project, opening up new possibilities for data manipulation and analysis.
Whether you’re working on a data analytics project, building a web application, or developing a machine learning algorithm, the ability to seamlessly import XML data from SQL Server into Python can be a game-changer. By leveraging the powerful tools and libraries available in Python, you can streamline your development process and achieve your goals more efficiently.
So why not give it a try? With the help of this tutorial and a little bit of practice, you’ll be able to import SQL Server XML data into your Python projects in no time!
Very often, in data management projects involving large number of files, including Python projects. It is a good practice to zip files to reduce the size, to secure them using a password or simply to organize and archive them in a different folder. The tutorial below shows some Python scripts to zip and unzip files.
Be the first to comment