How to import data from CSV to SQL Server in Python ?

Step-by-step coding tutorial on how to import data from a CSV file into a SQL Server table using Python.

In this tutorial, we will learn how to import data from CSV files into SQL Server using only Python scripts. By the end of all the steps, you will know how to connect to SQL Server and read data from flat files using Python.

1. Import the necessary Python libraries

First, we need to import the necessary libraries. We will use pandas to read the data from the CSV file and pyodbc to connect to the SQL Server database and execute SQL commands.

import pandas as pd
import pyodbc

2. Define the path to the CSV source file and create it

Next, we need to define the path to the CSV file that we want to import. In this example, we will assume that the CSV file is located on the same Windows machine as our Python development environment.

csv_file = 'c:\data\employees.csv'

For example, you can use a very simple file like the one below. It was used in the previous tutorial on how to export data into CSV format using Python. The data can look like this, i.e. with 4 columns:

  1. ID
  2. Name
  3. Department
  4. Salary
id,name,department,salary
1,John Smith,Sales,50000
2,Jane Doe,Marketing,60000
3,Bob Johnson,IT,70000
4,Alice Wong,HR,55000

3. Connect to the SQL Server database

Now, we need to connect to the SQL Server database using the pyodbc Python module. We will need to provide the connection details, such as the server name, database name, username, and password. Or in my case I’m using a Windows authentication, like in the script below.

# declare some variables to store the connections details
driver = 'SQL Server'
server = 'localhost'
database = 'Expert-Only'

# connect to the local SQL Server database
connection = pyodbc.connect(f'DRIVER={driver};'
                            f'SERVER={server};'
                            f'DATABASE={database};'
                            f'Trusted_Connection=yes;')

4. Create the SQL Server target table

If the SQL Server table does not exist yet, of course we need to create it. In this example, we will assume that the SQL Server table has the same structure as the CSV file, and we will create it using the following SQL command.

The goal is to avoid errors by checking if the table already exists or not in the database. So we are sure to drop the table and create the proper structure without any errors. This specific code below first attempts to drop the table employees if it exists, and then it creates the table with the specified schema.

drop_table_query = """
DROP TABLE IF EXISTS employees
"""

create_table_query = """
CREATE TABLE employees (
    id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(50),
    hire_date DATE
)
"""

cursor.execute(drop_table_query)
cursor.execute(create_table_query)

5. Insert data from the CSV file into SQL Server

Next, we need to read the data from the CSV file using “pandas”, and then insert it into the SQL Server table using SQL commands. We will assume that the first row of the CSV file contains the column names. Then the last line of code simply commits the changes to the database. It is a best practice to always commit the code in the database and close connections.

df = pd.read_csv(csv_file)
for index, row in df.iterrows():
    insert_query = f"""
    INSERT INTO employees (id, first_name, last_name, email, hire_date)
    VALUES ({row['id']}, '{row['first_name']}', '{row['last_name']}', '{row['email']}', '{row['hire_date']}')
    """
    cursor.execute(insert_query)

# Close the connection
cursor.close()
conn.close()

And that’s it! You should know all the steps now. The next paragraph is about a wrap up script to do all steps at once. Indeed, every step by itself will not work as you need to declare the connection before connecting to the MS SQL table for exemple.

Wrap up all the steps into one functional script

import pandas as pd
import pyodbc

# Define the path to the CSV file
csv_file = 'c:\data\employees.csv'

# import the pyodbc module to manage the odbc conection
import pyodbc

# declare some variables to store the connections details
driver = 'SQL Server'
server = 'localhost'
database = 'Expert-Only'

# connect to the local SQL Server database
connection = pyodbc.connect(f'DRIVER={driver};'
                            f'SERVER={server};'
                            f'DATABASE={database};'
                            f'Trusted_Connection=yes;')

cursor = connection.cursor()

# Create the SQL Server table (if it does not exist)
create_table_query = """
CREATE TABLE employees (
    id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(50),
    hire_date DATE
)
"""

cursor.execute(create_table_query)

# Insert the data from the CSV file into the SQL Server table
df = pd.read_csv(csv_file)
for index, row in df.iterrows():
    insert_query = f"""
    INSERT INTO employees (id, first_name, last_name, email, hire_date)
    VALUES ({row['id']}, '{row['first_name']}', '{row['last_name']}', '{row['email']}', '{row['hire_date']}')
    """
    cursor.execute(insert_query)

# Commit the changes to the database
connection.commit()

# Close the connection
cursor.close()
connection.close()

Conclusion on importing a CSV file into SQL Server using Python

To conclude, it can be a very useful skill to know how to import data from CSV to SQL Server in Python. Especially for many data analysis and reporting tasks. By automating this Python process, save yourself a lot of time and effort compared to doing it manually using Excel. Indeed, Python provides a powerful set of tools for working with databases and data files. By mastering these tools, you can greatly enhance your productivity as a data analyst or scientist.

Remember, when importing data from a CSV file into SQL Server, it is important to make sure that the SQL Server table has the same structure as the CSV file, and that the data types match. It is also a good idea to check the data for errors or inconsistencies before importing it into the database.

In this tutorial, we used pandas library to read the data from the CSV file and pyodbc to connect to SQL Server and execute SQL commands. We first defined the path to the CSV file, connected to the SQL Server database, created the SQL Server table. Then we inserted the data from the CSV file into the SQL Server table, and finally committed the changes to the database. All these steps are very similar to many integration programs.

Be the first to comment

Leave a Reply

Your email address will not be published.


*