The Objective of this article is to understand how to perform data manipulation on pandas DataFrames using SQL with pandasql library.
As per documentation pandasql allows us to query pandas DataFrames using SQL syntax.
Installation of pandasql
Library can be installed using below two methods, both of them uses PIP installation:
Use Case
The main function used in pandasql is sqldf. sqldf accepts two arguments:
- SQL query
- Session environment variables ( globals() and locals() )
session environment variables is optional and handled by python itself, even if we do not provide the session variables, we still can achieve the goal of converting pandas DataFrame to perform SQL querying.
Import Necessary Libraries
from pandasql import sqldf
import pandas as pd
Import Data Sets
emp_df=pd.read_csv(r'D:\python_coding\pandas_practice\emp.csv')
emp_df.head(10)
dept_df=pd.read_csv(r'D:\python_coding\pandas_practice\dept.csv')
dept_df.head()
Querying DataFrames
In this section of blog we will write code to transform the pandas data frames for SQL querying.
create query variables:
equery='select empno,ename,sal,comm,deptno from emp_df'
dquery='select deptno,dname,loc from dept_df'
As we have discussed in previous sections that sqldf is the main method of pandasql library that basically performs
the conversion or transformation of data frames to sql querying capability.
For conversion of DataFrames we will write a python function that will take query variables, created previously,
as arguments.
Python conversion functions can be created in two ways lambda or def func(). This blog covers both approaches.
using def
def df_to_sql(q):
return sqldf(q)
using lambda
dftosql = lambda q:sqldf(q)
Read Data using SQL query
equery='select empno,ename,sal,comm,deptno from emp_df'
print(type(df_to_sql(equery)))
df_to_sql(equery)
dquery='select deptno,dname,loc from dept_df'
print(type(dftosql(equery)))
dftosql(dquery)
Perform Join operation
edquery= 'select e.empno, e.ename, e.sal,
d.deptno, d.dname from emp_df e
inner join dept_df d on e.deptno=d.deptno'
merge_df=dftosql(edquery)
print(type(merge_df))
merge_df.head(10)
Group By and Aggregation
agg_query='select deptno, round(avg(sal),2) as avg from emp_df group by deptno'
agg_df=dftosql(agg_query)
print(type(agg_df))
agg_df.head()
pandasql documentation talks about session or environmental variables globals() and locals(),
apart from this not much explained in documentation. In document session variables are passed as arguments
to sqldf method, that performs conversion, but while working on use case I did not used these variables and it still works.
Actually globals and locals is maintained internally and contains the namespace or environmental details of the current
session in use. Output of globals() is python dictionary and contains information about all of the objects created in
session. Just to keep the article simple to understand I have only extracted the keys of the dictionary that relates to
all the objects for the session.
Let's check what exactly globals() holds.
print(globals().keys())
dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__',
'__builtin__', '__builtins__','_ih', '_oh', '_dh', 'In', 'Out', 'get_ipython', 'exit',
'quit', '_', '__', '___', '_i', '_ii', '_iii', '_i1','sqldf', 'pd', '_i2',
'emp_df', 'dept_df', '_i3', '_3', '_i4', 'equery', 'dquery', '_i5', 'dftosql', '_i6',
'df_to_sql', '_i7', '_7', '_i8', '_8', '_i9', '_9', '_i10', 'edquery', 'merge_df',
'_10', '_i11', 'agg_query', 'agg_df', '_11', '_i12', '_12', '_i13'])
To Summarize, topics covered in the article:
- use of pandasql library to able us to execute SQL query on pandas dataframe.
- perform SQL operations such as joins, groupby and aggregations etc on pandas dataframe using sql query.
That's all with the performing SQL operation on Pandas DataFrames.
Thanks for reading my blog and supporting the content.
Please subscribe and follow me on blogspot for upcoming contents. Word of appreciation always helps to keep up the spirit and a healthy knowledge network helps us grow.
Share the Knowledge.
Comments
Post a Comment