Posts

Showing posts with the label oracle

Split Datasets

Image
 The Objective of this article is to transform data set from row to column using explode() method. The scope of this article is to understand how to  unnest or explode a data set using parallel processing framework Pyspark and Python native library- Pandas . Dataset looks like as below: dept,name 10,vivek#ruby#aniket 20,rahul#john#amy 30,shankar#jagdish 40, 50,yug#alex#alexa Pandas explode() import pandas as pd pan_df=pd.read_csv(r'explode.csv') df_exp=pan_df.assign(name=pan_df['name'].str.split('#')).explode('name') df_exp Output: Dataset is transformed successfully and we are able to create new rows from nested dataset. Pandas way of explode is simple, crisp and straight forward unless the dataset is complex. In next section of this article we will cover PySpark way of exploding or unnesting dataset. PySpark explode() Import libraries and Connect to Spark from pyspark import SparkContext,SparkConf import pyspark from pyspark.sql import SparkSes...

Spark Window Functions

Image
 The objective of this article is to understand Pyspark Window functions. The blog will do a comparative study of Pyspark window functions and Relational DB systems, Oracle Database, analytical functions. Spark Window functions operate on a group of rows (like frame, partition) and return a single value for every input row.  To perform an operation on a group first, we need to partition the data using Window.partitionBy() , and for row number and rank function we need to additionally order by on partition data using orderBy()  clause. Connect to Spark import pyspark from pyspark.sql import SparkSession print('modules imported') spark=SparkSession.builder.appName('Spark_window_functions').getOrCreate() Load Dataset emp_df=spark.read.csv(r'emp.csv',header=True,inferSchema=True) emp_df.show(10) Import necessary Libraries from pyspark.sql.window import Window from pyspark.sql.functions import col, row_number, rank, dense_rank from pyspark.sql import functions as ...

Popular posts from this blog