7. Data Analysis - Data Aggregation & Grouping ~ Python and Machine Learning Blog

Tuesday, 4 October 2022

7. Data Analysis - Data Aggregation & Grouping

October 04, 2022 No comments

Split, Apply, Combine

Mean

X["Values1"].groupby([X["Keys1"]]).mean() -- Mean of "Values1" based on Keys = "Key1"
X["Values1"].groupby([X["Keys1"],X["Keys2"]]).mean()
X.groupBy["Keys1"].mean()
X.groupBy(X["Keys1"], X["Keys2"]).mean()

Count

X.groupBy(X["Keys1"], X["Keys2"]).size()

GroupBy Clause with FOR Loop

for name, group in X.groupby([["Key1"]])

print (name)
print(group)

Accessing each group in dictionary key

dict(list(X.groupby([["Key1"]])))

Extracting based on column

X.groupby(X.dtypes, axis =1) -- column will be seperated based on data types.

Column Grouping

Create a mapping from column to grouping

mapColumns={'a':'Group1', 'b':'Group1', 'c':'Group2', 'd':'Group2', 'e':'Group3', 'f':'Group3'}

X.groupby(mapColumns, axis=1).mean()

Passing Lambda function in groupby

X.groupby(lambda x:x<'c', axis =1).mean() -- Two group will be created with TRUE/FALSE column.

Aggregate function

def max_min(x)

return x.max() - x.min()

X.agg([max, min, max_min]) -- For each column min, max, max_min function will be called.

Aggregate function with groupby

X1 = X.groupby(["key1"]) -- Will create dataframe for each groups in "key1"
X1.agg(min, max, max_min)) -- Will apply aggregate function for each of that column.

Aggregate function with custom defined column name

X1.agg(("Maximum", max),("Minimum", min),("Maximum_Minimum",max_min))

GroupBy with Apply

X.groupby().apply(fName)
X.groupby(["key1"]).apply(lambda x: x.min()) -- Passing lambda function, User defined function can also be applied.

GroupBy with Apply with passing arguments to function

def minimumFb(X, x):

return(X.max()-X.min() > x)

X.groupby(["key1"]).apply(minimumFb(10))

Apply function with bucket analysis

quartiles = X.cut(X.Values1, 2)
X.groupby(quartiles).apply(lambda x:x.min())

Pivot Table

X.pivot_table(values="AvgSizeOfTrip" , index = "Gender" , columns="Group" )

Pivot table with aggregation

X.pivot_table (values="AvgSizeOfTrip" , index = "Gender" , columns="Group", aggfunc=np.sum)
X.pivot_table (values="AvgSizeOfTrip" , index = "Gender" , columns="Group", aggfunc={"Col1":np.sum, 'Col2':np.mean}) - different column with different operation.

Pivot table with count

pd.crosstab(X.gender, X.group) -- This just counts the frequency.

Share:

0 comments:

Post a Comment