Join my course at Udemy (Python Programming Bible-From beginner to advanced )

Blogger templates

Tuesday 4 October 2022

7. Data Analysis - Data Aggregation & Grouping

 Split, Apply, Combine

Mean

  • X["Values1"].groupby([X["Keys1"]]).mean() -- Mean of "Values1" based on Keys = "Key1"
  • X["Values1"].groupby([X["Keys1"],X["Keys2"]]).mean()
  • X.groupBy["Keys1"].mean()
  • X.groupBy(X["Keys1"], X["Keys2"]).mean()

Count

  • X.groupBy(X["Keys1"], X["Keys2"]).size()

GroupBy Clause with FOR Loop

  • for name, group in X.groupby([["Key1"]])
    • print (name)
    • print(group)

Accessing each group in dictionary key

  • dict(list(X.groupby([["Key1"]])))

Extracting based on column

  • X.groupby(X.dtypes, axis =1) -- column will be seperated based on data types.

Column Grouping

  • Create a mapping from column to grouping
    • mapColumns={'a':'Group1', 'b':'Group1', 'c':'Group2', 'd':'Group2', 'e':'Group3', 'f':'Group3'}
  • X.groupby(mapColumns, axis=1).mean()

Passing Lambda function in groupby

  •  X.groupby(lambda x:x<'c', axis =1).mean() -- Two group will be created with TRUE/FALSE column.

Aggregate function

  • def max_min(x)
    • return x.max() - x.min()
  • X.agg([max, min, max_min]) -- For each column min, max, max_min function will be called.

Aggregate function with groupby

  • X1 = X.groupby(["key1"]) -- Will create dataframe for each groups in "key1"
  • X1.agg(min, max, max_min)) -- Will apply aggregate function for each of that column.

Aggregate function with custom defined column name

  • X1.agg(("Maximum", max),("Minimum", min),("Maximum_Minimum",max_min))

GroupBy with Apply

  • X.groupby().apply(fName)
  • X.groupby(["key1"]).apply(lambda x: x.min())  -- Passing lambda function, User defined function can also be applied.
GroupBy with Apply with passing arguments to function
  • def minimumFb(X, x):
    • return(X.max()-X.min() > x)
  • X.groupby(["key1"]).apply(minimumFb(10))

Apply function with bucket analysis

  • quartiles = X.cut(X.Values1, 2)
  • X.groupby(quartiles).apply(lambda x:x.min())

Pivot Table

  • X.pivot_table(values="AvgSizeOfTrip" , index = "Gender" , columns="Group" )

Pivot table with aggregation

  • X.pivot_table (values="AvgSizeOfTrip" , index = "Gender" , columns="Group", aggfunc=np.sum)
  • X.pivot_table (values="AvgSizeOfTrip" , index = "Gender" , columns="Group", aggfunc={"Col1":np.sum, 'Col2':np.mean}) - different column with different operation.

Pivot table with count

  • pd.crosstab(X.gender, X.group)  -- This just counts the frequency.




Share:

0 comments:

Post a Comment

Feature Top (Full Width)

Pageviews

Search This Blog