DataFrame.groupby
- DataFrame.groupby(by: str | List[str] | None = None, dropna: bool = True) DataFrameGroupBy [source]
Used to perform groupby operations
Parameters
- by:
column or list of columns used to groupby Currently accepts column or list of columns
- dropna: default True
If True, and if group keys contain NA values, NA values together with row/column will be dropped.
Returns
opensearch_py_ml.groupby.DataFrameGroupBy
See Also
Examples
>>> from tests import OPENSEARCH_TEST_CLIENT
>>> oml_flights = oml.DataFrame(OPENSEARCH_TEST_CLIENT, 'flights', columns=["AvgTicketPrice", "Cancelled", "dayOfWeek", "timestamp", "DestCountry"]) >>> oml_flights.groupby(["DestCountry", "Cancelled"]).agg(["min", "max"], numeric_only=True) AvgTicketPrice dayOfWeek min max min max DestCountry Cancelled AE False 110.799911 1126.148682 0.0 6.0 True 132.443756 817.931030 0.0 6.0 AR False 125.589394 1199.642822 0.0 6.0 True 251.389603 1172.382568 0.0 6.0 AT False 100.020531 1181.835815 0.0 6.0 ... ... ... ... ... TR True 307.915649 307.915649 0.0 0.0 US False 100.145966 1199.729004 0.0 6.0 True 102.153069 1192.429932 0.0 6.0 ZA False 102.002663 1196.186157 0.0 6.0 True 121.280296 1175.709961 0.0 6.0 [63 rows x 4 columns]
>>> oml_flights.groupby(["DestCountry", "Cancelled"]).mean(numeric_only=True) AvgTicketPrice dayOfWeek DestCountry Cancelled AE False 643.956793 2.717949 True 388.828809 2.571429 AR False 673.551677 2.746154 True 682.197241 2.733333 AT False 647.158290 2.819936 ... ... ... TR True 307.915649 0.000000 US False 598.063146 2.752014 True 579.799066 2.767068 ZA False 636.998605 2.738589 True 677.794078 2.928571 [63 rows x 2 columns]
>>> oml_flights.groupby(["DestCountry", "Cancelled"]).min(numeric_only=False) AvgTicketPrice dayOfWeek timestamp DestCountry Cancelled AE False 110.799911 0 2018-01-01 19:31:30 True 132.443756 0 2018-01-06 13:03:25 AR False 125.589394 0 2018-01-01 01:30:47 True 251.389603 0 2018-01-01 02:13:17 AT False 100.020531 0 2018-01-01 05:24:19 ... ... ... ... TR True 307.915649 0 2018-01-08 04:35:10 US False 100.145966 0 2018-01-01 00:06:27 True 102.153069 0 2018-01-01 09:02:36 ZA False 102.002663 0 2018-01-01 06:44:44 True 121.280296 0 2018-01-04 00:37:01 [63 rows x 3 columns]