opensearch_to_pandas

etl.opensearch_to_pandas(show_progress: bool = False) → DataFrame

Convert an opensearch_py_ml.Dataframe to a pandas.DataFrame

Note: this loads the entire OpenSearch index into in core pandas.DataFrame structures. For large indices this can create significant load on the OpenSearch cluster and require signficant memory

Parameters

oml_df: opensearch_py_ml.DataFrame: The source opensearch_py_ml.Dataframe referencing the OpenSearch index
show_progress: bool: Output progress of option to stdout? By default, False.

Returns

pandas.Dataframe: pandas.DataFrame contains all rows and columns in opensearch_py_ml.DataFrame

Examples

>>> from tests import OPENSEARCH_TEST_CLIENT

>>> oml_df = oml.DataFrame(OPENSEARCH_TEST_CLIENT, 'flights').head()
>>> type(oml_df)
<class 'opensearch_py_ml.dataframe.DataFrame'>
>>> oml_df
   AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
0      841.265642      False  ...         0 2018-01-01 00:00:00
1      882.982662      False  ...         0 2018-01-01 18:27:00
2      190.636904      False  ...         0 2018-01-01 17:11:14
3      181.694216       True  ...         0 2018-01-01 10:33:28
4      730.041778      False  ...         0 2018-01-01 05:13:00

[5 rows x 27 columns]

Convert opensearch_py_ml.DataFrame to pandas.DataFrame (Note: this loads entire OpenSearch index into core memory)

>>> pd_df = oml.opensearch_to_pandas(oml_df)
>>> type(pd_df)
<class 'pandas.core.frame.DataFrame'>
>>> pd_df
   AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
0      841.265642      False  ...         0 2018-01-01 00:00:00
1      882.982662      False  ...         0 2018-01-01 18:27:00
2      190.636904      False  ...         0 2018-01-01 17:11:14
3      181.694216       True  ...         0 2018-01-01 10:33:28
4      730.041778      False  ...         0 2018-01-01 05:13:00

[5 rows x 27 columns]

Convert opensearch_py_ml.DataFrame to pandas.DataFrame and show progress every 10000 rows

>>> pd_df = oml.opensearch_to_pandas(oml.DataFrame(OPENSEARCH_TEST_CLIENT, 'flights'), show_progress=True) 
2020-01-29 12:43:36.572395: read 10000 rows
2020-01-29 12:43:37.309031: read 13059 rows

See Also

opensearch_py_ml.pandas_to_opensearch: Create an opensearch_py_ml.Dataframe from pandas.DataFrame