opensearch_to_pandas
- etl.opensearch_to_pandas(show_progress: bool = False) DataFrame
Convert an opensearch_py_ml.Dataframe to a pandas.DataFrame
Note: this loads the entire OpenSearch index into in core pandas.DataFrame structures. For large indices this can create significant load on the OpenSearch cluster and require signficant memory
Parameters
- oml_df: opensearch_py_ml.DataFrame
The source opensearch_py_ml.Dataframe referencing the OpenSearch index
- show_progress: bool
Output progress of option to stdout? By default, False.
Returns
- pandas.Dataframe
pandas.DataFrame contains all rows and columns in opensearch_py_ml.DataFrame
Examples
>>> from tests import OPENSEARCH_TEST_CLIENT
>>> oml_df = oml.DataFrame(OPENSEARCH_TEST_CLIENT, 'flights').head() >>> type(oml_df) <class 'opensearch_py_ml.dataframe.DataFrame'> >>> oml_df AvgTicketPrice Cancelled ... dayOfWeek timestamp 0 841.265642 False ... 0 2018-01-01 00:00:00 1 882.982662 False ... 0 2018-01-01 18:27:00 2 190.636904 False ... 0 2018-01-01 17:11:14 3 181.694216 True ... 0 2018-01-01 10:33:28 4 730.041778 False ... 0 2018-01-01 05:13:00 [5 rows x 27 columns]
Convert opensearch_py_ml.DataFrame to pandas.DataFrame (Note: this loads entire OpenSearch index into core memory)
>>> pd_df = oml.opensearch_to_pandas(oml_df) >>> type(pd_df) <class 'pandas.core.frame.DataFrame'> >>> pd_df AvgTicketPrice Cancelled ... dayOfWeek timestamp 0 841.265642 False ... 0 2018-01-01 00:00:00 1 882.982662 False ... 0 2018-01-01 18:27:00 2 190.636904 False ... 0 2018-01-01 17:11:14 3 181.694216 True ... 0 2018-01-01 10:33:28 4 730.041778 False ... 0 2018-01-01 05:13:00 [5 rows x 27 columns]
Convert opensearch_py_ml.DataFrame to pandas.DataFrame and show progress every 10000 rows
>>> pd_df = oml.opensearch_to_pandas(oml.DataFrame(OPENSEARCH_TEST_CLIENT, 'flights'), show_progress=True) 2020-01-29 12:43:36.572395: read 10000 rows 2020-01-29 12:43:37.309031: read 13059 rows
See Also
opensearch_py_ml.pandas_to_opensearch: Create an opensearch_py_ml.Dataframe from pandas.DataFrame