Do you work with esoteric data that has no schema, no human-readable output, and/or inconsistent interfaces? Is your data only readable from C++ classes with a secret encoding? Let me demonstrate how to use Python and Apache Arrow to quickly read your data into pandas and elegantly analyze the data.
One common scenario in large enterprise systems is esoteric/inconsistently structured data. This data is crucial to a firm’s success, but cannot be easily read, analyzed or extracted. The data might not have a schema and might only exist in memory. An example of this is C++ code that has classes with strange and inconsistent interfaces which do not have fast human-readable serializations. Programmers are stuck when needing to test and analyze this data. A better solution would be to migrate the data to a common schema-based format and use Python data science libraries to analyze it.
Python (with clang) and Apache Arrow enables you to quickly and easily transform data into the Apache Parquet format, where you can use PyArrow and pandas to analyze it. Attendees will learn how to: