This month, I took a look at Hortonwork's Data Platform (HDP) and Data flow (HDF).
I set up portioned out one 18 GB server to test out HDP which will handle analytics and visualizations, and another 18GB server to test out HDF which will handle real time streaming with Kafka, Storm, and Nifi. Through playing with HDF, I am now more interested in applications like IOT, and pricing prediction. I may start to offer services for analytics using geodata (for marketing, real estate, events, governments, etc) and longitudinal data (stocks, cryptos, sentiment over time, etc.). I've already laid out the pipelines with Apache Nifi, but I want to play around a bit more.
HDP's Ambari comes with lots of nice visualization tools like Grafana and Superset. Superset in particular was very interesting (after some configuration). With Superset, you can import data in various formats, and execute sql-like queries. In particular, if you configure the Mapbox APIs, you can get graphs like this! You can also share password protected dashboards!
By doing all this, I've definitely gained a level. I'm starting to see how these tools can provide actionable insights which are easy to share with non-technical audiences. It's slow and steady, but I'm definitely feeling the gains :) Now, I'm looking into Ansible, since if I want to scale out, I want to automate the configuration/installation process that would otherwise have taken me hours. (Though I may just stick to shell scripts since I only have 2 things to automate).
If you have any problems you'd like me to solve, just send me a message through twitter! Data science is so fun, especially once you get things working!Share on Twitter Share on Facebook