vrijdag 18 april 2014

Hadoop summit 2014 review

Introduction

Couple of weeks ago I joined the Hadoop summit 2014 in Amsterdam. I enjoyed most of the sessions and I've learned new stuff and it gave me more insights in the usage of Hadoop.The world of data is changing and it will happen fast, as it seems, according to some speakers at the Hadoop Summit. The data paradigm is shifting from Schema-on-Write (RDBMS) to Schema-on-Read (Hadoop). This paradigm shift is happening at this moment. We will see for certain types of data the Schema-on-Read will become popular in favorite of Schema-on-Write solutions. Most of these types of data are unstructured or semi-structured.

And, it's not about new technology replacing old technology, it's also about creating new businesses with new technology. With Hadoop there are new business opportunities possible with new ways of developing business models based on Hadoop and together with RDBMS solutions (for now).



DataOS

With the launch of YARN (Yet Another Resource Negotiator) the name of DataOS appeared. YARN is called DataOS. In 2012, YARN became a sub-project of the Apache Hadoop project. YARN is a software rewrite that decouples MapReduce's resource management and scheduling capabilities from the data processing component. This enables Hadoop to support more different processing approaches and a broader array of applications.





RDBMS living together with HADOOP

How about RDBMS and Hadoop?  YARN replaces Mapreduce and it isn't a batch oriented system but also a real time solution. Storm is an example of this. I think dat future developments of Hadoop will lead to a change of the design of the Enterprise Data Warehouse. But for now, in the referential architecture of Hortonworks RDBMS and Hadoop live together and they both borrow data from each other. The photo below shows you this (sorry for the bad photo).



This is also the case with the PDW v2 solution of Microsoft. In this appliance Microsoft already sells Hadoop as an integrated part. Polybase is the layer that abstracts the Hadoop and the MS nodes. So, In one solution a RDBMS and a Hadoop solution with an uniform layer. That's nice.



Conclusions

The Open source community of Hadoop has become huge and practically every vendor embraces this new technology. Microsoft gave up there Hadoop look-a-like software (Bing) and has embraced Hadoop and integrated this in PDW v2 (The modern Datawarehouse). There are new Hadoop developments like YARN, Stinger, TEZ, Storm.

As for integrating Hadoop into your Enterprise Data warehouse I can imagine that Hadoop will become a base for capturing raw data into a (historically) Staging Area. Hadoop is about storing Raw data and it's cheap. You can Build on top of that a Business datawarehouses and a discovery- and analytical platform for analyzing structured and unstructured data.

Greetz,

Hennie

Geen opmerkingen:

Een reactie posten