What businesses invest in on Big Data solutions are technologies that complement their current infrastructures, not replace those infrastructures. The Enterprise Information Management systems that they have in place are still useful, but they don’t want consume the capacity of, or overextend those systems, to accommodate new types of data.
They want to be able to take advantage of new technologies – such as Hadoop, advanced analytics, and data integration and data quality tools – that can work with their existing infrastructures.
80 percent of the work in any Big Data project is data integration and data quality. You need to access, integrate, cleanse and prepare the data. That means only 20 percent of the time is left to analyze the data.
John Haddad, Senior Director of Big Data Product Marketing, Informatica suggest following steps to deal with your data and gain advantages from.
- The first thing is that businesses need to be able to access any and all types of data. They need to have universal access – whether the data resides externally or on premise, in the cloud, on legacy systems, or in sensor devices.
- The next thing to consider is not just the ability to access the data. Can you bring it into your data management system in such a way that you don’t create another bottleneck? If data is being created in real time, it should be ingested in real time.
- If you also have data that resides in a relational database, and you need that data to be replicated into your analytics infrastructure, you need tools that make it easy to replicate schema or subset of schema. Or, if data only needs to be loaded periodically, you can load it in batch. Likewise, if the data changes frequently, you need to be able to ingest it and capture it in an efficient and real-time fashion.
- Tools should provide unlimited scalability. The tools that you use to create your data pipeline should have unlimited scalability, so that you don’t have to rebuild and refactor everything over and over again, as the data volume continues to grow.
- Once you have your data in Hadoop or whatever Big Data platform you are using, you need to transform, integrate, cleanse and normalize it. This requires ETL and data quality tools.
- Further, because data-driven applications are becoming mission critical, you need to have high availability. That means zero downtime.
- Big Data technologies are evolving and changing quickly. You can’t afford to re-build your pipeline and data analytics every time a new technology comes along. To minimize your risk in adopting new technologies, you need tools, such as the Informatica Vibe Virtual Data Machine, that enable you to design and build once and deploy anywhere.
Add new comment