How big companies store and handle their big data?
- What is big data?
so, name itself suggests that big data means large volume of data,the collection of data which is huge in size and adding up day by day.
take an example that data degeneration in an stock exchange industry like New York stack exchange which generates about one terabyte of new trade data per day , social media sites like Facebook, tweeter , linked in etc have crores , millions and trillions of user posting million of data. so, as these industries are growing ,storing such a big data have became a problem currently and since the data is big storing is not only the problem another problem arose is velocity of data sharing and processing. If a data is so huge we can’t even imagine how much time it will take to get properly processed and form a proper flow which should be of course continuous.
A well know social media platform “Linked in” which has estimated that they have reported over 660 millions of users, spread over 200 countries and on an average a single linked in user spends 10:20 minutes , millions of users , companies post millions of post and material so how they store and manage such a huge data.
Jay Kreps of LinkedIn explained that how they process data at recent Hadoop summit. They batch processing on daily basis using Hadoop. To achieve greater performance they need to operate on large scale data — they rely on large scale distributed storage technique which is cluster processing using Hadoop. LinkedIn updated their live servers with the help of Hadoop into Voldemort and Voldemort is LinkedIn’s NoSQl storage engine. they build an unique structure in their Hadoop pipeline which produces a multi-TB structures, which makes cluster computing resources for faster response because of which it takes LinkedIn about 90 minutes to Built 900 GB data on a 45 nodes development cluster.
Using Hadoop has a storage layer(Hadoop Distributed File System using this one can create such a structure which work on a principle of parallel processing where the data of main serve is stored and processed in parallel with other servers. that is you can tie up together with other commodity computers with main server and consider it as a single unit where clustered servers can read the datasets in parallel which makes the process cheaper than single server.
and this is how a need can give birth to a new invention , new technologies
And this is how multinational companies handle their big data.