Big data in China

Chapter 4 Big data, the part you don't know yet

Chapter 4 Big data, the part you don't know yet (2)
If you use big data correctly, collect, organize, analyze and make predictions, it will provide you with the intelligence and insights you can only dream of.Its controls are so powerful that they allow you to have the most information at your fingertips, but also enough to allow you to steer at your fingertips—protecting yourself from threats, protecting your business, resolving potential problems, and improving efficiency through self-inspection and optimization.

Now the world generates more than 3EB of data every day. We have reason to believe that with the increasing ownership and frequency of use of the Internet and various mobile platforms, this number will continue to rise.We already know from the Prism Gate incident that the U.S. government is trying to use these data in every possible way-absorbing them with big data technology. In addition to being used positively (anti-terrorism), it is also trying to monitor and control the people.

An American consulting company predicts that in the future, the United States will need more than 10 data analysis professionals and more than 100 million managers who can use data.It can be seen that the application of big data has become very popular in the United States. They use big data in large quantities on social media, mobile networks and the analysis of public opinion to achieve the purpose of controlling voters, managing information and monitoring enemy countries.

The earlier the research on big data and the better the preparation, the stronger the control will be.There is no doubt that the Americans have been at the forefront.

The basis of control is to manage these large amounts of unstructured data well. If managed properly, we can dig out effective information from them and realize the management innovation of enterprises and governments.Prescient companies are collecting more and more data from various internal sources and cloud infrastructures, building self-controlling data centers, and hiring and training their own big data engineers.But many more businesses are still lingering outside the door.The latter is destined to make their enterprises lag far behind others. They have no way to obtain timely, effective and massive information, as well as the resulting insights, and naturally cannot make wise decisions.

In 2013, we completed a big data survey in East Asia with security firm EOA North America.The object of the survey is 300 senior executives from all walks of life in China.It was found that 49% of Chinese companies are already concerned or very concerned about big data management issues, but 38% of Chinese companies do not understand what is big data, and are still confused about big data; another 27% of Chinese companies Said that they did not know much information about this, only know the details or stay in the stage of spectators.

In addition, we also found that 76% of Chinese companies do not use proper tools to manage their own system data (IT system), but adopt other independent or lack of interconnection functions of the system.Some companies are even using electronic forms to record and manage data.

This is a discouraging survey, but the good news is that we are seeing positive growth rates.Compared with 2012 or earlier, the number of Chinese companies devoted to big data is increasing at a crazy rate.As more and more companies feel the benefits of immersing themselves in it, people are no longer ready to wait and see, but jump in right away.

One of the keys to realizing big data control is "log management", which integrates all data related to oneself, such as enterprise logs, establishes an index library, and then designs an interface that is easy for users to understand and use.To make full use of data, it is necessary to correlate and standardize data, and have the ability to report, feedback and prevent intrusion.Every successful e-commerce website and user-oriented corporate official website does this.

The reality is that currently only 56% of respondents in China use log management solutions to manage their data.Many companies use a regular logbook that comes with the computer system or set up a spreadsheet to do this.A further 39% of respondents told us that they did not manage logs (data) at all.

"Is there any use?" they asked.This shows that there is still a long way to go for domestic understanding and application of the core of big data.Raising awareness and strengthening outreach has become a top priority.

In addition, relevant technology updates, solutions and platforms must keep pace with the generation of new information.The production of data is increasing at a geometric rate, which is more vast than the stars in the universe.If we take too long to retrieve data, analysis and forecasting will be meaningless, let alone control and management, and it will cause serious problems.

big data pioneer

China is in the initial stage of big data, and the experience of foreign pioneers has very important reference and reference value for us.Visionaries all over the world have started chasing each other many years ago, showing their talents in the construction of their own big data centers, and striving to gain a first-mover advantage in this war.

☆Intel (Intel)

Intel Corporation is the world's largest semiconductor chip manufacturer, founded in 1968, with a history of decades of product innovation and market leadership.It launched the world's first microprocessor in 1971, which triggered the computer and Internet revolution.Starting from the hardware to meet the needs of big data is Intel's first preparation. At the same time, it does not relax its software. It has enhanced and optimized the Hadoop system, Hbase, and HDFS, and launched IntelHadoopManager2.0.

In July 2012, Intel released its own Hadoop commercial distribution (Apache Hadoop Distribution), becoming the only company among several large manufacturers that has its own distribution version of Hadoop.

☆IBM
IBM launched the layout of the big data era with the acquisition of data mining and data analysis, and later officially launched a dynamic route called "3A5 steps", and then combined with information management, business analysis and other software to propose a big data platform architecture belonging to IBM .

The company's big data architecture covers IBM's four core capabilities and corresponding product lines in the field of big data, including: InfoSphere BigInsights in the field of Hadoop, InfoSphere Streams in the field of flow computing, InfoSphere Warehouse and etezza in the field of data warehouses, and information integration and governance ( InformationIntegrationandGovernance) products Optim and Guardium.

☆Hortonworks
After being spun off from Yahoo in 2011, Hortonworks released a technical preview version of a Hadoop-based data platform (HortonworksDataPlatform, HDP) in August of that year.Only a few weeks later, the company launched the HDP8 version based on Hadoop0.23. This version of Hadoop has been greatly improved and realized the next generation of MapReduce.

Despite its short history, Hortonworks moved quickly, launching its own big data strategy shortly after IBM announced its Hadoop-based big data analytics platform.In addition, it has reached an agreement with Talend to provide Talend's OpenStudioforBigData tool on its data platform to comprehensively deal with big data processing.

☆Microsoft (Microsoft Corporation)

Microsoft, as a traditional IT industry banner enterprise and a well-deserved monopoly giant, does not seem to be the first to enter the field of big data.It is often considered to be a late start, but in fact Microsoft has been working on the Hadoop-like development program Dryad as early as 2006 and made it commercialized.Microsoft has always maintained its own unique style, unhurried, but never lagging behind in key areas.

In early 2011, Microsoft released its own parallel data warehouse project (SQL).One year later, the SQLServer2012 database platform was officially released, extending the business to the field of unstructured data.After the launch of tools such as WindowsAzureMarketplace and SharePoint, Microsoft has accumulated a lot of money and is fully capable of building a big data platform.

☆SAP (SAP)

Founded in 1972, SAP has always had great advantages in the software field, and most of its products focus on the ability to analyze data.This makes it a leader in the moment when the era of big data opens. In August 2012, SAP launched the third function pack of version 8 of the SAP Business Objects BI solution, referred to as featurepack4.0, and then improved and integrated it.

Based on SAPHANA, SAP has also created a powerful real-time data platform to provide users with comprehensive data analysis and processing services.

☆Oracle (Oracle)

Oracle has been integrating hardware and software since its 2009 acquisition of Sun Microsystems, which makes workstations and servers.The company's big data appliance (BDA) and Exalytics business intelligence server launched in 2011 are considered to be a sign of Oracle's strong entry into the big data market. In early 2012, the official supply of BDA and Exalytics heralded the release of Oracle's big data platform solutions.

On December 2012, 12, Oracle announced the acquisition of DataRaker, a company serving the petroleum, electrical, and water supply industries, marking a new trend in big data applications, which began to penetrate into traditional industries and produce in-depth and comprehensive application effects.

☆VMware

VMware is a global leader in desktop-to-data center virtualization solutions. In addition to optimizing Hadoop, its virtualization products also have projects around big data analysis and processing.In addition, both Cetas and vFabricData series products reduce the complexity of data processing and analysis.In addition to the most core and specialized virtualization products, VMware has also launched many open source products through acquisitions and self-development in recent years.For example, HVE (Hadoop Virtualization Essential) plug-ins and Serengeti products are all open source virtualization products launched by Wei Rui.

☆Cloudera
Cloudera is comprised of ex-Facebook, Google, and Yahoo engineers Jeff Hammerbacher, Christophe Bisciglia, Amr Awadallah, and current Founded in 2008 by CEO and former Oracle executive Mike Olson.The company uses two technologies, NoSQL and Hadoop, which has received $ 7600 million in financing.

In June 2010, the company officially launched its enterprise product.Subsequently, Cloudera added the Cloudera Manager console and enterprise-level support to its Apache Hadoop software distribution.Now it is also working closely with Oracle to increase the number of customers and promote each other's share in the big data market.

☆MapR
MapR has always focused on the optimization of usability and data security, and it has its own advantages and unique features.For example, while MapR, like other companies, commoditizes and sells open-source Hadoop-based products, it offers many features that differ from Hadoop.Its products power EMC's GreenplumHD Enterprise Edition Hadoop.

Not long ago, MapR announced the new big data platform MapRM7, which will provide Hadoop and NoSQL with more convenient, reliable and fast services.

☆Splunk
Splunk, an American business intelligence software provider established in 2003 and listed in 2012, is recognized as "the first stock in the concept of big data". Its main business is to provide data engines to enterprises and customers.The search function of its MachineData software has a powerful advantage, while SplunkFree is dedicated to individual users, and SplunkEnterprise adds the ability to support multi-user and distributed deployment.

After the success of the above-mentioned products, Splunk immediately launched a new Splunk for Citrix XenDesktop solution, and in the middle of 2012, it fully launched Splunk App for PCI Compliance 2.0 to the market.

Caution: Not Everyone Needs

Like what we hear, see, and participate in consciously or unconsciously, big data has become a big project, and it is everywhere.We treated it like we were welcoming our life partner and the excitement was palpable.Everyone is thinking: "Hey, the era of big data is here, what can I get out of it?" From social media, start-ups to Zhongguancun in Beijing, people are researching and deploying big data.

However, as we mentioned earlier, big data is not water without a source, you need a sufficient reason to open the door for it to enter your world; at the same time, you also need to pay a high price for it.Most companies lack the budget to deploy big data technology solutions, nor can they afford the relevant teams and big data engineers.

Big data is first and foremost an industry. According to a report, in 2012, big data drove global IT spending of nearly US$300 billion, and it is expected that this figure will exceed US$4 billion in four years.There are also many unpredictable market spaces in emerging countries that are not included.You know, this is almost the annual gross domestic product of a moderately developed country.

The brilliant examples of using big data are everywhere, but always so far away from certain groups of people.For example, Facebook's promoters proudly say that they store about 100TB of user data every day; the US National Security Agency (NSA) processes about 24TB of data every day.Amazing numbers!We were really impressed.But what is the cost of processing this data?According to a public information, the NSA needs to pay more than one million dollars for 45 days of data storage services, and this cost continues to increase.In my interviews over the past few years, the CIOs of most companies also told me that their budgets cannot afford the cost of big data deployment.

Therefore, this is an expensive threshold-if a company wants to obtain big data services, the first thing to solve is to provide a sufficient financial budget.

No money?Sorry, this is not as simple as selling cabbage, wholesale cheap goods or hiring a few managers.So I often hear people complain: "Big data is too expensive!" Individuals and companies are sighing, but at the same time full of desire.The question is, do you really need it?
The cost of data storage and processing is so high that cost has become the biggest obstacle preventing everyone from embracing big data, just like everything else new.So that we ordinary people - small and medium-sized enterprises need to seek other solutions so that smaller companies and individuals will not be shut out by "big data".

Solution [-]: The key to big data is not "big".

Is big data necessarily "big"?Although the world's largest technology companies need to deal with PB-level data, they are well-deserved users of star-level services for massive data processing.However, our research also shows that the other 95% of companies typically only need to use 0.5TB to 40TB of data, or even less.

The stories of Facebook and the NSA are not generalized cases, they are not the norm.The truth is that the big company's program doesn't have to be the version that the small and medium companies follow.There are more than 5 companies in the United States with only 20 to 500 employees. Most of them have the need to solve data problems, but they have not followed Facebook and the NSA to build a costly data empire.

So you can see that the biggest demand in the big data market is not those large companies that are among the top 500 in the world, but companies that rank between 500 and 5.Why do we only focus on those very few exceptions, while ignoring the ordinary demanders?

Only by excluding ourselves from users with petabyte-scale data needs can we find a real solution.When big data comes to us, we should choose a smaller interface as much as possible to enjoy the same service and convenience.

Option Two: Determine if you really need it.

(End of this chapter)

Tap the screen to use advanced tools Tip: You can use left and right keyboard keys to browse between chapters.

You'll Also Like