Our use of big data technology has also gone through a development process. From the very beginning Google started to use big data technology in the search engine, to the ubiquitous artificial intelligence applications, with the development of big data technology, big data applications have also gone from being high and low to today's blooming everywhere.
When Google published an epoch-making paper on big data from the very beginning, it might not have imagined that it opened a new era of big data. The achievements of big data and artificial intelligence today are inseparable from the efforts of millions of big data practitioners around the world, including you and me. History may be opened by genius, but after all, it is created by the people. As participants in the era of big data, we are making history.
The search engine era of big data applications
As the world's largest search engine company, Google is also our recognized originator of big data. It stores almost all accessible web pages in the world, the number may exceed trillions, and all storage requires tens of thousands of disks. In order to store these files, Google developed GFS (Google File System), which manages tens of thousands of disks on thousands of servers, and then acts as a file system to store all these web files in a unified manner.
You might think that if you simply store all the web pages, it doesn't seem to be too great. That's right, but Google gets these webpage files to build a search engine. It needs to count the word frequency of all the words in the files, and then calculate the page rank according to the PageRank algorithm. In the middle, Google needs to calculate and process the files on these tens of thousands of disks, which sounds great. Of course, it is precisely based on these needs that Google has developed the MapReduce big data computing framework.
In fact, before Google, the most well-known search engine in the world was Yahoo. However, Google has made a qualitative leap in the search experience of search engines with its own big data technology and PageRank algorithm, and people have abandoned Yahoo and turned to Google. So when Google published its own GFS and MapReduce papers, Yahoo should be the first company to pay attention to these papers.
Doug Cutting was the first to make Hadoop based on Google's paper, so Yahoo dug into Doug Cutting to develop Hadoop full-time. However, the honeymoon between Yahoo and Doug Cutting did not last long. Doug Cutting was unable to withstand Yahoo's internal struggle and switched to Cloudera, a company dedicated to commercializing Hadoop. Yahoo invested in Cloudera's competitor HortonWorks.
Top companies, like top masters, do things with an elegant sense of beauty. You can watch Google go all the way, from search engine, Gmail, map, Android, driverless, every step pushes the technological boundary of mankind to a higher level. Even if the companies that are a little bit have won a prominent position, once they lose the beauty and rhythm of doing things, in this era of rapid change, they will fall faster than the meteors.
The data warehouse era of big data applications
When Google's paper was first published, it attracted search engine companies such as Yahoo and open source search engine developers such as Doug Cutting. Other companies were just eating melons. But when Facebook launched Hive, olfactory technology companies were not calm. They began to realize that the era of big data has truly begun.
In the past, when we conducted data analysis and statistics, we were only limited to the database. In the database computing environment, we performed statistical analysis on the data tables in the database. And limited by the amount of data and computing power, we can only perform statistics and analysis on the most important data. The so-called most important data here usually refers to data for the boss and financial-related data.
Hive can perform SQL operations on Hadoo to realize data statistics and analysis. In other words, we can obtain much more data storage and computing power at a lower price than before. We can put running logs, application collection data, and database data together for calculation and analysis to obtain data results that were previously unavailable, and the company's data warehouse will also expand exponentially.
Not only the boss, but every ordinary employee in the company such as product managers, operations personnel, and engineers, as long as they have data access rights, can put forward analysis requirements and obtain the data analysis results they want to understand from the big data warehouse.
You see, in the era of data warehouses, as long as there is data, statistical analysis is almost necessary. If the scale of data is relatively large, we will think of using Hadoop big data technology. This is one of the reasons why Hadoop has developed so fast in this period. The development of technology has also promoted the application of technology, which also paves the way for the next big data application to enter the era of data mining.
The era of data mining for big data applications
Once big data enters more companies, we will put forward more expectations for big data. In addition to statistics, we also hope to discover the value of more data, and then big data will enter the era of data mining.
Let’s talk about a real case. A long time ago, merchants discovered through data that people who buy diapers usually also buy beer. So savvy merchants put these two products together to promote sales. You can have various interpretations of the relationship between beer and diapers, but if you don't use data mining, you may break your head and you can't think of a relationship between them. In a business environment, how to interpret this relationship is not important. The important thing is that as long as there is a correlation between them, correlation analysis can be carried out. The ultimate goal is to let users see the goods they want to buy as much as possible.
In addition to the relationship between products and products, you can also use the relationship between people to recommend products. If two people buy many goods that are similar or even the same, no matter how far apart the two people are, they must have some kind of relationship, for example, they may have similar educational background, economic income, and hobbies. Based on this relationship, you can make related recommendations to let them see the products they are interested in.
Furthermore, big data can also dig out the different characteristics of each person, and label them with various labels: post-90s, living in first-tier cities, monthly income of 10,000 to 20,000, home...These labels constitute user portraits, and As long as there are enough such labels, you can describe a person completely, even more complete and accurate than the description of you by the person closest to you.
In addition to merchandise sales, data mining can also be used for interpersonal relationship mining. Have you ever heard of the "Six Degrees of Separation Theory", it thinks that two people in the world who don't know each other can connect them with only a few intermediaries. The experimental result of this theory in the United States is that two unknown Americans can be contacted in six steps. Also based on this theory, Facebook has studied the data of more than one billion users, trying to find the number between two strangers, and the answer is an astonishing 3.57. As you can see, all kinds of social software records our friendships. Through the mining of relationship graphs, almost all the interpersonal networks in the world can be depicted.
Modern life is almost inseparable from the Internet. Various applications collect data all the time. These data are constantly being analyzed and mined in the big data cluster in the background. Whether these analysis and mining bring us beauty or fear depends on the efforts of big data practitioners. But it is certain that regardless of the final result, this process will only accelerate and will not stop, you and I can only invest in it.
The machine learning era of big data applications
We discovered early on that there is a law in the data. This law is followed by all data. Things that happened in the past follow this law, and things that will happen in the future also follow this law. Once you find this law, you can predict what is happening in accordance with this law.
In the past, we were limited by data collection, storage, and computing capabilities. We could only obtain a small part of the data through sampling, and could not get a complete, global, and detailed law. But now that big data is available, all historical data can be collected, the laws can be counted, and then what is happening can be predicted.
This is machine learning.
Store the game record data of human Go games in history, and record which moves can get a higher winning surface for each board. After obtaining this statistical law, we can use this law to play chess with people, and calculate where each move will get a bigger win. So we got a robot that can play chess. This was a sensation in the past two years. AlphaGo defeated the top chess players of mankind by an overwhelming advantage.
Give another example that is closer to our lives. Collect the conversation data of people's chats and record the context of each conversation. If the previous sentence asked how you are today, then how to respond to the next sentence can be counted through machine learning. In the future, if someone asks how you are doing today, they can automatically reply to the next sentence, so we will get a chatting robot. Siri, Tmall Genie, Xiao Ai, such voice chat robots have been all over the streets in the machine learning era.
The data generated by human activities can be used to obtain statistical laws through machine learning, which can then simulate human behavior and make the machine show the unique intelligence of human beings. This is artificial intelligence AI.
We still have some irrational attitudes towards artificial intelligence. Some people believe that artificial intelligence will become stronger and stronger and will rule humanity in the future. In fact, a little understanding of the principles of artificial intelligence will find that this is only a statistical law calculated by big data. No matter how intelligent it is, it is impossible to understand the meaning of this. Meaning is the source of human intelligence. According to the current development thinking of artificial intelligence, there will never be intelligence that surpasses humans, let alone dominate humans.
Write at the end
From search engines to machine learning, big data is actually in the same line. It is to discover the laws in the data and use them for us. Therefore, many people call data a gold mine, and the application of big data is to unearth real gold and silver with commercial value from this gold mine containing the treasure of knowledge.
The value contained in data is already a well-known thing, so how to unearth the value of knowledge we want from these huge data is exactly what big data technology is currently solving, including big data storage and computing. Including big data analysis, mining, machine learning and other applications.
The Western Gold Rush in the United States brought about the great pioneering era of the United States. People from all over the world flocked to the western United States, bringing population, resources, and productivity to the wild west. Railroads also connected the east and west coasts of the United States. , The entire United States also prospered. Big data, a larger gold mine, is currently playing the same role. Countless governments, companies, and individuals around the world are paying attention to this gold mine, and countless resources are pouring here.
We have never lived in the prosperous era of the gold rush in the western United States, and missed the era of personal heroism of glory and dreams, freedom and passion. But now, a more epoch-making big data gold rush is coming, and you and I are in it.