Will the popular smart speaker really be a smart home entrance? BAT enters the smart speakers, entrepreneurs stand in line or break through? In addition to price wars, how else can smart speakers be sold? What are the changes in user usage scenarios brought about by the comprehensive integration of AI technology applications? How does the capital market view the application of cutting-edge technologies such as smart homes in products?
At 8:30 pm on July 31, "Tencent Venture 01CLUB" launched the first phase of community content sharing activities. Mingshi Capital’s founding partner Huang Mingming, and Xiaomi’s ecological chain vice president Tang Mu served as the guests of this dialogue. They talked about the smart speaker industry on several major insights into the industry, and conducted a discussion between the capital market and the head technology company. For the next "outreach" dialogue, Tencent Venture 01CLUB has sorted out several key points in this dialogue:
1. The foundation of smart home development is to solve the rigid needs of users.
2. The difficulty of smart speaker technology breakthrough is NLP (Natural Language Processing Technology).
3. In the face of giant competition, breakthroughs can only be achieved through cooperation.
4. If smart speakers need to add a screen, it is currently necessary to resolve mutually exclusive contradictions in device interaction.
5. Sweeping robots may be the next outlet for smart homes.
In response to these major points of view, the two dialogue guests conducted wonderful specific analysis, explanations and sharing. The following is the collation of the content of the dialogue:
The foundation of the development of smart home is to solve the rigid needs of users
Huang Mingming: Let me briefly introduce Mingshi Capital's investment logic and cases in the AI field.
In the AI field, our focus is to find the landing scenarios of core technologies, or to invest in areas that can improve the efficiency of the industry. The case scenarios we invest in cover the fields of travel, law, medical care, and industry.
For example, in the field of AI travel, we invested in Yihang Intelligence and Zhixing Technology; in terms of AI law, we invested in Sima Technology and Mita Technology; in terms of AI medical care, we invested in the research and development of intelligent medical surgical robots, Shangshukang Medical; In the AI industry, Xuan Yu Technology, a supplier of smart factory solutions, was invested.
Today, I mainly talked with Tang Mu, vice president of Xiaomi Eco-Chain, about smart speakers in terms of technology and business models.
First of all, in terms of technology, what kind of breakthroughs in the underlying technology, Mr. Tang (Tang Mu) thinks, has promoted the rapid development of smart speakers? What are the outstanding issues that seriously affect the user experience? For example, far-field positioning, local software and hardware computing capabilities, multi-person sound field and false awakening, including multi-round dialogue capabilities, which are the core constraints? Do you agree that the traffic entrance of smart homes in the future may be smart speakers?
Tang Mu: Let me talk about the problem of smart home entrance first. From 2012 and 2013, many companies have tried, imagined, and made related products. At that time, for the company's own products, there would be such a saying to the outside world: it was fighting for the entrance of smart home. Brothers are not talented, I was a router at the time, so I shouted "The router is the entrance to the future smart home".
After doing this for a few years, I found that I was shouting fiercely that I was the entrance to the smart home, but I didn't stay in the end. Instead, those who were really down-to-earth made a product to meet the high-frequency and rigid needs of users. Products can stay.
You have to ask me if the router is the entrance, but the router is not. I believe that the router is just one of the central nodes of the smart home.
The center of a smart home may not be one device, it may be several devices. I recall that Mr. Lei invited me to join Xiaomi. The core appeal is that I can explore smart homes. The router is only one of the focus nodes, and the smart speakers will be made later. .
After participating in the smart speaker project, I found that smart speakers are another dimension of smart home center. But now I don’t want to emphasize that it is the center of Yishou. At present, our product thinking has been transformed into: we can imagine a product very ambitiously, we can imagine a lot of things to leave a lot of future development space for it, but the most Basically, you have to satisfy its high frequency first.
The difficulty of smart speaker technology breakthrough is NLP
Tang Mu: Let's talk about technical issues. When we started to make smart speakers, I found that there are mainly three core technologies to be prepared: one is ASR, one is NLP, and the other is TTS. ASR is a speech-to-text technology, speech recognition technology; NLP is a natural language understanding technology; TTS is a text-to-speech, text-to-speech technology. These three technologies are basically a cornerstone of the existence of smart speakers. When these three technologies Ready, smart speakers have the prerequisites for turning out to be born.
Regarding the core constraints and bottlenecks, Huang Mingming said that there are many that can be improved through the accumulation of data, artificial intelligence self-learning and self-improvement. But I think so far, maybe basically I haven't seen a clear path to solve it, in fact, it is NLP (Natural Language Understanding). Many times the artificial intelligence that we present in smart speakers is a bit like a simple question and answer machine. You ask a question and he will give you an answer. In many cases, this answer is still mechanical, and it is still a little bit away from real artificial intelligence. distance.
Huang Mingming: It seems that everyone’s cognition is relatively consistent at least for now. The core thing that is difficult to break through is still in the field of NLP. Maybe this wave of deep neural networks we are talking about is basically in the field of NLP. I have asked a lot. Daniel, it is more difficult to make a breakthrough in the short term.
Aside from the NLP field that everyone is difficult to break through in the short term, let’s take Xiaomi’s Xiaoai speakers as an example to talk about the selection of technical solutions (including software, hardware, microphone array, far and near field positioning, input noise reduction, and error Wake up, etc.) What choices do you think can make the product have what features and advantages?
Tang Mu: Our best experience in making smart speakers is to seek cooperation extensively. For example, we have seven or eight partners in the ASR field, and we will send the user's query to seven or eight partners at the same time, and when they all feedback the results to us, we then go through a simple judgment algorithm to decide who to choose the result of. What we spend the most energy on technology is NLP, which is the core of smart speakers and the core of artificial intelligence voice assistants.
Take the Xiaoai speaker as an example. We currently have two great housekeeping skills, one is the understanding of the user's song query, and the other is the understanding of the user's IoT control. We have been releasing speakers for a long time, and we have collected more queries, which will greatly help us understand more deeply what users want.
The "convergence point" of the Mi 2C product form is far from being as simple as the hardware
Huang Mingming: It just so happens that you mentioned a lot of solutions on Tang Mu, and the technical solutions are cooperating with many external partners. I also gossip, some media revealed that Xiaomi is currently cooperating with Amazon Alexa, and may consider integrating or cooperating with Microsoft’s Xiaona to launch our new smart speaker product line. Is this rumor a good thing? Spectrum? If there are such considerations, does it mean that we may spend more effort on hardware production and software, and we will also cooperate with strong foreign partners?
Tang Mu: All the rumors are that there will be no waves. Google, Amazon, and Microsoft have all found after we released smart speakers. They value Xiaomi's supply chain advantages and cost-effective advantages in the field of smart hardware.
But everyone also knows that Xiaomi has never been a company that is only willing to be a hardware terminal, so when we select partners, we must also consider the possibility of win-win cooperation in the Internet field. In fact, we are still talking about it. We will not only make hardware without touching the internal system and the brain behind it.
Huang Mingming: Thank you very much Tang Mu for his frankness. He also gave us a lot of information. I only asked one Amazon, and now Google has been exposed. Let me enter into a slightly more challenging question. Since I just mentioned that it is not only with foreign giants, including domestic giants with massive content or even monopoly with massive content, including Tencent Music, Ali, etc., we compete with him. The situation is also in response to our theme today, the product dispute or the content dispute?
In the face of giant competition, win-win cooperation can achieve breakthroughs
Huang Mingming: Nowadays, there is a relatively mainstream voice in the market, because no matter how well we have done the speech recognition and semantic understanding we just talked about, in the end, what users care about is the content I want to obtain later. The content is in the hands of the giants. How to deal with it?
Tang Mu: I think the answer must be win-win cooperation. Although Xiaomi has an IPO, we still consider ourselves a startup company internally. We will never have the financial resources to compete with some Internet giants such as BAT in purchasing content.
I think that in fact, each company has what each company is good at, and each company's "gene". For example, in terms of music, Tencent must have its genes, otherwise TME would not be so big. So I spent a lot of energy to discuss cooperation with Tencent, so that everyone can do what they are good at: Xiaomi is good at making hardware and systems, and Tencent TME Group is good at making music, so we can work together and cooperate. NS.
On the one hand, we will cooperate with QQ Music to provide most of the free music for users of Xiaoai speakers. In addition, we are also willing to help QQ Music and the music copyright companies behind it to develop music memberships. This should be something that both sides are very willing to see.
In addition, in the course of the negotiation of cooperation, I am also pleased to see that the content industry in the country is developing more and more benignly. There are already a very, very large user group willing to pay for good content, which has given us a lot of money. confidence.
We can see from the 20 to 30 million queries made by Xiao Ai speakers every day, 60%, 70%, or even 70%, 80% of which are still the query for content acquisition, and the query for content playback control. . This can explain one thing: in fact, voice content must be the core demand of almost all smart speakers in China, and it is just needed for high frequencies.
Huang Mingming: I also agree with Tang Mu's statement: Each company has different genes, and it is still necessary to cooperate. Everyone does what he is good at doing his best. There is still a chance. What is the future competition and cooperation environment? There will be many changes over time.
To add a screen to a smart speaker, it is necessary to solve mutually exclusive contradictions in interaction
Huang Mingming: I go directly to the next question about Xiaomi's ecology. Now whether it is Amazon or other domestic competitors, they have added the concept of screen to their speakers. I know that Tang Mu, you are a super product manager and also emphasizes user experience very much. How does the smart speaker category think about adding the screen? Will Xiaomi's smart speakers consider adding the screen?
Tang Mu: Actually, there have been many thoughts about whether smart speakers should have a screen or not, but there are still some contradictions that need to be resolved. For example, in my vision, voice interaction is a kind of far-field interaction, and people and devices can actually interact directly with each other at a certain distance.
But from the perspective of screen interaction, the emergence of handheld devices with multi-touch is a near-field interaction, and far-field interaction and near-field interaction are inherently mutually exclusive. Putting a screen on the speaker will bring a device that everyone can do far-field interaction closer to a device that requires people to do near-field interaction. These two things appearing on the same device are a bit mutually exclusive.
When the current competitive situation has reached the current stage, I think I may be slapped in the face when I jump to conclusions, and I can only stop here. We will actively discuss or actively prepare a speaker with a screen, because through long-term observation, we found that a screen is actually more helpful for information feedback of voice interaction.
Huang Mingming: I very much look forward to Xiaomi's next generation of smart speakers with screens. We are inseparable from Xiaomi’s ecology for any of Xiaomi’s products. I don’t know that some digital parties are inconvenient. Tang Mu can introduce. At present, how many types and quantities of smart devices are interconnected through our Xiaomi smart speakers? The actual call frequency of our users. Just now what you said may be the main call frequency. What is the ratio of the call frequency to other IoT, especially smart home? How sticky is it used?
Tang Mu: At present, there are more than 100 million devices connected to the Xiaomi IoT Cloud. This more than 100 million devices are connected to us. It has gradually accumulated during the past four or five years of Xiaomi doing IoT. Smart speakers From the new thing that just appeared last year, its appearance just made all of our IoT devices connected to the Internet in the past more closely connected, and it greatly enhanced the stickiness of users to control IoT devices through voice.
For example, one of the data is quite interesting, called "joint purchase rate". We can observe from many user feedback and data that users with more Xiaomi IoT devices are more inclined to buy Xiaoai speakers. After buying Xiaoai speakers, the frequency and viscosity of daily control of IoT devices are very high. The other is that users who purchase Xiaoai speakers, even if they have no other Xiaomi IoT devices at the beginning, are more inclined to actively purchase other Xiaomi IoT ecological products after buying Xiaoai speakers.
This made me see a hope that smart homes will be closer to reality than ever before, and more landing. We used to make apps to control IoT devices in the past. Many users would think that it did not actually improve people's life experience, or the control experience of these devices, but with voice interaction, many users would think that turning on and off the lights is as simple as The operation of frequently used operations, using voice control is simply a match made in heaven, and many people have developed such a habit.
Huang Mingming: I mentioned that this query ranks second in number. I believe you will count the frequency and stickiness of repeated use. For example, how long the same user can use Xiaomi's smart speakers to call others How high is the habit of IoT devices?
I remember we just met and chatted at that time, and we saw a large number of so-called smart home entrepreneurial projects in the investment opportunity. In the end, we came to a very rude, but very straightforward conclusion. Using App to control smart home is a bit " Funny", for example, when you turn off a light, you need to pick up your phone to find your App, click on and select that light, and then choose to turn off. It's better to just get out of bed and turn off the light. Until the emergence of voice interaction, we think it is indeed a very exciting news that the wave of smart home scenes, especially the scenes dominated by voice interaction, has exploded.
Sweeping robots may be the next outlet for smart homes
Huang Mingming: I am very happy that the Tang Mu brothers are at the forefront again today, the forefront of the smart home entrance. In fact, I remember that our angels had a party two years ago. It happened to be with Mr. Lei. I remember that at the lakeside of the Summer Palace, when we were walking fast, we also talked about a topic, in addition to smart speakers, Xiaomi Eco Which ones are the most likely entrances to IoT? I am very optimistic about the other product is the robot vacuum cleaner, I don’t know what your opinion is?
I think there are two core points that the robot vacuum needs. One is to be fast. For example, the SLAM algorithm must scan the environment where the user lives, whether it is the current 2D or future 3D architecture, it will scan all of it clearly and store the relevant data in its database. In fact, it is also a robot that knows the user's home, home living conditions and other information very well. Second, its form includes more interaction scenarios with people, for example, it is mobile, so it is likely to become another very important interaction or the entrance of traffic.
One is speakers and the other is robots, both of which I am very optimistic about. Xiaomi has its own layout in these two areas. We are also optimistic about entrepreneurial opportunities in this field. Want to hear Tang Mu's views?
Tang Mu: We totally thought of going together on this point. I think the current Xiaoai speaker is just a container for an artificial intelligence assistant. In the future, this assistant will appear in many devices. I will think that its ultimate form is a robot. But speaking of the robot’s point of view, in fact, many users may not agree with it. When we make a product, we can first imagine what it can do if the product settles in thousands of households, but in fact, the first step is the most difficult. How to make it land in thousands of households.
For example, what actual needs do users have to spend money to buy it at home? I think it may be too early to be a robot, but the sweeping robot is the only robot form I am optimistic about.
Many Hollywood movies have raised users’ expectations for intelligent understanding to an infinitely high level, so that users will feel that buying a home robot is a shit, or even rubbish, because it is better than what we know we have seen before. The robots in the movie are too far behind, and the category of robots cannot be done, but you can consider it when you add the word "sweeping" in front of the robot.
In my opinion, there is actually a product evolution path before reaching the final robot form. I think there are three major elements.
The first is semantic meaning, the understanding of the meaning of sound. The AI speaker actually solves the problem of AI voice. It collects a lot of queries to allow the brain to recognize and understand people's intentions, but in fact, AI vision is the next area to be conquered, whether it is through home cameras or through sweeping robots. , Can collect enough data to train the brain to understand these images. The second is the understanding of AI vision, and the third is its action part and movement part. As long as the technology of this part is available, the appearance of the robot form will not be far away.
Visually speaking, the AI speaker solves the problem of "robot's mouth and ears". Later, we need to solve the "eye problem" and "leg problem". One day, when users are accustomed to the appearance of terminals such as ears, mouths, and eyes in their homes, they will think that perhaps the integrated product has the opportunity to appear in the home, and the user is willing to have it.
Huang Mingming: Thank you Tang Mu. I think everyone is surprisingly consistent in many points of view. We have also discussed robot projects internally. At this stage, all the so-called robot-shaped entrepreneurial products that are made into human form will be passed by us at first glance. , Because the expectations generated by users through Hollywood movies or personal natural reactions will far exceed the expectations that our existing products can achieve.
Let me add one last sentence. From today's conversation with Tang Mu, I am very excited and have received a very large amount of dry goods and information. Regardless of the IoT or the discussion with Tang Mu just now, the understanding of the real future home robots, including the phased forward steps, plans and future prospects of service robots, makes people full of great confidence and fun. The imagination of the future. Here, I also wish Tang Mu brother and Xiaomi better and better.
We also have a lot of layouts in outdoor scenes and auto-driving high-speed scenes. Welcome to have more exchanges indoors.