ByteDance Adds Image Understanding to China's Popular AI Chatbot Doubao

摘要:Unlike traditional OCR (Optical Character Recognition) technology, which focuses on recognizing text, Doubao’s new function recogn

TMTPOST – TikTok’s parent company ByteDance has added image understanding ability to its answer to ChatGPT -- Doubao, as the short-video giant doubles down on a potential blockbuster in China's fast-growing generative artificial intelligence (AI) market.

Unlike traditional OCR (Optical Character Recognition) technology, which focuses on recognizing text, Doubao’s new function recognizes the content of images, offering users a more interactive and informative experience. With the addition of photo and camera buttons on both the Doubao app and its PC version, users can now upload pictures and receive accurate descriptions of the content, such as identifying locations or characters from images.

This development marks a significant move toward multimodal AI, where different types of data—such as text, images, and video—are processed and understood together. This contrasts with earlier models, which typically focused on single modes, like text or speech.

For example, users can now ask Doubao questions about a specific landmark or the identity of a character from an image, and Doubao will provide answers seamlessly. In one demonstration, a four-panel comic with a scientific joke about physicists and Isaac Newton was shared with the app. Doubao was able to analyze the humor and explain the joke, highlighting how the AI interprets visual elements and relates them to textual content.

The comic depicted a humorous scenario in which two physicists on a battlefield observe fallen soldiers. Rather than considering whether the soldiers were alive or dead, the physicists focus on the scientific explanation for why the soldiers fell, concluding that Isaac Newton’s discovery of gravity was the cause. This comedic take on the sometimes absurd ways in which physicists explain the world amused many users, with Doubao providing an accurate and entertaining analysis of the scene.

The addition of image understanding to Doubao is not just about making the app more engaging. It’s also part of a broader trend in AI development, where companies are shifting from theoretical models to practical applications that solve real-world problems. Image recognition technology is particularly useful in scenarios like search engines, content evaluation, text generation, and more, making AI systems more relevant and practical for everyday users.

ByteDance's Doubao app has quickly become one of the most popular AI-based applications in China, surpassing competitors in terms of dAIly active users. According to data from QuestMobile, by October 2024, the monthly active user base for AI-native applications reached a staggering 89.76 million, reflecting a 373% year-over-year growth. Among these, Doubao, Kimi Smart Assistant, and Wenxin Yiyan topped the charts in both mobile and web access, further solidifying their place as leaders in the AI space.

As AI continues to evolve, companies are looking for ways to integrate AI models into more practical, real-world scenarios. The integration of image understanding is just one example of how AI can be applied to enhance productivity, improve accessibility, and foster innovation across industries. This shift is expected to drive the next wave of technological advancements, with AI playing a central role in boosting global economic efficiency.

According to Statista, the global AI market is expected to surpass $1.8 trillion by 2030, further underscoring the transformative potential of AI technologies. As companies like ByteDance continue to innovate and expand the capabilities of their AI models, the future looks bright for the growing AI ecosystem.

来源:钛媒体APP一点号

相关推荐