摘要:When people talk about big data, they often do not refer solely to the data itself, but rather to the combination of data and big
分享兴趣,传播快乐,增长见闻,留下美好!
亲爱的您,这里是LearningYard新学苑。
今天小编为大家带来文章
“小h漫谈(7):大数据关键技术”
欢迎您的访问。
Share interest, spread happiness, increase knowledge, leave a beautiful!
Dear, this is LearningYard Academy.
Today Xiaobian brings you an article
"Little H's Chat (7): Key Technologies of Big Data"
Welcome to your visit.
一、思维导图(Mind mapping)
二、精读内容(Intensive Reading Content)
当人们谈到大数据时,往往并非仅指数据本身,而是数据和大数据技术这二者的综合。
When people talk about big data, they often do not refer solely to the data itself, but rather to the combination of data and big data technology.
所谓大数据技术,是指伴随着大数掘的采集、存储、分析和结果呈现的相关技术,是使用非传统的工具对大量的结构化、半结构化和非结构化数据进行处理,从而获得分析和预测结果的一系列据处理和分析技术。
So-called big data technology refers to the related technologies that accompany the collection, storage, analysis, and result presentation of big data. It is a series of data processing and analysis technologies that use non-traditional tools to process large amounts of structured, semi-structured, and unstructured data to obtain analytical and predictive results.
讨论大数据技术时,需要首先了解大数据的基本处理流程,主要包括数据采集、存储、分析和结果呈现等环节。
When discussing big data technology, it is necessary to first understand the basic processing flow of big data, which mainly includes data collection, storage, analysis, and result presentation.
数据无处不在,互联网网站、政务系统、零售系统、办公系统、自动化生产系统、监控摄像头、传感器等,每时每刻都在产生数据。这些分散在各处的数据,需要进行采集。采集到的数据通常无法直接用于后续的数据分析,因为对于来源众多、类型多样的数据而言,数据缺失和语义模糊等问题是不可避免的,因而必须采取相应措施有效解决这些问题,这就需要一个被称为“数据预处理”的过程,把数据变成一个可用的状态。
Data is everywhere, from internet websites, government systems, retail systems, office systems, automated production systems, surveillance cameras, sensors, and more, all generating data every moment. This scattered data needs to be collected. The collected data usually cannot be directly used for subsequent data analysis because, for data from numerous sources and of various types, issues like data missing and semantic ambiguity are inevitable. Therefore, corresponding measures must be taken to effectively address these issues, which requires a process known as "data preprocessing" to transform the data into a usable state.
数据经过预处理以后,会被存放到文件系统或数据库系统中进行存储与管理,然后采用数据挖掘工具对数据进行处理与分析,最后采用可视化工具为用户呈现结果。在整个数据处理过程中,还必须注意数据安全和隐私保护问题。
After data has been preprocessed, it is stored and managed in file systems or database systems, then processed and analyzed using data mining tools, and finally presented to users through visualization tools. Throughout the entire data processing process, attention must also be paid to data security and privacy protection issues.
因此,从数据分析全流程的角度,大数据技术的不同层面及功能包括:
Therefore, from the perspective of the entire data analysis process, the different levels and functions of big data technology include:
数据采集与预处理。利用ETL工具将分布在异构数据源中的数据抽取到临时中间层后进行清洗、转换、集成,最后加载到数据仓库或数据集市中,成为联机分析处理、数据挖掘的基础;也可以利用日志采集工具(如Flume、Kafka 等)把实时釆集的数据作为流计算系统的输人,进行实时处理分析。
Data Collection and Preprocessing. Using ETL tools to extract data from heterogeneous data sources to a temporary intermediate layer, then clean, transform, integrate, and finally load it into data warehouses or data marts, forming the basis for online analytical processing and data mining; or using log collection tools (such as Flume, Kafka, etc.) to collect real-time data as input for stream computing systems for real-time processing and analysis.
数据存储和管理。利用分布式文件系统、数据仓库、关系数据库、NoSQL 数据库、云数据库等,实现对结构化、半结构化和非结构化海量数据的存储和管理。
Data Storage and Management. Utilizing distributed file systems, data warehouses, relational databases, NoSQL databases, cloud databases, etc., to achieve storage and management of massive amounts of structured, semi-structured, and unstructured data.
数据处理和分析。利用分布式并行编程模型和计算框架,结合机器学习和数据挖掘算法,实现对海量数据的处理和分析;对分析结果进行可视化呈现,帮助人们更好地理解数据、分析数据。
Data Processing and Analysis. Using distributed parallel programming models and computing frameworks, combined with machine learning and data mining algorithms, to process and analyze massive amounts of data; and to visualize the analysis results to help people better understand and analyze data.
数据安全和隐私保护。在从大数据中挖掘潜在的巨大商业价值和学术价值的同时,构建数据安全体系和隐私数据保护体系,有效保护数据安全和个人隐私。
Data Security and Privacy Protection. While extracting the potential significant commercial and academic value from big data, it is also crucial to build a data security system and a privacy data protection system to effectively safeguard data security and individual privacy.
今天的分享就到这里了
如果您对今天的文章有独特的想法
欢迎给我们留言
让我们相约明天
祝您今天过得开心快乐!
That's all for today's sharing.
If you have a unique idea about the article
please leave us a message
and let us meet tomorrow
I wish you a nice day!
文案|小h
排版|小h
审核|Dongyang
参考资料:
文字:《大数据技术原理与应用》
翻译:Kimi.ai
本文由LearningYard新学苑整理并发出,如有侵权请在后台留言!
来源:LearningYard学苑