摘要:Spark 最初诞生于美国加州大学伯克利分校的 AMP 实验室,是一个可应用于大规模数据处理的快速、通用引擎,如今是 Apache 软件基金会下的顶级开源项目之一。
分享兴趣,传播快乐,增长见闻,留下美好!
亲爱的您,这里是LearningYard新学苑。
今天小编为大家带来文章
“小h漫谈(13):Spark简介”
欢迎您的访问。
Share interest, spread happiness, increase knowledge, leave a beautiful!
Dear, this is LearningYard Academy.
Today Xiaobian brings you an article
"Little H's Casual Talk (13): Introduction to Spark"
Welcome to your visit.
一、思维导图(Mind mapping)
二、精读内容(Intensive Reading Content)
Spark简介(Spark Overview)
Spark 最初诞生于美国加州大学伯克利分校的 AMP 实验室,是一个可应用于大规模数据处理的快速、通用引擎,如今是 Apache 软件基金会下的顶级开源项目之一。
Spark was initially born in the AMP Lab at the University of California, Berkeley. It is a fast and general engine for large-scale data processing and is now one of the top-level open-source projects under the Apache Software Foundation.
Spark 最初的设计目标是使数据分析更快——不仅运行速度快,也要能快速、容易地编写程序。为了使程序运行更快,Spark 提供了内存计算和基于 DAG 的任务调度执行机制,减少了迭代计算时的 I/O(Input/Output,输入/输出) 开销;而为了使编写程序更为容易,Spark 使用简练、优雅的 Scala 语言编写,基于 Scala 提供了交互式的编程体验。同时,Spark 支持 Scala、Java、Python、R 等多种编程语言。
The original design goal of Spark was to make data analysis faster—not only in terms of running speed, but also in terms of being able to write programs quickly and easily. To make programs run faster, Spark provides in-memory computing and a task scheduling execution mechanism based on DAG (Directed Acyclic Graph), reducing the I/O (Input/Output) overhead during iterative computations. To make writing programs easier, Spark is written in the concise and elegant Scala language and provides an interactive programming experience based on Scala. At the same time, Spark supports multiple programming languages such as Scala, Java, Python, and R.
Spark 的设计遵循“一个软件栈满足不同应用场景”的理念,逐渐形成了一套完整的生态系统, 既能够提供内存计算框架,也可以支持 SQL 即席查询(Spark SQL)、流计算(Spark Streaming)、机器学习(MLlib)和图计算(GraphX)等。Spark 可以部署在资源管理器 YARN 上,提供一站式的大数据解决方案。因此,Spark 所提供的生态系统同时支持批处理、交互式查询和流数据处理。
The design of Spark follows the philosophy of "one software stack meets different application scenarios," gradually forming a complete ecosystem. It can not only provide an in-memory computing framework, but also support SQL ad-hoc queries (Spark SQL), stream computing (Spark Streaming), machine learning (MLlib), and graph computing (GraphX). Spark can be deployed on the resource manager YARN, providing a one-stop big data solution. Therefore, the ecosystem provided by Spark supports both batch processing, interactive queries, and stream data processing.
Spark对比Hadoop的优点(Advantages of Spark over Hadoop)
Hadoop 虽然已成为大数据技术的事实标准,但其本身还存在诸多缺陷,最主要的缺陷是 MapReduce 计算模型延迟过高,无法胜任实时、快速计算的需求,因而只适用于离线批处理的应用场景。
Although Hadoop has become the de facto standard for big data technology, it still has many shortcomings. The most significant one is that the MapReduce computing model has high latency and cannot meet the needs of real-time and fast computing, thus it is only suitable for offline batch processing application scenarios.
Spark 在借鉴 MapReduce 优点的同时,很好地解决了 MapReduce 所面临的问题。相比于 MapReduce,Spark 主要具有如下优点。
While drawing on the advantages of MapReduce, Spark has well solved the problems faced by MapReduce. Compared with MapReduce, Spark mainly has the following advantages.
(1)Spark 的计算模式也属于 MapReduce,但不局限于 Map 和 Reduce 操作,还提供了多种数据集操作类型,编程模型比 MapReduce 更灵活。
Spark's computing model also belongs to MapReduce, but it is not limited to Map and Reduce operations. It also provides a variety of dataset operation types, making the programming model more flexible than MapReduce.
(2)Spark 提供了内存计算,中间结果直接存放到内存中,带来了更高的迭代运算效率。
Spark provides in-memory computing, where intermediate results are directly stored in memory, leading to higher iterative operation efficiency.
(3)Spark 基于 DAG 的任务调度执行机制,要优于 MapReduce 的迭代执行机制。
Spark's task scheduling execution mechanism based on DAG is superior to MapReduce's iterative execution mechanism.
今天的分享就到这里了
如果您对今天的文章有独特的想法
欢迎给我们留言
让我们相约明天
祝您今天过得开心快乐!
That's all for today's sharing.
If you have a unique idea about the article
please leave us a message
and let us meet tomorrow
I wish you a nice day!
文案|小h
排版|小h
审核|ls
参考资料:
文字:《大数据技术原理与应用》
翻译:Kimi.ai
本文由LearningYard新学苑整理并发出,如有侵权请在后台留言!
来源:LearningYard学苑