摘要:HBase是一个稀疏、多维度、排序的映射表,这张表的索引包括行键,列族、列限定符和时间戳。每个值是一个未经解释的字符串,没有数据类型。用户在表中存储数据,每一行都有一个可排序的行键和任意多的列。
分享兴趣,传播快乐,增长见闻,留下美好!
亲爱的您,这里是LearningYard新学苑。
今天小编为大家带来文章
“小h漫谈(12):HBase数据模型概述”
欢迎您的访问。
Share interest, spread happiness, increase knowledge, leave a beautiful!
Dear, this is LearningYard Academy.
Today Xiaobian brings you an article
"Xiao H's Musings (12): Overview of HBase Data Model"
Welcome to your visit.
一、思维导图(Mind mapping)
二、精读内容(Intensive Reading Content)
数据模型概述
HBase是一个稀疏、多维度、排序的映射表,这张表的索引包括行键,列族、列限定符和时间戳。每个值是一个未经解释的字符串,没有数据类型。用户在表中存储数据,每一行都有一个可排序的行键和任意多的列。
HBase is a sparse, multi-dimensional, sorted mapping table. The index of this table includes the row key, column family, column qualifier, and timestamp. Each value is an uninterpreted string without a data type. Users store data in the table, where each row has a sortable row key and an arbitrary number of columns.
表在水平方向由一个或者多个列族组成。一个列族中可以包含任意多个列、同一个列族里面的数据存储在一起。列族支持动态扩展,可以很松地添加一个列族或列,无须预先定义列的数量以及类型,所有列均以字符串形式存储,用户需要自行进行数据类型转换。
A table is composed of one or more column families in the horizontal direction. A column family can contain an arbitrary number of columns, and data within the same column family is stored together. Column families support dynamic expansion, allowing for the flexible addition of a column family or column without the need to predefine the number and type of columns. All columns are stored as strings, and users must perform their own data type conversions.
由于同一张表里面的每一行数据都可以有截然不同的列,因此对于整个映射表的每行数据而言,有些列的值是空的,所以说 HBase 是稀疏的。
Since each row of data in the same table can have completely different columns, some columns may have empty values for each row of the entire mapping table, which makes HBase sparse.
在HBase 中执行更新操作时,并不会删除数据的旧的版本,而是生成一个新的版本,旧的版本仍然保留,HBase 可以对允许保留的版本的数量进行设置。客户端可以选择获取距离某个时何最近的版本,或者一次获取所有版本。
When performing update operations in HBase, the old versions of the data are not deleted but a new version is generated, and the old versions are still retained. HBase allows setting the number of versions to be retained. The client can choose to retrieve the version closest to a certain time or retrieve all versions at once.
如果在查询的时候不提供时间戳,那么会返回距高现在最近的那一个版本的数据,因为在存储的时候,数据会按照时间戳排序。HBase 提供了两种数据版本回收方式:一是保存数据的最后 n个版本;二是保存最近一段时间内的版本(如最近7天)。
If a timestamp is not provided during a query, the version closest to the present will be returned, as data is sorted by timestamp during storage. HBase provides two data version recycling methods: one is to retain the last n versions of the data; the other is to retain versions from a recent period of time (such as the last 7 days).
数据模型的相关概念
1.表
HBase 采用表来组织数据,表由行和列组成,列划分为若干个列族。
HBase organizes data using tables, which consist of rows and columns, with columns divided into several column families.
2. 行键
每个HBase 表都由若干行组成,每个行由行键(Row Key)来标识。访问表中的行只有3种方式:通过单个行键访问;通过一个行键的区间来访问;全表扫描。行键可以是任意字符串(最大长度是 64KB,实际应用中长度一般为 10~100 Byte)。在 HBase 内部,行键保存为字节数组。存储时,数据投照行键的字典序存储。在设计行键时,要充分考虑这个特性,将经常一起读取的行存储在一起。
Each HBase table is composed of several rows, each identified by a row key. There are only three ways to access rows in the table: by a single row key, by a range of row keys, or by a full table scan. The row key can be any string (with a maximum length of 64KB, and typically 10 to 100 bytes in practical applications). Internally, the row key is stored as a byte array. Data is stored in dictionary order based on the row key. When designing the row key, it is important to consider this characteristic and store rows that are often read together.
3. 列族
一个HBase 表被分组成许多“列族”的集合,它是基本的访问控制单元。列族需要在表创建时就定义好,数量不能太多(HBase 的一些缺陷使得列族的数量只限于几十个),而且不能频繁修改。存储在一个列族当中的所有数据,通常都属于同一种数据类型,这通常意味着数据具有较高的压缩率。表中的每个列都归属于某个列族,数据可以被存放到列族的某个列下面,但是在把数据存放到这个列族的某个列下面之前,首先必须创建这个列族。在创建完列族以后,就可以使用同一个列族当中的列。
An HBase table is grouped into many "column families," which are the basic units of access control. Column families need to be defined when the table is created, and the number should not be too large (due to some limitations of HBase, the number of column families is limited to dozens), and they should not be modified frequently. All data stored in a column family usually belongs to the same data type, which typically means that the data has a high compression rate. Each column in the table belongs to a certain column family, and data can be stored under a column of the column family. However, before storing data under a column of this column family, the column family must be created first. After the column family is created, columns within the same column family can be used.
复制分享
今天的分享就到这里了
如果您对今天的文章有独特的想法
欢迎给我们留言
让我们相约明天
祝您今天过得开心快乐!
That's all for today's sharing.
If you have a unique idea about the article
please leave us a message
and let us meet tomorrow
I wish you a nice day!
文案|小h
排版|小h
审核|ls
参考资料:
文字:《大数据技术原理与应用》
翻译:Kimi.ai
本文由LearningYard新学苑整理并发出,如有侵权请在后台留言!
来源:LearningYard学苑