1. TerarkDB 和 RocksDB 是什么关系?
TerarkDB 使用自己的算法和数据结构实现了一个 SSTable，获得了更好的随机读性能和更好的压缩率.
2. TerarkDB 是否有修改 RocksDB 的代码?
是的，TerarkDB 对 RocksDB 做了少量修改:
- 增加了在 TerarkDB SSTable 创建时的两边扫描支持，以减少磁盘使用(不影响现有的其他 SSTable).
- 增加了 TerarkDB 环境变量配置(这样用户可以在使用我们的
3. TerarkDB 是否兼容 RocksDB?
TerarkDB 没有修改任何 RocksDB 用户层的 API，从用户的角度 TerarkDB 和 RocksDB 100% 兼容，用户在替换为 TerarkDB 后甚至不需要重新编译应用。
4. TerarkDB 是否兼容 MongoDB 和 MySQL?
TerarkDB 通过 MongoRocks 和 MyRocks 兼容 MongoDB 和 MySQL。
5. Since the core algorithm of TerarkDB is proprietary, how to be compliant with MongoDB’s AGPL license?
- All Terark code related to MongoDB is open source (MongoDB itself and MongoRocks), these do not use any TerarkDB code or API.
- The core of TerarkDB is a plug-in for RocksDB, so it is not directly applied into MongoDB -- it is loaded as a dynamic library for librocksdb.so, and TerarkDB is compliant with RocksDB’s license.
6. How much memory is needed during compression?
- For key, the peak memory is about total_key_length * 1.3 during compression.
- For value, the memory usage is about input_size*18%, and never exceeds 12G during compression.
To give you an example, to compress 2TB of data, in which "keys" are 30GB:
- TerarkDB would need ~39GB of RAM to compress the keys (into index).
- TerarkDB would need ~12GB of RAM to compress the values.
- Key compressing and Value compressing can be in parallel if the memory is sufficient, and in serial if the memory is not sufficient.
7. Why use universal compaction, instead of level compaction?
Level compaction has larger write amplification, resulting slower compression speed and shorter SSD lifetime.
8. TerarkDB is faster in random read and smaller in compressed data size, but what is the trade-off?
The compression speed is slower, and use more memory during compression. This does not impact instant writes. SINGLE thread compression speed is about 15~50 MB/s, so multiple threads are used for compression.
9. What is the lock mechanism? Will TerarkDB lock the whole SSTable during the compression?
TerarkDB has nothing to do with lock, SSTable in RocksDB itself is read-only after compaction, and write-only during compaction. So we did not make any change on locking.
10. What happens if a huge SSTable needs to be compressed? (e.g. more than 5TB)
RocksDB’s SStable will not be that large.
TerarkDB supports data of huge size like that, but each SSTable is not necessarily that large, a single SSTable is hundreds of GB at most.
11. What is the max size of a single SSTable?
It depends, usually the max number of keys in an SSTable can not exceed 1.5 billions, and the total length of all keys can be much larger(practically 30GB should be safe).
For values, the total length of compressed values can not exceeds 128PB(practically this limit will never be hit), the number of values is the same as the number of keys.
People will unlikely to generate such large single SSTable and hit the limit, and TerarkDB will finish such large SSTable and create a new SSTable.
12. What is the random and sequential read speed?
It depends on the data set and memory limit.
On TPC-H lineitem data with unlimited memory (row len 615 bytes, key len 23 bytes, text field length 512 bytes), for a SINGLE thread:
- For key, ~10MB/s on Xeon 2630 v3, up to ~80MB/s on a different data set.
- For value, ~500 MB/s on Xeon 2630 v3, up to ~7GB/s on a different data set.
As a general rule, below showing the relative comparable magnitudes of read speed:
When Memory is slightly limited(all data fit in memory by TerarkDB, not by Other DBs)
DB Read mode Speed Description TerarkDB sequential 3000 all data exactly in memory
no page cache miss
CPU Cache & TLB miss is medium random 2000 CPU Cache & TLB miss is heavy Other DB sequential 10000 hot data always fit in cache random 10 no hot data, heavy cache miss
When Memory is extremely limited
|TerarkDB||sequential||2000||hot data always fit in cache|
|random||10||no hot data, heavy cache miss|
|Other DB||sequential||10000||hot data always fit in cache|
|random||1||no hot data, very heavy cache miss|
13. What is the compression (write) speed?
It depends on the data set.
On TPC-H lineitem data (row len 615 bytes, key len 23 bytes, text field length 512 bytes), for a SINGLE thread:
- For key, ~9MB/s on Xeon 2630 v3, up to ~90MB/s on a different data set.
- For value, ~40 MB/s on Xeon 2630 v3, up to ~120MB/s on a different data set.
14. What is the compression ratio?
It depends on the data set.
On wikipedia data, all english text of ~109G, is compressed into ~23G.
On TPC-H lineitem of ~550G, keys are ~22G, values are ~528G. Average row len is 615 bytes, in which key is 23 bytes, value is 592 bytes, in which the configurable text field is 512 bytes:
- All keys are compressed into ~5.3G.
- All values are compressed into ~24G. (So the SSTable size is 29.3G)
15. How does the length of key/value impact the compression ratio?
- For key, < 80 bytes will be best fit, and of course >80 Bytes still works.
- For value, between 50 bytes and 30KB will be best fit, and it works for other length too.