clickhouse学习笔记
2021/5/5 18:28:38
本文主要是介绍clickhouse学习笔记,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
introduction
https://www.youtube.com/watch?v=fGG9dApIhDU
glance of features
- shared nothing architecture
- column storage with vectorized query execution
- build-in sharding and replication
延伸阅读:
replicas help with concurrency, shards add IOPs.
shard table into different nodes, and replicate data one each of them.
use zookeeper to maintain the shared state and leader election.
clickhouse code is optimized for speed
bottom-up design: algorithms determine interface
ch的设计比较特殊,它是根据算法的实现来决定接口的定义。而不是常见的由用法(或使用习惯)决定接口。
specialized algorithms for common operations,seleted by:
由下面四个要素来决定某个操作应该使用哪种算法来执行。
- Data type:14 GROUP BY algorithms
- Data size:whether data fits in memory
- Ordering: whether data is already [partly] sorted or not
- Data distribution: e.g. using multi-armed bandits to optimize LZ4 decomposition
延伸阅读:
Introduction to Multi-Armed Bandits [pdf下载]
Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty.
LZ4 (一种极快的压缩/解压算法,但压缩比率较差)
LZ4 is lossless compression algorithm, providing compression speed > 500 MB/s per core, scalable with multi-cores CPU.
vectorized query execution
- SIMD (SSE 4.2+)
- efficient dispatch on all available cores
延伸阅读:
CMU 课程 Vectorized Query Execution
Vectorized query execution batches multiples rows together in a columnar format, and each operator uses simple loops to iterate over data within a batch. This feature greatly reduces the CPU usage for reading, writing and query operations like scanning, filtering.
how do distributed queries work?
application will visit one node of clickhouse, this node will dispatch subselect to different nodes and aggregateState will compute locally on mutil nodes, then the finnal aggregation will be merged on initiator node, and feedback to application.
其他
- https://en.wikipedia.org/wiki/Materialized_view
- Vectorization vs. Compilation in Query Execution (论文)
- TPC-DS
TPC-DS is an enterprise-class benchmark, published and maintained by the Transaction Processing Performance Council (TPC), to measure the performance of decision support systems running on SQL-based big data systems.
- clickhouse sql 语法 https://clickhouse.tech/docs/zh/sql-reference/syntax/
- 架构概述 https://clickhouse.tech/docs/zh/development/architecture/
这篇关于clickhouse学习笔记的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-11-23增量更新怎么做?-icode9专业技术文章分享
- 2024-11-23压缩包加密方案有哪些?-icode9专业技术文章分享
- 2024-11-23用shell怎么写一个开机时自动同步远程仓库的代码?-icode9专业技术文章分享
- 2024-11-23webman可以同步自己的仓库吗?-icode9专业技术文章分享
- 2024-11-23在 Webman 中怎么判断是否有某命令进程正在运行?-icode9专业技术文章分享
- 2024-11-23如何重置new Swiper?-icode9专业技术文章分享
- 2024-11-23oss直传有什么好处?-icode9专业技术文章分享
- 2024-11-23如何将oss直传封装成一个组件在其他页面调用时都可以使用?-icode9专业技术文章分享
- 2024-11-23怎么使用laravel 11在代码里获取路由列表?-icode9专业技术文章分享
- 2024-11-22怎么实现ansible playbook 备份代码中命名包含时间戳功能?-icode9专业技术文章分享