开始

数据湖屋查询引擎

Dremio is the only engine built from the ground up to deliver high-performing BI dashboards and interactive analytics directly on data lake storage.

世界上最快的湖屋引擎

With query acceleration technologies like data reflections and columnar cloud cache (C3), we make it possible to achieve interactive response times directly on data lake storage, without having to copy the data into warehouses, 集市, 提取或多维数据集.
柱状云缓存图

C3:柱状云缓存

Columnar Cloud Cache (C3) enables Dremio to achieve NVMe-level I/O performance on S3/ADLS/GCS by leveraging the NVMe/SSD built into cloud compute instances, 比如Amazon EC2和Azure虚拟机.

C3 only caches data required to satisfy your workloads and can even cache individual microblocks within datasets. 如果你的表有1,000 columns and you only query a subset of those columns and filter for data within a certain timeframe, then C3 will just cache that portion of your table.

通过有选择地缓存数据, C3 also eliminates over 90% of S3/ADLS/GCS I/O costs, which can make up 10-15% of the costs for each query you run.
数据反映产品截图

数据反映

数据反映 are data structures that intelligently precompute aggregations and other operations on data, so you don’t have to do complex aggregations and drilldowns on the fly.

Reflections are completely transparent to end users. Instead of connecting to a specific materialization, users query the desired tables and views and the Dremio optimizer picks the best Reflections to satisfy and accelerate the query.

Aside from simplicity for data analysts, Reflections are also incredibly easy to create and maintain! You can use a UI or REST API to administer Reflections, instead of having to write complicated SQL statements to define materialized views and refresh rules.
基于成本的优化器产品截图

基于成本的优化器

Query engines can choose from multiple strategies to execute any query you submit. Picking the right strategy is crucial — the wrong join algorithm could grind you to a halt!

Dremio’s cost-based optimizer picks the fastest path to complete your query by understanding deep statistics about the data you want to query, 包括位置, 基数, 和分布. It uses that data to accurately predict how much data will flow through the query’s operators so that it can choose the best plan. It also takes into account the Reflections in the system, and rewrites the query plan to use them.
细粒度的修剪图

细粒度的修剪

Runtime filtering enables Dremio to dynamically apply filters from a smaller joined table to a larger table to enhance filtering on larger tables. Dremio automatically applies these filters on joins without any user involvement and provides up to 100x improved performance when working with traditional star or snowflake schemas.
粗糙的

Apache箭头Gandiva

Dremio is a columnar engine powered by Apache Arrow, columnar的开源标准, 内存计算(十大网赌靠谱网址平台共同创造的!).

Dremio利用Gandiva, an LLVM-based library for runtime code generation, to create machine code that efficiently evaluates arbitrary expressions on batches of columnar Arrow data, 而不是基于行执行.

Gandiva maximizes CPU utilization and leverages optimizations like vectorized processing and SIMD execution to make your queries fly!
阿帕奇箭飞行图

Apache箭飞行

Apache Arrow is Dremio’s internal memory format, and it’s also the standard for Python and R developers with over 20 million downloads per month. 《十大网赌靠谱网址平台》是一部现代电影, open source RPC framework that was co-created by Dremio to enable ultra-fast data transfer between Arrow-enabled systems.

Flight eliminates serialization and deserialization, 支持并行性, and avoids the need for proprietary client-side drivers. The result: 20-100x faster access to query results compared to traditional JDBC and ODBC interfaces.
Multi-Engine Architecture and Workload Management diagram

Multi-Engine Architecture and Workload Management

Dremio具有多引擎架构, 所以你可以创建多个合适大小的, physically isolated engines for various workloads in your organization. You can easily set up workload management rules to route queries to the engines you define, so you’ll never have to worry again about complex data science workloads preventing an executive’s dashboard from loading.

除了消除资源争用之外, engines can quickly resize to tackle workloads of any concurrency and throughput, 并且在不运行查询时自动停止.

0 noisy neighbors, 100% resource control, 60% lower compute costs.

准备开始? 这里有一些可以帮助你的资源

...

电子书

数据湖的高性能BI

3 Steps for Making High-Performance BI Work Directly with Cloud Data Lake Storage

...

白皮书

新数据层

学习 how the new data tier brings data warehousing capabilities to the data lake and enables net-new capabilities that data warehouses cannot provide

...

网络研讨会

下一代云数据架构

David Loshin from TDWI helps you prepare for the next-generation cloud data architecture and discusses steps to take best advantage of this modernized environment.

看到所有资源
友情链接: 1 2 3 4 5 6 7 8 9 10