博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
hive的count(distinct id)测试--慎用
阅读量:4933 次
发布时间:2019-06-11

本文共 6768 字,大约阅读时间需要 22 分钟。

一, 测试语句

语句一:

select count(distinct order_id) ,count(1) from d_common_wlt_info

 

语句二:

1 select count(order_id), count(one_num)2 from 3 (4 select order_id ,count(1) one_num5 from d_common_wlt_info 6 group by order_id7 )t

二, 执行日志及表说明

表大小信息

Partition Parameters:	 	 	COLUMN_STATS_ACCURATE	true                	numFiles            	9                   	numRows             	28176219            	rawDataSize         	8785300782          	totalSize           	1820024671          	transient_lastDdlTime	1551631895

 (1)语句一日志

Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 12019-03-04 13:10:24,719 Stage-1 map = 0%,  reduce = 0%2019-03-04 13:10:39,289 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 23.87 sec2019-03-04 13:10:58,984 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 76.04 sec2019-03-04 13:11:39,220 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 191.24 sec2019-03-04 13:11:50,532 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 212.97 sec2019-03-04 13:11:56,726 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 228.88 sec2019-03-04 13:12:06,036 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 247.36 sec2019-03-04 13:12:09,118 Stage-1 map = 98%,  reduce = 0%, Cumulative CPU 250.66 sec2019-03-04 13:12:12,205 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 253.8 sec2019-03-04 13:12:34,851 Stage-1 map = 100%,  reduce = 68%, Cumulative CPU 274.25 sec2019-03-04 13:12:37,942 Stage-1 map = 100%,  reduce = 70%, Cumulative CPU 277.76 sec2019-03-04 13:12:41,023 Stage-1 map = 100%,  reduce = 73%, Cumulative CPU 280.93 sec2019-03-04 13:12:44,103 Stage-1 map = 100%,  reduce = 75%, Cumulative CPU 284.14 sec2019-03-04 13:12:47,185 Stage-1 map = 100%,  reduce = 77%, Cumulative CPU 287.42 sec2019-03-04 13:12:50,267 Stage-1 map = 100%,  reduce = 80%, Cumulative CPU 290.65 sec2019-03-04 13:12:53,353 Stage-1 map = 100%,  reduce = 82%, Cumulative CPU 293.78 sec2019-03-04 13:12:56,432 Stage-1 map = 100%,  reduce = 84%, Cumulative CPU 296.95 sec2019-03-04 13:12:59,547 Stage-1 map = 100%,  reduce = 86%, Cumulative CPU 300.13 sec2019-03-04 13:13:02,639 Stage-1 map = 100%,  reduce = 88%, Cumulative CPU 303.33 sec2019-03-04 13:13:05,728 Stage-1 map = 100%,  reduce = 91%, Cumulative CPU 306.51 sec2019-03-04 13:13:08,837 Stage-1 map = 100%,  reduce = 93%, Cumulative CPU 309.67 sec2019-03-04 13:13:11,917 Stage-1 map = 100%,  reduce = 95%, Cumulative CPU 312.88 sec2019-03-04 13:13:14,996 Stage-1 map = 100%,  reduce = 97%, Cumulative CPU 315.99 sec2019-03-04 13:13:17,052 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 318.29 secMapReduce Total cumulative CPU time: 5 minutes 18 seconds 290 msecEnded Job = job_1546585330012_16354054MapReduce Jobs Launched: Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 318.29 sec   HDFS Read: 805206007 HDFS Write: 68 SUCCESSTotal MapReduce CPU Time Spent: 5 minutes 18 seconds 290 msecOK21205531	27866943 Time taken: 218.373 seconds, Fetched: 1 row(s)

 (2) 语句二日志

Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 12019-03-04 13:12:08,331 Stage-1 map = 0%,  reduce = 0%2019-03-04 13:12:19,715 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 6.48 sec2019-03-04 13:12:42,875 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 69.34 sec2019-03-04 13:12:43,906 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 72.54 sec2019-03-04 13:13:16,847 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 162.12 sec2019-03-04 13:13:19,930 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 168.67 sec2019-03-04 13:13:23,060 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 174.94 sec2019-03-04 13:13:32,311 Stage-1 map = 97%,  reduce = 0%, Cumulative CPU 192.25 sec2019-03-04 13:13:35,397 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 195.41 sec2019-03-04 13:13:54,977 Stage-1 map = 100%,  reduce = 22%, Cumulative CPU 204.97 sec2019-03-04 13:13:58,058 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 210.04 sec2019-03-04 13:14:01,135 Stage-1 map = 100%,  reduce = 69%, Cumulative CPU 213.98 sec2019-03-04 13:14:04,236 Stage-1 map = 100%,  reduce = 71%, Cumulative CPU 217.42 sec2019-03-04 13:14:07,345 Stage-1 map = 100%,  reduce = 74%, Cumulative CPU 220.64 sec2019-03-04 13:14:10,432 Stage-1 map = 100%,  reduce = 77%, Cumulative CPU 223.88 sec2019-03-04 13:14:13,518 Stage-1 map = 100%,  reduce = 79%, Cumulative CPU 227.09 sec2019-03-04 13:14:16,602 Stage-1 map = 100%,  reduce = 81%, Cumulative CPU 230.22 sec2019-03-04 13:14:19,678 Stage-1 map = 100%,  reduce = 83%, Cumulative CPU 233.48 sec2019-03-04 13:14:22,759 Stage-1 map = 100%,  reduce = 85%, Cumulative CPU 236.69 sec2019-03-04 13:14:25,841 Stage-1 map = 100%,  reduce = 87%, Cumulative CPU 239.9 sec2019-03-04 13:14:28,920 Stage-1 map = 100%,  reduce = 89%, Cumulative CPU 242.94 sec2019-03-04 13:14:32,011 Stage-1 map = 100%,  reduce = 91%, Cumulative CPU 246.05 sec2019-03-04 13:14:35,103 Stage-1 map = 100%,  reduce = 93%, Cumulative CPU 249.11 sec2019-03-04 13:14:38,181 Stage-1 map = 100%,  reduce = 95%, Cumulative CPU 252.12 sec2019-03-04 13:14:41,261 Stage-1 map = 100%,  reduce = 97%, Cumulative CPU 255.26 sec2019-03-04 13:14:43,312 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 256.88 secMapReduce Total cumulative CPU time: 4 minutes 16 seconds 880 msecEnded Job = job_1546585330012_16354401Launching Job 2 out of 2Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers: set hive.exec.reducers.max=
In order to set a constant number of reducers: set mapreduce.job.reduces=
Starting Job = job_1546585330012_16354794, Tracking URL = http://tjtx-81-187.58os.org:9088/proxy/application_1546585330012_16354794/Kill Command = /usr/lib/software/hadoop/bin/hadoop job -kill job_1546585330012_16354794Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 12019-03-04 13:15:10,988 Stage-2 map = 0%, reduce = 0%2019-03-04 13:15:21,338 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.15 sec2019-03-04 13:15:30,634 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.56 secMapReduce Total cumulative CPU time: 4 seconds 560 msecEnded Job = job_1546585330012_16354794MapReduce Jobs Launched: Stage-Stage-1: Map: 3 Reduce: 1 Cumulative CPU: 256.88 sec HDFS Read: 805205212 HDFS Write: 123 SUCCESSStage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 4.56 sec HDFS Read: 5330 HDFS Write: 68 SUCCESSTotal MapReduce CPU Time Spent: 4 minutes 21 seconds 440 msecOK21205531 21205532Time taken: 225.271 seconds, Fetched: 1 row(s)

 三, 总结

在测试使用的集群中2000w的数据量级在效率上差别不大.

为什么慎用呢? 因为hive的distinct底层使用了HashSet去重.

即然效率上差不多,还是能不用就不用的好.

--

转载于:https://www.cnblogs.com/dobbin2018/p/10470172.html

你可能感兴趣的文章
应用提交 App Store 上架被拒绝
查看>>
Android实现异步处理 -- HTTP请求
查看>>
数据清空js清空div里的数据问题
查看>>
Fortran中的指针使用
查看>>
移动终端app测试点总结
查看>>
14-6-27&28自学内容小结
查看>>
JSP
查看>>
---
查看>>
(第一组_GNS3)自反ACl
查看>>
hdu--1258--Sum It Up(Map水过)
查看>>
Spring @DeclareParents 的扩展应用实例
查看>>
VS2012更新Update1后帮助查看器无法打开
查看>>
【Weiss】【第03章】练习3.9:大整数运算包
查看>>
Android 文件的读取和写入
查看>>
高校表白APP-冲刺第四天
查看>>
outlook 设置163邮箱
查看>>
mysql优化——show processlist命令详解
查看>>
Solr服务器搭建
查看>>
画世界怎么用光影_世界绘画经典教程:水彩光影魔法教程
查看>>
win+rsync+php,跨平台的fswatch+rsync同步备份
查看>>