自定义聚合函数(统计每种行为的触发次数排名前三的商品id)
2022/9/5 23:54:09
本文主要是介绍自定义聚合函数(统计每种行为的触发次数排名前三的商品id),对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
package SparkSQL.fun.project import org.apache.spark.SparkConf import org.apache.spark.sql.expressions.{MutableAggregationBuffer, UserDefinedAggregateFunction} import org.apache.spark.sql.types.{DataType, DataTypes, StructField, StructType} import org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession} /** * 统计每种行为的触发次数排名前三的商品id */ object BehaviorCode2 { def main(args: Array[String]): Unit = { val sparkConf = new SparkConf().setAppName("project01").setMaster("local[*]") val session = SparkSession.builder().config(sparkConf).getOrCreate() val map = Map("mode"->"dropMalformed","inferSchema"->"true") val frame = session.read.options(map).csv("G:\\shixunworkspace\\sparkcode\\src\\main\\java\\SparkSQL\\fun\\project\\b.csv") // "userId", "goodsId", "categoryId", "behavior", "time" import session.implicits._ val frame1: Dataset[UserBehaviorBean] = frame.map(row => { UserBehaviorBean(row.getInt(0), row.getInt(1), row.getInt(2), row.getString(3), row.getInt(4)) }) val frame3 = frame1.toDF("userId", "goodsId", "categoryId", "behavior", "time") frame3.createTempView("tmp") val frame2 = session.sql("select behavior, goodsId, count(*) count from tmp group by behavior, goodsId") frame2.show() frame2.createTempView("tmp1") val frame4 = session.sql("select behavior, goodsId, count, row_number() over(partition by behavior, goodsId order by count) rn from tmp1") frame4.show() frame4.createTempView("temp2") val frame5 = session.sql("select behavior, goodsId, count, rn from temp2 where rn <= 3") frame5.show() session.stop() } }
这篇关于自定义聚合函数(统计每种行为的触发次数排名前三的商品id)的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-05-09“2024鸿蒙零基础快速实战-仿抖音App开发(ArkTS版)”实战课程已上线
- 2024-05-09聊聊如何通过arthas-tunnel-server来远程管理所有需要arthas监控的应用
- 2024-05-09log4j2这么配就对了
- 2024-05-09nginx修改Content-Type
- 2024-05-09Redis多数据源,看这篇就够了
- 2024-05-09Google Chrome驱动程序 124.0.6367.62(正式版本)去哪下载?
- 2024-05-09有没有大佬知道这种数据应该怎么抓取呀?
- 2024-05-09这种运行结果里的10.100000001,怎么能最快改成10.1?
- 2024-05-09企业src漏洞挖掘-有意思的命令执行
- 2024-05-08阿里云域名注册流程,分享给第一次购买域名的新手站长!