Spark Java WordCount:长尾,揭示大数据中的隐秘问题。

2026-04-10 13:382阅读0评论SEO基础
  • 内容介绍
  • 文章标签
  • 相关推荐

本文共计194个文字,预计阅读时间需要1分钟。

Spark Java WordCount:长尾,揭示大数据中的隐秘问题。

javaimport java.util.Arrays;import org.apache.spark.api.java.JavaPairRDD;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.sql.SparkSession;import scala.Tuple2;

Spark Java WordCount:长尾,揭示大数据中的隐秘问题。

public class WordCount3 { public static void main(String[] args) { SparkSession spark=SparkSession.builder().appName(Word Count).getOrCreate(); JavaRDD textFile=spark.sparkContext().textFile(path_to_text_file); JavaRDD words=textFile.flatMap(line -> Arrays.asList(line.split( )).iterator()); JavaPairRDD pairs=words.mapToPair(word -> new Tuple2(word, 1)); JavaPairRDD wordCounts=pairs.reduceByKey((a, b) -> a + b); wordCounts.collect().forEach(System.out::println); }}

WordCount

import java.util.Arrays; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.sql.SparkSession; import scala.Tuple2; public class WordCount3 { public static void main(String[] args) { SparkSession spark = SparkSession.builder().master("local").appName("WordCount3").getOrCreate(); JavaRDD input = spark.read().textFile("words.txt").javaRDD(); JavaRDD words = input.flatMap(line -> Arrays.asList(line.split(",")).iterator()); JavaPairRDD pair = words.mapToPair(word -> new Tuple2 (word, 1)); JavaPairRDD output = pair.reduceByKey((v1, v2) -> v1+v2); output.foreach(res -> System.out.println(res)); } }

本文共计194个文字,预计阅读时间需要1分钟。

Spark Java WordCount:长尾,揭示大数据中的隐秘问题。

javaimport java.util.Arrays;import org.apache.spark.api.java.JavaPairRDD;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.sql.SparkSession;import scala.Tuple2;

Spark Java WordCount:长尾,揭示大数据中的隐秘问题。

public class WordCount3 { public static void main(String[] args) { SparkSession spark=SparkSession.builder().appName(Word Count).getOrCreate(); JavaRDD textFile=spark.sparkContext().textFile(path_to_text_file); JavaRDD words=textFile.flatMap(line -> Arrays.asList(line.split( )).iterator()); JavaPairRDD pairs=words.mapToPair(word -> new Tuple2(word, 1)); JavaPairRDD wordCounts=pairs.reduceByKey((a, b) -> a + b); wordCounts.collect().forEach(System.out::println); }}

WordCount

import java.util.Arrays; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.sql.SparkSession; import scala.Tuple2; public class WordCount3 { public static void main(String[] args) { SparkSession spark = SparkSession.builder().master("local").appName("WordCount3").getOrCreate(); JavaRDD input = spark.read().textFile("words.txt").javaRDD(); JavaRDD words = input.flatMap(line -> Arrays.asList(line.split(",")).iterator()); JavaPairRDD pair = words.mapToPair(word -> new Tuple2 (word, 1)); JavaPairRDD output = pair.reduceByKey((v1, v2) -> v1+v2); output.foreach(res -> System.out.println(res)); } }