Spark SQL query Group By value followed by list
Have table(Data) like below:
color status freq
red y 1
blue y 1
green y 2
expected output : red,blue 1 green 2
select color , freq from data where status = 'y' group by(freq)
Now we want to get result as 'red,blue' for freq= 1 and green for freq =2
how to get list of colours group by freq,correct the above sql query.
when using first(colour) , it only returning first colour but expecting all the colours group by freq.
correct the SQL query as per output
apache-spark apache-spark-sql
add a comment |
Have table(Data) like below:
color status freq
red y 1
blue y 1
green y 2
expected output : red,blue 1 green 2
select color , freq from data where status = 'y' group by(freq)
Now we want to get result as 'red,blue' for freq= 1 and green for freq =2
how to get list of colours group by freq,correct the above sql query.
when using first(colour) , it only returning first colour but expecting all the colours group by freq.
correct the SQL query as per output
apache-spark apache-spark-sql
Your data hasfreqcolumn, but you selectfrequency.
– user10465355
Nov 23 '18 at 11:11
sorry for typo error. select color,freq from data where status ='y' group by (freq). please correct this query as per output
– swcraft
Nov 23 '18 at 11:20
Could you please edit your question, include actual query you use, full traceback and format the post appropriately? Thanks.
– user10465355
Nov 23 '18 at 11:22
Can there be another row green y 3 as well implying overall green y 5?
– thebluephantom
Nov 23 '18 at 20:44
pls accept answer as per protocol if of benefit to you
– thebluephantom
Nov 26 '18 at 7:23
add a comment |
Have table(Data) like below:
color status freq
red y 1
blue y 1
green y 2
expected output : red,blue 1 green 2
select color , freq from data where status = 'y' group by(freq)
Now we want to get result as 'red,blue' for freq= 1 and green for freq =2
how to get list of colours group by freq,correct the above sql query.
when using first(colour) , it only returning first colour but expecting all the colours group by freq.
correct the SQL query as per output
apache-spark apache-spark-sql
Have table(Data) like below:
color status freq
red y 1
blue y 1
green y 2
expected output : red,blue 1 green 2
select color , freq from data where status = 'y' group by(freq)
Now we want to get result as 'red,blue' for freq= 1 and green for freq =2
how to get list of colours group by freq,correct the above sql query.
when using first(colour) , it only returning first colour but expecting all the colours group by freq.
correct the SQL query as per output
apache-spark apache-spark-sql
apache-spark apache-spark-sql
edited Nov 23 '18 at 21:16
thebluephantom
2,9233927
2,9233927
asked Nov 23 '18 at 11:01
swcraftswcraft
46
46
Your data hasfreqcolumn, but you selectfrequency.
– user10465355
Nov 23 '18 at 11:11
sorry for typo error. select color,freq from data where status ='y' group by (freq). please correct this query as per output
– swcraft
Nov 23 '18 at 11:20
Could you please edit your question, include actual query you use, full traceback and format the post appropriately? Thanks.
– user10465355
Nov 23 '18 at 11:22
Can there be another row green y 3 as well implying overall green y 5?
– thebluephantom
Nov 23 '18 at 20:44
pls accept answer as per protocol if of benefit to you
– thebluephantom
Nov 26 '18 at 7:23
add a comment |
Your data hasfreqcolumn, but you selectfrequency.
– user10465355
Nov 23 '18 at 11:11
sorry for typo error. select color,freq from data where status ='y' group by (freq). please correct this query as per output
– swcraft
Nov 23 '18 at 11:20
Could you please edit your question, include actual query you use, full traceback and format the post appropriately? Thanks.
– user10465355
Nov 23 '18 at 11:22
Can there be another row green y 3 as well implying overall green y 5?
– thebluephantom
Nov 23 '18 at 20:44
pls accept answer as per protocol if of benefit to you
– thebluephantom
Nov 26 '18 at 7:23
Your data has
freq column, but you select frequency.– user10465355
Nov 23 '18 at 11:11
Your data has
freq column, but you select frequency.– user10465355
Nov 23 '18 at 11:11
sorry for typo error. select color,freq from data where status ='y' group by (freq). please correct this query as per output
– swcraft
Nov 23 '18 at 11:20
sorry for typo error. select color,freq from data where status ='y' group by (freq). please correct this query as per output
– swcraft
Nov 23 '18 at 11:20
Could you please edit your question, include actual query you use, full traceback and format the post appropriately? Thanks.
– user10465355
Nov 23 '18 at 11:22
Could you please edit your question, include actual query you use, full traceback and format the post appropriately? Thanks.
– user10465355
Nov 23 '18 at 11:22
Can there be another row green y 3 as well implying overall green y 5?
– thebluephantom
Nov 23 '18 at 20:44
Can there be another row green y 3 as well implying overall green y 5?
– thebluephantom
Nov 23 '18 at 20:44
pls accept answer as per protocol if of benefit to you
– thebluephantom
Nov 26 '18 at 7:23
pls accept answer as per protocol if of benefit to you
– thebluephantom
Nov 26 '18 at 7:23
add a comment |
1 Answer
1
active
oldest
votes
Try this:
import org.apache.spark.sql.functions._
import spark.implicits._
//import org.apache.spark.sql._
//import org.apache.spark.sql.types._
val df = Seq(
("green","y", 4),
("blue","n", 7),
("red","y", 7),
("yellow","y", 7),
("cyan","y", 7)
).toDF("colour", "status", "freq")
val df2 = df.where("status = 'y'")
.select($"freq", $"colour")
.groupBy("freq")
.agg(collect_list($"colour"))
df2.show(false)
returns:
+----+--------------------+
|freq|collect_list(colour)|
+----+--------------------+
|4 |[green] |
|7 |[red, yellow, cyan] |
+----+--------------------+
how to pass this dataframe column(colour) value to spark sql query? Eg : val out = df3.filter($"colour" === colour column in df2) here df3 is another dataframe
– swcraft
Nov 26 '18 at 9:53
Not sure I follow, you may need ask new question and probide example like you did here.
– thebluephantom
Nov 26 '18 at 10:00
ok,will ask new question
– swcraft
Nov 26 '18 at 10:13
otherwise unclear what the question exactly is
– thebluephantom
Nov 26 '18 at 10:15
In the above example , df2 contains colour(list). now,we have to select status from DF (i.e) val status = df.filter($"colour"=== colour coloumn in df2).select($"status"). here how to pass colour column of df2 dynamically
– swcraft
Nov 26 '18 at 10:39
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53445451%2fspark-sql-query-group-by-value-followed-by-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try this:
import org.apache.spark.sql.functions._
import spark.implicits._
//import org.apache.spark.sql._
//import org.apache.spark.sql.types._
val df = Seq(
("green","y", 4),
("blue","n", 7),
("red","y", 7),
("yellow","y", 7),
("cyan","y", 7)
).toDF("colour", "status", "freq")
val df2 = df.where("status = 'y'")
.select($"freq", $"colour")
.groupBy("freq")
.agg(collect_list($"colour"))
df2.show(false)
returns:
+----+--------------------+
|freq|collect_list(colour)|
+----+--------------------+
|4 |[green] |
|7 |[red, yellow, cyan] |
+----+--------------------+
how to pass this dataframe column(colour) value to spark sql query? Eg : val out = df3.filter($"colour" === colour column in df2) here df3 is another dataframe
– swcraft
Nov 26 '18 at 9:53
Not sure I follow, you may need ask new question and probide example like you did here.
– thebluephantom
Nov 26 '18 at 10:00
ok,will ask new question
– swcraft
Nov 26 '18 at 10:13
otherwise unclear what the question exactly is
– thebluephantom
Nov 26 '18 at 10:15
In the above example , df2 contains colour(list). now,we have to select status from DF (i.e) val status = df.filter($"colour"=== colour coloumn in df2).select($"status"). here how to pass colour column of df2 dynamically
– swcraft
Nov 26 '18 at 10:39
|
show 1 more comment
Try this:
import org.apache.spark.sql.functions._
import spark.implicits._
//import org.apache.spark.sql._
//import org.apache.spark.sql.types._
val df = Seq(
("green","y", 4),
("blue","n", 7),
("red","y", 7),
("yellow","y", 7),
("cyan","y", 7)
).toDF("colour", "status", "freq")
val df2 = df.where("status = 'y'")
.select($"freq", $"colour")
.groupBy("freq")
.agg(collect_list($"colour"))
df2.show(false)
returns:
+----+--------------------+
|freq|collect_list(colour)|
+----+--------------------+
|4 |[green] |
|7 |[red, yellow, cyan] |
+----+--------------------+
how to pass this dataframe column(colour) value to spark sql query? Eg : val out = df3.filter($"colour" === colour column in df2) here df3 is another dataframe
– swcraft
Nov 26 '18 at 9:53
Not sure I follow, you may need ask new question and probide example like you did here.
– thebluephantom
Nov 26 '18 at 10:00
ok,will ask new question
– swcraft
Nov 26 '18 at 10:13
otherwise unclear what the question exactly is
– thebluephantom
Nov 26 '18 at 10:15
In the above example , df2 contains colour(list). now,we have to select status from DF (i.e) val status = df.filter($"colour"=== colour coloumn in df2).select($"status"). here how to pass colour column of df2 dynamically
– swcraft
Nov 26 '18 at 10:39
|
show 1 more comment
Try this:
import org.apache.spark.sql.functions._
import spark.implicits._
//import org.apache.spark.sql._
//import org.apache.spark.sql.types._
val df = Seq(
("green","y", 4),
("blue","n", 7),
("red","y", 7),
("yellow","y", 7),
("cyan","y", 7)
).toDF("colour", "status", "freq")
val df2 = df.where("status = 'y'")
.select($"freq", $"colour")
.groupBy("freq")
.agg(collect_list($"colour"))
df2.show(false)
returns:
+----+--------------------+
|freq|collect_list(colour)|
+----+--------------------+
|4 |[green] |
|7 |[red, yellow, cyan] |
+----+--------------------+
Try this:
import org.apache.spark.sql.functions._
import spark.implicits._
//import org.apache.spark.sql._
//import org.apache.spark.sql.types._
val df = Seq(
("green","y", 4),
("blue","n", 7),
("red","y", 7),
("yellow","y", 7),
("cyan","y", 7)
).toDF("colour", "status", "freq")
val df2 = df.where("status = 'y'")
.select($"freq", $"colour")
.groupBy("freq")
.agg(collect_list($"colour"))
df2.show(false)
returns:
+----+--------------------+
|freq|collect_list(colour)|
+----+--------------------+
|4 |[green] |
|7 |[red, yellow, cyan] |
+----+--------------------+
answered Nov 23 '18 at 21:02
thebluephantomthebluephantom
2,9233927
2,9233927
how to pass this dataframe column(colour) value to spark sql query? Eg : val out = df3.filter($"colour" === colour column in df2) here df3 is another dataframe
– swcraft
Nov 26 '18 at 9:53
Not sure I follow, you may need ask new question and probide example like you did here.
– thebluephantom
Nov 26 '18 at 10:00
ok,will ask new question
– swcraft
Nov 26 '18 at 10:13
otherwise unclear what the question exactly is
– thebluephantom
Nov 26 '18 at 10:15
In the above example , df2 contains colour(list). now,we have to select status from DF (i.e) val status = df.filter($"colour"=== colour coloumn in df2).select($"status"). here how to pass colour column of df2 dynamically
– swcraft
Nov 26 '18 at 10:39
|
show 1 more comment
how to pass this dataframe column(colour) value to spark sql query? Eg : val out = df3.filter($"colour" === colour column in df2) here df3 is another dataframe
– swcraft
Nov 26 '18 at 9:53
Not sure I follow, you may need ask new question and probide example like you did here.
– thebluephantom
Nov 26 '18 at 10:00
ok,will ask new question
– swcraft
Nov 26 '18 at 10:13
otherwise unclear what the question exactly is
– thebluephantom
Nov 26 '18 at 10:15
In the above example , df2 contains colour(list). now,we have to select status from DF (i.e) val status = df.filter($"colour"=== colour coloumn in df2).select($"status"). here how to pass colour column of df2 dynamically
– swcraft
Nov 26 '18 at 10:39
how to pass this dataframe column(colour) value to spark sql query? Eg : val out = df3.filter($"colour" === colour column in df2) here df3 is another dataframe
– swcraft
Nov 26 '18 at 9:53
how to pass this dataframe column(colour) value to spark sql query? Eg : val out = df3.filter($"colour" === colour column in df2) here df3 is another dataframe
– swcraft
Nov 26 '18 at 9:53
Not sure I follow, you may need ask new question and probide example like you did here.
– thebluephantom
Nov 26 '18 at 10:00
Not sure I follow, you may need ask new question and probide example like you did here.
– thebluephantom
Nov 26 '18 at 10:00
ok,will ask new question
– swcraft
Nov 26 '18 at 10:13
ok,will ask new question
– swcraft
Nov 26 '18 at 10:13
otherwise unclear what the question exactly is
– thebluephantom
Nov 26 '18 at 10:15
otherwise unclear what the question exactly is
– thebluephantom
Nov 26 '18 at 10:15
In the above example , df2 contains colour(list). now,we have to select status from DF (i.e) val status = df.filter($"colour"=== colour coloumn in df2).select($"status"). here how to pass colour column of df2 dynamically
– swcraft
Nov 26 '18 at 10:39
In the above example , df2 contains colour(list). now,we have to select status from DF (i.e) val status = df.filter($"colour"=== colour coloumn in df2).select($"status"). here how to pass colour column of df2 dynamically
– swcraft
Nov 26 '18 at 10:39
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53445451%2fspark-sql-query-group-by-value-followed-by-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Your data has
freqcolumn, but you selectfrequency.– user10465355
Nov 23 '18 at 11:11
sorry for typo error. select color,freq from data where status ='y' group by (freq). please correct this query as per output
– swcraft
Nov 23 '18 at 11:20
Could you please edit your question, include actual query you use, full traceback and format the post appropriately? Thanks.
– user10465355
Nov 23 '18 at 11:22
Can there be another row green y 3 as well implying overall green y 5?
– thebluephantom
Nov 23 '18 at 20:44
pls accept answer as per protocol if of benefit to you
– thebluephantom
Nov 26 '18 at 7:23