How to get the name of a Spark Column as String?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I want to write a method to round a numeric column without doing something like:
df
.select(round($"x",2).as("x"))
Therefore I need to have a reusable column-expression like:
def roundKeepName(c:Column,scale:Int) = round(c,scale).as(c.name)
Unfortunately c.name does not exist, therefore the above code does not compile. I've found a solution for ColumName:
def roundKeepName(c:ColumnName,scale:Int) = round(c,scale).as(c.string.name)
But how can I do that with Column (which is generated if I use col("x") instead of $"x")
scala apache-spark
add a comment |
I want to write a method to round a numeric column without doing something like:
df
.select(round($"x",2).as("x"))
Therefore I need to have a reusable column-expression like:
def roundKeepName(c:Column,scale:Int) = round(c,scale).as(c.name)
Unfortunately c.name does not exist, therefore the above code does not compile. I've found a solution for ColumName:
def roundKeepName(c:ColumnName,scale:Int) = round(c,scale).as(c.string.name)
But how can I do that with Column (which is generated if I use col("x") instead of $"x")
scala apache-spark
c.expr.toString?
– user10465355
Nov 26 '18 at 11:08
add a comment |
I want to write a method to round a numeric column without doing something like:
df
.select(round($"x",2).as("x"))
Therefore I need to have a reusable column-expression like:
def roundKeepName(c:Column,scale:Int) = round(c,scale).as(c.name)
Unfortunately c.name does not exist, therefore the above code does not compile. I've found a solution for ColumName:
def roundKeepName(c:ColumnName,scale:Int) = round(c,scale).as(c.string.name)
But how can I do that with Column (which is generated if I use col("x") instead of $"x")
scala apache-spark
I want to write a method to round a numeric column without doing something like:
df
.select(round($"x",2).as("x"))
Therefore I need to have a reusable column-expression like:
def roundKeepName(c:Column,scale:Int) = round(c,scale).as(c.name)
Unfortunately c.name does not exist, therefore the above code does not compile. I've found a solution for ColumName:
def roundKeepName(c:ColumnName,scale:Int) = round(c,scale).as(c.string.name)
But how can I do that with Column (which is generated if I use col("x") instead of $"x")
scala apache-spark
scala apache-spark
edited Nov 26 '18 at 16:01
Bugs
4,15992637
4,15992637
asked Nov 26 '18 at 11:00
Raphael RothRaphael Roth
12.8k54279
12.8k54279
c.expr.toString?
– user10465355
Nov 26 '18 at 11:08
add a comment |
c.expr.toString?
– user10465355
Nov 26 '18 at 11:08
c.expr.toString?– user10465355
Nov 26 '18 at 11:08
c.expr.toString?– user10465355
Nov 26 '18 at 11:08
add a comment |
2 Answers
2
active
oldest
votes
Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):
def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)
In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.
import org.apache.spark.sql.catalyst.expressions.NamedExpression
def roundKeepname(c:Column,scale:Int) =
c.expr.asInstanceOf[NamedExpression].name
And it works:
scala> spark.range(2).select(roundKeepname('id, 2)).show
+---+
| id|
+---+
| 0|
| 1|
+---+
that works, altough I don't like to rely on.toString
– Raphael Roth
Nov 26 '18 at 15:41
Could you tell us why you need to define your function from the column object? I usually do things like that from the column name withround(col(name), 2).as(name). Would it work for you?
– Oli
Nov 26 '18 at 18:02
I edited my answer with another more robust version
– Oli
Nov 26 '18 at 18:28
thats a good idea, you meandef roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?
– Raphael Roth
Nov 27 '18 at 8:24
Yes exactly. This is something I use quite often.
– Oli
Nov 27 '18 at 9:24
add a comment |
Update:
With the solution way given by BlueSheepToken, here is how you can do it dynamically assuming you have all "double" columns.
scala> val df = Seq((1.22,4.34,8.93),(3.44,12.66,17.44),(5.66,9.35,6.54)).toDF("x","y","z")
df: org.apache.spark.sql.DataFrame = [x: double, y: double ... 1 more field]
scala> df.show
+----+-----+-----+
| x| y| z|
+----+-----+-----+
|1.22| 4.34| 8.93|
|3.44|12.66|17.44|
|5.66| 9.35| 6.54|
+----+-----+-----+
scala> df.columns.foldLeft(df)( (acc,p) => (acc.withColumn(p+"_t",round(col(p),1)).drop(p).withColumnRenamed(p+"_t",p))).show
+---+----+----+
| x| y| z|
+---+----+----+
|1.2| 4.3| 8.9|
|3.4|12.7|17.4|
|5.7| 9.4| 6.5|
+---+----+----+
scala>
Is it me or you forgot to drop the x column before renaming? :)
– BlueSheepToken
Nov 26 '18 at 12:57
if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.
– stack0114106
Nov 26 '18 at 13:00
Ok my bad then, I thoughtdf.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").showwas working
– BlueSheepToken
Nov 26 '18 at 13:23
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53479686%2fhow-to-get-the-name-of-a-spark-column-as-string%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):
def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)
In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.
import org.apache.spark.sql.catalyst.expressions.NamedExpression
def roundKeepname(c:Column,scale:Int) =
c.expr.asInstanceOf[NamedExpression].name
And it works:
scala> spark.range(2).select(roundKeepname('id, 2)).show
+---+
| id|
+---+
| 0|
| 1|
+---+
that works, altough I don't like to rely on.toString
– Raphael Roth
Nov 26 '18 at 15:41
Could you tell us why you need to define your function from the column object? I usually do things like that from the column name withround(col(name), 2).as(name). Would it work for you?
– Oli
Nov 26 '18 at 18:02
I edited my answer with another more robust version
– Oli
Nov 26 '18 at 18:28
thats a good idea, you meandef roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?
– Raphael Roth
Nov 27 '18 at 8:24
Yes exactly. This is something I use quite often.
– Oli
Nov 27 '18 at 9:24
add a comment |
Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):
def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)
In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.
import org.apache.spark.sql.catalyst.expressions.NamedExpression
def roundKeepname(c:Column,scale:Int) =
c.expr.asInstanceOf[NamedExpression].name
And it works:
scala> spark.range(2).select(roundKeepname('id, 2)).show
+---+
| id|
+---+
| 0|
| 1|
+---+
that works, altough I don't like to rely on.toString
– Raphael Roth
Nov 26 '18 at 15:41
Could you tell us why you need to define your function from the column object? I usually do things like that from the column name withround(col(name), 2).as(name). Would it work for you?
– Oli
Nov 26 '18 at 18:02
I edited my answer with another more robust version
– Oli
Nov 26 '18 at 18:28
thats a good idea, you meandef roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?
– Raphael Roth
Nov 27 '18 at 8:24
Yes exactly. This is something I use quite often.
– Oli
Nov 27 '18 at 9:24
add a comment |
Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):
def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)
In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.
import org.apache.spark.sql.catalyst.expressions.NamedExpression
def roundKeepname(c:Column,scale:Int) =
c.expr.asInstanceOf[NamedExpression].name
And it works:
scala> spark.range(2).select(roundKeepname('id, 2)).show
+---+
| id|
+---+
| 0|
| 1|
+---+
Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):
def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)
In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.
import org.apache.spark.sql.catalyst.expressions.NamedExpression
def roundKeepname(c:Column,scale:Int) =
c.expr.asInstanceOf[NamedExpression].name
And it works:
scala> spark.range(2).select(roundKeepname('id, 2)).show
+---+
| id|
+---+
| 0|
| 1|
+---+
edited Nov 26 '18 at 18:28
answered Nov 26 '18 at 14:34
OliOli
2,3141419
2,3141419
that works, altough I don't like to rely on.toString
– Raphael Roth
Nov 26 '18 at 15:41
Could you tell us why you need to define your function from the column object? I usually do things like that from the column name withround(col(name), 2).as(name). Would it work for you?
– Oli
Nov 26 '18 at 18:02
I edited my answer with another more robust version
– Oli
Nov 26 '18 at 18:28
thats a good idea, you meandef roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?
– Raphael Roth
Nov 27 '18 at 8:24
Yes exactly. This is something I use quite often.
– Oli
Nov 27 '18 at 9:24
add a comment |
that works, altough I don't like to rely on.toString
– Raphael Roth
Nov 26 '18 at 15:41
Could you tell us why you need to define your function from the column object? I usually do things like that from the column name withround(col(name), 2).as(name). Would it work for you?
– Oli
Nov 26 '18 at 18:02
I edited my answer with another more robust version
– Oli
Nov 26 '18 at 18:28
thats a good idea, you meandef roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?
– Raphael Roth
Nov 27 '18 at 8:24
Yes exactly. This is something I use quite often.
– Oli
Nov 27 '18 at 9:24
that works, altough I don't like to rely on
.toString– Raphael Roth
Nov 26 '18 at 15:41
that works, altough I don't like to rely on
.toString– Raphael Roth
Nov 26 '18 at 15:41
Could you tell us why you need to define your function from the column object? I usually do things like that from the column name with
round(col(name), 2).as(name). Would it work for you?– Oli
Nov 26 '18 at 18:02
Could you tell us why you need to define your function from the column object? I usually do things like that from the column name with
round(col(name), 2).as(name). Would it work for you?– Oli
Nov 26 '18 at 18:02
I edited my answer with another more robust version
– Oli
Nov 26 '18 at 18:28
I edited my answer with another more robust version
– Oli
Nov 26 '18 at 18:28
thats a good idea, you mean
def roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?– Raphael Roth
Nov 27 '18 at 8:24
thats a good idea, you mean
def roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?– Raphael Roth
Nov 27 '18 at 8:24
Yes exactly. This is something I use quite often.
– Oli
Nov 27 '18 at 9:24
Yes exactly. This is something I use quite often.
– Oli
Nov 27 '18 at 9:24
add a comment |
Update:
With the solution way given by BlueSheepToken, here is how you can do it dynamically assuming you have all "double" columns.
scala> val df = Seq((1.22,4.34,8.93),(3.44,12.66,17.44),(5.66,9.35,6.54)).toDF("x","y","z")
df: org.apache.spark.sql.DataFrame = [x: double, y: double ... 1 more field]
scala> df.show
+----+-----+-----+
| x| y| z|
+----+-----+-----+
|1.22| 4.34| 8.93|
|3.44|12.66|17.44|
|5.66| 9.35| 6.54|
+----+-----+-----+
scala> df.columns.foldLeft(df)( (acc,p) => (acc.withColumn(p+"_t",round(col(p),1)).drop(p).withColumnRenamed(p+"_t",p))).show
+---+----+----+
| x| y| z|
+---+----+----+
|1.2| 4.3| 8.9|
|3.4|12.7|17.4|
|5.7| 9.4| 6.5|
+---+----+----+
scala>
Is it me or you forgot to drop the x column before renaming? :)
– BlueSheepToken
Nov 26 '18 at 12:57
if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.
– stack0114106
Nov 26 '18 at 13:00
Ok my bad then, I thoughtdf.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").showwas working
– BlueSheepToken
Nov 26 '18 at 13:23
add a comment |
Update:
With the solution way given by BlueSheepToken, here is how you can do it dynamically assuming you have all "double" columns.
scala> val df = Seq((1.22,4.34,8.93),(3.44,12.66,17.44),(5.66,9.35,6.54)).toDF("x","y","z")
df: org.apache.spark.sql.DataFrame = [x: double, y: double ... 1 more field]
scala> df.show
+----+-----+-----+
| x| y| z|
+----+-----+-----+
|1.22| 4.34| 8.93|
|3.44|12.66|17.44|
|5.66| 9.35| 6.54|
+----+-----+-----+
scala> df.columns.foldLeft(df)( (acc,p) => (acc.withColumn(p+"_t",round(col(p),1)).drop(p).withColumnRenamed(p+"_t",p))).show
+---+----+----+
| x| y| z|
+---+----+----+
|1.2| 4.3| 8.9|
|3.4|12.7|17.4|
|5.7| 9.4| 6.5|
+---+----+----+
scala>
Is it me or you forgot to drop the x column before renaming? :)
– BlueSheepToken
Nov 26 '18 at 12:57
if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.
– stack0114106
Nov 26 '18 at 13:00
Ok my bad then, I thoughtdf.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").showwas working
– BlueSheepToken
Nov 26 '18 at 13:23
add a comment |
Update:
With the solution way given by BlueSheepToken, here is how you can do it dynamically assuming you have all "double" columns.
scala> val df = Seq((1.22,4.34,8.93),(3.44,12.66,17.44),(5.66,9.35,6.54)).toDF("x","y","z")
df: org.apache.spark.sql.DataFrame = [x: double, y: double ... 1 more field]
scala> df.show
+----+-----+-----+
| x| y| z|
+----+-----+-----+
|1.22| 4.34| 8.93|
|3.44|12.66|17.44|
|5.66| 9.35| 6.54|
+----+-----+-----+
scala> df.columns.foldLeft(df)( (acc,p) => (acc.withColumn(p+"_t",round(col(p),1)).drop(p).withColumnRenamed(p+"_t",p))).show
+---+----+----+
| x| y| z|
+---+----+----+
|1.2| 4.3| 8.9|
|3.4|12.7|17.4|
|5.7| 9.4| 6.5|
+---+----+----+
scala>
Update:
With the solution way given by BlueSheepToken, here is how you can do it dynamically assuming you have all "double" columns.
scala> val df = Seq((1.22,4.34,8.93),(3.44,12.66,17.44),(5.66,9.35,6.54)).toDF("x","y","z")
df: org.apache.spark.sql.DataFrame = [x: double, y: double ... 1 more field]
scala> df.show
+----+-----+-----+
| x| y| z|
+----+-----+-----+
|1.22| 4.34| 8.93|
|3.44|12.66|17.44|
|5.66| 9.35| 6.54|
+----+-----+-----+
scala> df.columns.foldLeft(df)( (acc,p) => (acc.withColumn(p+"_t",round(col(p),1)).drop(p).withColumnRenamed(p+"_t",p))).show
+---+----+----+
| x| y| z|
+---+----+----+
|1.2| 4.3| 8.9|
|3.4|12.7|17.4|
|5.7| 9.4| 6.5|
+---+----+----+
scala>
edited Nov 26 '18 at 14:29
answered Nov 26 '18 at 12:49
stack0114106stack0114106
4,9832423
4,9832423
Is it me or you forgot to drop the x column before renaming? :)
– BlueSheepToken
Nov 26 '18 at 12:57
if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.
– stack0114106
Nov 26 '18 at 13:00
Ok my bad then, I thoughtdf.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").showwas working
– BlueSheepToken
Nov 26 '18 at 13:23
add a comment |
Is it me or you forgot to drop the x column before renaming? :)
– BlueSheepToken
Nov 26 '18 at 12:57
if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.
– stack0114106
Nov 26 '18 at 13:00
Ok my bad then, I thoughtdf.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").showwas working
– BlueSheepToken
Nov 26 '18 at 13:23
Is it me or you forgot to drop the x column before renaming? :)
– BlueSheepToken
Nov 26 '18 at 12:57
Is it me or you forgot to drop the x column before renaming? :)
– BlueSheepToken
Nov 26 '18 at 12:57
if you drop x first, then you will get exception - cannot resolve '
x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.– stack0114106
Nov 26 '18 at 13:00
if you drop x first, then you will get exception - cannot resolve '
x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.– stack0114106
Nov 26 '18 at 13:00
Ok my bad then, I thought
df.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").show was working– BlueSheepToken
Nov 26 '18 at 13:23
Ok my bad then, I thought
df.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").show was working– BlueSheepToken
Nov 26 '18 at 13:23
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53479686%2fhow-to-get-the-name-of-a-spark-column-as-string%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
c.expr.toString?– user10465355
Nov 26 '18 at 11:08