How to get the name of a Spark Column as String?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















I want to write a method to round a numeric column without doing something like:



df
.select(round($"x",2).as("x"))


Therefore I need to have a reusable column-expression like:



def roundKeepName(c:Column,scale:Int) = round(c,scale).as(c.name)


Unfortunately c.name does not exist, therefore the above code does not compile. I've found a solution for ColumName:



 def roundKeepName(c:ColumnName,scale:Int) = round(c,scale).as(c.string.name)


But how can I do that with Column (which is generated if I use col("x") instead of $"x")










share|improve this question

























  • c.expr.toString?

    – user10465355
    Nov 26 '18 at 11:08


















0















I want to write a method to round a numeric column without doing something like:



df
.select(round($"x",2).as("x"))


Therefore I need to have a reusable column-expression like:



def roundKeepName(c:Column,scale:Int) = round(c,scale).as(c.name)


Unfortunately c.name does not exist, therefore the above code does not compile. I've found a solution for ColumName:



 def roundKeepName(c:ColumnName,scale:Int) = round(c,scale).as(c.string.name)


But how can I do that with Column (which is generated if I use col("x") instead of $"x")










share|improve this question

























  • c.expr.toString?

    – user10465355
    Nov 26 '18 at 11:08














0












0








0


0






I want to write a method to round a numeric column without doing something like:



df
.select(round($"x",2).as("x"))


Therefore I need to have a reusable column-expression like:



def roundKeepName(c:Column,scale:Int) = round(c,scale).as(c.name)


Unfortunately c.name does not exist, therefore the above code does not compile. I've found a solution for ColumName:



 def roundKeepName(c:ColumnName,scale:Int) = round(c,scale).as(c.string.name)


But how can I do that with Column (which is generated if I use col("x") instead of $"x")










share|improve this question
















I want to write a method to round a numeric column without doing something like:



df
.select(round($"x",2).as("x"))


Therefore I need to have a reusable column-expression like:



def roundKeepName(c:Column,scale:Int) = round(c,scale).as(c.name)


Unfortunately c.name does not exist, therefore the above code does not compile. I've found a solution for ColumName:



 def roundKeepName(c:ColumnName,scale:Int) = round(c,scale).as(c.string.name)


But how can I do that with Column (which is generated if I use col("x") instead of $"x")







scala apache-spark






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 26 '18 at 16:01









Bugs

4,15992637




4,15992637










asked Nov 26 '18 at 11:00









Raphael RothRaphael Roth

12.8k54279




12.8k54279













  • c.expr.toString?

    – user10465355
    Nov 26 '18 at 11:08



















  • c.expr.toString?

    – user10465355
    Nov 26 '18 at 11:08

















c.expr.toString?

– user10465355
Nov 26 '18 at 11:08





c.expr.toString?

– user10465355
Nov 26 '18 at 11:08












2 Answers
2






active

oldest

votes


















1














Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):



def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)


In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.



import org.apache.spark.sql.catalyst.expressions.NamedExpression
def roundKeepname(c:Column,scale:Int) =
c.expr.asInstanceOf[NamedExpression].name


And it works:



scala> spark.range(2).select(roundKeepname('id, 2)).show
+---+
| id|
+---+
| 0|
| 1|
+---+





share|improve this answer


























  • that works, altough I don't like to rely on .toString

    – Raphael Roth
    Nov 26 '18 at 15:41











  • Could you tell us why you need to define your function from the column object? I usually do things like that from the column name with round(col(name), 2).as(name). Would it work for you?

    – Oli
    Nov 26 '18 at 18:02











  • I edited my answer with another more robust version

    – Oli
    Nov 26 '18 at 18:28











  • thats a good idea, you mean def roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?

    – Raphael Roth
    Nov 27 '18 at 8:24













  • Yes exactly. This is something I use quite often.

    – Oli
    Nov 27 '18 at 9:24



















-1














Update:



With the solution way given by BlueSheepToken, here is how you can do it dynamically assuming you have all "double" columns.



scala> val df = Seq((1.22,4.34,8.93),(3.44,12.66,17.44),(5.66,9.35,6.54)).toDF("x","y","z")
df: org.apache.spark.sql.DataFrame = [x: double, y: double ... 1 more field]

scala> df.show
+----+-----+-----+
| x| y| z|
+----+-----+-----+
|1.22| 4.34| 8.93|
|3.44|12.66|17.44|
|5.66| 9.35| 6.54|
+----+-----+-----+


scala> df.columns.foldLeft(df)( (acc,p) => (acc.withColumn(p+"_t",round(col(p),1)).drop(p).withColumnRenamed(p+"_t",p))).show
+---+----+----+
| x| y| z|
+---+----+----+
|1.2| 4.3| 8.9|
|3.4|12.7|17.4|
|5.7| 9.4| 6.5|
+---+----+----+


scala>





share|improve this answer


























  • Is it me or you forgot to drop the x column before renaming? :)

    – BlueSheepToken
    Nov 26 '18 at 12:57











  • if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.

    – stack0114106
    Nov 26 '18 at 13:00











  • Ok my bad then, I thought df.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").show was working

    – BlueSheepToken
    Nov 26 '18 at 13:23












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53479686%2fhow-to-get-the-name-of-a-spark-column-as-string%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):



def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)


In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.



import org.apache.spark.sql.catalyst.expressions.NamedExpression
def roundKeepname(c:Column,scale:Int) =
c.expr.asInstanceOf[NamedExpression].name


And it works:



scala> spark.range(2).select(roundKeepname('id, 2)).show
+---+
| id|
+---+
| 0|
| 1|
+---+





share|improve this answer


























  • that works, altough I don't like to rely on .toString

    – Raphael Roth
    Nov 26 '18 at 15:41











  • Could you tell us why you need to define your function from the column object? I usually do things like that from the column name with round(col(name), 2).as(name). Would it work for you?

    – Oli
    Nov 26 '18 at 18:02











  • I edited my answer with another more robust version

    – Oli
    Nov 26 '18 at 18:28











  • thats a good idea, you mean def roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?

    – Raphael Roth
    Nov 27 '18 at 8:24













  • Yes exactly. This is something I use quite often.

    – Oli
    Nov 27 '18 at 9:24
















1














Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):



def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)


In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.



import org.apache.spark.sql.catalyst.expressions.NamedExpression
def roundKeepname(c:Column,scale:Int) =
c.expr.asInstanceOf[NamedExpression].name


And it works:



scala> spark.range(2).select(roundKeepname('id, 2)).show
+---+
| id|
+---+
| 0|
| 1|
+---+





share|improve this answer


























  • that works, altough I don't like to rely on .toString

    – Raphael Roth
    Nov 26 '18 at 15:41











  • Could you tell us why you need to define your function from the column object? I usually do things like that from the column name with round(col(name), 2).as(name). Would it work for you?

    – Oli
    Nov 26 '18 at 18:02











  • I edited my answer with another more robust version

    – Oli
    Nov 26 '18 at 18:28











  • thats a good idea, you mean def roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?

    – Raphael Roth
    Nov 27 '18 at 8:24













  • Yes exactly. This is something I use quite often.

    – Oli
    Nov 27 '18 at 9:24














1












1








1







Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):



def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)


In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.



import org.apache.spark.sql.catalyst.expressions.NamedExpression
def roundKeepname(c:Column,scale:Int) =
c.expr.asInstanceOf[NamedExpression].name


And it works:



scala> spark.range(2).select(roundKeepname('id, 2)).show
+---+
| id|
+---+
| 0|
| 1|
+---+





share|improve this answer















Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):



def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)


In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.



import org.apache.spark.sql.catalyst.expressions.NamedExpression
def roundKeepname(c:Column,scale:Int) =
c.expr.asInstanceOf[NamedExpression].name


And it works:



scala> spark.range(2).select(roundKeepname('id, 2)).show
+---+
| id|
+---+
| 0|
| 1|
+---+






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 26 '18 at 18:28

























answered Nov 26 '18 at 14:34









OliOli

2,3141419




2,3141419













  • that works, altough I don't like to rely on .toString

    – Raphael Roth
    Nov 26 '18 at 15:41











  • Could you tell us why you need to define your function from the column object? I usually do things like that from the column name with round(col(name), 2).as(name). Would it work for you?

    – Oli
    Nov 26 '18 at 18:02











  • I edited my answer with another more robust version

    – Oli
    Nov 26 '18 at 18:28











  • thats a good idea, you mean def roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?

    – Raphael Roth
    Nov 27 '18 at 8:24













  • Yes exactly. This is something I use quite often.

    – Oli
    Nov 27 '18 at 9:24



















  • that works, altough I don't like to rely on .toString

    – Raphael Roth
    Nov 26 '18 at 15:41











  • Could you tell us why you need to define your function from the column object? I usually do things like that from the column name with round(col(name), 2).as(name). Would it work for you?

    – Oli
    Nov 26 '18 at 18:02











  • I edited my answer with another more robust version

    – Oli
    Nov 26 '18 at 18:28











  • thats a good idea, you mean def roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?

    – Raphael Roth
    Nov 27 '18 at 8:24













  • Yes exactly. This is something I use quite often.

    – Oli
    Nov 27 '18 at 9:24

















that works, altough I don't like to rely on .toString

– Raphael Roth
Nov 26 '18 at 15:41





that works, altough I don't like to rely on .toString

– Raphael Roth
Nov 26 '18 at 15:41













Could you tell us why you need to define your function from the column object? I usually do things like that from the column name with round(col(name), 2).as(name). Would it work for you?

– Oli
Nov 26 '18 at 18:02





Could you tell us why you need to define your function from the column object? I usually do things like that from the column name with round(col(name), 2).as(name). Would it work for you?

– Oli
Nov 26 '18 at 18:02













I edited my answer with another more robust version

– Oli
Nov 26 '18 at 18:28





I edited my answer with another more robust version

– Oli
Nov 26 '18 at 18:28













thats a good idea, you mean def roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?

– Raphael Roth
Nov 27 '18 at 8:24







thats a good idea, you mean def roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?

– Raphael Roth
Nov 27 '18 at 8:24















Yes exactly. This is something I use quite often.

– Oli
Nov 27 '18 at 9:24





Yes exactly. This is something I use quite often.

– Oli
Nov 27 '18 at 9:24













-1














Update:



With the solution way given by BlueSheepToken, here is how you can do it dynamically assuming you have all "double" columns.



scala> val df = Seq((1.22,4.34,8.93),(3.44,12.66,17.44),(5.66,9.35,6.54)).toDF("x","y","z")
df: org.apache.spark.sql.DataFrame = [x: double, y: double ... 1 more field]

scala> df.show
+----+-----+-----+
| x| y| z|
+----+-----+-----+
|1.22| 4.34| 8.93|
|3.44|12.66|17.44|
|5.66| 9.35| 6.54|
+----+-----+-----+


scala> df.columns.foldLeft(df)( (acc,p) => (acc.withColumn(p+"_t",round(col(p),1)).drop(p).withColumnRenamed(p+"_t",p))).show
+---+----+----+
| x| y| z|
+---+----+----+
|1.2| 4.3| 8.9|
|3.4|12.7|17.4|
|5.7| 9.4| 6.5|
+---+----+----+


scala>





share|improve this answer


























  • Is it me or you forgot to drop the x column before renaming? :)

    – BlueSheepToken
    Nov 26 '18 at 12:57











  • if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.

    – stack0114106
    Nov 26 '18 at 13:00











  • Ok my bad then, I thought df.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").show was working

    – BlueSheepToken
    Nov 26 '18 at 13:23
















-1














Update:



With the solution way given by BlueSheepToken, here is how you can do it dynamically assuming you have all "double" columns.



scala> val df = Seq((1.22,4.34,8.93),(3.44,12.66,17.44),(5.66,9.35,6.54)).toDF("x","y","z")
df: org.apache.spark.sql.DataFrame = [x: double, y: double ... 1 more field]

scala> df.show
+----+-----+-----+
| x| y| z|
+----+-----+-----+
|1.22| 4.34| 8.93|
|3.44|12.66|17.44|
|5.66| 9.35| 6.54|
+----+-----+-----+


scala> df.columns.foldLeft(df)( (acc,p) => (acc.withColumn(p+"_t",round(col(p),1)).drop(p).withColumnRenamed(p+"_t",p))).show
+---+----+----+
| x| y| z|
+---+----+----+
|1.2| 4.3| 8.9|
|3.4|12.7|17.4|
|5.7| 9.4| 6.5|
+---+----+----+


scala>





share|improve this answer


























  • Is it me or you forgot to drop the x column before renaming? :)

    – BlueSheepToken
    Nov 26 '18 at 12:57











  • if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.

    – stack0114106
    Nov 26 '18 at 13:00











  • Ok my bad then, I thought df.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").show was working

    – BlueSheepToken
    Nov 26 '18 at 13:23














-1












-1








-1







Update:



With the solution way given by BlueSheepToken, here is how you can do it dynamically assuming you have all "double" columns.



scala> val df = Seq((1.22,4.34,8.93),(3.44,12.66,17.44),(5.66,9.35,6.54)).toDF("x","y","z")
df: org.apache.spark.sql.DataFrame = [x: double, y: double ... 1 more field]

scala> df.show
+----+-----+-----+
| x| y| z|
+----+-----+-----+
|1.22| 4.34| 8.93|
|3.44|12.66|17.44|
|5.66| 9.35| 6.54|
+----+-----+-----+


scala> df.columns.foldLeft(df)( (acc,p) => (acc.withColumn(p+"_t",round(col(p),1)).drop(p).withColumnRenamed(p+"_t",p))).show
+---+----+----+
| x| y| z|
+---+----+----+
|1.2| 4.3| 8.9|
|3.4|12.7|17.4|
|5.7| 9.4| 6.5|
+---+----+----+


scala>





share|improve this answer















Update:



With the solution way given by BlueSheepToken, here is how you can do it dynamically assuming you have all "double" columns.



scala> val df = Seq((1.22,4.34,8.93),(3.44,12.66,17.44),(5.66,9.35,6.54)).toDF("x","y","z")
df: org.apache.spark.sql.DataFrame = [x: double, y: double ... 1 more field]

scala> df.show
+----+-----+-----+
| x| y| z|
+----+-----+-----+
|1.22| 4.34| 8.93|
|3.44|12.66|17.44|
|5.66| 9.35| 6.54|
+----+-----+-----+


scala> df.columns.foldLeft(df)( (acc,p) => (acc.withColumn(p+"_t",round(col(p),1)).drop(p).withColumnRenamed(p+"_t",p))).show
+---+----+----+
| x| y| z|
+---+----+----+
|1.2| 4.3| 8.9|
|3.4|12.7|17.4|
|5.7| 9.4| 6.5|
+---+----+----+


scala>






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 26 '18 at 14:29

























answered Nov 26 '18 at 12:49









stack0114106stack0114106

4,9832423




4,9832423













  • Is it me or you forgot to drop the x column before renaming? :)

    – BlueSheepToken
    Nov 26 '18 at 12:57











  • if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.

    – stack0114106
    Nov 26 '18 at 13:00











  • Ok my bad then, I thought df.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").show was working

    – BlueSheepToken
    Nov 26 '18 at 13:23



















  • Is it me or you forgot to drop the x column before renaming? :)

    – BlueSheepToken
    Nov 26 '18 at 12:57











  • if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.

    – stack0114106
    Nov 26 '18 at 13:00











  • Ok my bad then, I thought df.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").show was working

    – BlueSheepToken
    Nov 26 '18 at 13:23

















Is it me or you forgot to drop the x column before renaming? :)

– BlueSheepToken
Nov 26 '18 at 12:57





Is it me or you forgot to drop the x column before renaming? :)

– BlueSheepToken
Nov 26 '18 at 12:57













if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.

– stack0114106
Nov 26 '18 at 13:00





if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.

– stack0114106
Nov 26 '18 at 13:00













Ok my bad then, I thought df.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").show was working

– BlueSheepToken
Nov 26 '18 at 13:23





Ok my bad then, I thought df.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").show was working

– BlueSheepToken
Nov 26 '18 at 13:23


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53479686%2fhow-to-get-the-name-of-a-spark-column-as-string%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Tonle Sap (See)

I get strange results when I access the Sqlitedatabase with Unity C# via XAMPP

Guatemaltekische Davis-Cup-Mannschaft