How to get the name of a Spark Column as String?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I want to write a method to round a numeric column without doing something like:

df

.select(round($"x",2).as("x"))

Therefore I need to have a reusable column-expression like:

def roundKeepName(c:Column,scale:Int) = round(c,scale).as(c.name)

Unfortunately c.name does not exist, therefore the above code does not compile. I've found a solution for ColumName:

 def roundKeepName(c:ColumnName,scale:Int) = round(c,scale).as(c.string.name)

But how can I do that with Column (which is generated if I use col("x") instead of $"x")

edited Nov 26 '18 at 16:01

Bugs

4,15992637

asked Nov 26 '18 at 11:00

Raphael Roth

12.8k54279

c.expr.toString?

– user10465355
Nov 26 '18 at 11:08

add a comment |

I want to write a method to round a numeric column without doing something like:

df

.select(round($"x",2).as("x"))

Therefore I need to have a reusable column-expression like:

def roundKeepName(c:Column,scale:Int) = round(c,scale).as(c.name)

Unfortunately c.name does not exist, therefore the above code does not compile. I've found a solution for ColumName:

 def roundKeepName(c:ColumnName,scale:Int) = round(c,scale).as(c.string.name)

But how can I do that with Column (which is generated if I use col("x") instead of $"x")

edited Nov 26 '18 at 16:01

Bugs

4,15992637

asked Nov 26 '18 at 11:00

Raphael Roth

12.8k54279

c.expr.toString?

– user10465355
Nov 26 '18 at 11:08

add a comment |

I want to write a method to round a numeric column without doing something like:

df

.select(round($"x",2).as("x"))

Therefore I need to have a reusable column-expression like:

def roundKeepName(c:Column,scale:Int) = round(c,scale).as(c.name)

Unfortunately c.name does not exist, therefore the above code does not compile. I've found a solution for ColumName:

 def roundKeepName(c:ColumnName,scale:Int) = round(c,scale).as(c.string.name)

But how can I do that with Column (which is generated if I use col("x") instead of $"x")

edited Nov 26 '18 at 16:01

Bugs

4,15992637

asked Nov 26 '18 at 11:00

Raphael Roth

12.8k54279

I want to write a method to round a numeric column without doing something like:

df

.select(round($"x",2).as("x"))

Therefore I need to have a reusable column-expression like:

def roundKeepName(c:Column,scale:Int) = round(c,scale).as(c.name)

Unfortunately c.name does not exist, therefore the above code does not compile. I've found a solution for ColumName:

 def roundKeepName(c:ColumnName,scale:Int) = round(c,scale).as(c.string.name)

But how can I do that with Column (which is generated if I use col("x") instead of $"x")

scala apache-spark

edited Nov 26 '18 at 16:01

Bugs

4,15992637

asked Nov 26 '18 at 11:00

Raphael Roth

12.8k54279

edited Nov 26 '18 at 16:01

Bugs

4,15992637

asked Nov 26 '18 at 11:00

Raphael Roth

12.8k54279

edited Nov 26 '18 at 16:01

Bugs

4,15992637

edited Nov 26 '18 at 16:01

Bugs

4,15992637

edited Nov 26 '18 at 16:01

Bugs

4,15992637

asked Nov 26 '18 at 11:00

Raphael Roth

12.8k54279

asked Nov 26 '18 at 11:00

Raphael Roth

12.8k54279

asked Nov 26 '18 at 11:00

Raphael Roth

12.8k54279

c.expr.toString?

– user10465355
Nov 26 '18 at 11:08

add a comment |

c.expr.toString?

– user10465355
Nov 26 '18 at 11:08

c.expr.toString?

– user10465355
Nov 26 '18 at 11:08

add a comment |

2 Answers
2

active

oldest

votes

Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):

def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)

In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.

import org.apache.spark.sql.catalyst.expressions.NamedExpression

def roundKeepname(c:Column,scale:Int) = 

    c.expr.asInstanceOf[NamedExpression].name

And it works:

scala> spark.range(2).select(roundKeepname('id, 2)).show

+---+

| id|

+---+

|  0|

|  1|

+---+

edited Nov 26 '18 at 18:28

answered Nov 26 '18 at 14:34

Oli

2,3141419

that works, altough I don't like to rely on .toString

– Raphael Roth
Nov 26 '18 at 15:41

Could you tell us why you need to define your function from the column object? I usually do things like that from the column name with round(col(name), 2).as(name). Would it work for you?

– Oli
Nov 26 '18 at 18:02

I edited my answer with another more robust version

– Oli
Nov 26 '18 at 18:28

thats a good idea, you mean def roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?

– Raphael Roth
Nov 27 '18 at 8:24

Yes exactly. This is something I use quite often.

– Oli
Nov 27 '18 at 9:24

add a comment |

-1

Update:

With the solution way given by BlueSheepToken, here is how you can do it dynamically assuming you have all "double" columns.

scala> val df = Seq((1.22,4.34,8.93),(3.44,12.66,17.44),(5.66,9.35,6.54)).toDF("x","y","z")

df: org.apache.spark.sql.DataFrame = [x: double, y: double ... 1 more field]



scala> df.show

+----+-----+-----+

|   x|    y|    z|

+----+-----+-----+

|1.22| 4.34| 8.93|

|3.44|12.66|17.44|

|5.66| 9.35| 6.54|

+----+-----+-----+





scala>  df.columns.foldLeft(df)( (acc,p)  => (acc.withColumn(p+"_t",round(col(p),1)).drop(p).withColumnRenamed(p+"_t",p))).show

+---+----+----+

|  x|   y|   z|

+---+----+----+

|1.2| 4.3| 8.9|

|3.4|12.7|17.4|

|5.7| 9.4| 6.5|

+---+----+----+





scala>

edited Nov 26 '18 at 14:29

answered Nov 26 '18 at 12:49

stack0114106

4,9832423

Is it me or you forgot to drop the x column before renaming? :)

– BlueSheepToken
Nov 26 '18 at 12:57

if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.

– stack0114106
Nov 26 '18 at 13:00

Ok my bad then, I thought df.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").show was working

– BlueSheepToken
Nov 26 '18 at 13:23

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53479686%2fhow-to-get-the-name-of-a-spark-column-as-string%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):

def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)

In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.

import org.apache.spark.sql.catalyst.expressions.NamedExpression

def roundKeepname(c:Column,scale:Int) = 

    c.expr.asInstanceOf[NamedExpression].name

And it works:

scala> spark.range(2).select(roundKeepname('id, 2)).show

+---+

| id|

+---+

|  0|

|  1|

+---+

edited Nov 26 '18 at 18:28

answered Nov 26 '18 at 14:34

Oli

2,3141419

that works, altough I don't like to rely on .toString

– Raphael Roth
Nov 26 '18 at 15:41

Could you tell us why you need to define your function from the column object? I usually do things like that from the column name with round(col(name), 2).as(name). Would it work for you?

– Oli
Nov 26 '18 at 18:02

I edited my answer with another more robust version

– Oli
Nov 26 '18 at 18:28

thats a good idea, you mean def roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?

– Raphael Roth
Nov 27 '18 at 8:24

Yes exactly. This is something I use quite often.

– Oli
Nov 27 '18 at 9:24

add a comment |

Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):

def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)

In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.

import org.apache.spark.sql.catalyst.expressions.NamedExpression

def roundKeepname(c:Column,scale:Int) = 

    c.expr.asInstanceOf[NamedExpression].name

And it works:

scala> spark.range(2).select(roundKeepname('id, 2)).show

+---+

| id|

+---+

|  0|

|  1|

+---+

edited Nov 26 '18 at 18:28

answered Nov 26 '18 at 14:34

Oli

2,3141419

that works, altough I don't like to rely on .toString

– Raphael Roth
Nov 26 '18 at 15:41

Could you tell us why you need to define your function from the column object? I usually do things like that from the column name with round(col(name), 2).as(name). Would it work for you?

– Oli
Nov 26 '18 at 18:02

I edited my answer with another more robust version

– Oli
Nov 26 '18 at 18:28

thats a good idea, you mean def roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?

– Raphael Roth
Nov 27 '18 at 8:24

Yes exactly. This is something I use quite often.

– Oli
Nov 27 '18 at 9:24

add a comment |

Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):

def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)

In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.

import org.apache.spark.sql.catalyst.expressions.NamedExpression

def roundKeepname(c:Column,scale:Int) = 

    c.expr.asInstanceOf[NamedExpression].name

And it works:

scala> spark.range(2).select(roundKeepname('id, 2)).show

+---+

| id|

+---+

|  0|

|  1|

+---+

edited Nov 26 '18 at 18:28

answered Nov 26 '18 at 14:34

Oli

2,3141419

Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):

def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)

In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.

import org.apache.spark.sql.catalyst.expressions.NamedExpression

def roundKeepname(c:Column,scale:Int) = 

    c.expr.asInstanceOf[NamedExpression].name

And it works:

scala> spark.range(2).select(roundKeepname('id, 2)).show

+---+

| id|

+---+

|  0|

|  1|

+---+

edited Nov 26 '18 at 18:28

answered Nov 26 '18 at 14:34

Oli

2,3141419

edited Nov 26 '18 at 18:28

answered Nov 26 '18 at 14:34

Oli

2,3141419

answered Nov 26 '18 at 14:34

Oli

2,3141419

answered Nov 26 '18 at 14:34

Oli

2,3141419

that works, altough I don't like to rely on .toString

– Raphael Roth
Nov 26 '18 at 15:41

Could you tell us why you need to define your function from the column object? I usually do things like that from the column name with round(col(name), 2).as(name). Would it work for you?

– Oli
Nov 26 '18 at 18:02

I edited my answer with another more robust version

– Oli
Nov 26 '18 at 18:28

thats a good idea, you mean def roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?

– Raphael Roth
Nov 27 '18 at 8:24

Yes exactly. This is something I use quite often.

– Oli
Nov 27 '18 at 9:24

add a comment |

that works, altough I don't like to rely on .toString

– Raphael Roth
Nov 26 '18 at 15:41

Could you tell us why you need to define your function from the column object? I usually do things like that from the column name with round(col(name), 2).as(name). Would it work for you?

– Oli
Nov 26 '18 at 18:02

I edited my answer with another more robust version

– Oli
Nov 26 '18 at 18:28

thats a good idea, you mean def roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?

– Raphael Roth
Nov 27 '18 at 8:24

Yes exactly. This is something I use quite often.

– Oli
Nov 27 '18 at 9:24

that works, altough I don't like to rely on .toString

– Raphael Roth
Nov 26 '18 at 15:41

Could you tell us why you need to define your function from the column object? I usually do things like that from the column name with round(col(name), 2).as(name). Would it work for you?

– Oli
Nov 26 '18 at 18:02

I edited my answer with another more robust version

– Oli
Nov 26 '18 at 18:28

thats a good idea, you mean def roundKeepName(columnName:String,scale:Int) = round(col(columnName),scale).as(columnName)?

– Raphael Roth
Nov 27 '18 at 8:24

Yes exactly. This is something I use quite often.

– Oli
Nov 27 '18 at 9:24

add a comment |

-1

Update:

With the solution way given by BlueSheepToken, here is how you can do it dynamically assuming you have all "double" columns.

scala> val df = Seq((1.22,4.34,8.93),(3.44,12.66,17.44),(5.66,9.35,6.54)).toDF("x","y","z")

df: org.apache.spark.sql.DataFrame = [x: double, y: double ... 1 more field]



scala> df.show

+----+-----+-----+

|   x|    y|    z|

+----+-----+-----+

|1.22| 4.34| 8.93|

|3.44|12.66|17.44|

|5.66| 9.35| 6.54|

+----+-----+-----+





scala>  df.columns.foldLeft(df)( (acc,p)  => (acc.withColumn(p+"_t",round(col(p),1)).drop(p).withColumnRenamed(p+"_t",p))).show

+---+----+----+

|  x|   y|   z|

+---+----+----+

|1.2| 4.3| 8.9|

|3.4|12.7|17.4|

|5.7| 9.4| 6.5|

+---+----+----+





scala>

edited Nov 26 '18 at 14:29

answered Nov 26 '18 at 12:49

stack0114106

4,9832423

Is it me or you forgot to drop the x column before renaming? :)

– BlueSheepToken
Nov 26 '18 at 12:57

if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.

– stack0114106
Nov 26 '18 at 13:00

Ok my bad then, I thought df.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").show was working

– BlueSheepToken
Nov 26 '18 at 13:23

add a comment |

-1

Update:

With the solution way given by BlueSheepToken, here is how you can do it dynamically assuming you have all "double" columns.

scala> val df = Seq((1.22,4.34,8.93),(3.44,12.66,17.44),(5.66,9.35,6.54)).toDF("x","y","z")

df: org.apache.spark.sql.DataFrame = [x: double, y: double ... 1 more field]



scala> df.show

+----+-----+-----+

|   x|    y|    z|

+----+-----+-----+

|1.22| 4.34| 8.93|

|3.44|12.66|17.44|

|5.66| 9.35| 6.54|

+----+-----+-----+





scala>  df.columns.foldLeft(df)( (acc,p)  => (acc.withColumn(p+"_t",round(col(p),1)).drop(p).withColumnRenamed(p+"_t",p))).show

+---+----+----+

|  x|   y|   z|

+---+----+----+

|1.2| 4.3| 8.9|

|3.4|12.7|17.4|

|5.7| 9.4| 6.5|

+---+----+----+





scala>

edited Nov 26 '18 at 14:29

answered Nov 26 '18 at 12:49

stack0114106

4,9832423

Is it me or you forgot to drop the x column before renaming? :)

– BlueSheepToken
Nov 26 '18 at 12:57

if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.

– stack0114106
Nov 26 '18 at 13:00

Ok my bad then, I thought df.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").show was working

– BlueSheepToken
Nov 26 '18 at 13:23

add a comment |

-1

Update:

With the solution way given by BlueSheepToken, here is how you can do it dynamically assuming you have all "double" columns.

scala> val df = Seq((1.22,4.34,8.93),(3.44,12.66,17.44),(5.66,9.35,6.54)).toDF("x","y","z")

df: org.apache.spark.sql.DataFrame = [x: double, y: double ... 1 more field]



scala> df.show

+----+-----+-----+

|   x|    y|    z|

+----+-----+-----+

|1.22| 4.34| 8.93|

|3.44|12.66|17.44|

|5.66| 9.35| 6.54|

+----+-----+-----+





scala>  df.columns.foldLeft(df)( (acc,p)  => (acc.withColumn(p+"_t",round(col(p),1)).drop(p).withColumnRenamed(p+"_t",p))).show

+---+----+----+

|  x|   y|   z|

+---+----+----+

|1.2| 4.3| 8.9|

|3.4|12.7|17.4|

|5.7| 9.4| 6.5|

+---+----+----+





scala>

edited Nov 26 '18 at 14:29

answered Nov 26 '18 at 12:49

stack0114106

4,9832423

Update:

With the solution way given by BlueSheepToken, here is how you can do it dynamically assuming you have all "double" columns.

scala> val df = Seq((1.22,4.34,8.93),(3.44,12.66,17.44),(5.66,9.35,6.54)).toDF("x","y","z")

df: org.apache.spark.sql.DataFrame = [x: double, y: double ... 1 more field]



scala> df.show

+----+-----+-----+

|   x|    y|    z|

+----+-----+-----+

|1.22| 4.34| 8.93|

|3.44|12.66|17.44|

|5.66| 9.35| 6.54|

+----+-----+-----+





scala>  df.columns.foldLeft(df)( (acc,p)  => (acc.withColumn(p+"_t",round(col(p),1)).drop(p).withColumnRenamed(p+"_t",p))).show

+---+----+----+

|  x|   y|   z|

+---+----+----+

|1.2| 4.3| 8.9|

|3.4|12.7|17.4|

|5.7| 9.4| 6.5|

+---+----+----+





scala>

edited Nov 26 '18 at 14:29

answered Nov 26 '18 at 12:49

stack0114106

4,9832423

edited Nov 26 '18 at 14:29

answered Nov 26 '18 at 12:49

stack0114106

4,9832423

answered Nov 26 '18 at 12:49

stack0114106

4,9832423

answered Nov 26 '18 at 12:49

stack0114106

4,9832423

Is it me or you forgot to drop the x column before renaming? :)

– BlueSheepToken
Nov 26 '18 at 12:57

if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.

– stack0114106
Nov 26 '18 at 13:00

Ok my bad then, I thought df.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").show was working

– BlueSheepToken
Nov 26 '18 at 13:23

add a comment |

Is it me or you forgot to drop the x column before renaming? :)

– BlueSheepToken
Nov 26 '18 at 12:57

if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.

– stack0114106
Nov 26 '18 at 13:00

Ok my bad then, I thought df.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").show was working

– BlueSheepToken
Nov 26 '18 at 13:23

Is it me or you forgot to drop the x column before renaming? :)

– BlueSheepToken
Nov 26 '18 at 12:57

if you drop x first, then you will get exception - cannot resolve 'x' .. df.drop("x").withColumn("x2",round($"x",1)).show(false) doesn't work.

– stack0114106
Nov 26 '18 at 13:00

Ok my bad then, I thought df.withColumn("x2",round($"x",1)).drop("x").withColumnRenamed("x2","x").show was working

– BlueSheepToken
Nov 26 '18 at 13:23

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytukyg