Drop a column in a struct within an Array type
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
My schema looks like this
root
|-- source: string (nullable = true)
|-- results: array (nullable = true)
| |-- content: struct (containsNull = true)
| | |-- ptype: string (nullable = true)
| | |-- domain: string (nullable = true)
| | |-- verb: string (nullable = true)
| | |-- foobar: map (nullable = true)
| | | |-- key: string
| | | |-- value: string (valueContainsNull = true)
| | |-- fooId: integer (nullable = true)
|-- date: string (nullable = false)
|-- hour: string (nullable = false)
I have a df with the above data. I want to create a dataframe without fooId
.
I cannot use drop
since its a nested column.
The tricky part is results
is an array and has content
as a struct.
Inside of which there is fooId
What would be the cleanest way to accomplish this?
apache-spark pyspark apache-spark-sql
add a comment |
My schema looks like this
root
|-- source: string (nullable = true)
|-- results: array (nullable = true)
| |-- content: struct (containsNull = true)
| | |-- ptype: string (nullable = true)
| | |-- domain: string (nullable = true)
| | |-- verb: string (nullable = true)
| | |-- foobar: map (nullable = true)
| | | |-- key: string
| | | |-- value: string (valueContainsNull = true)
| | |-- fooId: integer (nullable = true)
|-- date: string (nullable = false)
|-- hour: string (nullable = false)
I have a df with the above data. I want to create a dataframe without fooId
.
I cannot use drop
since its a nested column.
The tricky part is results
is an array and has content
as a struct.
Inside of which there is fooId
What would be the cleanest way to accomplish this?
apache-spark pyspark apache-spark-sql
@user6910411 The concept is the same but the structure is different. Here it is a struct with an array.
– suprita shankar
Nov 27 '18 at 0:44
Since it's an array, I believe the easiest way would be to use anUDF
to remove the data. Or you couldexplode
the array first but that would change the dataframe structure.
– Shaido
Nov 27 '18 at 1:46
I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?
– suprita shankar
Nov 27 '18 at 2:42
You can create an udf that takes theresults
array as input, then do all necessary processing inside it (i.e. removefooId
). You can usewithColumn
to overwrite theresults
column with what the udf returns.
– Shaido
Nov 27 '18 at 2:54
add a comment |
My schema looks like this
root
|-- source: string (nullable = true)
|-- results: array (nullable = true)
| |-- content: struct (containsNull = true)
| | |-- ptype: string (nullable = true)
| | |-- domain: string (nullable = true)
| | |-- verb: string (nullable = true)
| | |-- foobar: map (nullable = true)
| | | |-- key: string
| | | |-- value: string (valueContainsNull = true)
| | |-- fooId: integer (nullable = true)
|-- date: string (nullable = false)
|-- hour: string (nullable = false)
I have a df with the above data. I want to create a dataframe without fooId
.
I cannot use drop
since its a nested column.
The tricky part is results
is an array and has content
as a struct.
Inside of which there is fooId
What would be the cleanest way to accomplish this?
apache-spark pyspark apache-spark-sql
My schema looks like this
root
|-- source: string (nullable = true)
|-- results: array (nullable = true)
| |-- content: struct (containsNull = true)
| | |-- ptype: string (nullable = true)
| | |-- domain: string (nullable = true)
| | |-- verb: string (nullable = true)
| | |-- foobar: map (nullable = true)
| | | |-- key: string
| | | |-- value: string (valueContainsNull = true)
| | |-- fooId: integer (nullable = true)
|-- date: string (nullable = false)
|-- hour: string (nullable = false)
I have a df with the above data. I want to create a dataframe without fooId
.
I cannot use drop
since its a nested column.
The tricky part is results
is an array and has content
as a struct.
Inside of which there is fooId
What would be the cleanest way to accomplish this?
apache-spark pyspark apache-spark-sql
apache-spark pyspark apache-spark-sql
asked Nov 26 '18 at 23:42
suprita shankarsuprita shankar
5611423
5611423
@user6910411 The concept is the same but the structure is different. Here it is a struct with an array.
– suprita shankar
Nov 27 '18 at 0:44
Since it's an array, I believe the easiest way would be to use anUDF
to remove the data. Or you couldexplode
the array first but that would change the dataframe structure.
– Shaido
Nov 27 '18 at 1:46
I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?
– suprita shankar
Nov 27 '18 at 2:42
You can create an udf that takes theresults
array as input, then do all necessary processing inside it (i.e. removefooId
). You can usewithColumn
to overwrite theresults
column with what the udf returns.
– Shaido
Nov 27 '18 at 2:54
add a comment |
@user6910411 The concept is the same but the structure is different. Here it is a struct with an array.
– suprita shankar
Nov 27 '18 at 0:44
Since it's an array, I believe the easiest way would be to use anUDF
to remove the data. Or you couldexplode
the array first but that would change the dataframe structure.
– Shaido
Nov 27 '18 at 1:46
I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?
– suprita shankar
Nov 27 '18 at 2:42
You can create an udf that takes theresults
array as input, then do all necessary processing inside it (i.e. removefooId
). You can usewithColumn
to overwrite theresults
column with what the udf returns.
– Shaido
Nov 27 '18 at 2:54
@user6910411 The concept is the same but the structure is different. Here it is a struct with an array.
– suprita shankar
Nov 27 '18 at 0:44
@user6910411 The concept is the same but the structure is different. Here it is a struct with an array.
– suprita shankar
Nov 27 '18 at 0:44
Since it's an array, I believe the easiest way would be to use an
UDF
to remove the data. Or you could explode
the array first but that would change the dataframe structure.– Shaido
Nov 27 '18 at 1:46
Since it's an array, I believe the easiest way would be to use an
UDF
to remove the data. Or you could explode
the array first but that would change the dataframe structure.– Shaido
Nov 27 '18 at 1:46
I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?
– suprita shankar
Nov 27 '18 at 2:42
I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?
– suprita shankar
Nov 27 '18 at 2:42
You can create an udf that takes the
results
array as input, then do all necessary processing inside it (i.e. remove fooId
). You can use withColumn
to overwrite the results
column with what the udf returns.– Shaido
Nov 27 '18 at 2:54
You can create an udf that takes the
results
array as input, then do all necessary processing inside it (i.e. remove fooId
). You can use withColumn
to overwrite the results
column with what the udf returns.– Shaido
Nov 27 '18 at 2:54
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53490730%2fdrop-a-column-in-a-struct-within-an-array-type%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53490730%2fdrop-a-column-in-a-struct-within-an-array-type%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
@user6910411 The concept is the same but the structure is different. Here it is a struct with an array.
– suprita shankar
Nov 27 '18 at 0:44
Since it's an array, I believe the easiest way would be to use an
UDF
to remove the data. Or you couldexplode
the array first but that would change the dataframe structure.– Shaido
Nov 27 '18 at 1:46
I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?
– suprita shankar
Nov 27 '18 at 2:42
You can create an udf that takes the
results
array as input, then do all necessary processing inside it (i.e. removefooId
). You can usewithColumn
to overwrite theresults
column with what the udf returns.– Shaido
Nov 27 '18 at 2:54