Drop a column in a struct within an Array type





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















My schema looks like this



root
|-- source: string (nullable = true)
|-- results: array (nullable = true)
| |-- content: struct (containsNull = true)
| | |-- ptype: string (nullable = true)
| | |-- domain: string (nullable = true)
| | |-- verb: string (nullable = true)
| | |-- foobar: map (nullable = true)
| | | |-- key: string
| | | |-- value: string (valueContainsNull = true)
| | |-- fooId: integer (nullable = true)
|-- date: string (nullable = false)
|-- hour: string (nullable = false)


I have a df with the above data. I want to create a dataframe without fooId.
I cannot use drop since its a nested column.



The tricky part is results is an array and has content as a struct.
Inside of which there is fooId



What would be the cleanest way to accomplish this?










share|improve this question























  • @user6910411 The concept is the same but the structure is different. Here it is a struct with an array.

    – suprita shankar
    Nov 27 '18 at 0:44











  • Since it's an array, I believe the easiest way would be to use an UDF to remove the data. Or you could explode the array first but that would change the dataframe structure.

    – Shaido
    Nov 27 '18 at 1:46











  • I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?

    – suprita shankar
    Nov 27 '18 at 2:42











  • You can create an udf that takes the results array as input, then do all necessary processing inside it (i.e. remove fooId). You can use withColumn to overwrite the results column with what the udf returns.

    – Shaido
    Nov 27 '18 at 2:54


















0















My schema looks like this



root
|-- source: string (nullable = true)
|-- results: array (nullable = true)
| |-- content: struct (containsNull = true)
| | |-- ptype: string (nullable = true)
| | |-- domain: string (nullable = true)
| | |-- verb: string (nullable = true)
| | |-- foobar: map (nullable = true)
| | | |-- key: string
| | | |-- value: string (valueContainsNull = true)
| | |-- fooId: integer (nullable = true)
|-- date: string (nullable = false)
|-- hour: string (nullable = false)


I have a df with the above data. I want to create a dataframe without fooId.
I cannot use drop since its a nested column.



The tricky part is results is an array and has content as a struct.
Inside of which there is fooId



What would be the cleanest way to accomplish this?










share|improve this question























  • @user6910411 The concept is the same but the structure is different. Here it is a struct with an array.

    – suprita shankar
    Nov 27 '18 at 0:44











  • Since it's an array, I believe the easiest way would be to use an UDF to remove the data. Or you could explode the array first but that would change the dataframe structure.

    – Shaido
    Nov 27 '18 at 1:46











  • I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?

    – suprita shankar
    Nov 27 '18 at 2:42











  • You can create an udf that takes the results array as input, then do all necessary processing inside it (i.e. remove fooId). You can use withColumn to overwrite the results column with what the udf returns.

    – Shaido
    Nov 27 '18 at 2:54














0












0








0








My schema looks like this



root
|-- source: string (nullable = true)
|-- results: array (nullable = true)
| |-- content: struct (containsNull = true)
| | |-- ptype: string (nullable = true)
| | |-- domain: string (nullable = true)
| | |-- verb: string (nullable = true)
| | |-- foobar: map (nullable = true)
| | | |-- key: string
| | | |-- value: string (valueContainsNull = true)
| | |-- fooId: integer (nullable = true)
|-- date: string (nullable = false)
|-- hour: string (nullable = false)


I have a df with the above data. I want to create a dataframe without fooId.
I cannot use drop since its a nested column.



The tricky part is results is an array and has content as a struct.
Inside of which there is fooId



What would be the cleanest way to accomplish this?










share|improve this question














My schema looks like this



root
|-- source: string (nullable = true)
|-- results: array (nullable = true)
| |-- content: struct (containsNull = true)
| | |-- ptype: string (nullable = true)
| | |-- domain: string (nullable = true)
| | |-- verb: string (nullable = true)
| | |-- foobar: map (nullable = true)
| | | |-- key: string
| | | |-- value: string (valueContainsNull = true)
| | |-- fooId: integer (nullable = true)
|-- date: string (nullable = false)
|-- hour: string (nullable = false)


I have a df with the above data. I want to create a dataframe without fooId.
I cannot use drop since its a nested column.



The tricky part is results is an array and has content as a struct.
Inside of which there is fooId



What would be the cleanest way to accomplish this?







apache-spark pyspark apache-spark-sql






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 26 '18 at 23:42









suprita shankarsuprita shankar

5611423




5611423













  • @user6910411 The concept is the same but the structure is different. Here it is a struct with an array.

    – suprita shankar
    Nov 27 '18 at 0:44











  • Since it's an array, I believe the easiest way would be to use an UDF to remove the data. Or you could explode the array first but that would change the dataframe structure.

    – Shaido
    Nov 27 '18 at 1:46











  • I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?

    – suprita shankar
    Nov 27 '18 at 2:42











  • You can create an udf that takes the results array as input, then do all necessary processing inside it (i.e. remove fooId). You can use withColumn to overwrite the results column with what the udf returns.

    – Shaido
    Nov 27 '18 at 2:54



















  • @user6910411 The concept is the same but the structure is different. Here it is a struct with an array.

    – suprita shankar
    Nov 27 '18 at 0:44











  • Since it's an array, I believe the easiest way would be to use an UDF to remove the data. Or you could explode the array first but that would change the dataframe structure.

    – Shaido
    Nov 27 '18 at 1:46











  • I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?

    – suprita shankar
    Nov 27 '18 at 2:42











  • You can create an udf that takes the results array as input, then do all necessary processing inside it (i.e. remove fooId). You can use withColumn to overwrite the results column with what the udf returns.

    – Shaido
    Nov 27 '18 at 2:54

















@user6910411 The concept is the same but the structure is different. Here it is a struct with an array.

– suprita shankar
Nov 27 '18 at 0:44





@user6910411 The concept is the same but the structure is different. Here it is a struct with an array.

– suprita shankar
Nov 27 '18 at 0:44













Since it's an array, I believe the easiest way would be to use an UDF to remove the data. Or you could explode the array first but that would change the dataframe structure.

– Shaido
Nov 27 '18 at 1:46





Since it's an array, I believe the easiest way would be to use an UDF to remove the data. Or you could explode the array first but that would change the dataframe structure.

– Shaido
Nov 27 '18 at 1:46













I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?

– suprita shankar
Nov 27 '18 at 2:42





I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?

– suprita shankar
Nov 27 '18 at 2:42













You can create an udf that takes the results array as input, then do all necessary processing inside it (i.e. remove fooId). You can use withColumn to overwrite the results column with what the udf returns.

– Shaido
Nov 27 '18 at 2:54





You can create an udf that takes the results array as input, then do all necessary processing inside it (i.e. remove fooId). You can use withColumn to overwrite the results column with what the udf returns.

– Shaido
Nov 27 '18 at 2:54












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53490730%2fdrop-a-column-in-a-struct-within-an-array-type%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53490730%2fdrop-a-column-in-a-struct-within-an-array-type%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Wiesbaden

Marschland

Dieringhausen