Drop a column in a struct within an Array type

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

My schema looks like this

root

 |-- source: string (nullable = true)

 |-- results: array (nullable = true)

 |    |-- content: struct (containsNull = true)

 |    |    |-- ptype: string (nullable = true)

 |    |    |-- domain: string (nullable = true)

 |    |    |-- verb: string (nullable = true)

 |    |    |-- foobar: map (nullable = true)

 |    |    |    |-- key: string

 |    |    |    |-- value: string (valueContainsNull = true)

 |    |    |-- fooId: integer (nullable = true)

 |-- date: string (nullable = false)

 |-- hour: string (nullable = false)

I have a df with the above data. I want to create a dataframe without fooId.
I cannot use drop since its a nested column.

The tricky part is results is an array and has content as a struct.
Inside of which there is fooId

What would be the cleanest way to accomplish this?

asked Nov 26 '18 at 23:42

suprita shankar

5611423

@user6910411 The concept is the same but the structure is different. Here it is a struct with an array.

– suprita shankar
Nov 27 '18 at 0:44

Since it's an array, I believe the easiest way would be to use an UDF to remove the data. Or you could explode the array first but that would change the dataframe structure.

– Shaido
Nov 27 '18 at 1:46

I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?

– suprita shankar
Nov 27 '18 at 2:42

You can create an udf that takes the results array as input, then do all necessary processing inside it (i.e. remove fooId). You can use withColumn to overwrite the results column with what the udf returns.

– Shaido
Nov 27 '18 at 2:54

add a comment |

My schema looks like this

root

 |-- source: string (nullable = true)

 |-- results: array (nullable = true)

 |    |-- content: struct (containsNull = true)

 |    |    |-- ptype: string (nullable = true)

 |    |    |-- domain: string (nullable = true)

 |    |    |-- verb: string (nullable = true)

 |    |    |-- foobar: map (nullable = true)

 |    |    |    |-- key: string

 |    |    |    |-- value: string (valueContainsNull = true)

 |    |    |-- fooId: integer (nullable = true)

 |-- date: string (nullable = false)

 |-- hour: string (nullable = false)

I have a df with the above data. I want to create a dataframe without fooId.
I cannot use drop since its a nested column.

The tricky part is results is an array and has content as a struct.
Inside of which there is fooId

What would be the cleanest way to accomplish this?

asked Nov 26 '18 at 23:42

suprita shankar

5611423

@user6910411 The concept is the same but the structure is different. Here it is a struct with an array.

– suprita shankar
Nov 27 '18 at 0:44

Since it's an array, I believe the easiest way would be to use an UDF to remove the data. Or you could explode the array first but that would change the dataframe structure.

– Shaido
Nov 27 '18 at 1:46

I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?

– suprita shankar
Nov 27 '18 at 2:42

You can create an udf that takes the results array as input, then do all necessary processing inside it (i.e. remove fooId). You can use withColumn to overwrite the results column with what the udf returns.

– Shaido
Nov 27 '18 at 2:54

add a comment |

My schema looks like this

root

 |-- source: string (nullable = true)

 |-- results: array (nullable = true)

 |    |-- content: struct (containsNull = true)

 |    |    |-- ptype: string (nullable = true)

 |    |    |-- domain: string (nullable = true)

 |    |    |-- verb: string (nullable = true)

 |    |    |-- foobar: map (nullable = true)

 |    |    |    |-- key: string

 |    |    |    |-- value: string (valueContainsNull = true)

 |    |    |-- fooId: integer (nullable = true)

 |-- date: string (nullable = false)

 |-- hour: string (nullable = false)

I have a df with the above data. I want to create a dataframe without fooId.
I cannot use drop since its a nested column.

The tricky part is results is an array and has content as a struct.
Inside of which there is fooId

What would be the cleanest way to accomplish this?

asked Nov 26 '18 at 23:42

suprita shankar

5611423

My schema looks like this

root

 |-- source: string (nullable = true)

 |-- results: array (nullable = true)

 |    |-- content: struct (containsNull = true)

 |    |    |-- ptype: string (nullable = true)

 |    |    |-- domain: string (nullable = true)

 |    |    |-- verb: string (nullable = true)

 |    |    |-- foobar: map (nullable = true)

 |    |    |    |-- key: string

 |    |    |    |-- value: string (valueContainsNull = true)

 |    |    |-- fooId: integer (nullable = true)

 |-- date: string (nullable = false)

 |-- hour: string (nullable = false)

I have a df with the above data. I want to create a dataframe without fooId.
I cannot use drop since its a nested column.

The tricky part is results is an array and has content as a struct.
Inside of which there is fooId

What would be the cleanest way to accomplish this?

apache-spark pyspark apache-spark-sql

asked Nov 26 '18 at 23:42

suprita shankar

5611423

asked Nov 26 '18 at 23:42

suprita shankar

5611423

asked Nov 26 '18 at 23:42

suprita shankar

5611423

asked Nov 26 '18 at 23:42

suprita shankar

5611423

asked Nov 26 '18 at 23:42

suprita shankar

5611423

@user6910411 The concept is the same but the structure is different. Here it is a struct with an array.

– suprita shankar
Nov 27 '18 at 0:44

Since it's an array, I believe the easiest way would be to use an UDF to remove the data. Or you could explode the array first but that would change the dataframe structure.

– Shaido
Nov 27 '18 at 1:46

I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?

– suprita shankar
Nov 27 '18 at 2:42

You can create an udf that takes the results array as input, then do all necessary processing inside it (i.e. remove fooId). You can use withColumn to overwrite the results column with what the udf returns.

– Shaido
Nov 27 '18 at 2:54

add a comment |

@user6910411 The concept is the same but the structure is different. Here it is a struct with an array.

– suprita shankar
Nov 27 '18 at 0:44

Since it's an array, I believe the easiest way would be to use an UDF to remove the data. Or you could explode the array first but that would change the dataframe structure.

– Shaido
Nov 27 '18 at 1:46

I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?

– suprita shankar
Nov 27 '18 at 2:42

You can create an udf that takes the results array as input, then do all necessary processing inside it (i.e. remove fooId). You can use withColumn to overwrite the results column with what the udf returns.

– Shaido
Nov 27 '18 at 2:54

@user6910411 The concept is the same but the structure is different. Here it is a struct with an array.

– suprita shankar
Nov 27 '18 at 0:44

Since it's an array, I believe the easiest way would be to use an UDF to remove the data. Or you could explode the array first but that would change the dataframe structure.

– Shaido
Nov 27 '18 at 1:46

I would prefer the former approach. So I have to map and then pass it to the udf?Can you elaborate a little more?

– suprita shankar
Nov 27 '18 at 2:42

You can create an udf that takes the results array as input, then do all necessary processing inside it (i.e. remove fooId). You can use withColumn to overwrite the results column with what the udf returns.

– Shaido
Nov 27 '18 at 2:54

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53490730%2fdrop-a-column-in-a-struct-within-an-array-type%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytukyg