Filtering a spark partitioned table is not working in Pyspark

up vote
1
down vote

favorite

I am using spark 2.3 and have written one dataframe to create hive partitioned table using dataframe writer class method in pyspark.

newdf.coalesce(1).write.format('orc').partitionBy('veh_country').mode("overwrite").saveAsTable('emp.partition_Load_table')

Here is my table structure and partitions information.

hive> desc emp.partition_Load_table;

OK

veh_code                varchar(17)

veh_flag                varchar(1)

veh_model               smallint

veh_country             varchar(3)



# Partition Information

# col_name              data_type               comment



veh_country              varchar(3)



hive> show partitions partition_Load_table;

OK

veh_country=CHN

veh_country=USA

veh_country=RUS

Now I am reading this table back in pyspark inside a dataframe.

    df2_data = spark.sql("""

    SELECT * 

    from udb.partition_Load_table

    """);



df2_data.show() --> is working

But I am not able to filter it using partition key column

from pyspark.sql.functions import col

newdf = df2_data.where(col("veh_country")=='CHN')

I am getting below error message:

: java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. 

You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, 

however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK

Caused by: MetaException(message:Filtering is supported only on partition keys of type string)

whereas when I am creating dataframe by specifying the hdfs absolute path of table. filter and where clause is working as expected.

newdataframe = spark.read.format("orc").option("header","false").load("hdfs/path/emp.db/partition_load_table")

below is working

newdataframe.where(col("veh_country")=='CHN').show()

my question is that why it was not able to filter the dataframe in first place. and also why it's throwing an error message " Filtering is supported only on partition keys of type string " even though my veh_country is defined as string or varchar datatypes.

edited 2 days ago

asked 2 days ago

vikrant rana

10619

add a comment |

up vote
1
down vote

favorite

I am using spark 2.3 and have written one dataframe to create hive partitioned table using dataframe writer class method in pyspark.

newdf.coalesce(1).write.format('orc').partitionBy('veh_country').mode("overwrite").saveAsTable('emp.partition_Load_table')

Here is my table structure and partitions information.

hive> desc emp.partition_Load_table;

OK

veh_code                varchar(17)

veh_flag                varchar(1)

veh_model               smallint

veh_country             varchar(3)



# Partition Information

# col_name              data_type               comment



veh_country              varchar(3)



hive> show partitions partition_Load_table;

OK

veh_country=CHN

veh_country=USA

veh_country=RUS

Now I am reading this table back in pyspark inside a dataframe.

    df2_data = spark.sql("""

    SELECT * 

    from udb.partition_Load_table

    """);



df2_data.show() --> is working

But I am not able to filter it using partition key column

from pyspark.sql.functions import col

newdf = df2_data.where(col("veh_country")=='CHN')

I am getting below error message:

: java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. 

You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, 

however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK

Caused by: MetaException(message:Filtering is supported only on partition keys of type string)

whereas when I am creating dataframe by specifying the hdfs absolute path of table. filter and where clause is working as expected.

newdataframe = spark.read.format("orc").option("header","false").load("hdfs/path/emp.db/partition_load_table")

below is working

newdataframe.where(col("veh_country")=='CHN').show()

edited 2 days ago

asked 2 days ago

vikrant rana

10619

add a comment |

up vote
1
down vote

favorite

I am using spark 2.3 and have written one dataframe to create hive partitioned table using dataframe writer class method in pyspark.

newdf.coalesce(1).write.format('orc').partitionBy('veh_country').mode("overwrite").saveAsTable('emp.partition_Load_table')

Here is my table structure and partitions information.

hive> desc emp.partition_Load_table;

OK

veh_code                varchar(17)

veh_flag                varchar(1)

veh_model               smallint

veh_country             varchar(3)



# Partition Information

# col_name              data_type               comment



veh_country              varchar(3)



hive> show partitions partition_Load_table;

OK

veh_country=CHN

veh_country=USA

veh_country=RUS

Now I am reading this table back in pyspark inside a dataframe.

    df2_data = spark.sql("""

    SELECT * 

    from udb.partition_Load_table

    """);



df2_data.show() --> is working

But I am not able to filter it using partition key column

from pyspark.sql.functions import col

newdf = df2_data.where(col("veh_country")=='CHN')

I am getting below error message:

: java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. 

You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, 

however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK

Caused by: MetaException(message:Filtering is supported only on partition keys of type string)

whereas when I am creating dataframe by specifying the hdfs absolute path of table. filter and where clause is working as expected.

newdataframe = spark.read.format("orc").option("header","false").load("hdfs/path/emp.db/partition_load_table")

below is working

newdataframe.where(col("veh_country")=='CHN').show()

edited 2 days ago

asked 2 days ago

vikrant rana

10619

I am using spark 2.3 and have written one dataframe to create hive partitioned table using dataframe writer class method in pyspark.

newdf.coalesce(1).write.format('orc').partitionBy('veh_country').mode("overwrite").saveAsTable('emp.partition_Load_table')

Here is my table structure and partitions information.

hive> desc emp.partition_Load_table;

OK

veh_code                varchar(17)

veh_flag                varchar(1)

veh_model               smallint

veh_country             varchar(3)



# Partition Information

# col_name              data_type               comment



veh_country              varchar(3)



hive> show partitions partition_Load_table;

OK

veh_country=CHN

veh_country=USA

veh_country=RUS

Now I am reading this table back in pyspark inside a dataframe.

    df2_data = spark.sql("""

    SELECT * 

    from udb.partition_Load_table

    """);



df2_data.show() --> is working

But I am not able to filter it using partition key column

from pyspark.sql.functions import col

newdf = df2_data.where(col("veh_country")=='CHN')

I am getting below error message:

: java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. 

You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, 

however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK

Caused by: MetaException(message:Filtering is supported only on partition keys of type string)

whereas when I am creating dataframe by specifying the hdfs absolute path of table. filter and where clause is working as expected.

newdataframe = spark.read.format("orc").option("header","false").load("hdfs/path/emp.db/partition_load_table")

below is working

newdataframe.where(col("veh_country")=='CHN').show()

hive pyspark partitioning

edited 2 days ago

asked 2 days ago

vikrant rana

10619

edited 2 days ago

asked 2 days ago

vikrant rana

10619

edited 2 days ago

asked 2 days ago

vikrant rana

10619

asked 2 days ago

vikrant rana

10619

asked 2 days ago

vikrant rana

10619

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372962%2ffiltering-a-spark-partitioned-table-is-not-working-in-pyspark%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

VujuxaBP,qgY0aqBE,Mo6R,CrOrizjkbDwZtHMM88aA E3xvcXWM,18PmfEd8cbebgR2n8iRkg2

搜尋此網誌

Ytukyg