error using astype when NaN exists in a dataframe
up vote
18
down vote
favorite
df
A B
0 a=10 b=20.10
1 a=20 NaN
2 NaN b=30.10
3 a=40 b=40.10
I tried :
df['A'] = df['A'].str.extract('(d+)').astype(int)
df['B'] = df['B'].str.extract('(d+)').astype(float)
But I get the following error:
ValueError: cannot convert float NaN to integer
And:
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
How do I fix this ?
pandas
add a comment |
up vote
18
down vote
favorite
df
A B
0 a=10 b=20.10
1 a=20 NaN
2 NaN b=30.10
3 a=40 b=40.10
I tried :
df['A'] = df['A'].str.extract('(d+)').astype(int)
df['B'] = df['B'].str.extract('(d+)').astype(float)
But I get the following error:
ValueError: cannot convert float NaN to integer
And:
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
How do I fix this ?
pandas
1
FirstlyNaNcan only be represented by float so you can't cast tointin that case, second if you have mixed dtypes for instance string and some other thing then using ``str.extract` will fail, although mixed dtypes are supported, it's not a good idea as it leads to errors. You should decide what the final dtype should be and replace the missing values that makes sense to you
– EdChum
Jan 9 '17 at 15:02
add a comment |
up vote
18
down vote
favorite
up vote
18
down vote
favorite
df
A B
0 a=10 b=20.10
1 a=20 NaN
2 NaN b=30.10
3 a=40 b=40.10
I tried :
df['A'] = df['A'].str.extract('(d+)').astype(int)
df['B'] = df['B'].str.extract('(d+)').astype(float)
But I get the following error:
ValueError: cannot convert float NaN to integer
And:
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
How do I fix this ?
pandas
df
A B
0 a=10 b=20.10
1 a=20 NaN
2 NaN b=30.10
3 a=40 b=40.10
I tried :
df['A'] = df['A'].str.extract('(d+)').astype(int)
df['B'] = df['B'].str.extract('(d+)').astype(float)
But I get the following error:
ValueError: cannot convert float NaN to integer
And:
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
How do I fix this ?
pandas
pandas
edited Jan 9 '17 at 15:04
IanS
8,20122457
8,20122457
asked Jan 9 '17 at 14:57
Sun
2683514
2683514
1
FirstlyNaNcan only be represented by float so you can't cast tointin that case, second if you have mixed dtypes for instance string and some other thing then using ``str.extract` will fail, although mixed dtypes are supported, it's not a good idea as it leads to errors. You should decide what the final dtype should be and replace the missing values that makes sense to you
– EdChum
Jan 9 '17 at 15:02
add a comment |
1
FirstlyNaNcan only be represented by float so you can't cast tointin that case, second if you have mixed dtypes for instance string and some other thing then using ``str.extract` will fail, although mixed dtypes are supported, it's not a good idea as it leads to errors. You should decide what the final dtype should be and replace the missing values that makes sense to you
– EdChum
Jan 9 '17 at 15:02
1
1
Firstly
NaN can only be represented by float so you can't cast to int in that case, second if you have mixed dtypes for instance string and some other thing then using ``str.extract` will fail, although mixed dtypes are supported, it's not a good idea as it leads to errors. You should decide what the final dtype should be and replace the missing values that makes sense to you– EdChum
Jan 9 '17 at 15:02
Firstly
NaN can only be represented by float so you can't cast to int in that case, second if you have mixed dtypes for instance string and some other thing then using ``str.extract` will fail, although mixed dtypes are supported, it's not a good idea as it leads to errors. You should decide what the final dtype should be and replace the missing values that makes sense to you– EdChum
Jan 9 '17 at 15:02
add a comment |
1 Answer
1
active
oldest
votes
up vote
31
down vote
accepted
If some values in column are missing (NaN) and then converted to numeric, always dtype is float. You cannot convert values to int. Only to float, because type of NaN is float.
print (type(np.nan))
<class 'float'>
See docs how convert values if at least one NaN:
integer > cast to float64
If need int values you need replace NaN to some int, e.g. 0 by fillna and then it works perfectly:
df['A'] = df['A'].str.extract('(d+)', expand=False)
df['B'] = df['B'].str.extract('(d+)', expand=False)
print (df)
A B
0 10 20
1 20 NaN
2 NaN 30
3 40 40
df1 = df.fillna(0).astype(int)
print (df1)
A B
0 10 20
1 20 0
2 0 30
3 40 40
print (df1.dtypes)
A int32
B int32
dtype: object
works. Thanks a lot for your help.
– Sun
Jan 10 '17 at 10:26
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
31
down vote
accepted
If some values in column are missing (NaN) and then converted to numeric, always dtype is float. You cannot convert values to int. Only to float, because type of NaN is float.
print (type(np.nan))
<class 'float'>
See docs how convert values if at least one NaN:
integer > cast to float64
If need int values you need replace NaN to some int, e.g. 0 by fillna and then it works perfectly:
df['A'] = df['A'].str.extract('(d+)', expand=False)
df['B'] = df['B'].str.extract('(d+)', expand=False)
print (df)
A B
0 10 20
1 20 NaN
2 NaN 30
3 40 40
df1 = df.fillna(0).astype(int)
print (df1)
A B
0 10 20
1 20 0
2 0 30
3 40 40
print (df1.dtypes)
A int32
B int32
dtype: object
works. Thanks a lot for your help.
– Sun
Jan 10 '17 at 10:26
add a comment |
up vote
31
down vote
accepted
If some values in column are missing (NaN) and then converted to numeric, always dtype is float. You cannot convert values to int. Only to float, because type of NaN is float.
print (type(np.nan))
<class 'float'>
See docs how convert values if at least one NaN:
integer > cast to float64
If need int values you need replace NaN to some int, e.g. 0 by fillna and then it works perfectly:
df['A'] = df['A'].str.extract('(d+)', expand=False)
df['B'] = df['B'].str.extract('(d+)', expand=False)
print (df)
A B
0 10 20
1 20 NaN
2 NaN 30
3 40 40
df1 = df.fillna(0).astype(int)
print (df1)
A B
0 10 20
1 20 0
2 0 30
3 40 40
print (df1.dtypes)
A int32
B int32
dtype: object
works. Thanks a lot for your help.
– Sun
Jan 10 '17 at 10:26
add a comment |
up vote
31
down vote
accepted
up vote
31
down vote
accepted
If some values in column are missing (NaN) and then converted to numeric, always dtype is float. You cannot convert values to int. Only to float, because type of NaN is float.
print (type(np.nan))
<class 'float'>
See docs how convert values if at least one NaN:
integer > cast to float64
If need int values you need replace NaN to some int, e.g. 0 by fillna and then it works perfectly:
df['A'] = df['A'].str.extract('(d+)', expand=False)
df['B'] = df['B'].str.extract('(d+)', expand=False)
print (df)
A B
0 10 20
1 20 NaN
2 NaN 30
3 40 40
df1 = df.fillna(0).astype(int)
print (df1)
A B
0 10 20
1 20 0
2 0 30
3 40 40
print (df1.dtypes)
A int32
B int32
dtype: object
If some values in column are missing (NaN) and then converted to numeric, always dtype is float. You cannot convert values to int. Only to float, because type of NaN is float.
print (type(np.nan))
<class 'float'>
See docs how convert values if at least one NaN:
integer > cast to float64
If need int values you need replace NaN to some int, e.g. 0 by fillna and then it works perfectly:
df['A'] = df['A'].str.extract('(d+)', expand=False)
df['B'] = df['B'].str.extract('(d+)', expand=False)
print (df)
A B
0 10 20
1 20 NaN
2 NaN 30
3 40 40
df1 = df.fillna(0).astype(int)
print (df1)
A B
0 10 20
1 20 0
2 0 30
3 40 40
print (df1.dtypes)
A int32
B int32
dtype: object
edited Jan 9 '17 at 15:09
answered Jan 9 '17 at 14:59
jezrael
311k21247322
311k21247322
works. Thanks a lot for your help.
– Sun
Jan 10 '17 at 10:26
add a comment |
works. Thanks a lot for your help.
– Sun
Jan 10 '17 at 10:26
works. Thanks a lot for your help.
– Sun
Jan 10 '17 at 10:26
works. Thanks a lot for your help.
– Sun
Jan 10 '17 at 10:26
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f41550746%2ferror-using-astype-when-nan-exists-in-a-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Firstly
NaNcan only be represented by float so you can't cast tointin that case, second if you have mixed dtypes for instance string and some other thing then using ``str.extract` will fail, although mixed dtypes are supported, it's not a good idea as it leads to errors. You should decide what the final dtype should be and replace the missing values that makes sense to you– EdChum
Jan 9 '17 at 15:02