Convert pandas.DataFrame to numpy tensor using factor levels for shape [duplicate]
This question already has an answer here:
Transform Pandas DataFrame with n-level hierarchical index into n-D Numpy array
1 answer
I have data from a full factorial experiment. For example, for each of N
samples, I have J
types of measurement and K
measurement loci. I receive this data in long format, for example,
import numpy as np
import pandas as pd
import itertools
from numpy.random import normal as rnorm
# [[N], [J], [K]]
levels = [[1,2,3,4], ['start', 'stop'], ['gene1', 'gene2', 'gene3']]
# fully crossed
exp_design = list(itertools.product(*levels))
df = pd.DataFrame(exp_design, columns=["sample", "mode", "gene"])
# some fake data
df['x'] = rnorm(size=len(exp_design))
which results in 24 observations (x
) with a column for each of the three factors.
> df.head()
sample mode gene x
0 1 start gene1 -1.229370
1 1 start gene2 1.129773
2 1 start gene3 -1.155202
3 1 stop gene1 -0.757551
4 1 stop gene2 -0.166129
I want to convert these observations to the corresponding (N,J,K)
-shaped tensor (numpy array). I was thinking pivoting to wide format with a MultiIndex, then extracting values would generate the correct tensor, but it simply comes off as a column vector:
> df.pivot_table(values='x', index=['sample', 'mode', 'gene']).values
array([[-1.22936989],
[ 1.12977346],
[-1.15520216],
...,
[-0.1031641 ],
[ 1.1296491 ],
[ 1.31113584]])
Is there a quick way to get tensor formatted data from a long format pandas.DataFrame
?
python pandas numpy tensor numpy-ndarray
marked as duplicate by merv, Community♦ Nov 29 '18 at 2:37
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
Transform Pandas DataFrame with n-level hierarchical index into n-D Numpy array
1 answer
I have data from a full factorial experiment. For example, for each of N
samples, I have J
types of measurement and K
measurement loci. I receive this data in long format, for example,
import numpy as np
import pandas as pd
import itertools
from numpy.random import normal as rnorm
# [[N], [J], [K]]
levels = [[1,2,3,4], ['start', 'stop'], ['gene1', 'gene2', 'gene3']]
# fully crossed
exp_design = list(itertools.product(*levels))
df = pd.DataFrame(exp_design, columns=["sample", "mode", "gene"])
# some fake data
df['x'] = rnorm(size=len(exp_design))
which results in 24 observations (x
) with a column for each of the three factors.
> df.head()
sample mode gene x
0 1 start gene1 -1.229370
1 1 start gene2 1.129773
2 1 start gene3 -1.155202
3 1 stop gene1 -0.757551
4 1 stop gene2 -0.166129
I want to convert these observations to the corresponding (N,J,K)
-shaped tensor (numpy array). I was thinking pivoting to wide format with a MultiIndex, then extracting values would generate the correct tensor, but it simply comes off as a column vector:
> df.pivot_table(values='x', index=['sample', 'mode', 'gene']).values
array([[-1.22936989],
[ 1.12977346],
[-1.15520216],
...,
[-0.1031641 ],
[ 1.1296491 ],
[ 1.31113584]])
Is there a quick way to get tensor formatted data from a long format pandas.DataFrame
?
python pandas numpy tensor numpy-ndarray
marked as duplicate by merv, Community♦ Nov 29 '18 at 2:37
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
Transform Pandas DataFrame with n-level hierarchical index into n-D Numpy array
1 answer
I have data from a full factorial experiment. For example, for each of N
samples, I have J
types of measurement and K
measurement loci. I receive this data in long format, for example,
import numpy as np
import pandas as pd
import itertools
from numpy.random import normal as rnorm
# [[N], [J], [K]]
levels = [[1,2,3,4], ['start', 'stop'], ['gene1', 'gene2', 'gene3']]
# fully crossed
exp_design = list(itertools.product(*levels))
df = pd.DataFrame(exp_design, columns=["sample", "mode", "gene"])
# some fake data
df['x'] = rnorm(size=len(exp_design))
which results in 24 observations (x
) with a column for each of the three factors.
> df.head()
sample mode gene x
0 1 start gene1 -1.229370
1 1 start gene2 1.129773
2 1 start gene3 -1.155202
3 1 stop gene1 -0.757551
4 1 stop gene2 -0.166129
I want to convert these observations to the corresponding (N,J,K)
-shaped tensor (numpy array). I was thinking pivoting to wide format with a MultiIndex, then extracting values would generate the correct tensor, but it simply comes off as a column vector:
> df.pivot_table(values='x', index=['sample', 'mode', 'gene']).values
array([[-1.22936989],
[ 1.12977346],
[-1.15520216],
...,
[-0.1031641 ],
[ 1.1296491 ],
[ 1.31113584]])
Is there a quick way to get tensor formatted data from a long format pandas.DataFrame
?
python pandas numpy tensor numpy-ndarray
This question already has an answer here:
Transform Pandas DataFrame with n-level hierarchical index into n-D Numpy array
1 answer
I have data from a full factorial experiment. For example, for each of N
samples, I have J
types of measurement and K
measurement loci. I receive this data in long format, for example,
import numpy as np
import pandas as pd
import itertools
from numpy.random import normal as rnorm
# [[N], [J], [K]]
levels = [[1,2,3,4], ['start', 'stop'], ['gene1', 'gene2', 'gene3']]
# fully crossed
exp_design = list(itertools.product(*levels))
df = pd.DataFrame(exp_design, columns=["sample", "mode", "gene"])
# some fake data
df['x'] = rnorm(size=len(exp_design))
which results in 24 observations (x
) with a column for each of the three factors.
> df.head()
sample mode gene x
0 1 start gene1 -1.229370
1 1 start gene2 1.129773
2 1 start gene3 -1.155202
3 1 stop gene1 -0.757551
4 1 stop gene2 -0.166129
I want to convert these observations to the corresponding (N,J,K)
-shaped tensor (numpy array). I was thinking pivoting to wide format with a MultiIndex, then extracting values would generate the correct tensor, but it simply comes off as a column vector:
> df.pivot_table(values='x', index=['sample', 'mode', 'gene']).values
array([[-1.22936989],
[ 1.12977346],
[-1.15520216],
...,
[-0.1031641 ],
[ 1.1296491 ],
[ 1.31113584]])
Is there a quick way to get tensor formatted data from a long format pandas.DataFrame
?
This question already has an answer here:
Transform Pandas DataFrame with n-level hierarchical index into n-D Numpy array
1 answer
python pandas numpy tensor numpy-ndarray
python pandas numpy tensor numpy-ndarray
edited Nov 24 '18 at 4:13
merv
asked Nov 24 '18 at 3:59
mervmerv
25.3k674109
25.3k674109
marked as duplicate by merv, Community♦ Nov 29 '18 at 2:37
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by merv, Community♦ Nov 29 '18 at 2:37
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Try with
df.agg('nunique')
Out[69]:
sample 4
mode 2
gene 3
x 24
dtype: int64
s=df.agg('nunique')
df.x.values.reshape(s['sample'],s['mode'],s['gene'])
Out[71]:
array([[[-2.78133759e-01, -1.42234420e+00, 5.42439121e-01],
[ 2.15359867e+00, 6.55837886e-01, -1.01293568e+00]],
[[ 7.92306679e-01, -1.62539763e-01, -6.13120335e-01],
[-2.91567999e-01, -4.01257702e-01, 7.96422763e-01]],
[[ 1.05088264e-01, -7.23400925e-02, 2.78515041e-01],
[ 2.63088568e-01, 1.47477886e+00, -2.10735619e+00]],
[[-1.71756374e+00, 6.12224005e-04, -3.11562798e-02],
[ 5.26028807e-01, -1.18502045e+00, 1.88633760e+00]]])
@merv check the update
– Wen-Ben
Nov 24 '18 at 5:21
I think it's important to note here that this assumes that data frame is first sorted like,df.sort_values(by=['sample', 'mode', 'gene'])
– merv
Nov 24 '18 at 5:46
@merv yes you are right
– Wen-Ben
Nov 24 '18 at 6:33
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try with
df.agg('nunique')
Out[69]:
sample 4
mode 2
gene 3
x 24
dtype: int64
s=df.agg('nunique')
df.x.values.reshape(s['sample'],s['mode'],s['gene'])
Out[71]:
array([[[-2.78133759e-01, -1.42234420e+00, 5.42439121e-01],
[ 2.15359867e+00, 6.55837886e-01, -1.01293568e+00]],
[[ 7.92306679e-01, -1.62539763e-01, -6.13120335e-01],
[-2.91567999e-01, -4.01257702e-01, 7.96422763e-01]],
[[ 1.05088264e-01, -7.23400925e-02, 2.78515041e-01],
[ 2.63088568e-01, 1.47477886e+00, -2.10735619e+00]],
[[-1.71756374e+00, 6.12224005e-04, -3.11562798e-02],
[ 5.26028807e-01, -1.18502045e+00, 1.88633760e+00]]])
@merv check the update
– Wen-Ben
Nov 24 '18 at 5:21
I think it's important to note here that this assumes that data frame is first sorted like,df.sort_values(by=['sample', 'mode', 'gene'])
– merv
Nov 24 '18 at 5:46
@merv yes you are right
– Wen-Ben
Nov 24 '18 at 6:33
add a comment |
Try with
df.agg('nunique')
Out[69]:
sample 4
mode 2
gene 3
x 24
dtype: int64
s=df.agg('nunique')
df.x.values.reshape(s['sample'],s['mode'],s['gene'])
Out[71]:
array([[[-2.78133759e-01, -1.42234420e+00, 5.42439121e-01],
[ 2.15359867e+00, 6.55837886e-01, -1.01293568e+00]],
[[ 7.92306679e-01, -1.62539763e-01, -6.13120335e-01],
[-2.91567999e-01, -4.01257702e-01, 7.96422763e-01]],
[[ 1.05088264e-01, -7.23400925e-02, 2.78515041e-01],
[ 2.63088568e-01, 1.47477886e+00, -2.10735619e+00]],
[[-1.71756374e+00, 6.12224005e-04, -3.11562798e-02],
[ 5.26028807e-01, -1.18502045e+00, 1.88633760e+00]]])
@merv check the update
– Wen-Ben
Nov 24 '18 at 5:21
I think it's important to note here that this assumes that data frame is first sorted like,df.sort_values(by=['sample', 'mode', 'gene'])
– merv
Nov 24 '18 at 5:46
@merv yes you are right
– Wen-Ben
Nov 24 '18 at 6:33
add a comment |
Try with
df.agg('nunique')
Out[69]:
sample 4
mode 2
gene 3
x 24
dtype: int64
s=df.agg('nunique')
df.x.values.reshape(s['sample'],s['mode'],s['gene'])
Out[71]:
array([[[-2.78133759e-01, -1.42234420e+00, 5.42439121e-01],
[ 2.15359867e+00, 6.55837886e-01, -1.01293568e+00]],
[[ 7.92306679e-01, -1.62539763e-01, -6.13120335e-01],
[-2.91567999e-01, -4.01257702e-01, 7.96422763e-01]],
[[ 1.05088264e-01, -7.23400925e-02, 2.78515041e-01],
[ 2.63088568e-01, 1.47477886e+00, -2.10735619e+00]],
[[-1.71756374e+00, 6.12224005e-04, -3.11562798e-02],
[ 5.26028807e-01, -1.18502045e+00, 1.88633760e+00]]])
Try with
df.agg('nunique')
Out[69]:
sample 4
mode 2
gene 3
x 24
dtype: int64
s=df.agg('nunique')
df.x.values.reshape(s['sample'],s['mode'],s['gene'])
Out[71]:
array([[[-2.78133759e-01, -1.42234420e+00, 5.42439121e-01],
[ 2.15359867e+00, 6.55837886e-01, -1.01293568e+00]],
[[ 7.92306679e-01, -1.62539763e-01, -6.13120335e-01],
[-2.91567999e-01, -4.01257702e-01, 7.96422763e-01]],
[[ 1.05088264e-01, -7.23400925e-02, 2.78515041e-01],
[ 2.63088568e-01, 1.47477886e+00, -2.10735619e+00]],
[[-1.71756374e+00, 6.12224005e-04, -3.11562798e-02],
[ 5.26028807e-01, -1.18502045e+00, 1.88633760e+00]]])
edited Nov 24 '18 at 5:21
answered Nov 24 '18 at 4:36
Wen-BenWen-Ben
112k83367
112k83367
@merv check the update
– Wen-Ben
Nov 24 '18 at 5:21
I think it's important to note here that this assumes that data frame is first sorted like,df.sort_values(by=['sample', 'mode', 'gene'])
– merv
Nov 24 '18 at 5:46
@merv yes you are right
– Wen-Ben
Nov 24 '18 at 6:33
add a comment |
@merv check the update
– Wen-Ben
Nov 24 '18 at 5:21
I think it's important to note here that this assumes that data frame is first sorted like,df.sort_values(by=['sample', 'mode', 'gene'])
– merv
Nov 24 '18 at 5:46
@merv yes you are right
– Wen-Ben
Nov 24 '18 at 6:33
@merv check the update
– Wen-Ben
Nov 24 '18 at 5:21
@merv check the update
– Wen-Ben
Nov 24 '18 at 5:21
I think it's important to note here that this assumes that data frame is first sorted like,
df.sort_values(by=['sample', 'mode', 'gene'])
– merv
Nov 24 '18 at 5:46
I think it's important to note here that this assumes that data frame is first sorted like,
df.sort_values(by=['sample', 'mode', 'gene'])
– merv
Nov 24 '18 at 5:46
@merv yes you are right
– Wen-Ben
Nov 24 '18 at 6:33
@merv yes you are right
– Wen-Ben
Nov 24 '18 at 6:33
add a comment |