Making a new dataframe based on certain conditions
I have a dataframe object in R, sample of which is as follows:
4 5 3
4 5 9
4 5 2
4 6 4
4 10 4
4 10 3
4 10 7
4 10 2
4 9 3
4 9 7
4 10 4
4 10 3
4 6 8
4 5 4
12 3 6
12 4 1
12 4 2
12 4 7
From this dataframe, I want to create a new dataframe of 20 columns, as follows:
Only one row in the new dataframe,for each unique value in
$1
. Hence for this sample data, the new dataframe should have 2 rows(unique 4,12).$2
represents the column number of the new dataframe, in which the value of$3
(of this dataframe) is to be filled. If there are repeating cases, the median of the values of$3
is to be taken. For example, for
$1
value 4, 5 is repeated 4 times, and in the new dataframe, column 5 of the first row should have the value median(3,9,2,4) =3.All other column values are zero.
A sample output for this data would be as follows:
0 0 0 0 3 4 0 0 3 4 0 0 0 0 0 0 0 0 0 0
0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
How can we do this in R? A huge thanks in advance!
r dataframe
add a comment |
I have a dataframe object in R, sample of which is as follows:
4 5 3
4 5 9
4 5 2
4 6 4
4 10 4
4 10 3
4 10 7
4 10 2
4 9 3
4 9 7
4 10 4
4 10 3
4 6 8
4 5 4
12 3 6
12 4 1
12 4 2
12 4 7
From this dataframe, I want to create a new dataframe of 20 columns, as follows:
Only one row in the new dataframe,for each unique value in
$1
. Hence for this sample data, the new dataframe should have 2 rows(unique 4,12).$2
represents the column number of the new dataframe, in which the value of$3
(of this dataframe) is to be filled. If there are repeating cases, the median of the values of$3
is to be taken. For example, for
$1
value 4, 5 is repeated 4 times, and in the new dataframe, column 5 of the first row should have the value median(3,9,2,4) =3.All other column values are zero.
A sample output for this data would be as follows:
0 0 0 0 3 4 0 0 3 4 0 0 0 0 0 0 0 0 0 0
0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
How can we do this in R? A huge thanks in advance!
r dataframe
1
how can the median of(3,9,2,4)
be3
and how can the median ofc(4,8)
be 4??
– Onyambu
Nov 21 '18 at 17:23
add a comment |
I have a dataframe object in R, sample of which is as follows:
4 5 3
4 5 9
4 5 2
4 6 4
4 10 4
4 10 3
4 10 7
4 10 2
4 9 3
4 9 7
4 10 4
4 10 3
4 6 8
4 5 4
12 3 6
12 4 1
12 4 2
12 4 7
From this dataframe, I want to create a new dataframe of 20 columns, as follows:
Only one row in the new dataframe,for each unique value in
$1
. Hence for this sample data, the new dataframe should have 2 rows(unique 4,12).$2
represents the column number of the new dataframe, in which the value of$3
(of this dataframe) is to be filled. If there are repeating cases, the median of the values of$3
is to be taken. For example, for
$1
value 4, 5 is repeated 4 times, and in the new dataframe, column 5 of the first row should have the value median(3,9,2,4) =3.All other column values are zero.
A sample output for this data would be as follows:
0 0 0 0 3 4 0 0 3 4 0 0 0 0 0 0 0 0 0 0
0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
How can we do this in R? A huge thanks in advance!
r dataframe
I have a dataframe object in R, sample of which is as follows:
4 5 3
4 5 9
4 5 2
4 6 4
4 10 4
4 10 3
4 10 7
4 10 2
4 9 3
4 9 7
4 10 4
4 10 3
4 6 8
4 5 4
12 3 6
12 4 1
12 4 2
12 4 7
From this dataframe, I want to create a new dataframe of 20 columns, as follows:
Only one row in the new dataframe,for each unique value in
$1
. Hence for this sample data, the new dataframe should have 2 rows(unique 4,12).$2
represents the column number of the new dataframe, in which the value of$3
(of this dataframe) is to be filled. If there are repeating cases, the median of the values of$3
is to be taken. For example, for
$1
value 4, 5 is repeated 4 times, and in the new dataframe, column 5 of the first row should have the value median(3,9,2,4) =3.All other column values are zero.
A sample output for this data would be as follows:
0 0 0 0 3 4 0 0 3 4 0 0 0 0 0 0 0 0 0 0
0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
How can we do this in R? A huge thanks in advance!
r dataframe
r dataframe
asked Nov 21 '18 at 16:52
rishirishi
317
317
1
how can the median of(3,9,2,4)
be3
and how can the median ofc(4,8)
be 4??
– Onyambu
Nov 21 '18 at 17:23
add a comment |
1
how can the median of(3,9,2,4)
be3
and how can the median ofc(4,8)
be 4??
– Onyambu
Nov 21 '18 at 17:23
1
1
how can the median of
(3,9,2,4)
be 3
and how can the median of c(4,8)
be 4??– Onyambu
Nov 21 '18 at 17:23
how can the median of
(3,9,2,4)
be 3
and how can the median of c(4,8)
be 4??– Onyambu
Nov 21 '18 at 17:23
add a comment |
2 Answers
2
active
oldest
votes
Are you sure that your expected outcome is correct? I think there is an error in calculating the median in your question, as also pointed out in the comments. You could do it as follows:
library(dplyr)
df$V1 <- as.numeric(as.factor(df$V1))
values <- df %>% group_by(V1,V2) %>% summarise(median=median(V3))
new_df <- matrix(0,nrow=length(unique(df$V1)), ncol=20)
for(i in 1:nrow(new_df)){
for(j in 1:ncol(new_df)){
value <- values$median[values$V1==i & values$V2==j]
if(length(value)>0){
new_df[i,j] = value
}
}
}
new_df
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] 0 0 0 0 3.5 6 0 0 5 3.5 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 6 2 0.0 0 0 0 0 0.0 0 0 0 0 0 0 0 0 0 0
The outcome is slightly different though.
add a comment |
df = transform(df,V1=factor(V1))
fill = matrix(0,length(levels(df$V1)),20)
df2=aggregate(V3~.,df,function(x)floor(median(x)))
fill[cbind(as.integer(df2$V1),df2$V2)]=df2$V3
fill
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] 0 0 0 0 3 6 0 0 5 3 0 0 0 0
[2,] 0 0 6 2 0 0 0 0 0 0 0 0 0 0
[,15] [,16] [,17] [,18] [,19] [,20]
[1,] 0 0 0 0 0 0
[2,] 0 0 0 0 0 0
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53416960%2fmaking-a-new-dataframe-based-on-certain-conditions%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Are you sure that your expected outcome is correct? I think there is an error in calculating the median in your question, as also pointed out in the comments. You could do it as follows:
library(dplyr)
df$V1 <- as.numeric(as.factor(df$V1))
values <- df %>% group_by(V1,V2) %>% summarise(median=median(V3))
new_df <- matrix(0,nrow=length(unique(df$V1)), ncol=20)
for(i in 1:nrow(new_df)){
for(j in 1:ncol(new_df)){
value <- values$median[values$V1==i & values$V2==j]
if(length(value)>0){
new_df[i,j] = value
}
}
}
new_df
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] 0 0 0 0 3.5 6 0 0 5 3.5 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 6 2 0.0 0 0 0 0 0.0 0 0 0 0 0 0 0 0 0 0
The outcome is slightly different though.
add a comment |
Are you sure that your expected outcome is correct? I think there is an error in calculating the median in your question, as also pointed out in the comments. You could do it as follows:
library(dplyr)
df$V1 <- as.numeric(as.factor(df$V1))
values <- df %>% group_by(V1,V2) %>% summarise(median=median(V3))
new_df <- matrix(0,nrow=length(unique(df$V1)), ncol=20)
for(i in 1:nrow(new_df)){
for(j in 1:ncol(new_df)){
value <- values$median[values$V1==i & values$V2==j]
if(length(value)>0){
new_df[i,j] = value
}
}
}
new_df
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] 0 0 0 0 3.5 6 0 0 5 3.5 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 6 2 0.0 0 0 0 0 0.0 0 0 0 0 0 0 0 0 0 0
The outcome is slightly different though.
add a comment |
Are you sure that your expected outcome is correct? I think there is an error in calculating the median in your question, as also pointed out in the comments. You could do it as follows:
library(dplyr)
df$V1 <- as.numeric(as.factor(df$V1))
values <- df %>% group_by(V1,V2) %>% summarise(median=median(V3))
new_df <- matrix(0,nrow=length(unique(df$V1)), ncol=20)
for(i in 1:nrow(new_df)){
for(j in 1:ncol(new_df)){
value <- values$median[values$V1==i & values$V2==j]
if(length(value)>0){
new_df[i,j] = value
}
}
}
new_df
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] 0 0 0 0 3.5 6 0 0 5 3.5 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 6 2 0.0 0 0 0 0 0.0 0 0 0 0 0 0 0 0 0 0
The outcome is slightly different though.
Are you sure that your expected outcome is correct? I think there is an error in calculating the median in your question, as also pointed out in the comments. You could do it as follows:
library(dplyr)
df$V1 <- as.numeric(as.factor(df$V1))
values <- df %>% group_by(V1,V2) %>% summarise(median=median(V3))
new_df <- matrix(0,nrow=length(unique(df$V1)), ncol=20)
for(i in 1:nrow(new_df)){
for(j in 1:ncol(new_df)){
value <- values$median[values$V1==i & values$V2==j]
if(length(value)>0){
new_df[i,j] = value
}
}
}
new_df
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] 0 0 0 0 3.5 6 0 0 5 3.5 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 6 2 0.0 0 0 0 0 0.0 0 0 0 0 0 0 0 0 0 0
The outcome is slightly different though.
answered Nov 21 '18 at 17:46
otwtmotwtm
35810
35810
add a comment |
add a comment |
df = transform(df,V1=factor(V1))
fill = matrix(0,length(levels(df$V1)),20)
df2=aggregate(V3~.,df,function(x)floor(median(x)))
fill[cbind(as.integer(df2$V1),df2$V2)]=df2$V3
fill
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] 0 0 0 0 3 6 0 0 5 3 0 0 0 0
[2,] 0 0 6 2 0 0 0 0 0 0 0 0 0 0
[,15] [,16] [,17] [,18] [,19] [,20]
[1,] 0 0 0 0 0 0
[2,] 0 0 0 0 0 0
add a comment |
df = transform(df,V1=factor(V1))
fill = matrix(0,length(levels(df$V1)),20)
df2=aggregate(V3~.,df,function(x)floor(median(x)))
fill[cbind(as.integer(df2$V1),df2$V2)]=df2$V3
fill
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] 0 0 0 0 3 6 0 0 5 3 0 0 0 0
[2,] 0 0 6 2 0 0 0 0 0 0 0 0 0 0
[,15] [,16] [,17] [,18] [,19] [,20]
[1,] 0 0 0 0 0 0
[2,] 0 0 0 0 0 0
add a comment |
df = transform(df,V1=factor(V1))
fill = matrix(0,length(levels(df$V1)),20)
df2=aggregate(V3~.,df,function(x)floor(median(x)))
fill[cbind(as.integer(df2$V1),df2$V2)]=df2$V3
fill
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] 0 0 0 0 3 6 0 0 5 3 0 0 0 0
[2,] 0 0 6 2 0 0 0 0 0 0 0 0 0 0
[,15] [,16] [,17] [,18] [,19] [,20]
[1,] 0 0 0 0 0 0
[2,] 0 0 0 0 0 0
df = transform(df,V1=factor(V1))
fill = matrix(0,length(levels(df$V1)),20)
df2=aggregate(V3~.,df,function(x)floor(median(x)))
fill[cbind(as.integer(df2$V1),df2$V2)]=df2$V3
fill
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] 0 0 0 0 3 6 0 0 5 3 0 0 0 0
[2,] 0 0 6 2 0 0 0 0 0 0 0 0 0 0
[,15] [,16] [,17] [,18] [,19] [,20]
[1,] 0 0 0 0 0 0
[2,] 0 0 0 0 0 0
answered Nov 21 '18 at 17:57
OnyambuOnyambu
15.4k1519
15.4k1519
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53416960%2fmaking-a-new-dataframe-based-on-certain-conditions%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
how can the median of
(3,9,2,4)
be3
and how can the median ofc(4,8)
be 4??– Onyambu
Nov 21 '18 at 17:23