Making a new dataframe based on certain conditions

-2

I have a dataframe object in R, sample of which is as follows:

From this dataframe, I want to create a new dataframe of 20 columns, as follows:

Only one row in the new dataframe,for each unique value in $1. Hence for this sample data, the new dataframe should have 2 rows(unique 4,12).

$2 represents the column number of the new dataframe, in which the value of $3(of this dataframe) is to be filled. If there are repeating cases, the median of the values of $3 is to be taken. For example, for
$1 value 4, 5 is repeated 4 times, and in the new dataframe, column 5 of the first row should have the value median(3,9,2,4) =3.

All other column values are zero.

A sample output for this data would be as follows:

0 0 0 0 3 4 0 0 3 4 0 0 0 0 0 0 0 0 0 0

0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

How can we do this in R? A huge thanks in advance!

asked Nov 21 '18 at 16:52

rishi

317

1

how can the median of (3,9,2,4) be 3 and how can the median of c(4,8) be 4??
– Onyambu
Nov 21 '18 at 17:23

add a comment |

-2

I have a dataframe object in R, sample of which is as follows:

From this dataframe, I want to create a new dataframe of 20 columns, as follows:

Only one row in the new dataframe,for each unique value in $1. Hence for this sample data, the new dataframe should have 2 rows(unique 4,12).

$2 represents the column number of the new dataframe, in which the value of $3(of this dataframe) is to be filled. If there are repeating cases, the median of the values of $3 is to be taken. For example, for
$1 value 4, 5 is repeated 4 times, and in the new dataframe, column 5 of the first row should have the value median(3,9,2,4) =3.

All other column values are zero.

A sample output for this data would be as follows:

0 0 0 0 3 4 0 0 3 4 0 0 0 0 0 0 0 0 0 0

0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

How can we do this in R? A huge thanks in advance!

asked Nov 21 '18 at 16:52

rishi

317

1

how can the median of (3,9,2,4) be 3 and how can the median of c(4,8) be 4??
– Onyambu
Nov 21 '18 at 17:23

add a comment |

-2

I have a dataframe object in R, sample of which is as follows:

From this dataframe, I want to create a new dataframe of 20 columns, as follows:

Only one row in the new dataframe,for each unique value in $1. Hence for this sample data, the new dataframe should have 2 rows(unique 4,12).

$2 represents the column number of the new dataframe, in which the value of $3(of this dataframe) is to be filled. If there are repeating cases, the median of the values of $3 is to be taken. For example, for
$1 value 4, 5 is repeated 4 times, and in the new dataframe, column 5 of the first row should have the value median(3,9,2,4) =3.

All other column values are zero.

A sample output for this data would be as follows:

0 0 0 0 3 4 0 0 3 4 0 0 0 0 0 0 0 0 0 0

0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

How can we do this in R? A huge thanks in advance!

asked Nov 21 '18 at 16:52

rishi

317

I have a dataframe object in R, sample of which is as follows:

From this dataframe, I want to create a new dataframe of 20 columns, as follows:

Only one row in the new dataframe,for each unique value in $1. Hence for this sample data, the new dataframe should have 2 rows(unique 4,12).

$2 represents the column number of the new dataframe, in which the value of $3(of this dataframe) is to be filled. If there are repeating cases, the median of the values of $3 is to be taken. For example, for
$1 value 4, 5 is repeated 4 times, and in the new dataframe, column 5 of the first row should have the value median(3,9,2,4) =3.

All other column values are zero.

A sample output for this data would be as follows:

0 0 0 0 3 4 0 0 3 4 0 0 0 0 0 0 0 0 0 0

0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

How can we do this in R? A huge thanks in advance!

r dataframe

asked Nov 21 '18 at 16:52

rishi

317

asked Nov 21 '18 at 16:52

rishi

317

asked Nov 21 '18 at 16:52

rishi

317

asked Nov 21 '18 at 16:52

rishi

317

asked Nov 21 '18 at 16:52

rishi

317

1

how can the median of (3,9,2,4) be 3 and how can the median of c(4,8) be 4??
– Onyambu
Nov 21 '18 at 17:23

add a comment |

1

how can the median of (3,9,2,4) be 3 and how can the median of c(4,8) be 4??
– Onyambu
Nov 21 '18 at 17:23

how can the median of (3,9,2,4) be 3 and how can the median of c(4,8) be 4??
– Onyambu
Nov 21 '18 at 17:23

add a comment |

2 Answers
2

active

oldest

votes

Are you sure that your expected outcome is correct? I think there is an error in calculating the median in your question, as also pointed out in the comments. You could do it as follows:

library(dplyr)



df$V1 <- as.numeric(as.factor(df$V1))

values <- df %>% group_by(V1,V2) %>% summarise(median=median(V3))

new_df <- matrix(0,nrow=length(unique(df$V1)), ncol=20)



for(i in 1:nrow(new_df)){

  for(j in 1:ncol(new_df)){

    value <- values$median[values$V1==i & values$V2==j]

    if(length(value)>0){

      new_df[i,j] = value  

    }

  }

}



new_df



    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]

[1,]    0    0    0    0  3.5    6    0    0    5   3.5     0     0     0     0     0     0     0     0     0     0

[2,]    0    0    6    2  0.0    0    0    0    0   0.0     0     0     0     0     0     0     0     0     0     0

The outcome is slightly different though.

answered Nov 21 '18 at 17:46

otwtm

35810

add a comment |

df = transform(df,V1=factor(V1))

fill = matrix(0,length(levels(df$V1)),20)

df2=aggregate(V3~.,df,function(x)floor(median(x)))

fill[cbind(as.integer(df2$V1),df2$V2)]=df2$V3

fill

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]

[1,]    0    0    0    0    3    6    0    0    5     3     0     0     0     0

[2,]    0    0    6    2    0    0    0    0    0     0     0     0     0     0

     [,15] [,16] [,17] [,18] [,19] [,20]

[1,]     0     0     0     0     0     0

[2,]     0     0     0     0     0     0

answered Nov 21 '18 at 17:57

Onyambu

15.4k1519

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53416960%2fmaking-a-new-dataframe-based-on-certain-conditions%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Are you sure that your expected outcome is correct? I think there is an error in calculating the median in your question, as also pointed out in the comments. You could do it as follows:

library(dplyr)



df$V1 <- as.numeric(as.factor(df$V1))

values <- df %>% group_by(V1,V2) %>% summarise(median=median(V3))

new_df <- matrix(0,nrow=length(unique(df$V1)), ncol=20)



for(i in 1:nrow(new_df)){

  for(j in 1:ncol(new_df)){

    value <- values$median[values$V1==i & values$V2==j]

    if(length(value)>0){

      new_df[i,j] = value  

    }

  }

}



new_df



    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]

[1,]    0    0    0    0  3.5    6    0    0    5   3.5     0     0     0     0     0     0     0     0     0     0

[2,]    0    0    6    2  0.0    0    0    0    0   0.0     0     0     0     0     0     0     0     0     0     0

The outcome is slightly different though.

answered Nov 21 '18 at 17:46

otwtm

35810

add a comment |

Are you sure that your expected outcome is correct? I think there is an error in calculating the median in your question, as also pointed out in the comments. You could do it as follows:

library(dplyr)



df$V1 <- as.numeric(as.factor(df$V1))

values <- df %>% group_by(V1,V2) %>% summarise(median=median(V3))

new_df <- matrix(0,nrow=length(unique(df$V1)), ncol=20)



for(i in 1:nrow(new_df)){

  for(j in 1:ncol(new_df)){

    value <- values$median[values$V1==i & values$V2==j]

    if(length(value)>0){

      new_df[i,j] = value  

    }

  }

}



new_df



    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]

[1,]    0    0    0    0  3.5    6    0    0    5   3.5     0     0     0     0     0     0     0     0     0     0

[2,]    0    0    6    2  0.0    0    0    0    0   0.0     0     0     0     0     0     0     0     0     0     0

The outcome is slightly different though.

answered Nov 21 '18 at 17:46

otwtm

35810

add a comment |

Are you sure that your expected outcome is correct? I think there is an error in calculating the median in your question, as also pointed out in the comments. You could do it as follows:

library(dplyr)



df$V1 <- as.numeric(as.factor(df$V1))

values <- df %>% group_by(V1,V2) %>% summarise(median=median(V3))

new_df <- matrix(0,nrow=length(unique(df$V1)), ncol=20)



for(i in 1:nrow(new_df)){

  for(j in 1:ncol(new_df)){

    value <- values$median[values$V1==i & values$V2==j]

    if(length(value)>0){

      new_df[i,j] = value  

    }

  }

}



new_df



    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]

[1,]    0    0    0    0  3.5    6    0    0    5   3.5     0     0     0     0     0     0     0     0     0     0

[2,]    0    0    6    2  0.0    0    0    0    0   0.0     0     0     0     0     0     0     0     0     0     0

The outcome is slightly different though.

answered Nov 21 '18 at 17:46

otwtm

35810

Are you sure that your expected outcome is correct? I think there is an error in calculating the median in your question, as also pointed out in the comments. You could do it as follows:

library(dplyr)



df$V1 <- as.numeric(as.factor(df$V1))

values <- df %>% group_by(V1,V2) %>% summarise(median=median(V3))

new_df <- matrix(0,nrow=length(unique(df$V1)), ncol=20)



for(i in 1:nrow(new_df)){

  for(j in 1:ncol(new_df)){

    value <- values$median[values$V1==i & values$V2==j]

    if(length(value)>0){

      new_df[i,j] = value  

    }

  }

}



new_df



    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]

[1,]    0    0    0    0  3.5    6    0    0    5   3.5     0     0     0     0     0     0     0     0     0     0

[2,]    0    0    6    2  0.0    0    0    0    0   0.0     0     0     0     0     0     0     0     0     0     0

The outcome is slightly different though.

answered Nov 21 '18 at 17:46

otwtm

35810

answered Nov 21 '18 at 17:46

otwtm

35810

answered Nov 21 '18 at 17:46

otwtm

35810

answered Nov 21 '18 at 17:46

otwtm

35810

add a comment |

df = transform(df,V1=factor(V1))

fill = matrix(0,length(levels(df$V1)),20)

df2=aggregate(V3~.,df,function(x)floor(median(x)))

fill[cbind(as.integer(df2$V1),df2$V2)]=df2$V3

fill

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]

[1,]    0    0    0    0    3    6    0    0    5     3     0     0     0     0

[2,]    0    0    6    2    0    0    0    0    0     0     0     0     0     0

     [,15] [,16] [,17] [,18] [,19] [,20]

[1,]     0     0     0     0     0     0

[2,]     0     0     0     0     0     0

answered Nov 21 '18 at 17:57

Onyambu

15.4k1519

add a comment |

df = transform(df,V1=factor(V1))

fill = matrix(0,length(levels(df$V1)),20)

df2=aggregate(V3~.,df,function(x)floor(median(x)))

fill[cbind(as.integer(df2$V1),df2$V2)]=df2$V3

fill

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]

[1,]    0    0    0    0    3    6    0    0    5     3     0     0     0     0

[2,]    0    0    6    2    0    0    0    0    0     0     0     0     0     0

     [,15] [,16] [,17] [,18] [,19] [,20]

[1,]     0     0     0     0     0     0

[2,]     0     0     0     0     0     0

answered Nov 21 '18 at 17:57

Onyambu

15.4k1519

add a comment |

df = transform(df,V1=factor(V1))

fill = matrix(0,length(levels(df$V1)),20)

df2=aggregate(V3~.,df,function(x)floor(median(x)))

fill[cbind(as.integer(df2$V1),df2$V2)]=df2$V3

fill

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]

[1,]    0    0    0    0    3    6    0    0    5     3     0     0     0     0

[2,]    0    0    6    2    0    0    0    0    0     0     0     0     0     0

     [,15] [,16] [,17] [,18] [,19] [,20]

[1,]     0     0     0     0     0     0

[2,]     0     0     0     0     0     0

answered Nov 21 '18 at 17:57

Onyambu

15.4k1519

df = transform(df,V1=factor(V1))

fill = matrix(0,length(levels(df$V1)),20)

df2=aggregate(V3~.,df,function(x)floor(median(x)))

fill[cbind(as.integer(df2$V1),df2$V2)]=df2$V3

fill

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]

[1,]    0    0    0    0    3    6    0    0    5     3     0     0     0     0

[2,]    0    0    6    2    0    0    0    0    0     0     0     0     0     0

     [,15] [,16] [,17] [,18] [,19] [,20]

[1,]     0     0     0     0     0     0

[2,]     0     0     0     0     0     0

answered Nov 21 '18 at 17:57

Onyambu

15.4k1519

answered Nov 21 '18 at 17:57

Onyambu

15.4k1519

answered Nov 21 '18 at 17:57

Onyambu

15.4k1519

answered Nov 21 '18 at 17:57

Onyambu

15.4k1519

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytukyg