igraph: adding vertices = X creating clusters of size = 1












1














I am currently working through some graph theory problems and have a question I can't seem to find an answer to. When creating a graph using:



x <- graph_from_data_frame(el, directed = F, vertices = x)


The addition of the vertices = x creates components of size = 1.



I want to look at cluster size i.e. extracting the components and looking at a table of size using:



comp <- components(x)
table(comp$csize)


Given the nature of edgelists, I would expect no clusters to have size <= 2, seeing as the edgelist is the relationship between two nodes.If I run the exact same code without the vertices = x, my table will start with clusters of size = 2.



Why does the addition of vertices = x do this?



Thanks



EDIT:



My edgelist has the variables:



ID   ID.2  soure 
x1 x2 healthcare
x1 x3 child benefit


The vertices data frame contains general information for the nodes(IDs)



 ID   date_of_birth   nationality   

x1 02/09/1999 French
x2 12/12/1997 French
x3 22/01/2002 French









share|improve this question




















  • 1




    The vertices argument is there to include vertex metadata. Without knowing what is in x its hard to say. If you post some of your data with dput() or make a minimal reproducible example it would be easier to diagnose.
    – gfgm
    Nov 21 '18 at 12:48










  • Hi, thanks for the quick response. I have edited the thread and added a small reproducible example.
    – williamg15
    Nov 21 '18 at 12:57
















1














I am currently working through some graph theory problems and have a question I can't seem to find an answer to. When creating a graph using:



x <- graph_from_data_frame(el, directed = F, vertices = x)


The addition of the vertices = x creates components of size = 1.



I want to look at cluster size i.e. extracting the components and looking at a table of size using:



comp <- components(x)
table(comp$csize)


Given the nature of edgelists, I would expect no clusters to have size <= 2, seeing as the edgelist is the relationship between two nodes.If I run the exact same code without the vertices = x, my table will start with clusters of size = 2.



Why does the addition of vertices = x do this?



Thanks



EDIT:



My edgelist has the variables:



ID   ID.2  soure 
x1 x2 healthcare
x1 x3 child benefit


The vertices data frame contains general information for the nodes(IDs)



 ID   date_of_birth   nationality   

x1 02/09/1999 French
x2 12/12/1997 French
x3 22/01/2002 French









share|improve this question




















  • 1




    The vertices argument is there to include vertex metadata. Without knowing what is in x its hard to say. If you post some of your data with dput() or make a minimal reproducible example it would be easier to diagnose.
    – gfgm
    Nov 21 '18 at 12:48










  • Hi, thanks for the quick response. I have edited the thread and added a small reproducible example.
    – williamg15
    Nov 21 '18 at 12:57














1












1








1







I am currently working through some graph theory problems and have a question I can't seem to find an answer to. When creating a graph using:



x <- graph_from_data_frame(el, directed = F, vertices = x)


The addition of the vertices = x creates components of size = 1.



I want to look at cluster size i.e. extracting the components and looking at a table of size using:



comp <- components(x)
table(comp$csize)


Given the nature of edgelists, I would expect no clusters to have size <= 2, seeing as the edgelist is the relationship between two nodes.If I run the exact same code without the vertices = x, my table will start with clusters of size = 2.



Why does the addition of vertices = x do this?



Thanks



EDIT:



My edgelist has the variables:



ID   ID.2  soure 
x1 x2 healthcare
x1 x3 child benefit


The vertices data frame contains general information for the nodes(IDs)



 ID   date_of_birth   nationality   

x1 02/09/1999 French
x2 12/12/1997 French
x3 22/01/2002 French









share|improve this question















I am currently working through some graph theory problems and have a question I can't seem to find an answer to. When creating a graph using:



x <- graph_from_data_frame(el, directed = F, vertices = x)


The addition of the vertices = x creates components of size = 1.



I want to look at cluster size i.e. extracting the components and looking at a table of size using:



comp <- components(x)
table(comp$csize)


Given the nature of edgelists, I would expect no clusters to have size <= 2, seeing as the edgelist is the relationship between two nodes.If I run the exact same code without the vertices = x, my table will start with clusters of size = 2.



Why does the addition of vertices = x do this?



Thanks



EDIT:



My edgelist has the variables:



ID   ID.2  soure 
x1 x2 healthcare
x1 x3 child benefit


The vertices data frame contains general information for the nodes(IDs)



 ID   date_of_birth   nationality   

x1 02/09/1999 French
x2 12/12/1997 French
x3 22/01/2002 French






r igraph graph-theory sna






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 '18 at 9:17









Szabolcs

16k361143




16k361143










asked Nov 21 '18 at 12:44









williamg15

104




104








  • 1




    The vertices argument is there to include vertex metadata. Without knowing what is in x its hard to say. If you post some of your data with dput() or make a minimal reproducible example it would be easier to diagnose.
    – gfgm
    Nov 21 '18 at 12:48










  • Hi, thanks for the quick response. I have edited the thread and added a small reproducible example.
    – williamg15
    Nov 21 '18 at 12:57














  • 1




    The vertices argument is there to include vertex metadata. Without knowing what is in x its hard to say. If you post some of your data with dput() or make a minimal reproducible example it would be easier to diagnose.
    – gfgm
    Nov 21 '18 at 12:48










  • Hi, thanks for the quick response. I have edited the thread and added a small reproducible example.
    – williamg15
    Nov 21 '18 at 12:57








1




1




The vertices argument is there to include vertex metadata. Without knowing what is in x its hard to say. If you post some of your data with dput() or make a minimal reproducible example it would be easier to diagnose.
– gfgm
Nov 21 '18 at 12:48




The vertices argument is there to include vertex metadata. Without knowing what is in x its hard to say. If you post some of your data with dput() or make a minimal reproducible example it would be easier to diagnose.
– gfgm
Nov 21 '18 at 12:48












Hi, thanks for the quick response. I have edited the thread and added a small reproducible example.
– williamg15
Nov 21 '18 at 12:57




Hi, thanks for the quick response. I have edited the thread and added a small reproducible example.
– williamg15
Nov 21 '18 at 12:57












1 Answer
1






active

oldest

votes


















0














I suspect that what is happening is that you have IDs appearing in your data.frame of node metadata x that do not appear in the edge list. Igraph will add these nodes as isolated vertices. Some sample code below to illustrate the problem:





library(igraph)

# generate some fake data
set.seed(42)
e1 <- data.frame(ID = sample(1:10, 5), ID.2 = sample(1:10, 5))
head(e1)
#> ID ID.2
#> 1 10 6
#> 2 9 7
#> 3 3 2
#> 4 6 5
#> 5 4 9

# make the desired graph object
x <- graph_from_data_frame(e1, directed = F)

# make some attribute data that only matches the nodes that have edges
v_atts1 <- data.frame(ID = names(V(x)), foo = rnorm(length(names(V(x)))))
v_atts1
#> ID foo
#> 1 10 -0.10612452
#> 2 9 1.51152200
#> 3 3 -0.09465904
#> 4 6 2.01842371
#> 5 4 -0.06271410
#> 6 7 1.30486965
#> 7 2 2.28664539
#> 8 5 -1.38886070

g1 <- graph_from_data_frame(e1, directed = FALSE, vertices = v_atts1)

# we can see only groups of size 2 and greater
comp1 <- components(g1)
table(comp1$csize)
#>
#> 2 3
#> 1 2

# now make attribute data that includes nodes that dont appear in e1
v_atts2 <- data.frame(ID = 1:10, foo=rnorm(10))
g2 <- graph_from_data_frame(e1, directed = FALSE, vertices = v_atts2)

# now we see that there are isolated nodes
comp2 <- components(g2)
table(comp2$csize)
#>
#> 1 2 3
#> 2 1 2

# and inspecting the number of vertices we see that
# this is because the graph has incorporated vertices
# that appear in the metadata but not the edge list
length(V(g1))
#> [1] 8
length(V(g2))
#> [1] 10


If you wanted to avoid this you could try graph_from_data_frame(e1, directed=FALSE, vertices=x[x$ID %in% c(e1$ID, e1$ID.2),]) which should subset your metadata to only the vertices that are connected. Note that you may want to check that your IDs are not being encoded as factors with levels that are not appearing in the data.






share|improve this answer





















  • Sorry for only getting back to you now! The %in% procedure solves the problem. Many thanks
    – williamg15
    Dec 3 '18 at 10:51











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53412315%2figraph-adding-vertices-x-creating-clusters-of-size-1%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














I suspect that what is happening is that you have IDs appearing in your data.frame of node metadata x that do not appear in the edge list. Igraph will add these nodes as isolated vertices. Some sample code below to illustrate the problem:





library(igraph)

# generate some fake data
set.seed(42)
e1 <- data.frame(ID = sample(1:10, 5), ID.2 = sample(1:10, 5))
head(e1)
#> ID ID.2
#> 1 10 6
#> 2 9 7
#> 3 3 2
#> 4 6 5
#> 5 4 9

# make the desired graph object
x <- graph_from_data_frame(e1, directed = F)

# make some attribute data that only matches the nodes that have edges
v_atts1 <- data.frame(ID = names(V(x)), foo = rnorm(length(names(V(x)))))
v_atts1
#> ID foo
#> 1 10 -0.10612452
#> 2 9 1.51152200
#> 3 3 -0.09465904
#> 4 6 2.01842371
#> 5 4 -0.06271410
#> 6 7 1.30486965
#> 7 2 2.28664539
#> 8 5 -1.38886070

g1 <- graph_from_data_frame(e1, directed = FALSE, vertices = v_atts1)

# we can see only groups of size 2 and greater
comp1 <- components(g1)
table(comp1$csize)
#>
#> 2 3
#> 1 2

# now make attribute data that includes nodes that dont appear in e1
v_atts2 <- data.frame(ID = 1:10, foo=rnorm(10))
g2 <- graph_from_data_frame(e1, directed = FALSE, vertices = v_atts2)

# now we see that there are isolated nodes
comp2 <- components(g2)
table(comp2$csize)
#>
#> 1 2 3
#> 2 1 2

# and inspecting the number of vertices we see that
# this is because the graph has incorporated vertices
# that appear in the metadata but not the edge list
length(V(g1))
#> [1] 8
length(V(g2))
#> [1] 10


If you wanted to avoid this you could try graph_from_data_frame(e1, directed=FALSE, vertices=x[x$ID %in% c(e1$ID, e1$ID.2),]) which should subset your metadata to only the vertices that are connected. Note that you may want to check that your IDs are not being encoded as factors with levels that are not appearing in the data.






share|improve this answer





















  • Sorry for only getting back to you now! The %in% procedure solves the problem. Many thanks
    – williamg15
    Dec 3 '18 at 10:51
















0














I suspect that what is happening is that you have IDs appearing in your data.frame of node metadata x that do not appear in the edge list. Igraph will add these nodes as isolated vertices. Some sample code below to illustrate the problem:





library(igraph)

# generate some fake data
set.seed(42)
e1 <- data.frame(ID = sample(1:10, 5), ID.2 = sample(1:10, 5))
head(e1)
#> ID ID.2
#> 1 10 6
#> 2 9 7
#> 3 3 2
#> 4 6 5
#> 5 4 9

# make the desired graph object
x <- graph_from_data_frame(e1, directed = F)

# make some attribute data that only matches the nodes that have edges
v_atts1 <- data.frame(ID = names(V(x)), foo = rnorm(length(names(V(x)))))
v_atts1
#> ID foo
#> 1 10 -0.10612452
#> 2 9 1.51152200
#> 3 3 -0.09465904
#> 4 6 2.01842371
#> 5 4 -0.06271410
#> 6 7 1.30486965
#> 7 2 2.28664539
#> 8 5 -1.38886070

g1 <- graph_from_data_frame(e1, directed = FALSE, vertices = v_atts1)

# we can see only groups of size 2 and greater
comp1 <- components(g1)
table(comp1$csize)
#>
#> 2 3
#> 1 2

# now make attribute data that includes nodes that dont appear in e1
v_atts2 <- data.frame(ID = 1:10, foo=rnorm(10))
g2 <- graph_from_data_frame(e1, directed = FALSE, vertices = v_atts2)

# now we see that there are isolated nodes
comp2 <- components(g2)
table(comp2$csize)
#>
#> 1 2 3
#> 2 1 2

# and inspecting the number of vertices we see that
# this is because the graph has incorporated vertices
# that appear in the metadata but not the edge list
length(V(g1))
#> [1] 8
length(V(g2))
#> [1] 10


If you wanted to avoid this you could try graph_from_data_frame(e1, directed=FALSE, vertices=x[x$ID %in% c(e1$ID, e1$ID.2),]) which should subset your metadata to only the vertices that are connected. Note that you may want to check that your IDs are not being encoded as factors with levels that are not appearing in the data.






share|improve this answer





















  • Sorry for only getting back to you now! The %in% procedure solves the problem. Many thanks
    – williamg15
    Dec 3 '18 at 10:51














0












0








0






I suspect that what is happening is that you have IDs appearing in your data.frame of node metadata x that do not appear in the edge list. Igraph will add these nodes as isolated vertices. Some sample code below to illustrate the problem:





library(igraph)

# generate some fake data
set.seed(42)
e1 <- data.frame(ID = sample(1:10, 5), ID.2 = sample(1:10, 5))
head(e1)
#> ID ID.2
#> 1 10 6
#> 2 9 7
#> 3 3 2
#> 4 6 5
#> 5 4 9

# make the desired graph object
x <- graph_from_data_frame(e1, directed = F)

# make some attribute data that only matches the nodes that have edges
v_atts1 <- data.frame(ID = names(V(x)), foo = rnorm(length(names(V(x)))))
v_atts1
#> ID foo
#> 1 10 -0.10612452
#> 2 9 1.51152200
#> 3 3 -0.09465904
#> 4 6 2.01842371
#> 5 4 -0.06271410
#> 6 7 1.30486965
#> 7 2 2.28664539
#> 8 5 -1.38886070

g1 <- graph_from_data_frame(e1, directed = FALSE, vertices = v_atts1)

# we can see only groups of size 2 and greater
comp1 <- components(g1)
table(comp1$csize)
#>
#> 2 3
#> 1 2

# now make attribute data that includes nodes that dont appear in e1
v_atts2 <- data.frame(ID = 1:10, foo=rnorm(10))
g2 <- graph_from_data_frame(e1, directed = FALSE, vertices = v_atts2)

# now we see that there are isolated nodes
comp2 <- components(g2)
table(comp2$csize)
#>
#> 1 2 3
#> 2 1 2

# and inspecting the number of vertices we see that
# this is because the graph has incorporated vertices
# that appear in the metadata but not the edge list
length(V(g1))
#> [1] 8
length(V(g2))
#> [1] 10


If you wanted to avoid this you could try graph_from_data_frame(e1, directed=FALSE, vertices=x[x$ID %in% c(e1$ID, e1$ID.2),]) which should subset your metadata to only the vertices that are connected. Note that you may want to check that your IDs are not being encoded as factors with levels that are not appearing in the data.






share|improve this answer












I suspect that what is happening is that you have IDs appearing in your data.frame of node metadata x that do not appear in the edge list. Igraph will add these nodes as isolated vertices. Some sample code below to illustrate the problem:





library(igraph)

# generate some fake data
set.seed(42)
e1 <- data.frame(ID = sample(1:10, 5), ID.2 = sample(1:10, 5))
head(e1)
#> ID ID.2
#> 1 10 6
#> 2 9 7
#> 3 3 2
#> 4 6 5
#> 5 4 9

# make the desired graph object
x <- graph_from_data_frame(e1, directed = F)

# make some attribute data that only matches the nodes that have edges
v_atts1 <- data.frame(ID = names(V(x)), foo = rnorm(length(names(V(x)))))
v_atts1
#> ID foo
#> 1 10 -0.10612452
#> 2 9 1.51152200
#> 3 3 -0.09465904
#> 4 6 2.01842371
#> 5 4 -0.06271410
#> 6 7 1.30486965
#> 7 2 2.28664539
#> 8 5 -1.38886070

g1 <- graph_from_data_frame(e1, directed = FALSE, vertices = v_atts1)

# we can see only groups of size 2 and greater
comp1 <- components(g1)
table(comp1$csize)
#>
#> 2 3
#> 1 2

# now make attribute data that includes nodes that dont appear in e1
v_atts2 <- data.frame(ID = 1:10, foo=rnorm(10))
g2 <- graph_from_data_frame(e1, directed = FALSE, vertices = v_atts2)

# now we see that there are isolated nodes
comp2 <- components(g2)
table(comp2$csize)
#>
#> 1 2 3
#> 2 1 2

# and inspecting the number of vertices we see that
# this is because the graph has incorporated vertices
# that appear in the metadata but not the edge list
length(V(g1))
#> [1] 8
length(V(g2))
#> [1] 10


If you wanted to avoid this you could try graph_from_data_frame(e1, directed=FALSE, vertices=x[x$ID %in% c(e1$ID, e1$ID.2),]) which should subset your metadata to only the vertices that are connected. Note that you may want to check that your IDs are not being encoded as factors with levels that are not appearing in the data.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 21 '18 at 13:12









gfgm

2,159625




2,159625












  • Sorry for only getting back to you now! The %in% procedure solves the problem. Many thanks
    – williamg15
    Dec 3 '18 at 10:51


















  • Sorry for only getting back to you now! The %in% procedure solves the problem. Many thanks
    – williamg15
    Dec 3 '18 at 10:51
















Sorry for only getting back to you now! The %in% procedure solves the problem. Many thanks
– williamg15
Dec 3 '18 at 10:51




Sorry for only getting back to you now! The %in% procedure solves the problem. Many thanks
– williamg15
Dec 3 '18 at 10:51


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53412315%2figraph-adding-vertices-x-creating-clusters-of-size-1%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Wiesbaden

Marschland

Dieringhausen