python incorrect size with getsizeof() and .nbytes with nested lists
I apologise if this is a duplicate issue, but I've been having some issues with .nsize
and sys.getsizeof()
.
In particular, I have a list which contains numpy arrays, each array is a 3D representation of an image (row, column, RGB) and each of these images have different dimensions.
There are over 4000 images, and this may increase in the future, as I plan to use them for machine learning.
When I use .nsize with one image, I get the correct size, but when I try to evaluate the whole lot, I get an incorrect size:
# size of image 1 in bytes
print("size of first image: %d bytes" % images[0].nbytes)
# size of all images in bytes
print("total size of all images: %d bytes" % images.nbytes)
Result:
size of first image: 60066 bytes
total size of all images: 36600 bytes
Are the only ways around this to either loop through all the images or change to a monstrous 4D array instead of a list of 3D arrays? Is there another function which better evaluates size for this kind of nested setup?
I'm running Python 3.6.7.
python-3.x numpy size
add a comment |
I apologise if this is a duplicate issue, but I've been having some issues with .nsize
and sys.getsizeof()
.
In particular, I have a list which contains numpy arrays, each array is a 3D representation of an image (row, column, RGB) and each of these images have different dimensions.
There are over 4000 images, and this may increase in the future, as I plan to use them for machine learning.
When I use .nsize with one image, I get the correct size, but when I try to evaluate the whole lot, I get an incorrect size:
# size of image 1 in bytes
print("size of first image: %d bytes" % images[0].nbytes)
# size of all images in bytes
print("total size of all images: %d bytes" % images.nbytes)
Result:
size of first image: 60066 bytes
total size of all images: 36600 bytes
Are the only ways around this to either loop through all the images or change to a monstrous 4D array instead of a list of 3D arrays? Is there another function which better evaluates size for this kind of nested setup?
I'm running Python 3.6.7.
python-3.x numpy size
Focus on shape and dtype. The other measures don't help you understand. And don't give us a massive display of the data.
– hpaulj
Nov 25 '18 at 15:05
1
Many, if not all, machine learning tools, assume the images have the same shape. They will raise errors if you try to use a list or object dtype array with diverse shapes. Both lists and object arrays contain pointers to arrays elsewhere in memory. So any size measure of the container just sees the pointers (e.g. 8 byte integers).
– hpaulj
Nov 25 '18 at 17:29
@hpaulj I would have avoided including that data but someone asked for it. I'll remove it now.
– Armand Bernard
Nov 26 '18 at 1:08
@hpaulj as it happens I guess that second comment of yours answers my question as to 'why' it is happening, so thanks for that. I understand that better now. I'm not going to close this yet as I'm looking for solutions too.
– Armand Bernard
Nov 26 '18 at 1:15
add a comment |
I apologise if this is a duplicate issue, but I've been having some issues with .nsize
and sys.getsizeof()
.
In particular, I have a list which contains numpy arrays, each array is a 3D representation of an image (row, column, RGB) and each of these images have different dimensions.
There are over 4000 images, and this may increase in the future, as I plan to use them for machine learning.
When I use .nsize with one image, I get the correct size, but when I try to evaluate the whole lot, I get an incorrect size:
# size of image 1 in bytes
print("size of first image: %d bytes" % images[0].nbytes)
# size of all images in bytes
print("total size of all images: %d bytes" % images.nbytes)
Result:
size of first image: 60066 bytes
total size of all images: 36600 bytes
Are the only ways around this to either loop through all the images or change to a monstrous 4D array instead of a list of 3D arrays? Is there another function which better evaluates size for this kind of nested setup?
I'm running Python 3.6.7.
python-3.x numpy size
I apologise if this is a duplicate issue, but I've been having some issues with .nsize
and sys.getsizeof()
.
In particular, I have a list which contains numpy arrays, each array is a 3D representation of an image (row, column, RGB) and each of these images have different dimensions.
There are over 4000 images, and this may increase in the future, as I plan to use them for machine learning.
When I use .nsize with one image, I get the correct size, but when I try to evaluate the whole lot, I get an incorrect size:
# size of image 1 in bytes
print("size of first image: %d bytes" % images[0].nbytes)
# size of all images in bytes
print("total size of all images: %d bytes" % images.nbytes)
Result:
size of first image: 60066 bytes
total size of all images: 36600 bytes
Are the only ways around this to either loop through all the images or change to a monstrous 4D array instead of a list of 3D arrays? Is there another function which better evaluates size for this kind of nested setup?
I'm running Python 3.6.7.
python-3.x numpy size
python-3.x numpy size
edited Nov 26 '18 at 1:08
Armand Bernard
asked Nov 25 '18 at 11:36
Armand BernardArmand Bernard
12
12
Focus on shape and dtype. The other measures don't help you understand. And don't give us a massive display of the data.
– hpaulj
Nov 25 '18 at 15:05
1
Many, if not all, machine learning tools, assume the images have the same shape. They will raise errors if you try to use a list or object dtype array with diverse shapes. Both lists and object arrays contain pointers to arrays elsewhere in memory. So any size measure of the container just sees the pointers (e.g. 8 byte integers).
– hpaulj
Nov 25 '18 at 17:29
@hpaulj I would have avoided including that data but someone asked for it. I'll remove it now.
– Armand Bernard
Nov 26 '18 at 1:08
@hpaulj as it happens I guess that second comment of yours answers my question as to 'why' it is happening, so thanks for that. I understand that better now. I'm not going to close this yet as I'm looking for solutions too.
– Armand Bernard
Nov 26 '18 at 1:15
add a comment |
Focus on shape and dtype. The other measures don't help you understand. And don't give us a massive display of the data.
– hpaulj
Nov 25 '18 at 15:05
1
Many, if not all, machine learning tools, assume the images have the same shape. They will raise errors if you try to use a list or object dtype array with diverse shapes. Both lists and object arrays contain pointers to arrays elsewhere in memory. So any size measure of the container just sees the pointers (e.g. 8 byte integers).
– hpaulj
Nov 25 '18 at 17:29
@hpaulj I would have avoided including that data but someone asked for it. I'll remove it now.
– Armand Bernard
Nov 26 '18 at 1:08
@hpaulj as it happens I guess that second comment of yours answers my question as to 'why' it is happening, so thanks for that. I understand that better now. I'm not going to close this yet as I'm looking for solutions too.
– Armand Bernard
Nov 26 '18 at 1:15
Focus on shape and dtype. The other measures don't help you understand. And don't give us a massive display of the data.
– hpaulj
Nov 25 '18 at 15:05
Focus on shape and dtype. The other measures don't help you understand. And don't give us a massive display of the data.
– hpaulj
Nov 25 '18 at 15:05
1
1
Many, if not all, machine learning tools, assume the images have the same shape. They will raise errors if you try to use a list or object dtype array with diverse shapes. Both lists and object arrays contain pointers to arrays elsewhere in memory. So any size measure of the container just sees the pointers (e.g. 8 byte integers).
– hpaulj
Nov 25 '18 at 17:29
Many, if not all, machine learning tools, assume the images have the same shape. They will raise errors if you try to use a list or object dtype array with diverse shapes. Both lists and object arrays contain pointers to arrays elsewhere in memory. So any size measure of the container just sees the pointers (e.g. 8 byte integers).
– hpaulj
Nov 25 '18 at 17:29
@hpaulj I would have avoided including that data but someone asked for it. I'll remove it now.
– Armand Bernard
Nov 26 '18 at 1:08
@hpaulj I would have avoided including that data but someone asked for it. I'll remove it now.
– Armand Bernard
Nov 26 '18 at 1:08
@hpaulj as it happens I guess that second comment of yours answers my question as to 'why' it is happening, so thanks for that. I understand that better now. I'm not going to close this yet as I'm looking for solutions too.
– Armand Bernard
Nov 26 '18 at 1:15
@hpaulj as it happens I guess that second comment of yours answers my question as to 'why' it is happening, so thanks for that. I understand that better now. I'm not going to close this yet as I'm looking for solutions too.
– Armand Bernard
Nov 26 '18 at 1:15
add a comment |
1 Answer
1
active
oldest
votes
Try running images.dtype
. What does it return? If it's dtype('O')
, that explains your problem: images
is not a list, but is instead a Numpy array of type object
, which is generally a Bad Idea™️. Technically, it'll be an 1D array holding a bunch of 3D arrays.
Numpy arrays are best suited to use with numerical data. They're flexible enough to hold arbitrary Python objects, but it greatly impairs both their functionality and their efficiency. Unless you have a clear reason why in mind, you should generally just use a plain Python list in these situations.
You may actually be best off converting images
to a 4D array, as this is the only way that images.nbytes
will work correctly. You can't do this if your images are all different sizes, but given that they all have the same shape (x, y, z)
it's actually pretty straightforward:
images = np.array([a for a in images])
Now images.shape
will be (n, x, y, z)
, where n
is the total number of images. You can access the 3D array that represents the i
th image by just indexing images
:
image_i = images[i]
Alternatively, you can convert images
to a normal Python list:
images = images.to_list()
If you don't want to bother with any of those conversions, you can always get the size of all the subarrays via iteration:
totalsize = sum(arr.nbytes for arr in images)
images.dtype returns "object", which I'm guessing is what you said. Even though I created it as a list, I guess it changed type when I started importing arrays.
– Armand Bernard
Nov 26 '18 at 1:28
Thanks for your answer, what you propose with making it a 4D array does theoretically pose a problem though. If I import an image larger than the previous one into the 4D array, I'd probably get an error and have to pad previous images so it all fits. I'd then have to reverse this process every time I access the image. Seems like a huge hassle just so I can gauge the 'weight' of my dataset.
– Armand Bernard
Nov 26 '18 at 1:39
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53467036%2fpython-incorrect-size-with-getsizeof-and-nbytes-with-nested-lists%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try running images.dtype
. What does it return? If it's dtype('O')
, that explains your problem: images
is not a list, but is instead a Numpy array of type object
, which is generally a Bad Idea™️. Technically, it'll be an 1D array holding a bunch of 3D arrays.
Numpy arrays are best suited to use with numerical data. They're flexible enough to hold arbitrary Python objects, but it greatly impairs both their functionality and their efficiency. Unless you have a clear reason why in mind, you should generally just use a plain Python list in these situations.
You may actually be best off converting images
to a 4D array, as this is the only way that images.nbytes
will work correctly. You can't do this if your images are all different sizes, but given that they all have the same shape (x, y, z)
it's actually pretty straightforward:
images = np.array([a for a in images])
Now images.shape
will be (n, x, y, z)
, where n
is the total number of images. You can access the 3D array that represents the i
th image by just indexing images
:
image_i = images[i]
Alternatively, you can convert images
to a normal Python list:
images = images.to_list()
If you don't want to bother with any of those conversions, you can always get the size of all the subarrays via iteration:
totalsize = sum(arr.nbytes for arr in images)
images.dtype returns "object", which I'm guessing is what you said. Even though I created it as a list, I guess it changed type when I started importing arrays.
– Armand Bernard
Nov 26 '18 at 1:28
Thanks for your answer, what you propose with making it a 4D array does theoretically pose a problem though. If I import an image larger than the previous one into the 4D array, I'd probably get an error and have to pad previous images so it all fits. I'd then have to reverse this process every time I access the image. Seems like a huge hassle just so I can gauge the 'weight' of my dataset.
– Armand Bernard
Nov 26 '18 at 1:39
add a comment |
Try running images.dtype
. What does it return? If it's dtype('O')
, that explains your problem: images
is not a list, but is instead a Numpy array of type object
, which is generally a Bad Idea™️. Technically, it'll be an 1D array holding a bunch of 3D arrays.
Numpy arrays are best suited to use with numerical data. They're flexible enough to hold arbitrary Python objects, but it greatly impairs both their functionality and their efficiency. Unless you have a clear reason why in mind, you should generally just use a plain Python list in these situations.
You may actually be best off converting images
to a 4D array, as this is the only way that images.nbytes
will work correctly. You can't do this if your images are all different sizes, but given that they all have the same shape (x, y, z)
it's actually pretty straightforward:
images = np.array([a for a in images])
Now images.shape
will be (n, x, y, z)
, where n
is the total number of images. You can access the 3D array that represents the i
th image by just indexing images
:
image_i = images[i]
Alternatively, you can convert images
to a normal Python list:
images = images.to_list()
If you don't want to bother with any of those conversions, you can always get the size of all the subarrays via iteration:
totalsize = sum(arr.nbytes for arr in images)
images.dtype returns "object", which I'm guessing is what you said. Even though I created it as a list, I guess it changed type when I started importing arrays.
– Armand Bernard
Nov 26 '18 at 1:28
Thanks for your answer, what you propose with making it a 4D array does theoretically pose a problem though. If I import an image larger than the previous one into the 4D array, I'd probably get an error and have to pad previous images so it all fits. I'd then have to reverse this process every time I access the image. Seems like a huge hassle just so I can gauge the 'weight' of my dataset.
– Armand Bernard
Nov 26 '18 at 1:39
add a comment |
Try running images.dtype
. What does it return? If it's dtype('O')
, that explains your problem: images
is not a list, but is instead a Numpy array of type object
, which is generally a Bad Idea™️. Technically, it'll be an 1D array holding a bunch of 3D arrays.
Numpy arrays are best suited to use with numerical data. They're flexible enough to hold arbitrary Python objects, but it greatly impairs both their functionality and their efficiency. Unless you have a clear reason why in mind, you should generally just use a plain Python list in these situations.
You may actually be best off converting images
to a 4D array, as this is the only way that images.nbytes
will work correctly. You can't do this if your images are all different sizes, but given that they all have the same shape (x, y, z)
it's actually pretty straightforward:
images = np.array([a for a in images])
Now images.shape
will be (n, x, y, z)
, where n
is the total number of images. You can access the 3D array that represents the i
th image by just indexing images
:
image_i = images[i]
Alternatively, you can convert images
to a normal Python list:
images = images.to_list()
If you don't want to bother with any of those conversions, you can always get the size of all the subarrays via iteration:
totalsize = sum(arr.nbytes for arr in images)
Try running images.dtype
. What does it return? If it's dtype('O')
, that explains your problem: images
is not a list, but is instead a Numpy array of type object
, which is generally a Bad Idea™️. Technically, it'll be an 1D array holding a bunch of 3D arrays.
Numpy arrays are best suited to use with numerical data. They're flexible enough to hold arbitrary Python objects, but it greatly impairs both their functionality and their efficiency. Unless you have a clear reason why in mind, you should generally just use a plain Python list in these situations.
You may actually be best off converting images
to a 4D array, as this is the only way that images.nbytes
will work correctly. You can't do this if your images are all different sizes, but given that they all have the same shape (x, y, z)
it's actually pretty straightforward:
images = np.array([a for a in images])
Now images.shape
will be (n, x, y, z)
, where n
is the total number of images. You can access the 3D array that represents the i
th image by just indexing images
:
image_i = images[i]
Alternatively, you can convert images
to a normal Python list:
images = images.to_list()
If you don't want to bother with any of those conversions, you can always get the size of all the subarrays via iteration:
totalsize = sum(arr.nbytes for arr in images)
edited Nov 25 '18 at 13:36
answered Nov 25 '18 at 13:18
teltel
7,41621431
7,41621431
images.dtype returns "object", which I'm guessing is what you said. Even though I created it as a list, I guess it changed type when I started importing arrays.
– Armand Bernard
Nov 26 '18 at 1:28
Thanks for your answer, what you propose with making it a 4D array does theoretically pose a problem though. If I import an image larger than the previous one into the 4D array, I'd probably get an error and have to pad previous images so it all fits. I'd then have to reverse this process every time I access the image. Seems like a huge hassle just so I can gauge the 'weight' of my dataset.
– Armand Bernard
Nov 26 '18 at 1:39
add a comment |
images.dtype returns "object", which I'm guessing is what you said. Even though I created it as a list, I guess it changed type when I started importing arrays.
– Armand Bernard
Nov 26 '18 at 1:28
Thanks for your answer, what you propose with making it a 4D array does theoretically pose a problem though. If I import an image larger than the previous one into the 4D array, I'd probably get an error and have to pad previous images so it all fits. I'd then have to reverse this process every time I access the image. Seems like a huge hassle just so I can gauge the 'weight' of my dataset.
– Armand Bernard
Nov 26 '18 at 1:39
images.dtype returns "object", which I'm guessing is what you said. Even though I created it as a list, I guess it changed type when I started importing arrays.
– Armand Bernard
Nov 26 '18 at 1:28
images.dtype returns "object", which I'm guessing is what you said. Even though I created it as a list, I guess it changed type when I started importing arrays.
– Armand Bernard
Nov 26 '18 at 1:28
Thanks for your answer, what you propose with making it a 4D array does theoretically pose a problem though. If I import an image larger than the previous one into the 4D array, I'd probably get an error and have to pad previous images so it all fits. I'd then have to reverse this process every time I access the image. Seems like a huge hassle just so I can gauge the 'weight' of my dataset.
– Armand Bernard
Nov 26 '18 at 1:39
Thanks for your answer, what you propose with making it a 4D array does theoretically pose a problem though. If I import an image larger than the previous one into the 4D array, I'd probably get an error and have to pad previous images so it all fits. I'd then have to reverse this process every time I access the image. Seems like a huge hassle just so I can gauge the 'weight' of my dataset.
– Armand Bernard
Nov 26 '18 at 1:39
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53467036%2fpython-incorrect-size-with-getsizeof-and-nbytes-with-nested-lists%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Focus on shape and dtype. The other measures don't help you understand. And don't give us a massive display of the data.
– hpaulj
Nov 25 '18 at 15:05
1
Many, if not all, machine learning tools, assume the images have the same shape. They will raise errors if you try to use a list or object dtype array with diverse shapes. Both lists and object arrays contain pointers to arrays elsewhere in memory. So any size measure of the container just sees the pointers (e.g. 8 byte integers).
– hpaulj
Nov 25 '18 at 17:29
@hpaulj I would have avoided including that data but someone asked for it. I'll remove it now.
– Armand Bernard
Nov 26 '18 at 1:08
@hpaulj as it happens I guess that second comment of yours answers my question as to 'why' it is happening, so thanks for that. I understand that better now. I'm not going to close this yet as I'm looking for solutions too.
– Armand Bernard
Nov 26 '18 at 1:15