Pytorch DataLoader multiple data source
I am trying to use Pytorch dataloader to define my own dataset, but I am not sure how to load multiple data source:
My current code:
class MultipleSourceDataSet(Dataset):
def __init__ (self, json_file, root_dir, transform = None):
with open(root_dir + 'block0.json') as f:
self.result = torch.Tensor(json.load(f))
self.root_dir = root_dir
self.transform = transform
def __len__(self):
return len(self.result[0])
def __getitem__ (self):
None
The data source is 50 blocks under root_dir = ~/Documents/blocks/
I split them and avoid to combine them directly before since this is a very big dataset.
How can I load them into a single dataloader?
python-3.x image-processing machine-learning deep-learning pytorch
add a comment |
I am trying to use Pytorch dataloader to define my own dataset, but I am not sure how to load multiple data source:
My current code:
class MultipleSourceDataSet(Dataset):
def __init__ (self, json_file, root_dir, transform = None):
with open(root_dir + 'block0.json') as f:
self.result = torch.Tensor(json.load(f))
self.root_dir = root_dir
self.transform = transform
def __len__(self):
return len(self.result[0])
def __getitem__ (self):
None
The data source is 50 blocks under root_dir = ~/Documents/blocks/
I split them and avoid to combine them directly before since this is a very big dataset.
How can I load them into a single dataloader?
python-3.x image-processing machine-learning deep-learning pytorch
add a comment |
I am trying to use Pytorch dataloader to define my own dataset, but I am not sure how to load multiple data source:
My current code:
class MultipleSourceDataSet(Dataset):
def __init__ (self, json_file, root_dir, transform = None):
with open(root_dir + 'block0.json') as f:
self.result = torch.Tensor(json.load(f))
self.root_dir = root_dir
self.transform = transform
def __len__(self):
return len(self.result[0])
def __getitem__ (self):
None
The data source is 50 blocks under root_dir = ~/Documents/blocks/
I split them and avoid to combine them directly before since this is a very big dataset.
How can I load them into a single dataloader?
python-3.x image-processing machine-learning deep-learning pytorch
I am trying to use Pytorch dataloader to define my own dataset, but I am not sure how to load multiple data source:
My current code:
class MultipleSourceDataSet(Dataset):
def __init__ (self, json_file, root_dir, transform = None):
with open(root_dir + 'block0.json') as f:
self.result = torch.Tensor(json.load(f))
self.root_dir = root_dir
self.transform = transform
def __len__(self):
return len(self.result[0])
def __getitem__ (self):
None
The data source is 50 blocks under root_dir = ~/Documents/blocks/
I split them and avoid to combine them directly before since this is a very big dataset.
How can I load them into a single dataloader?
python-3.x image-processing machine-learning deep-learning pytorch
python-3.x image-processing machine-learning deep-learning pytorch
edited Nov 27 '18 at 8:11
Shai
70.7k23138247
70.7k23138247
asked Nov 26 '18 at 9:14
sealpuppysealpuppy
739
739
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
For DataLoader
you need to have a single Dataset
, your problem is that you have multiple 'json'
files and you only know how to create a Dataset
from each 'json'
separately.
What you can do in this case is to use ConcatDataset
that contains all the single-'json'
datasets you create:
import os
import torch.utils.data as data
class SingeJsonDataset(data.Dataset):
# implement a single json dataset here...
list_of_datasets =
for j in os.path.listdir(root_dir):
if not j.endswith('.json'):
continue # skip non-json files
list_of_datasets.append(SingeJsonDataset(json_file=j, root_dir=root_dir, transform=None))
# once all single json datasets are created you can concat them into a single one:
multiple_json_dataset = data.ConcatDataset(list_of_datasets)
Now you can feed the concatenated dataset into data.DataLoader
.
1
Thank you. This is a very detailed explanation. My problem is that if I concatenate all .json files, the file will become too big that it may eventually crash. However, I will still try this solution anyway. Thanks a lot!
– sealpuppy
Nov 26 '18 at 15:11
add a comment |
I should revise my question as 2 different sub-questions:
- How to deal with large datasets in PyTorch to avoid memory error
If I am separating large a dataset into small chunks, how can I load multiple mini-datasets
For question 1:
PyTorch DataLoader can prevent this issue by creating mini-batches. Here you can find further explanations.
For question 2:
Please refer to Shai's answer above.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53477861%2fpytorch-dataloader-multiple-data-source%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
For DataLoader
you need to have a single Dataset
, your problem is that you have multiple 'json'
files and you only know how to create a Dataset
from each 'json'
separately.
What you can do in this case is to use ConcatDataset
that contains all the single-'json'
datasets you create:
import os
import torch.utils.data as data
class SingeJsonDataset(data.Dataset):
# implement a single json dataset here...
list_of_datasets =
for j in os.path.listdir(root_dir):
if not j.endswith('.json'):
continue # skip non-json files
list_of_datasets.append(SingeJsonDataset(json_file=j, root_dir=root_dir, transform=None))
# once all single json datasets are created you can concat them into a single one:
multiple_json_dataset = data.ConcatDataset(list_of_datasets)
Now you can feed the concatenated dataset into data.DataLoader
.
1
Thank you. This is a very detailed explanation. My problem is that if I concatenate all .json files, the file will become too big that it may eventually crash. However, I will still try this solution anyway. Thanks a lot!
– sealpuppy
Nov 26 '18 at 15:11
add a comment |
For DataLoader
you need to have a single Dataset
, your problem is that you have multiple 'json'
files and you only know how to create a Dataset
from each 'json'
separately.
What you can do in this case is to use ConcatDataset
that contains all the single-'json'
datasets you create:
import os
import torch.utils.data as data
class SingeJsonDataset(data.Dataset):
# implement a single json dataset here...
list_of_datasets =
for j in os.path.listdir(root_dir):
if not j.endswith('.json'):
continue # skip non-json files
list_of_datasets.append(SingeJsonDataset(json_file=j, root_dir=root_dir, transform=None))
# once all single json datasets are created you can concat them into a single one:
multiple_json_dataset = data.ConcatDataset(list_of_datasets)
Now you can feed the concatenated dataset into data.DataLoader
.
1
Thank you. This is a very detailed explanation. My problem is that if I concatenate all .json files, the file will become too big that it may eventually crash. However, I will still try this solution anyway. Thanks a lot!
– sealpuppy
Nov 26 '18 at 15:11
add a comment |
For DataLoader
you need to have a single Dataset
, your problem is that you have multiple 'json'
files and you only know how to create a Dataset
from each 'json'
separately.
What you can do in this case is to use ConcatDataset
that contains all the single-'json'
datasets you create:
import os
import torch.utils.data as data
class SingeJsonDataset(data.Dataset):
# implement a single json dataset here...
list_of_datasets =
for j in os.path.listdir(root_dir):
if not j.endswith('.json'):
continue # skip non-json files
list_of_datasets.append(SingeJsonDataset(json_file=j, root_dir=root_dir, transform=None))
# once all single json datasets are created you can concat them into a single one:
multiple_json_dataset = data.ConcatDataset(list_of_datasets)
Now you can feed the concatenated dataset into data.DataLoader
.
For DataLoader
you need to have a single Dataset
, your problem is that you have multiple 'json'
files and you only know how to create a Dataset
from each 'json'
separately.
What you can do in this case is to use ConcatDataset
that contains all the single-'json'
datasets you create:
import os
import torch.utils.data as data
class SingeJsonDataset(data.Dataset):
# implement a single json dataset here...
list_of_datasets =
for j in os.path.listdir(root_dir):
if not j.endswith('.json'):
continue # skip non-json files
list_of_datasets.append(SingeJsonDataset(json_file=j, root_dir=root_dir, transform=None))
# once all single json datasets are created you can concat them into a single one:
multiple_json_dataset = data.ConcatDataset(list_of_datasets)
Now you can feed the concatenated dataset into data.DataLoader
.
answered Nov 26 '18 at 13:17
ShaiShai
70.7k23138247
70.7k23138247
1
Thank you. This is a very detailed explanation. My problem is that if I concatenate all .json files, the file will become too big that it may eventually crash. However, I will still try this solution anyway. Thanks a lot!
– sealpuppy
Nov 26 '18 at 15:11
add a comment |
1
Thank you. This is a very detailed explanation. My problem is that if I concatenate all .json files, the file will become too big that it may eventually crash. However, I will still try this solution anyway. Thanks a lot!
– sealpuppy
Nov 26 '18 at 15:11
1
1
Thank you. This is a very detailed explanation. My problem is that if I concatenate all .json files, the file will become too big that it may eventually crash. However, I will still try this solution anyway. Thanks a lot!
– sealpuppy
Nov 26 '18 at 15:11
Thank you. This is a very detailed explanation. My problem is that if I concatenate all .json files, the file will become too big that it may eventually crash. However, I will still try this solution anyway. Thanks a lot!
– sealpuppy
Nov 26 '18 at 15:11
add a comment |
I should revise my question as 2 different sub-questions:
- How to deal with large datasets in PyTorch to avoid memory error
If I am separating large a dataset into small chunks, how can I load multiple mini-datasets
For question 1:
PyTorch DataLoader can prevent this issue by creating mini-batches. Here you can find further explanations.
For question 2:
Please refer to Shai's answer above.
add a comment |
I should revise my question as 2 different sub-questions:
- How to deal with large datasets in PyTorch to avoid memory error
If I am separating large a dataset into small chunks, how can I load multiple mini-datasets
For question 1:
PyTorch DataLoader can prevent this issue by creating mini-batches. Here you can find further explanations.
For question 2:
Please refer to Shai's answer above.
add a comment |
I should revise my question as 2 different sub-questions:
- How to deal with large datasets in PyTorch to avoid memory error
If I am separating large a dataset into small chunks, how can I load multiple mini-datasets
For question 1:
PyTorch DataLoader can prevent this issue by creating mini-batches. Here you can find further explanations.
For question 2:
Please refer to Shai's answer above.
I should revise my question as 2 different sub-questions:
- How to deal with large datasets in PyTorch to avoid memory error
If I am separating large a dataset into small chunks, how can I load multiple mini-datasets
For question 1:
PyTorch DataLoader can prevent this issue by creating mini-batches. Here you can find further explanations.
For question 2:
Please refer to Shai's answer above.
answered Nov 28 '18 at 6:50
sealpuppysealpuppy
739
739
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53477861%2fpytorch-dataloader-multiple-data-source%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown