Pytorch DataLoader multiple data source

I am trying to use Pytorch dataloader to define my own dataset, but I am not sure how to load multiple data source:

My current code:

class MultipleSourceDataSet(Dataset):

    def __init__ (self, json_file, root_dir, transform = None):

        with open(root_dir + 'block0.json') as f:

            self.result = torch.Tensor(json.load(f))



    self.root_dir = root_dir

    self.transform = transform



    def __len__(self):

        return len(self.result[0])



    def __getitem__ (self):

        None

The data source is 50 blocks under root_dir = ~/Documents/blocks/

I split them and avoid to combine them directly before since this is a very big dataset.

How can I load them into a single dataloader?

edited Nov 27 '18 at 8:11

Shai

70.7k23138247

asked Nov 26 '18 at 9:14

sealpuppy

739

add a comment |

I am trying to use Pytorch dataloader to define my own dataset, but I am not sure how to load multiple data source:

My current code:

class MultipleSourceDataSet(Dataset):

    def __init__ (self, json_file, root_dir, transform = None):

        with open(root_dir + 'block0.json') as f:

            self.result = torch.Tensor(json.load(f))



    self.root_dir = root_dir

    self.transform = transform



    def __len__(self):

        return len(self.result[0])



    def __getitem__ (self):

        None

The data source is 50 blocks under root_dir = ~/Documents/blocks/

I split them and avoid to combine them directly before since this is a very big dataset.

How can I load them into a single dataloader?

edited Nov 27 '18 at 8:11

Shai

70.7k23138247

asked Nov 26 '18 at 9:14

sealpuppy

739

add a comment |

I am trying to use Pytorch dataloader to define my own dataset, but I am not sure how to load multiple data source:

My current code:

class MultipleSourceDataSet(Dataset):

    def __init__ (self, json_file, root_dir, transform = None):

        with open(root_dir + 'block0.json') as f:

            self.result = torch.Tensor(json.load(f))



    self.root_dir = root_dir

    self.transform = transform



    def __len__(self):

        return len(self.result[0])



    def __getitem__ (self):

        None

The data source is 50 blocks under root_dir = ~/Documents/blocks/

I split them and avoid to combine them directly before since this is a very big dataset.

How can I load them into a single dataloader?

edited Nov 27 '18 at 8:11

Shai

70.7k23138247

asked Nov 26 '18 at 9:14

sealpuppy

739

I am trying to use Pytorch dataloader to define my own dataset, but I am not sure how to load multiple data source:

My current code:

class MultipleSourceDataSet(Dataset):

    def __init__ (self, json_file, root_dir, transform = None):

        with open(root_dir + 'block0.json') as f:

            self.result = torch.Tensor(json.load(f))



    self.root_dir = root_dir

    self.transform = transform



    def __len__(self):

        return len(self.result[0])



    def __getitem__ (self):

        None

The data source is 50 blocks under root_dir = ~/Documents/blocks/

I split them and avoid to combine them directly before since this is a very big dataset.

How can I load them into a single dataloader?

python-3.x image-processing machine-learning deep-learning pytorch

edited Nov 27 '18 at 8:11

Shai

70.7k23138247

asked Nov 26 '18 at 9:14

sealpuppy

739

edited Nov 27 '18 at 8:11

Shai

70.7k23138247

asked Nov 26 '18 at 9:14

sealpuppy

739

edited Nov 27 '18 at 8:11

Shai

70.7k23138247

edited Nov 27 '18 at 8:11

Shai

70.7k23138247

edited Nov 27 '18 at 8:11

Shai

70.7k23138247

asked Nov 26 '18 at 9:14

sealpuppy

739

asked Nov 26 '18 at 9:14

sealpuppy

739

asked Nov 26 '18 at 9:14

sealpuppy

739

add a comment |

2 Answers
2

active

oldest

votes

For DataLoader you need to have a single Dataset, your problem is that you have multiple 'json' files and you only know how to create a Dataset from each 'json' separately.

What you can do in this case is to use ConcatDataset that contains all the single-'json' datasets you create:

import os

import torch.utils.data as data



class SingeJsonDataset(data.Dataset):

    # implement a single json dataset here...



list_of_datasets = 

for j in os.path.listdir(root_dir):

    if not j.endswith('.json'):

        continue  # skip non-json files

    list_of_datasets.append(SingeJsonDataset(json_file=j, root_dir=root_dir, transform=None))

# once all single json datasets are created you can concat them into a single one:

multiple_json_dataset = data.ConcatDataset(list_of_datasets)

Now you can feed the concatenated dataset into data.DataLoader.

answered Nov 26 '18 at 13:17

Shai

70.7k23138247

1

Thank you. This is a very detailed explanation. My problem is that if I concatenate all .json files, the file will become too big that it may eventually crash. However, I will still try this solution anyway. Thanks a lot!

– sealpuppy
Nov 26 '18 at 15:11

add a comment |

I should revise my question as 2 different sub-questions:

How to deal with large datasets in PyTorch to avoid memory error

If I am separating large a dataset into small chunks, how can I load multiple mini-datasets

For question 1:

PyTorch DataLoader can prevent this issue by creating mini-batches. Here you can find further explanations.

For question 2:

Please refer to Shai's answer above.

answered Nov 28 '18 at 6:50

sealpuppy

739

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53477861%2fpytorch-dataloader-multiple-data-source%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

import os

import torch.utils.data as data



class SingeJsonDataset(data.Dataset):

    # implement a single json dataset here...



list_of_datasets = 

for j in os.path.listdir(root_dir):

    if not j.endswith('.json'):

        continue  # skip non-json files

    list_of_datasets.append(SingeJsonDataset(json_file=j, root_dir=root_dir, transform=None))

# once all single json datasets are created you can concat them into a single one:

multiple_json_dataset = data.ConcatDataset(list_of_datasets)

Now you can feed the concatenated dataset into data.DataLoader.

answered Nov 26 '18 at 13:17

Shai

70.7k23138247

1

Thank you. This is a very detailed explanation. My problem is that if I concatenate all .json files, the file will become too big that it may eventually crash. However, I will still try this solution anyway. Thanks a lot!

– sealpuppy
Nov 26 '18 at 15:11

add a comment |

import os

import torch.utils.data as data



class SingeJsonDataset(data.Dataset):

    # implement a single json dataset here...



list_of_datasets = 

for j in os.path.listdir(root_dir):

    if not j.endswith('.json'):

        continue  # skip non-json files

    list_of_datasets.append(SingeJsonDataset(json_file=j, root_dir=root_dir, transform=None))

# once all single json datasets are created you can concat them into a single one:

multiple_json_dataset = data.ConcatDataset(list_of_datasets)

Now you can feed the concatenated dataset into data.DataLoader.

answered Nov 26 '18 at 13:17

Shai

70.7k23138247

1

Thank you. This is a very detailed explanation. My problem is that if I concatenate all .json files, the file will become too big that it may eventually crash. However, I will still try this solution anyway. Thanks a lot!

– sealpuppy
Nov 26 '18 at 15:11

add a comment |

import os

import torch.utils.data as data



class SingeJsonDataset(data.Dataset):

    # implement a single json dataset here...



list_of_datasets = 

for j in os.path.listdir(root_dir):

    if not j.endswith('.json'):

        continue  # skip non-json files

    list_of_datasets.append(SingeJsonDataset(json_file=j, root_dir=root_dir, transform=None))

# once all single json datasets are created you can concat them into a single one:

multiple_json_dataset = data.ConcatDataset(list_of_datasets)

Now you can feed the concatenated dataset into data.DataLoader.

answered Nov 26 '18 at 13:17

Shai

70.7k23138247

import os

import torch.utils.data as data



class SingeJsonDataset(data.Dataset):

    # implement a single json dataset here...



list_of_datasets = 

for j in os.path.listdir(root_dir):

    if not j.endswith('.json'):

        continue  # skip non-json files

    list_of_datasets.append(SingeJsonDataset(json_file=j, root_dir=root_dir, transform=None))

# once all single json datasets are created you can concat them into a single one:

multiple_json_dataset = data.ConcatDataset(list_of_datasets)

Now you can feed the concatenated dataset into data.DataLoader.

answered Nov 26 '18 at 13:17

Shai

70.7k23138247

answered Nov 26 '18 at 13:17

Shai

70.7k23138247

answered Nov 26 '18 at 13:17

Shai

70.7k23138247

answered Nov 26 '18 at 13:17

Shai

70.7k23138247

1

Thank you. This is a very detailed explanation. My problem is that if I concatenate all .json files, the file will become too big that it may eventually crash. However, I will still try this solution anyway. Thanks a lot!

– sealpuppy
Nov 26 '18 at 15:11

add a comment |

1

Thank you. This is a very detailed explanation. My problem is that if I concatenate all .json files, the file will become too big that it may eventually crash. However, I will still try this solution anyway. Thanks a lot!

– sealpuppy
Nov 26 '18 at 15:11

Thank you. This is a very detailed explanation. My problem is that if I concatenate all .json files, the file will become too big that it may eventually crash. However, I will still try this solution anyway. Thanks a lot!

– sealpuppy
Nov 26 '18 at 15:11

add a comment |

I should revise my question as 2 different sub-questions:

How to deal with large datasets in PyTorch to avoid memory error

If I am separating large a dataset into small chunks, how can I load multiple mini-datasets

For question 1:

PyTorch DataLoader can prevent this issue by creating mini-batches. Here you can find further explanations.

For question 2:

Please refer to Shai's answer above.

answered Nov 28 '18 at 6:50

sealpuppy

739

add a comment |

I should revise my question as 2 different sub-questions:

How to deal with large datasets in PyTorch to avoid memory error

If I am separating large a dataset into small chunks, how can I load multiple mini-datasets

For question 1:

PyTorch DataLoader can prevent this issue by creating mini-batches. Here you can find further explanations.

For question 2:

Please refer to Shai's answer above.

answered Nov 28 '18 at 6:50

sealpuppy

739

add a comment |

I should revise my question as 2 different sub-questions:

How to deal with large datasets in PyTorch to avoid memory error

If I am separating large a dataset into small chunks, how can I load multiple mini-datasets

For question 1:

PyTorch DataLoader can prevent this issue by creating mini-batches. Here you can find further explanations.

For question 2:

Please refer to Shai's answer above.

answered Nov 28 '18 at 6:50

sealpuppy

739

I should revise my question as 2 different sub-questions:

How to deal with large datasets in PyTorch to avoid memory error

If I am separating large a dataset into small chunks, how can I load multiple mini-datasets

For question 1:

PyTorch DataLoader can prevent this issue by creating mini-batches. Here you can find further explanations.

For question 2:

Please refer to Shai's answer above.

answered Nov 28 '18 at 6:50

sealpuppy

739

answered Nov 28 '18 at 6:50

sealpuppy

739

answered Nov 28 '18 at 6:50

sealpuppy

739

answered Nov 28 '18 at 6:50

sealpuppy

739

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

3Ce9 xBP8J HmVi6FHxE j lFEXOtU,QOueyf

搜尋此網誌

Ytukyg