Pytorch LSTM: Target Dimension in Calculating Cross Entropy Loss












2















I've been trying to get an LSTM (LSTM followed by a linear layer in a custom model), working in Pytorch, but was getting the following error when calculating the loss:



Assertion cur_target >= 0 && cur_target < n_classes' failed.



I defined the loss function with:



criterion = nn.CrossEntropyLoss()


and then called with



loss += criterion(output, target)


I was giving the target with dimensions [sequence_length, number_of_classes], and output has dimensions [sequence_length, 1, number_of_classes].



The examples I was following seemed to be doing the same thing, but it was different on the Pytorch docs on cross entropy loss.



The docs say the target should be of dimension (N), where each value is 0 ≤ targets[i] ≤ C−1 and C is the number of classes. I changed the target to be in that form, but now I'm getting an error saying (The sequence length is 75, and there are 55 classes):



Expected target size (75, 55), got torch.Size([75])


I've tried looking at solutions for both errors, but still can't get this working properly. I'm confused as to the proper dimensions of target, as well as the actual meaning behind the first error (different searches gave very different meanings for the error, none of the fixes worked).



Thanks










share|improve this question



























    2















    I've been trying to get an LSTM (LSTM followed by a linear layer in a custom model), working in Pytorch, but was getting the following error when calculating the loss:



    Assertion cur_target >= 0 && cur_target < n_classes' failed.



    I defined the loss function with:



    criterion = nn.CrossEntropyLoss()


    and then called with



    loss += criterion(output, target)


    I was giving the target with dimensions [sequence_length, number_of_classes], and output has dimensions [sequence_length, 1, number_of_classes].



    The examples I was following seemed to be doing the same thing, but it was different on the Pytorch docs on cross entropy loss.



    The docs say the target should be of dimension (N), where each value is 0 ≤ targets[i] ≤ C−1 and C is the number of classes. I changed the target to be in that form, but now I'm getting an error saying (The sequence length is 75, and there are 55 classes):



    Expected target size (75, 55), got torch.Size([75])


    I've tried looking at solutions for both errors, but still can't get this working properly. I'm confused as to the proper dimensions of target, as well as the actual meaning behind the first error (different searches gave very different meanings for the error, none of the fixes worked).



    Thanks










    share|improve this question

























      2












      2








      2








      I've been trying to get an LSTM (LSTM followed by a linear layer in a custom model), working in Pytorch, but was getting the following error when calculating the loss:



      Assertion cur_target >= 0 && cur_target < n_classes' failed.



      I defined the loss function with:



      criterion = nn.CrossEntropyLoss()


      and then called with



      loss += criterion(output, target)


      I was giving the target with dimensions [sequence_length, number_of_classes], and output has dimensions [sequence_length, 1, number_of_classes].



      The examples I was following seemed to be doing the same thing, but it was different on the Pytorch docs on cross entropy loss.



      The docs say the target should be of dimension (N), where each value is 0 ≤ targets[i] ≤ C−1 and C is the number of classes. I changed the target to be in that form, but now I'm getting an error saying (The sequence length is 75, and there are 55 classes):



      Expected target size (75, 55), got torch.Size([75])


      I've tried looking at solutions for both errors, but still can't get this working properly. I'm confused as to the proper dimensions of target, as well as the actual meaning behind the first error (different searches gave very different meanings for the error, none of the fixes worked).



      Thanks










      share|improve this question














      I've been trying to get an LSTM (LSTM followed by a linear layer in a custom model), working in Pytorch, but was getting the following error when calculating the loss:



      Assertion cur_target >= 0 && cur_target < n_classes' failed.



      I defined the loss function with:



      criterion = nn.CrossEntropyLoss()


      and then called with



      loss += criterion(output, target)


      I was giving the target with dimensions [sequence_length, number_of_classes], and output has dimensions [sequence_length, 1, number_of_classes].



      The examples I was following seemed to be doing the same thing, but it was different on the Pytorch docs on cross entropy loss.



      The docs say the target should be of dimension (N), where each value is 0 ≤ targets[i] ≤ C−1 and C is the number of classes. I changed the target to be in that form, but now I'm getting an error saying (The sequence length is 75, and there are 55 classes):



      Expected target size (75, 55), got torch.Size([75])


      I've tried looking at solutions for both errors, but still can't get this working properly. I'm confused as to the proper dimensions of target, as well as the actual meaning behind the first error (different searches gave very different meanings for the error, none of the fixes worked).



      Thanks







      lstm pytorch one-hot-encoding cross-entropy






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 24 '18 at 6:34









      LunarLlamaLunarLlama

      447




      447
























          1 Answer
          1






          active

          oldest

          votes


















          2














          You can use squeeze() on your output tensor, this returns a tensor with all the dimensions of size 1 removed.



          This short code uses the shapes you mentioned in your question:



          sequence_length   = 75
          number_of_classes = 55
          # creates random tensor of your output shape
          output = torch.rand(sequence_length, 1, number_of_classes)
          # creates tensor with random targets
          target = torch.randint(55, (75,)).long()

          # define loss function and calculate loss
          criterion = nn.CrossEntropyLoss()
          loss = criterion(output, target)
          print(loss)


          Results in the error you described:



          ValueError: Expected target size (75, 55), got torch.Size([75])


          So using squeeze() on your output tensor solves your problem by getting it to correct shape.



          Example with corrected shape:



          sequence_length   = 75
          number_of_classes = 55
          # creates random tensor of your output shape
          output = torch.rand(sequence_length, 1, number_of_classes)
          # creates tensor with random targets
          target = torch.randint(55, (75,)).long()

          # define loss function and calculate loss
          criterion = nn.CrossEntropyLoss()

          # apply squeeze() on output tensor to change shape form [75, 1, 55] to [75, 55]
          loss = criterion(output.squeeze(), target)
          print(loss)


          Output:



          tensor(4.0442)


          Using squeeze() changes your tensor shape from [75, 1, 55] to [75, 55] so it that output and target shape matches!



          You can also use other methods to reshape your tensor, it is just important that you have the shape of [sequence_length, number_of_classes] instead of [sequence_length, 1, number_of_classes].



          Your targets should be a LongTensor resp. a tensor of type torch.long containing the classes. Shape here is [sequence_length].



          Edit:

          Shapes from above example when passing to cross-entropy function:



          Outputs: torch.Size([75, 55])

          Targets: torch.Size([75])





          Here is a more general example what outputs and targets should look like for CE. In this case we assume we have 5 different target classes, there are three examples for sequences of length 1, 2 and 3:



          # init CE Loss function
          criterion = nn.CrossEntropyLoss()

          # sequence of length 1
          output = torch.rand(1, 5)
          # in this case the 1th class is our target, index of 1th class is 0
          target = torch.LongTensor([0])
          loss = criterion(output, target)
          print('Sequence of length 1:')
          print('Output:', output, 'shape:', output.shape)
          print('Target:', target, 'shape:', target.shape)
          print('Loss:', loss)

          # sequence of length 2
          output = torch.rand(2, 5)
          # targets are here 1th class for the first element and 2th class for the second element
          target = torch.LongTensor([0, 1])
          loss = criterion(output, target)
          print('nSequence of length 2:')
          print('Output:', output, 'shape:', output.shape)
          print('Target:', target, 'shape:', target.shape)
          print('Loss:', loss)

          # sequence of length 3
          output = torch.rand(3, 5)
          # targets here 1th class, 2th class and 2th class again for the last element of the sequence
          target = torch.LongTensor([0, 1, 1])
          loss = criterion(output, target)
          print('nSequence of length 3:')
          print('Output:', output, 'shape:', output.shape)
          print('Target:', target, 'shape:', target.shape)
          print('Loss:', loss)


          Output:



          Sequence of length 1:
          Output: tensor([[ 0.1956, 0.0395, 0.6564, 0.4000, 0.2875]]) shape: torch.Size([1, 5])
          Target: tensor([ 0]) shape: torch.Size([1])
          Loss: tensor(1.7516)

          Sequence of length 2:
          Output: tensor([[ 0.9905, 0.2267, 0.7583, 0.4865, 0.3220],
          [ 0.8073, 0.1803, 0.5290, 0.3179, 0.2746]]) shape: torch.Size([2, 5])
          Target: tensor([ 0, 1]) shape: torch.Size([2])
          Loss: tensor(1.5469)

          Sequence of length 3:
          Output: tensor([[ 0.8497, 0.2728, 0.3329, 0.2278, 0.1459],
          [ 0.4899, 0.2487, 0.4730, 0.9970, 0.1350],
          [ 0.0869, 0.9306, 0.1526, 0.2206, 0.6328]]) shape: torch.Size([3, 5])
          Target: tensor([ 0, 1, 1]) shape: torch.Size([3])
          Loss: tensor(1.3918)


          I hope this helps!






          share|improve this answer





















          • 1





            @LunarLlama Yes, it can be confusing. You can try out the 2nd example in my answer, check the shapes and apply it to your own program. Your target should be a LongTensor and have shape [75].

            – blue-phoenox
            Nov 24 '18 at 17:24






          • 1





            In your outputs you have each element of the sequence a distribution over the given classes, thus [sequence_length, number_of_classes] in your targets you just have an index value for the respective class for each element hence it is of shape [sequence_length]. Does this make sense to you?

            – blue-phoenox
            Nov 24 '18 at 17:45






          • 1





            @LunarLlama I edited my answer and added a more general example at the end, I hope this helps understanding the shapes! If so it would be great if you could accept the answer :)

            – blue-phoenox
            Nov 24 '18 at 18:09








          • 1





            The network runs! Thank you so much, I really appreciate all your detailed answers.

            – LunarLlama
            Nov 24 '18 at 18:40






          • 1





            @LunarLlama I would just flatten out your tensors so: outputs.view(seq_length* batch_size, num_classes), targets.view(-1). I think this is also the intended way as on the docs it says: Input: (N,C) where C = number of classes and Target: (N). So view commands above should give you that shape.

            – blue-phoenox
            Nov 26 '18 at 8:34











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53455780%2fpytorch-lstm-target-dimension-in-calculating-cross-entropy-loss%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          You can use squeeze() on your output tensor, this returns a tensor with all the dimensions of size 1 removed.



          This short code uses the shapes you mentioned in your question:



          sequence_length   = 75
          number_of_classes = 55
          # creates random tensor of your output shape
          output = torch.rand(sequence_length, 1, number_of_classes)
          # creates tensor with random targets
          target = torch.randint(55, (75,)).long()

          # define loss function and calculate loss
          criterion = nn.CrossEntropyLoss()
          loss = criterion(output, target)
          print(loss)


          Results in the error you described:



          ValueError: Expected target size (75, 55), got torch.Size([75])


          So using squeeze() on your output tensor solves your problem by getting it to correct shape.



          Example with corrected shape:



          sequence_length   = 75
          number_of_classes = 55
          # creates random tensor of your output shape
          output = torch.rand(sequence_length, 1, number_of_classes)
          # creates tensor with random targets
          target = torch.randint(55, (75,)).long()

          # define loss function and calculate loss
          criterion = nn.CrossEntropyLoss()

          # apply squeeze() on output tensor to change shape form [75, 1, 55] to [75, 55]
          loss = criterion(output.squeeze(), target)
          print(loss)


          Output:



          tensor(4.0442)


          Using squeeze() changes your tensor shape from [75, 1, 55] to [75, 55] so it that output and target shape matches!



          You can also use other methods to reshape your tensor, it is just important that you have the shape of [sequence_length, number_of_classes] instead of [sequence_length, 1, number_of_classes].



          Your targets should be a LongTensor resp. a tensor of type torch.long containing the classes. Shape here is [sequence_length].



          Edit:

          Shapes from above example when passing to cross-entropy function:



          Outputs: torch.Size([75, 55])

          Targets: torch.Size([75])





          Here is a more general example what outputs and targets should look like for CE. In this case we assume we have 5 different target classes, there are three examples for sequences of length 1, 2 and 3:



          # init CE Loss function
          criterion = nn.CrossEntropyLoss()

          # sequence of length 1
          output = torch.rand(1, 5)
          # in this case the 1th class is our target, index of 1th class is 0
          target = torch.LongTensor([0])
          loss = criterion(output, target)
          print('Sequence of length 1:')
          print('Output:', output, 'shape:', output.shape)
          print('Target:', target, 'shape:', target.shape)
          print('Loss:', loss)

          # sequence of length 2
          output = torch.rand(2, 5)
          # targets are here 1th class for the first element and 2th class for the second element
          target = torch.LongTensor([0, 1])
          loss = criterion(output, target)
          print('nSequence of length 2:')
          print('Output:', output, 'shape:', output.shape)
          print('Target:', target, 'shape:', target.shape)
          print('Loss:', loss)

          # sequence of length 3
          output = torch.rand(3, 5)
          # targets here 1th class, 2th class and 2th class again for the last element of the sequence
          target = torch.LongTensor([0, 1, 1])
          loss = criterion(output, target)
          print('nSequence of length 3:')
          print('Output:', output, 'shape:', output.shape)
          print('Target:', target, 'shape:', target.shape)
          print('Loss:', loss)


          Output:



          Sequence of length 1:
          Output: tensor([[ 0.1956, 0.0395, 0.6564, 0.4000, 0.2875]]) shape: torch.Size([1, 5])
          Target: tensor([ 0]) shape: torch.Size([1])
          Loss: tensor(1.7516)

          Sequence of length 2:
          Output: tensor([[ 0.9905, 0.2267, 0.7583, 0.4865, 0.3220],
          [ 0.8073, 0.1803, 0.5290, 0.3179, 0.2746]]) shape: torch.Size([2, 5])
          Target: tensor([ 0, 1]) shape: torch.Size([2])
          Loss: tensor(1.5469)

          Sequence of length 3:
          Output: tensor([[ 0.8497, 0.2728, 0.3329, 0.2278, 0.1459],
          [ 0.4899, 0.2487, 0.4730, 0.9970, 0.1350],
          [ 0.0869, 0.9306, 0.1526, 0.2206, 0.6328]]) shape: torch.Size([3, 5])
          Target: tensor([ 0, 1, 1]) shape: torch.Size([3])
          Loss: tensor(1.3918)


          I hope this helps!






          share|improve this answer





















          • 1





            @LunarLlama Yes, it can be confusing. You can try out the 2nd example in my answer, check the shapes and apply it to your own program. Your target should be a LongTensor and have shape [75].

            – blue-phoenox
            Nov 24 '18 at 17:24






          • 1





            In your outputs you have each element of the sequence a distribution over the given classes, thus [sequence_length, number_of_classes] in your targets you just have an index value for the respective class for each element hence it is of shape [sequence_length]. Does this make sense to you?

            – blue-phoenox
            Nov 24 '18 at 17:45






          • 1





            @LunarLlama I edited my answer and added a more general example at the end, I hope this helps understanding the shapes! If so it would be great if you could accept the answer :)

            – blue-phoenox
            Nov 24 '18 at 18:09








          • 1





            The network runs! Thank you so much, I really appreciate all your detailed answers.

            – LunarLlama
            Nov 24 '18 at 18:40






          • 1





            @LunarLlama I would just flatten out your tensors so: outputs.view(seq_length* batch_size, num_classes), targets.view(-1). I think this is also the intended way as on the docs it says: Input: (N,C) where C = number of classes and Target: (N). So view commands above should give you that shape.

            – blue-phoenox
            Nov 26 '18 at 8:34
















          2














          You can use squeeze() on your output tensor, this returns a tensor with all the dimensions of size 1 removed.



          This short code uses the shapes you mentioned in your question:



          sequence_length   = 75
          number_of_classes = 55
          # creates random tensor of your output shape
          output = torch.rand(sequence_length, 1, number_of_classes)
          # creates tensor with random targets
          target = torch.randint(55, (75,)).long()

          # define loss function and calculate loss
          criterion = nn.CrossEntropyLoss()
          loss = criterion(output, target)
          print(loss)


          Results in the error you described:



          ValueError: Expected target size (75, 55), got torch.Size([75])


          So using squeeze() on your output tensor solves your problem by getting it to correct shape.



          Example with corrected shape:



          sequence_length   = 75
          number_of_classes = 55
          # creates random tensor of your output shape
          output = torch.rand(sequence_length, 1, number_of_classes)
          # creates tensor with random targets
          target = torch.randint(55, (75,)).long()

          # define loss function and calculate loss
          criterion = nn.CrossEntropyLoss()

          # apply squeeze() on output tensor to change shape form [75, 1, 55] to [75, 55]
          loss = criterion(output.squeeze(), target)
          print(loss)


          Output:



          tensor(4.0442)


          Using squeeze() changes your tensor shape from [75, 1, 55] to [75, 55] so it that output and target shape matches!



          You can also use other methods to reshape your tensor, it is just important that you have the shape of [sequence_length, number_of_classes] instead of [sequence_length, 1, number_of_classes].



          Your targets should be a LongTensor resp. a tensor of type torch.long containing the classes. Shape here is [sequence_length].



          Edit:

          Shapes from above example when passing to cross-entropy function:



          Outputs: torch.Size([75, 55])

          Targets: torch.Size([75])





          Here is a more general example what outputs and targets should look like for CE. In this case we assume we have 5 different target classes, there are three examples for sequences of length 1, 2 and 3:



          # init CE Loss function
          criterion = nn.CrossEntropyLoss()

          # sequence of length 1
          output = torch.rand(1, 5)
          # in this case the 1th class is our target, index of 1th class is 0
          target = torch.LongTensor([0])
          loss = criterion(output, target)
          print('Sequence of length 1:')
          print('Output:', output, 'shape:', output.shape)
          print('Target:', target, 'shape:', target.shape)
          print('Loss:', loss)

          # sequence of length 2
          output = torch.rand(2, 5)
          # targets are here 1th class for the first element and 2th class for the second element
          target = torch.LongTensor([0, 1])
          loss = criterion(output, target)
          print('nSequence of length 2:')
          print('Output:', output, 'shape:', output.shape)
          print('Target:', target, 'shape:', target.shape)
          print('Loss:', loss)

          # sequence of length 3
          output = torch.rand(3, 5)
          # targets here 1th class, 2th class and 2th class again for the last element of the sequence
          target = torch.LongTensor([0, 1, 1])
          loss = criterion(output, target)
          print('nSequence of length 3:')
          print('Output:', output, 'shape:', output.shape)
          print('Target:', target, 'shape:', target.shape)
          print('Loss:', loss)


          Output:



          Sequence of length 1:
          Output: tensor([[ 0.1956, 0.0395, 0.6564, 0.4000, 0.2875]]) shape: torch.Size([1, 5])
          Target: tensor([ 0]) shape: torch.Size([1])
          Loss: tensor(1.7516)

          Sequence of length 2:
          Output: tensor([[ 0.9905, 0.2267, 0.7583, 0.4865, 0.3220],
          [ 0.8073, 0.1803, 0.5290, 0.3179, 0.2746]]) shape: torch.Size([2, 5])
          Target: tensor([ 0, 1]) shape: torch.Size([2])
          Loss: tensor(1.5469)

          Sequence of length 3:
          Output: tensor([[ 0.8497, 0.2728, 0.3329, 0.2278, 0.1459],
          [ 0.4899, 0.2487, 0.4730, 0.9970, 0.1350],
          [ 0.0869, 0.9306, 0.1526, 0.2206, 0.6328]]) shape: torch.Size([3, 5])
          Target: tensor([ 0, 1, 1]) shape: torch.Size([3])
          Loss: tensor(1.3918)


          I hope this helps!






          share|improve this answer





















          • 1





            @LunarLlama Yes, it can be confusing. You can try out the 2nd example in my answer, check the shapes and apply it to your own program. Your target should be a LongTensor and have shape [75].

            – blue-phoenox
            Nov 24 '18 at 17:24






          • 1





            In your outputs you have each element of the sequence a distribution over the given classes, thus [sequence_length, number_of_classes] in your targets you just have an index value for the respective class for each element hence it is of shape [sequence_length]. Does this make sense to you?

            – blue-phoenox
            Nov 24 '18 at 17:45






          • 1





            @LunarLlama I edited my answer and added a more general example at the end, I hope this helps understanding the shapes! If so it would be great if you could accept the answer :)

            – blue-phoenox
            Nov 24 '18 at 18:09








          • 1





            The network runs! Thank you so much, I really appreciate all your detailed answers.

            – LunarLlama
            Nov 24 '18 at 18:40






          • 1





            @LunarLlama I would just flatten out your tensors so: outputs.view(seq_length* batch_size, num_classes), targets.view(-1). I think this is also the intended way as on the docs it says: Input: (N,C) where C = number of classes and Target: (N). So view commands above should give you that shape.

            – blue-phoenox
            Nov 26 '18 at 8:34














          2












          2








          2







          You can use squeeze() on your output tensor, this returns a tensor with all the dimensions of size 1 removed.



          This short code uses the shapes you mentioned in your question:



          sequence_length   = 75
          number_of_classes = 55
          # creates random tensor of your output shape
          output = torch.rand(sequence_length, 1, number_of_classes)
          # creates tensor with random targets
          target = torch.randint(55, (75,)).long()

          # define loss function and calculate loss
          criterion = nn.CrossEntropyLoss()
          loss = criterion(output, target)
          print(loss)


          Results in the error you described:



          ValueError: Expected target size (75, 55), got torch.Size([75])


          So using squeeze() on your output tensor solves your problem by getting it to correct shape.



          Example with corrected shape:



          sequence_length   = 75
          number_of_classes = 55
          # creates random tensor of your output shape
          output = torch.rand(sequence_length, 1, number_of_classes)
          # creates tensor with random targets
          target = torch.randint(55, (75,)).long()

          # define loss function and calculate loss
          criterion = nn.CrossEntropyLoss()

          # apply squeeze() on output tensor to change shape form [75, 1, 55] to [75, 55]
          loss = criterion(output.squeeze(), target)
          print(loss)


          Output:



          tensor(4.0442)


          Using squeeze() changes your tensor shape from [75, 1, 55] to [75, 55] so it that output and target shape matches!



          You can also use other methods to reshape your tensor, it is just important that you have the shape of [sequence_length, number_of_classes] instead of [sequence_length, 1, number_of_classes].



          Your targets should be a LongTensor resp. a tensor of type torch.long containing the classes. Shape here is [sequence_length].



          Edit:

          Shapes from above example when passing to cross-entropy function:



          Outputs: torch.Size([75, 55])

          Targets: torch.Size([75])





          Here is a more general example what outputs and targets should look like for CE. In this case we assume we have 5 different target classes, there are three examples for sequences of length 1, 2 and 3:



          # init CE Loss function
          criterion = nn.CrossEntropyLoss()

          # sequence of length 1
          output = torch.rand(1, 5)
          # in this case the 1th class is our target, index of 1th class is 0
          target = torch.LongTensor([0])
          loss = criterion(output, target)
          print('Sequence of length 1:')
          print('Output:', output, 'shape:', output.shape)
          print('Target:', target, 'shape:', target.shape)
          print('Loss:', loss)

          # sequence of length 2
          output = torch.rand(2, 5)
          # targets are here 1th class for the first element and 2th class for the second element
          target = torch.LongTensor([0, 1])
          loss = criterion(output, target)
          print('nSequence of length 2:')
          print('Output:', output, 'shape:', output.shape)
          print('Target:', target, 'shape:', target.shape)
          print('Loss:', loss)

          # sequence of length 3
          output = torch.rand(3, 5)
          # targets here 1th class, 2th class and 2th class again for the last element of the sequence
          target = torch.LongTensor([0, 1, 1])
          loss = criterion(output, target)
          print('nSequence of length 3:')
          print('Output:', output, 'shape:', output.shape)
          print('Target:', target, 'shape:', target.shape)
          print('Loss:', loss)


          Output:



          Sequence of length 1:
          Output: tensor([[ 0.1956, 0.0395, 0.6564, 0.4000, 0.2875]]) shape: torch.Size([1, 5])
          Target: tensor([ 0]) shape: torch.Size([1])
          Loss: tensor(1.7516)

          Sequence of length 2:
          Output: tensor([[ 0.9905, 0.2267, 0.7583, 0.4865, 0.3220],
          [ 0.8073, 0.1803, 0.5290, 0.3179, 0.2746]]) shape: torch.Size([2, 5])
          Target: tensor([ 0, 1]) shape: torch.Size([2])
          Loss: tensor(1.5469)

          Sequence of length 3:
          Output: tensor([[ 0.8497, 0.2728, 0.3329, 0.2278, 0.1459],
          [ 0.4899, 0.2487, 0.4730, 0.9970, 0.1350],
          [ 0.0869, 0.9306, 0.1526, 0.2206, 0.6328]]) shape: torch.Size([3, 5])
          Target: tensor([ 0, 1, 1]) shape: torch.Size([3])
          Loss: tensor(1.3918)


          I hope this helps!






          share|improve this answer















          You can use squeeze() on your output tensor, this returns a tensor with all the dimensions of size 1 removed.



          This short code uses the shapes you mentioned in your question:



          sequence_length   = 75
          number_of_classes = 55
          # creates random tensor of your output shape
          output = torch.rand(sequence_length, 1, number_of_classes)
          # creates tensor with random targets
          target = torch.randint(55, (75,)).long()

          # define loss function and calculate loss
          criterion = nn.CrossEntropyLoss()
          loss = criterion(output, target)
          print(loss)


          Results in the error you described:



          ValueError: Expected target size (75, 55), got torch.Size([75])


          So using squeeze() on your output tensor solves your problem by getting it to correct shape.



          Example with corrected shape:



          sequence_length   = 75
          number_of_classes = 55
          # creates random tensor of your output shape
          output = torch.rand(sequence_length, 1, number_of_classes)
          # creates tensor with random targets
          target = torch.randint(55, (75,)).long()

          # define loss function and calculate loss
          criterion = nn.CrossEntropyLoss()

          # apply squeeze() on output tensor to change shape form [75, 1, 55] to [75, 55]
          loss = criterion(output.squeeze(), target)
          print(loss)


          Output:



          tensor(4.0442)


          Using squeeze() changes your tensor shape from [75, 1, 55] to [75, 55] so it that output and target shape matches!



          You can also use other methods to reshape your tensor, it is just important that you have the shape of [sequence_length, number_of_classes] instead of [sequence_length, 1, number_of_classes].



          Your targets should be a LongTensor resp. a tensor of type torch.long containing the classes. Shape here is [sequence_length].



          Edit:

          Shapes from above example when passing to cross-entropy function:



          Outputs: torch.Size([75, 55])

          Targets: torch.Size([75])





          Here is a more general example what outputs and targets should look like for CE. In this case we assume we have 5 different target classes, there are three examples for sequences of length 1, 2 and 3:



          # init CE Loss function
          criterion = nn.CrossEntropyLoss()

          # sequence of length 1
          output = torch.rand(1, 5)
          # in this case the 1th class is our target, index of 1th class is 0
          target = torch.LongTensor([0])
          loss = criterion(output, target)
          print('Sequence of length 1:')
          print('Output:', output, 'shape:', output.shape)
          print('Target:', target, 'shape:', target.shape)
          print('Loss:', loss)

          # sequence of length 2
          output = torch.rand(2, 5)
          # targets are here 1th class for the first element and 2th class for the second element
          target = torch.LongTensor([0, 1])
          loss = criterion(output, target)
          print('nSequence of length 2:')
          print('Output:', output, 'shape:', output.shape)
          print('Target:', target, 'shape:', target.shape)
          print('Loss:', loss)

          # sequence of length 3
          output = torch.rand(3, 5)
          # targets here 1th class, 2th class and 2th class again for the last element of the sequence
          target = torch.LongTensor([0, 1, 1])
          loss = criterion(output, target)
          print('nSequence of length 3:')
          print('Output:', output, 'shape:', output.shape)
          print('Target:', target, 'shape:', target.shape)
          print('Loss:', loss)


          Output:



          Sequence of length 1:
          Output: tensor([[ 0.1956, 0.0395, 0.6564, 0.4000, 0.2875]]) shape: torch.Size([1, 5])
          Target: tensor([ 0]) shape: torch.Size([1])
          Loss: tensor(1.7516)

          Sequence of length 2:
          Output: tensor([[ 0.9905, 0.2267, 0.7583, 0.4865, 0.3220],
          [ 0.8073, 0.1803, 0.5290, 0.3179, 0.2746]]) shape: torch.Size([2, 5])
          Target: tensor([ 0, 1]) shape: torch.Size([2])
          Loss: tensor(1.5469)

          Sequence of length 3:
          Output: tensor([[ 0.8497, 0.2728, 0.3329, 0.2278, 0.1459],
          [ 0.4899, 0.2487, 0.4730, 0.9970, 0.1350],
          [ 0.0869, 0.9306, 0.1526, 0.2206, 0.6328]]) shape: torch.Size([3, 5])
          Target: tensor([ 0, 1, 1]) shape: torch.Size([3])
          Loss: tensor(1.3918)


          I hope this helps!







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 24 '18 at 18:06

























          answered Nov 24 '18 at 12:30









          blue-phoenoxblue-phoenox

          4,261101745




          4,261101745








          • 1





            @LunarLlama Yes, it can be confusing. You can try out the 2nd example in my answer, check the shapes and apply it to your own program. Your target should be a LongTensor and have shape [75].

            – blue-phoenox
            Nov 24 '18 at 17:24






          • 1





            In your outputs you have each element of the sequence a distribution over the given classes, thus [sequence_length, number_of_classes] in your targets you just have an index value for the respective class for each element hence it is of shape [sequence_length]. Does this make sense to you?

            – blue-phoenox
            Nov 24 '18 at 17:45






          • 1





            @LunarLlama I edited my answer and added a more general example at the end, I hope this helps understanding the shapes! If so it would be great if you could accept the answer :)

            – blue-phoenox
            Nov 24 '18 at 18:09








          • 1





            The network runs! Thank you so much, I really appreciate all your detailed answers.

            – LunarLlama
            Nov 24 '18 at 18:40






          • 1





            @LunarLlama I would just flatten out your tensors so: outputs.view(seq_length* batch_size, num_classes), targets.view(-1). I think this is also the intended way as on the docs it says: Input: (N,C) where C = number of classes and Target: (N). So view commands above should give you that shape.

            – blue-phoenox
            Nov 26 '18 at 8:34














          • 1





            @LunarLlama Yes, it can be confusing. You can try out the 2nd example in my answer, check the shapes and apply it to your own program. Your target should be a LongTensor and have shape [75].

            – blue-phoenox
            Nov 24 '18 at 17:24






          • 1





            In your outputs you have each element of the sequence a distribution over the given classes, thus [sequence_length, number_of_classes] in your targets you just have an index value for the respective class for each element hence it is of shape [sequence_length]. Does this make sense to you?

            – blue-phoenox
            Nov 24 '18 at 17:45






          • 1





            @LunarLlama I edited my answer and added a more general example at the end, I hope this helps understanding the shapes! If so it would be great if you could accept the answer :)

            – blue-phoenox
            Nov 24 '18 at 18:09








          • 1





            The network runs! Thank you so much, I really appreciate all your detailed answers.

            – LunarLlama
            Nov 24 '18 at 18:40






          • 1





            @LunarLlama I would just flatten out your tensors so: outputs.view(seq_length* batch_size, num_classes), targets.view(-1). I think this is also the intended way as on the docs it says: Input: (N,C) where C = number of classes and Target: (N). So view commands above should give you that shape.

            – blue-phoenox
            Nov 26 '18 at 8:34








          1




          1





          @LunarLlama Yes, it can be confusing. You can try out the 2nd example in my answer, check the shapes and apply it to your own program. Your target should be a LongTensor and have shape [75].

          – blue-phoenox
          Nov 24 '18 at 17:24





          @LunarLlama Yes, it can be confusing. You can try out the 2nd example in my answer, check the shapes and apply it to your own program. Your target should be a LongTensor and have shape [75].

          – blue-phoenox
          Nov 24 '18 at 17:24




          1




          1





          In your outputs you have each element of the sequence a distribution over the given classes, thus [sequence_length, number_of_classes] in your targets you just have an index value for the respective class for each element hence it is of shape [sequence_length]. Does this make sense to you?

          – blue-phoenox
          Nov 24 '18 at 17:45





          In your outputs you have each element of the sequence a distribution over the given classes, thus [sequence_length, number_of_classes] in your targets you just have an index value for the respective class for each element hence it is of shape [sequence_length]. Does this make sense to you?

          – blue-phoenox
          Nov 24 '18 at 17:45




          1




          1





          @LunarLlama I edited my answer and added a more general example at the end, I hope this helps understanding the shapes! If so it would be great if you could accept the answer :)

          – blue-phoenox
          Nov 24 '18 at 18:09







          @LunarLlama I edited my answer and added a more general example at the end, I hope this helps understanding the shapes! If so it would be great if you could accept the answer :)

          – blue-phoenox
          Nov 24 '18 at 18:09






          1




          1





          The network runs! Thank you so much, I really appreciate all your detailed answers.

          – LunarLlama
          Nov 24 '18 at 18:40





          The network runs! Thank you so much, I really appreciate all your detailed answers.

          – LunarLlama
          Nov 24 '18 at 18:40




          1




          1





          @LunarLlama I would just flatten out your tensors so: outputs.view(seq_length* batch_size, num_classes), targets.view(-1). I think this is also the intended way as on the docs it says: Input: (N,C) where C = number of classes and Target: (N). So view commands above should give you that shape.

          – blue-phoenox
          Nov 26 '18 at 8:34





          @LunarLlama I would just flatten out your tensors so: outputs.view(seq_length* batch_size, num_classes), targets.view(-1). I think this is also the intended way as on the docs it says: Input: (N,C) where C = number of classes and Target: (N). So view commands above should give you that shape.

          – blue-phoenox
          Nov 26 '18 at 8:34




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53455780%2fpytorch-lstm-target-dimension-in-calculating-cross-entropy-loss%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Wiesbaden

          Marschland

          Dieringhausen