Tensorflow dense tensor to sparse binarized hash trick tensor












0















I want to transform this dataset in such a way that each tensor has a given size n and that a feature at index i of this new tensor is set to 1 if and only if there is a i in the original feature (modulo n).



I hope the following example will make things clearer



Let's suppose I have a dataset like:



t = tf.constant([
[0, 3, 4],
[12, 2 ,4]])
ds = tf.data.Dataset.from_tensors(t)


I want to get the sparse equivalent of (if n = 9)



t = tf.constant([
[1, 0, 0, 1, 1, 0, 0, 0, 0], # index set to 1 are 0, 3 and 4
[0, 0, 1, 1, 1, 0, 0, 0, 0]]) # index set to 1 are 2, 4, and 12%9 = 3


I already know how to obtain a not sparse representation (Tensorflow: tensor binarization) and as I will end up with n > 1 million, I do not want to pass by the dense tensor to get the sparse one



thanks










share|improve this question























  • So the input is still dense in this case, right?

    – jdehesa
    Nov 23 '18 at 16:45











  • yes, input is still dense

    – taktak004
    Dec 1 '18 at 23:15
















0















I want to transform this dataset in such a way that each tensor has a given size n and that a feature at index i of this new tensor is set to 1 if and only if there is a i in the original feature (modulo n).



I hope the following example will make things clearer



Let's suppose I have a dataset like:



t = tf.constant([
[0, 3, 4],
[12, 2 ,4]])
ds = tf.data.Dataset.from_tensors(t)


I want to get the sparse equivalent of (if n = 9)



t = tf.constant([
[1, 0, 0, 1, 1, 0, 0, 0, 0], # index set to 1 are 0, 3 and 4
[0, 0, 1, 1, 1, 0, 0, 0, 0]]) # index set to 1 are 2, 4, and 12%9 = 3


I already know how to obtain a not sparse representation (Tensorflow: tensor binarization) and as I will end up with n > 1 million, I do not want to pass by the dense tensor to get the sparse one



thanks










share|improve this question























  • So the input is still dense in this case, right?

    – jdehesa
    Nov 23 '18 at 16:45











  • yes, input is still dense

    – taktak004
    Dec 1 '18 at 23:15














0












0








0








I want to transform this dataset in such a way that each tensor has a given size n and that a feature at index i of this new tensor is set to 1 if and only if there is a i in the original feature (modulo n).



I hope the following example will make things clearer



Let's suppose I have a dataset like:



t = tf.constant([
[0, 3, 4],
[12, 2 ,4]])
ds = tf.data.Dataset.from_tensors(t)


I want to get the sparse equivalent of (if n = 9)



t = tf.constant([
[1, 0, 0, 1, 1, 0, 0, 0, 0], # index set to 1 are 0, 3 and 4
[0, 0, 1, 1, 1, 0, 0, 0, 0]]) # index set to 1 are 2, 4, and 12%9 = 3


I already know how to obtain a not sparse representation (Tensorflow: tensor binarization) and as I will end up with n > 1 million, I do not want to pass by the dense tensor to get the sparse one



thanks










share|improve this question














I want to transform this dataset in such a way that each tensor has a given size n and that a feature at index i of this new tensor is set to 1 if and only if there is a i in the original feature (modulo n).



I hope the following example will make things clearer



Let's suppose I have a dataset like:



t = tf.constant([
[0, 3, 4],
[12, 2 ,4]])
ds = tf.data.Dataset.from_tensors(t)


I want to get the sparse equivalent of (if n = 9)



t = tf.constant([
[1, 0, 0, 1, 1, 0, 0, 0, 0], # index set to 1 are 0, 3 and 4
[0, 0, 1, 1, 1, 0, 0, 0, 0]]) # index set to 1 are 2, 4, and 12%9 = 3


I already know how to obtain a not sparse representation (Tensorflow: tensor binarization) and as I will end up with n > 1 million, I do not want to pass by the dense tensor to get the sparse one



thanks







python tensorflow sparse-matrix






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 23 '18 at 16:34









taktak004taktak004

410524




410524













  • So the input is still dense in this case, right?

    – jdehesa
    Nov 23 '18 at 16:45











  • yes, input is still dense

    – taktak004
    Dec 1 '18 at 23:15



















  • So the input is still dense in this case, right?

    – jdehesa
    Nov 23 '18 at 16:45











  • yes, input is still dense

    – taktak004
    Dec 1 '18 at 23:15

















So the input is still dense in this case, right?

– jdehesa
Nov 23 '18 at 16:45





So the input is still dense in this case, right?

– jdehesa
Nov 23 '18 at 16:45













yes, input is still dense

– taktak004
Dec 1 '18 at 23:15





yes, input is still dense

– taktak004
Dec 1 '18 at 23:15












1 Answer
1






active

oldest

votes


















1














Here is a possible implementation for that:



import tensorflow as tf

def binarization_sparse(t, n):
# Input size
t_shape = tf.shape(t)
t_rows = t_shape[0]
t_cols = t_shape[1]
# Make sparse row indices for each value
row_idx = tf.tile(tf.range(t_rows)[: ,tf.newaxis], [1, t_cols])
# Sparse column indices
col_idx = t % n
# "Flat" indices - needed to discard repetitions
total_idx = row_idx * n + col_idx
# Remove repeated elements
out_idx, _ = tf.unique(tf.reshape(total_idx, [-1]))
# Back to row and column indices
sparse_idx = tf.stack([out_idx // n, out_idx % n], axis=-1)
# Sparse values
sparse_values = tf.ones([tf.shape(sparse_idx)[0]], dtype=t.dtype)
# Make sparse tensor
out = tf.sparse.SparseTensor(tf.cast(sparse_idx, tf.int64),
sparse_values,
[t_rows, n])
# Reorder indices
out = tf.sparse.reorder(out)
return out

# Test
with tf.Graph().as_default(), tf.Session() as sess:
t = tf.constant([
[ 0, 3, 4],
[12, 2, 4]
])
# Sparse result
t_m1h_sp = binarization_sparse(t, 9)
# Convert to dense to check output
t_m1h = tf.sparse.to_dense(t_m1h_sp)
print(sess.run(t_m1h))


Output:



[[1 0 0 1 1 0 0 0 0]
[0 0 1 1 1 0 0 0 0]]


I added the logic to remove repeated elements because in principle it could happen, but if you have a guarantee that there are no repetitions (including modulo), you may skip that step. Also, I reorder the sparse tensor at the end. That is not strictly necessary here, but (I think) sparse operations sometimes expect the indices to be ordered (and sparse_idx may not be ordered).



Also, this solution is specific to 2D inputs. For 1D inputs would be simpler, and it can be written for higher-dimensional inputs as well if necessary. I think a completely general solution is possible but it would be more complicated (specially if you want to consider tensors with unknown number of dimensions).






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53450218%2ftensorflow-dense-tensor-to-sparse-binarized-hash-trick-tensor%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Here is a possible implementation for that:



    import tensorflow as tf

    def binarization_sparse(t, n):
    # Input size
    t_shape = tf.shape(t)
    t_rows = t_shape[0]
    t_cols = t_shape[1]
    # Make sparse row indices for each value
    row_idx = tf.tile(tf.range(t_rows)[: ,tf.newaxis], [1, t_cols])
    # Sparse column indices
    col_idx = t % n
    # "Flat" indices - needed to discard repetitions
    total_idx = row_idx * n + col_idx
    # Remove repeated elements
    out_idx, _ = tf.unique(tf.reshape(total_idx, [-1]))
    # Back to row and column indices
    sparse_idx = tf.stack([out_idx // n, out_idx % n], axis=-1)
    # Sparse values
    sparse_values = tf.ones([tf.shape(sparse_idx)[0]], dtype=t.dtype)
    # Make sparse tensor
    out = tf.sparse.SparseTensor(tf.cast(sparse_idx, tf.int64),
    sparse_values,
    [t_rows, n])
    # Reorder indices
    out = tf.sparse.reorder(out)
    return out

    # Test
    with tf.Graph().as_default(), tf.Session() as sess:
    t = tf.constant([
    [ 0, 3, 4],
    [12, 2, 4]
    ])
    # Sparse result
    t_m1h_sp = binarization_sparse(t, 9)
    # Convert to dense to check output
    t_m1h = tf.sparse.to_dense(t_m1h_sp)
    print(sess.run(t_m1h))


    Output:



    [[1 0 0 1 1 0 0 0 0]
    [0 0 1 1 1 0 0 0 0]]


    I added the logic to remove repeated elements because in principle it could happen, but if you have a guarantee that there are no repetitions (including modulo), you may skip that step. Also, I reorder the sparse tensor at the end. That is not strictly necessary here, but (I think) sparse operations sometimes expect the indices to be ordered (and sparse_idx may not be ordered).



    Also, this solution is specific to 2D inputs. For 1D inputs would be simpler, and it can be written for higher-dimensional inputs as well if necessary. I think a completely general solution is possible but it would be more complicated (specially if you want to consider tensors with unknown number of dimensions).






    share|improve this answer




























      1














      Here is a possible implementation for that:



      import tensorflow as tf

      def binarization_sparse(t, n):
      # Input size
      t_shape = tf.shape(t)
      t_rows = t_shape[0]
      t_cols = t_shape[1]
      # Make sparse row indices for each value
      row_idx = tf.tile(tf.range(t_rows)[: ,tf.newaxis], [1, t_cols])
      # Sparse column indices
      col_idx = t % n
      # "Flat" indices - needed to discard repetitions
      total_idx = row_idx * n + col_idx
      # Remove repeated elements
      out_idx, _ = tf.unique(tf.reshape(total_idx, [-1]))
      # Back to row and column indices
      sparse_idx = tf.stack([out_idx // n, out_idx % n], axis=-1)
      # Sparse values
      sparse_values = tf.ones([tf.shape(sparse_idx)[0]], dtype=t.dtype)
      # Make sparse tensor
      out = tf.sparse.SparseTensor(tf.cast(sparse_idx, tf.int64),
      sparse_values,
      [t_rows, n])
      # Reorder indices
      out = tf.sparse.reorder(out)
      return out

      # Test
      with tf.Graph().as_default(), tf.Session() as sess:
      t = tf.constant([
      [ 0, 3, 4],
      [12, 2, 4]
      ])
      # Sparse result
      t_m1h_sp = binarization_sparse(t, 9)
      # Convert to dense to check output
      t_m1h = tf.sparse.to_dense(t_m1h_sp)
      print(sess.run(t_m1h))


      Output:



      [[1 0 0 1 1 0 0 0 0]
      [0 0 1 1 1 0 0 0 0]]


      I added the logic to remove repeated elements because in principle it could happen, but if you have a guarantee that there are no repetitions (including modulo), you may skip that step. Also, I reorder the sparse tensor at the end. That is not strictly necessary here, but (I think) sparse operations sometimes expect the indices to be ordered (and sparse_idx may not be ordered).



      Also, this solution is specific to 2D inputs. For 1D inputs would be simpler, and it can be written for higher-dimensional inputs as well if necessary. I think a completely general solution is possible but it would be more complicated (specially if you want to consider tensors with unknown number of dimensions).






      share|improve this answer


























        1












        1








        1







        Here is a possible implementation for that:



        import tensorflow as tf

        def binarization_sparse(t, n):
        # Input size
        t_shape = tf.shape(t)
        t_rows = t_shape[0]
        t_cols = t_shape[1]
        # Make sparse row indices for each value
        row_idx = tf.tile(tf.range(t_rows)[: ,tf.newaxis], [1, t_cols])
        # Sparse column indices
        col_idx = t % n
        # "Flat" indices - needed to discard repetitions
        total_idx = row_idx * n + col_idx
        # Remove repeated elements
        out_idx, _ = tf.unique(tf.reshape(total_idx, [-1]))
        # Back to row and column indices
        sparse_idx = tf.stack([out_idx // n, out_idx % n], axis=-1)
        # Sparse values
        sparse_values = tf.ones([tf.shape(sparse_idx)[0]], dtype=t.dtype)
        # Make sparse tensor
        out = tf.sparse.SparseTensor(tf.cast(sparse_idx, tf.int64),
        sparse_values,
        [t_rows, n])
        # Reorder indices
        out = tf.sparse.reorder(out)
        return out

        # Test
        with tf.Graph().as_default(), tf.Session() as sess:
        t = tf.constant([
        [ 0, 3, 4],
        [12, 2, 4]
        ])
        # Sparse result
        t_m1h_sp = binarization_sparse(t, 9)
        # Convert to dense to check output
        t_m1h = tf.sparse.to_dense(t_m1h_sp)
        print(sess.run(t_m1h))


        Output:



        [[1 0 0 1 1 0 0 0 0]
        [0 0 1 1 1 0 0 0 0]]


        I added the logic to remove repeated elements because in principle it could happen, but if you have a guarantee that there are no repetitions (including modulo), you may skip that step. Also, I reorder the sparse tensor at the end. That is not strictly necessary here, but (I think) sparse operations sometimes expect the indices to be ordered (and sparse_idx may not be ordered).



        Also, this solution is specific to 2D inputs. For 1D inputs would be simpler, and it can be written for higher-dimensional inputs as well if necessary. I think a completely general solution is possible but it would be more complicated (specially if you want to consider tensors with unknown number of dimensions).






        share|improve this answer













        Here is a possible implementation for that:



        import tensorflow as tf

        def binarization_sparse(t, n):
        # Input size
        t_shape = tf.shape(t)
        t_rows = t_shape[0]
        t_cols = t_shape[1]
        # Make sparse row indices for each value
        row_idx = tf.tile(tf.range(t_rows)[: ,tf.newaxis], [1, t_cols])
        # Sparse column indices
        col_idx = t % n
        # "Flat" indices - needed to discard repetitions
        total_idx = row_idx * n + col_idx
        # Remove repeated elements
        out_idx, _ = tf.unique(tf.reshape(total_idx, [-1]))
        # Back to row and column indices
        sparse_idx = tf.stack([out_idx // n, out_idx % n], axis=-1)
        # Sparse values
        sparse_values = tf.ones([tf.shape(sparse_idx)[0]], dtype=t.dtype)
        # Make sparse tensor
        out = tf.sparse.SparseTensor(tf.cast(sparse_idx, tf.int64),
        sparse_values,
        [t_rows, n])
        # Reorder indices
        out = tf.sparse.reorder(out)
        return out

        # Test
        with tf.Graph().as_default(), tf.Session() as sess:
        t = tf.constant([
        [ 0, 3, 4],
        [12, 2, 4]
        ])
        # Sparse result
        t_m1h_sp = binarization_sparse(t, 9)
        # Convert to dense to check output
        t_m1h = tf.sparse.to_dense(t_m1h_sp)
        print(sess.run(t_m1h))


        Output:



        [[1 0 0 1 1 0 0 0 0]
        [0 0 1 1 1 0 0 0 0]]


        I added the logic to remove repeated elements because in principle it could happen, but if you have a guarantee that there are no repetitions (including modulo), you may skip that step. Also, I reorder the sparse tensor at the end. That is not strictly necessary here, but (I think) sparse operations sometimes expect the indices to be ordered (and sparse_idx may not be ordered).



        Also, this solution is specific to 2D inputs. For 1D inputs would be simpler, and it can be written for higher-dimensional inputs as well if necessary. I think a completely general solution is possible but it would be more complicated (specially if you want to consider tensors with unknown number of dimensions).







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 23 '18 at 17:25









        jdehesajdehesa

        24.4k43554




        24.4k43554
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53450218%2ftensorflow-dense-tensor-to-sparse-binarized-hash-trick-tensor%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Wiesbaden

            Marschland

            Dieringhausen