Tensorflow dense tensor to sparse binarized hash trick tensor
I want to transform this dataset in such a way that each tensor has a given size n
and that a feature at index i
of this new tensor is set to 1 if and only if there is a i
in the original feature (modulo n).
I hope the following example will make things clearer
Let's suppose I have a dataset like:
t = tf.constant([
[0, 3, 4],
[12, 2 ,4]])
ds = tf.data.Dataset.from_tensors(t)
I want to get the sparse equivalent of (if n
= 9)
t = tf.constant([
[1, 0, 0, 1, 1, 0, 0, 0, 0], # index set to 1 are 0, 3 and 4
[0, 0, 1, 1, 1, 0, 0, 0, 0]]) # index set to 1 are 2, 4, and 12%9 = 3
I already know how to obtain a not sparse representation (Tensorflow: tensor binarization) and as I will end up with n > 1 million, I do not want to pass by the dense tensor to get the sparse one
thanks
python tensorflow sparse-matrix
add a comment |
I want to transform this dataset in such a way that each tensor has a given size n
and that a feature at index i
of this new tensor is set to 1 if and only if there is a i
in the original feature (modulo n).
I hope the following example will make things clearer
Let's suppose I have a dataset like:
t = tf.constant([
[0, 3, 4],
[12, 2 ,4]])
ds = tf.data.Dataset.from_tensors(t)
I want to get the sparse equivalent of (if n
= 9)
t = tf.constant([
[1, 0, 0, 1, 1, 0, 0, 0, 0], # index set to 1 are 0, 3 and 4
[0, 0, 1, 1, 1, 0, 0, 0, 0]]) # index set to 1 are 2, 4, and 12%9 = 3
I already know how to obtain a not sparse representation (Tensorflow: tensor binarization) and as I will end up with n > 1 million, I do not want to pass by the dense tensor to get the sparse one
thanks
python tensorflow sparse-matrix
So the input is still dense in this case, right?
– jdehesa
Nov 23 '18 at 16:45
yes, input is still dense
– taktak004
Dec 1 '18 at 23:15
add a comment |
I want to transform this dataset in such a way that each tensor has a given size n
and that a feature at index i
of this new tensor is set to 1 if and only if there is a i
in the original feature (modulo n).
I hope the following example will make things clearer
Let's suppose I have a dataset like:
t = tf.constant([
[0, 3, 4],
[12, 2 ,4]])
ds = tf.data.Dataset.from_tensors(t)
I want to get the sparse equivalent of (if n
= 9)
t = tf.constant([
[1, 0, 0, 1, 1, 0, 0, 0, 0], # index set to 1 are 0, 3 and 4
[0, 0, 1, 1, 1, 0, 0, 0, 0]]) # index set to 1 are 2, 4, and 12%9 = 3
I already know how to obtain a not sparse representation (Tensorflow: tensor binarization) and as I will end up with n > 1 million, I do not want to pass by the dense tensor to get the sparse one
thanks
python tensorflow sparse-matrix
I want to transform this dataset in such a way that each tensor has a given size n
and that a feature at index i
of this new tensor is set to 1 if and only if there is a i
in the original feature (modulo n).
I hope the following example will make things clearer
Let's suppose I have a dataset like:
t = tf.constant([
[0, 3, 4],
[12, 2 ,4]])
ds = tf.data.Dataset.from_tensors(t)
I want to get the sparse equivalent of (if n
= 9)
t = tf.constant([
[1, 0, 0, 1, 1, 0, 0, 0, 0], # index set to 1 are 0, 3 and 4
[0, 0, 1, 1, 1, 0, 0, 0, 0]]) # index set to 1 are 2, 4, and 12%9 = 3
I already know how to obtain a not sparse representation (Tensorflow: tensor binarization) and as I will end up with n > 1 million, I do not want to pass by the dense tensor to get the sparse one
thanks
python tensorflow sparse-matrix
python tensorflow sparse-matrix
asked Nov 23 '18 at 16:34
taktak004taktak004
410524
410524
So the input is still dense in this case, right?
– jdehesa
Nov 23 '18 at 16:45
yes, input is still dense
– taktak004
Dec 1 '18 at 23:15
add a comment |
So the input is still dense in this case, right?
– jdehesa
Nov 23 '18 at 16:45
yes, input is still dense
– taktak004
Dec 1 '18 at 23:15
So the input is still dense in this case, right?
– jdehesa
Nov 23 '18 at 16:45
So the input is still dense in this case, right?
– jdehesa
Nov 23 '18 at 16:45
yes, input is still dense
– taktak004
Dec 1 '18 at 23:15
yes, input is still dense
– taktak004
Dec 1 '18 at 23:15
add a comment |
1 Answer
1
active
oldest
votes
Here is a possible implementation for that:
import tensorflow as tf
def binarization_sparse(t, n):
# Input size
t_shape = tf.shape(t)
t_rows = t_shape[0]
t_cols = t_shape[1]
# Make sparse row indices for each value
row_idx = tf.tile(tf.range(t_rows)[: ,tf.newaxis], [1, t_cols])
# Sparse column indices
col_idx = t % n
# "Flat" indices - needed to discard repetitions
total_idx = row_idx * n + col_idx
# Remove repeated elements
out_idx, _ = tf.unique(tf.reshape(total_idx, [-1]))
# Back to row and column indices
sparse_idx = tf.stack([out_idx // n, out_idx % n], axis=-1)
# Sparse values
sparse_values = tf.ones([tf.shape(sparse_idx)[0]], dtype=t.dtype)
# Make sparse tensor
out = tf.sparse.SparseTensor(tf.cast(sparse_idx, tf.int64),
sparse_values,
[t_rows, n])
# Reorder indices
out = tf.sparse.reorder(out)
return out
# Test
with tf.Graph().as_default(), tf.Session() as sess:
t = tf.constant([
[ 0, 3, 4],
[12, 2, 4]
])
# Sparse result
t_m1h_sp = binarization_sparse(t, 9)
# Convert to dense to check output
t_m1h = tf.sparse.to_dense(t_m1h_sp)
print(sess.run(t_m1h))
Output:
[[1 0 0 1 1 0 0 0 0]
[0 0 1 1 1 0 0 0 0]]
I added the logic to remove repeated elements because in principle it could happen, but if you have a guarantee that there are no repetitions (including modulo), you may skip that step. Also, I reorder the sparse tensor at the end. That is not strictly necessary here, but (I think) sparse operations sometimes expect the indices to be ordered (and sparse_idx
may not be ordered).
Also, this solution is specific to 2D inputs. For 1D inputs would be simpler, and it can be written for higher-dimensional inputs as well if necessary. I think a completely general solution is possible but it would be more complicated (specially if you want to consider tensors with unknown number of dimensions).
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53450218%2ftensorflow-dense-tensor-to-sparse-binarized-hash-trick-tensor%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here is a possible implementation for that:
import tensorflow as tf
def binarization_sparse(t, n):
# Input size
t_shape = tf.shape(t)
t_rows = t_shape[0]
t_cols = t_shape[1]
# Make sparse row indices for each value
row_idx = tf.tile(tf.range(t_rows)[: ,tf.newaxis], [1, t_cols])
# Sparse column indices
col_idx = t % n
# "Flat" indices - needed to discard repetitions
total_idx = row_idx * n + col_idx
# Remove repeated elements
out_idx, _ = tf.unique(tf.reshape(total_idx, [-1]))
# Back to row and column indices
sparse_idx = tf.stack([out_idx // n, out_idx % n], axis=-1)
# Sparse values
sparse_values = tf.ones([tf.shape(sparse_idx)[0]], dtype=t.dtype)
# Make sparse tensor
out = tf.sparse.SparseTensor(tf.cast(sparse_idx, tf.int64),
sparse_values,
[t_rows, n])
# Reorder indices
out = tf.sparse.reorder(out)
return out
# Test
with tf.Graph().as_default(), tf.Session() as sess:
t = tf.constant([
[ 0, 3, 4],
[12, 2, 4]
])
# Sparse result
t_m1h_sp = binarization_sparse(t, 9)
# Convert to dense to check output
t_m1h = tf.sparse.to_dense(t_m1h_sp)
print(sess.run(t_m1h))
Output:
[[1 0 0 1 1 0 0 0 0]
[0 0 1 1 1 0 0 0 0]]
I added the logic to remove repeated elements because in principle it could happen, but if you have a guarantee that there are no repetitions (including modulo), you may skip that step. Also, I reorder the sparse tensor at the end. That is not strictly necessary here, but (I think) sparse operations sometimes expect the indices to be ordered (and sparse_idx
may not be ordered).
Also, this solution is specific to 2D inputs. For 1D inputs would be simpler, and it can be written for higher-dimensional inputs as well if necessary. I think a completely general solution is possible but it would be more complicated (specially if you want to consider tensors with unknown number of dimensions).
add a comment |
Here is a possible implementation for that:
import tensorflow as tf
def binarization_sparse(t, n):
# Input size
t_shape = tf.shape(t)
t_rows = t_shape[0]
t_cols = t_shape[1]
# Make sparse row indices for each value
row_idx = tf.tile(tf.range(t_rows)[: ,tf.newaxis], [1, t_cols])
# Sparse column indices
col_idx = t % n
# "Flat" indices - needed to discard repetitions
total_idx = row_idx * n + col_idx
# Remove repeated elements
out_idx, _ = tf.unique(tf.reshape(total_idx, [-1]))
# Back to row and column indices
sparse_idx = tf.stack([out_idx // n, out_idx % n], axis=-1)
# Sparse values
sparse_values = tf.ones([tf.shape(sparse_idx)[0]], dtype=t.dtype)
# Make sparse tensor
out = tf.sparse.SparseTensor(tf.cast(sparse_idx, tf.int64),
sparse_values,
[t_rows, n])
# Reorder indices
out = tf.sparse.reorder(out)
return out
# Test
with tf.Graph().as_default(), tf.Session() as sess:
t = tf.constant([
[ 0, 3, 4],
[12, 2, 4]
])
# Sparse result
t_m1h_sp = binarization_sparse(t, 9)
# Convert to dense to check output
t_m1h = tf.sparse.to_dense(t_m1h_sp)
print(sess.run(t_m1h))
Output:
[[1 0 0 1 1 0 0 0 0]
[0 0 1 1 1 0 0 0 0]]
I added the logic to remove repeated elements because in principle it could happen, but if you have a guarantee that there are no repetitions (including modulo), you may skip that step. Also, I reorder the sparse tensor at the end. That is not strictly necessary here, but (I think) sparse operations sometimes expect the indices to be ordered (and sparse_idx
may not be ordered).
Also, this solution is specific to 2D inputs. For 1D inputs would be simpler, and it can be written for higher-dimensional inputs as well if necessary. I think a completely general solution is possible but it would be more complicated (specially if you want to consider tensors with unknown number of dimensions).
add a comment |
Here is a possible implementation for that:
import tensorflow as tf
def binarization_sparse(t, n):
# Input size
t_shape = tf.shape(t)
t_rows = t_shape[0]
t_cols = t_shape[1]
# Make sparse row indices for each value
row_idx = tf.tile(tf.range(t_rows)[: ,tf.newaxis], [1, t_cols])
# Sparse column indices
col_idx = t % n
# "Flat" indices - needed to discard repetitions
total_idx = row_idx * n + col_idx
# Remove repeated elements
out_idx, _ = tf.unique(tf.reshape(total_idx, [-1]))
# Back to row and column indices
sparse_idx = tf.stack([out_idx // n, out_idx % n], axis=-1)
# Sparse values
sparse_values = tf.ones([tf.shape(sparse_idx)[0]], dtype=t.dtype)
# Make sparse tensor
out = tf.sparse.SparseTensor(tf.cast(sparse_idx, tf.int64),
sparse_values,
[t_rows, n])
# Reorder indices
out = tf.sparse.reorder(out)
return out
# Test
with tf.Graph().as_default(), tf.Session() as sess:
t = tf.constant([
[ 0, 3, 4],
[12, 2, 4]
])
# Sparse result
t_m1h_sp = binarization_sparse(t, 9)
# Convert to dense to check output
t_m1h = tf.sparse.to_dense(t_m1h_sp)
print(sess.run(t_m1h))
Output:
[[1 0 0 1 1 0 0 0 0]
[0 0 1 1 1 0 0 0 0]]
I added the logic to remove repeated elements because in principle it could happen, but if you have a guarantee that there are no repetitions (including modulo), you may skip that step. Also, I reorder the sparse tensor at the end. That is not strictly necessary here, but (I think) sparse operations sometimes expect the indices to be ordered (and sparse_idx
may not be ordered).
Also, this solution is specific to 2D inputs. For 1D inputs would be simpler, and it can be written for higher-dimensional inputs as well if necessary. I think a completely general solution is possible but it would be more complicated (specially if you want to consider tensors with unknown number of dimensions).
Here is a possible implementation for that:
import tensorflow as tf
def binarization_sparse(t, n):
# Input size
t_shape = tf.shape(t)
t_rows = t_shape[0]
t_cols = t_shape[1]
# Make sparse row indices for each value
row_idx = tf.tile(tf.range(t_rows)[: ,tf.newaxis], [1, t_cols])
# Sparse column indices
col_idx = t % n
# "Flat" indices - needed to discard repetitions
total_idx = row_idx * n + col_idx
# Remove repeated elements
out_idx, _ = tf.unique(tf.reshape(total_idx, [-1]))
# Back to row and column indices
sparse_idx = tf.stack([out_idx // n, out_idx % n], axis=-1)
# Sparse values
sparse_values = tf.ones([tf.shape(sparse_idx)[0]], dtype=t.dtype)
# Make sparse tensor
out = tf.sparse.SparseTensor(tf.cast(sparse_idx, tf.int64),
sparse_values,
[t_rows, n])
# Reorder indices
out = tf.sparse.reorder(out)
return out
# Test
with tf.Graph().as_default(), tf.Session() as sess:
t = tf.constant([
[ 0, 3, 4],
[12, 2, 4]
])
# Sparse result
t_m1h_sp = binarization_sparse(t, 9)
# Convert to dense to check output
t_m1h = tf.sparse.to_dense(t_m1h_sp)
print(sess.run(t_m1h))
Output:
[[1 0 0 1 1 0 0 0 0]
[0 0 1 1 1 0 0 0 0]]
I added the logic to remove repeated elements because in principle it could happen, but if you have a guarantee that there are no repetitions (including modulo), you may skip that step. Also, I reorder the sparse tensor at the end. That is not strictly necessary here, but (I think) sparse operations sometimes expect the indices to be ordered (and sparse_idx
may not be ordered).
Also, this solution is specific to 2D inputs. For 1D inputs would be simpler, and it can be written for higher-dimensional inputs as well if necessary. I think a completely general solution is possible but it would be more complicated (specially if you want to consider tensors with unknown number of dimensions).
answered Nov 23 '18 at 17:25
jdehesajdehesa
24.4k43554
24.4k43554
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53450218%2ftensorflow-dense-tensor-to-sparse-binarized-hash-trick-tensor%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
So the input is still dense in this case, right?
– jdehesa
Nov 23 '18 at 16:45
yes, input is still dense
– taktak004
Dec 1 '18 at 23:15