How do I count how many items are in a specific row in my RDD
as you can tell I’m fairly new to using Pyspark Python my RDD is set out as follows:
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
Is there anyway I can count how many of these records I have stored within my RDD such as count all the IDs in the RDD. So that the output would tell me I have 5 of them.
I have tried using RDD.count() but that just seems to return how many items I have in my dataset in total.
python scala pyspark
add a comment |
as you can tell I’m fairly new to using Pyspark Python my RDD is set out as follows:
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
Is there anyway I can count how many of these records I have stored within my RDD such as count all the IDs in the RDD. So that the output would tell me I have 5 of them.
I have tried using RDD.count() but that just seems to return how many items I have in my dataset in total.
python scala pyspark
1
Post your attempt at solving this, i.e. code please.
– Happypig375
Nov 25 '18 at 14:08
RDD.count() #This did not return what I wanted
– soumbo
Nov 25 '18 at 14:12
No, something more than that. For example, a custom function for counting each record.
– Happypig375
Nov 25 '18 at 14:28
1
Look at this question for example. you should ask a question in a more proper way. stackoverflow.com/questions/53153149/…
– Ali AzG
Nov 25 '18 at 15:30
2
It's not clear what you need. Give an example
– Mahmoud Hanafy
Nov 25 '18 at 16:18
add a comment |
as you can tell I’m fairly new to using Pyspark Python my RDD is set out as follows:
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
Is there anyway I can count how many of these records I have stored within my RDD such as count all the IDs in the RDD. So that the output would tell me I have 5 of them.
I have tried using RDD.count() but that just seems to return how many items I have in my dataset in total.
python scala pyspark
as you can tell I’m fairly new to using Pyspark Python my RDD is set out as follows:
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
Is there anyway I can count how many of these records I have stored within my RDD such as count all the IDs in the RDD. So that the output would tell me I have 5 of them.
I have tried using RDD.count() but that just seems to return how many items I have in my dataset in total.
python scala pyspark
python scala pyspark
asked Nov 25 '18 at 13:55
soumbosoumbo
6
6
1
Post your attempt at solving this, i.e. code please.
– Happypig375
Nov 25 '18 at 14:08
RDD.count() #This did not return what I wanted
– soumbo
Nov 25 '18 at 14:12
No, something more than that. For example, a custom function for counting each record.
– Happypig375
Nov 25 '18 at 14:28
1
Look at this question for example. you should ask a question in a more proper way. stackoverflow.com/questions/53153149/…
– Ali AzG
Nov 25 '18 at 15:30
2
It's not clear what you need. Give an example
– Mahmoud Hanafy
Nov 25 '18 at 16:18
add a comment |
1
Post your attempt at solving this, i.e. code please.
– Happypig375
Nov 25 '18 at 14:08
RDD.count() #This did not return what I wanted
– soumbo
Nov 25 '18 at 14:12
No, something more than that. For example, a custom function for counting each record.
– Happypig375
Nov 25 '18 at 14:28
1
Look at this question for example. you should ask a question in a more proper way. stackoverflow.com/questions/53153149/…
– Ali AzG
Nov 25 '18 at 15:30
2
It's not clear what you need. Give an example
– Mahmoud Hanafy
Nov 25 '18 at 16:18
1
1
Post your attempt at solving this, i.e. code please.
– Happypig375
Nov 25 '18 at 14:08
Post your attempt at solving this, i.e. code please.
– Happypig375
Nov 25 '18 at 14:08
RDD.count() #This did not return what I wanted
– soumbo
Nov 25 '18 at 14:12
RDD.count() #This did not return what I wanted
– soumbo
Nov 25 '18 at 14:12
No, something more than that. For example, a custom function for counting each record.
– Happypig375
Nov 25 '18 at 14:28
No, something more than that. For example, a custom function for counting each record.
– Happypig375
Nov 25 '18 at 14:28
1
1
Look at this question for example. you should ask a question in a more proper way. stackoverflow.com/questions/53153149/…
– Ali AzG
Nov 25 '18 at 15:30
Look at this question for example. you should ask a question in a more proper way. stackoverflow.com/questions/53153149/…
– Ali AzG
Nov 25 '18 at 15:30
2
2
It's not clear what you need. Give an example
– Mahmoud Hanafy
Nov 25 '18 at 16:18
It's not clear what you need. Give an example
– Mahmoud Hanafy
Nov 25 '18 at 16:18
add a comment |
1 Answer
1
active
oldest
votes
If you have RDD of tuples like RDD[(ID, First name, Last name, Address)] then you can perform below operation to do different types of counting.
Count the total number of elements/Rows in your RDD.
rdd.count()
Count Distinct IDs from your above RDD. Select the ID element and then do a distinct on top of it.
rdd.map(lambda x : x[0]).distinct().count()
Hope it helps to do the different sort of counting.
Let me know if you need any further help here.
Regards,
Neeraj
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53468203%2fhow-do-i-count-how-many-items-are-in-a-specific-row-in-my-rdd%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you have RDD of tuples like RDD[(ID, First name, Last name, Address)] then you can perform below operation to do different types of counting.
Count the total number of elements/Rows in your RDD.
rdd.count()
Count Distinct IDs from your above RDD. Select the ID element and then do a distinct on top of it.
rdd.map(lambda x : x[0]).distinct().count()
Hope it helps to do the different sort of counting.
Let me know if you need any further help here.
Regards,
Neeraj
add a comment |
If you have RDD of tuples like RDD[(ID, First name, Last name, Address)] then you can perform below operation to do different types of counting.
Count the total number of elements/Rows in your RDD.
rdd.count()
Count Distinct IDs from your above RDD. Select the ID element and then do a distinct on top of it.
rdd.map(lambda x : x[0]).distinct().count()
Hope it helps to do the different sort of counting.
Let me know if you need any further help here.
Regards,
Neeraj
add a comment |
If you have RDD of tuples like RDD[(ID, First name, Last name, Address)] then you can perform below operation to do different types of counting.
Count the total number of elements/Rows in your RDD.
rdd.count()
Count Distinct IDs from your above RDD. Select the ID element and then do a distinct on top of it.
rdd.map(lambda x : x[0]).distinct().count()
Hope it helps to do the different sort of counting.
Let me know if you need any further help here.
Regards,
Neeraj
If you have RDD of tuples like RDD[(ID, First name, Last name, Address)] then you can perform below operation to do different types of counting.
Count the total number of elements/Rows in your RDD.
rdd.count()
Count Distinct IDs from your above RDD. Select the ID element and then do a distinct on top of it.
rdd.map(lambda x : x[0]).distinct().count()
Hope it helps to do the different sort of counting.
Let me know if you need any further help here.
Regards,
Neeraj
answered Nov 25 '18 at 20:35
neeraj bhadanineeraj bhadani
915313
915313
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53468203%2fhow-do-i-count-how-many-items-are-in-a-specific-row-in-my-rdd%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Post your attempt at solving this, i.e. code please.
– Happypig375
Nov 25 '18 at 14:08
RDD.count() #This did not return what I wanted
– soumbo
Nov 25 '18 at 14:12
No, something more than that. For example, a custom function for counting each record.
– Happypig375
Nov 25 '18 at 14:28
1
Look at this question for example. you should ask a question in a more proper way. stackoverflow.com/questions/53153149/…
– Ali AzG
Nov 25 '18 at 15:30
2
It's not clear what you need. Give an example
– Mahmoud Hanafy
Nov 25 '18 at 16:18