How do I count how many items are in a specific row in my RDD












-1















as you can tell I’m fairly new to using Pyspark Python my RDD is set out as follows:
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
Is there anyway I can count how many of these records I have stored within my RDD such as count all the IDs in the RDD. So that the output would tell me I have 5 of them.
I have tried using RDD.count() but that just seems to return how many items I have in my dataset in total.










share|improve this question


















  • 1





    Post your attempt at solving this, i.e. code please.

    – Happypig375
    Nov 25 '18 at 14:08











  • RDD.count() #This did not return what I wanted

    – soumbo
    Nov 25 '18 at 14:12











  • No, something more than that. For example, a custom function for counting each record.

    – Happypig375
    Nov 25 '18 at 14:28






  • 1





    Look at this question for example. you should ask a question in a more proper way. stackoverflow.com/questions/53153149/…

    – Ali AzG
    Nov 25 '18 at 15:30






  • 2





    It's not clear what you need. Give an example

    – Mahmoud Hanafy
    Nov 25 '18 at 16:18
















-1















as you can tell I’m fairly new to using Pyspark Python my RDD is set out as follows:
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
Is there anyway I can count how many of these records I have stored within my RDD such as count all the IDs in the RDD. So that the output would tell me I have 5 of them.
I have tried using RDD.count() but that just seems to return how many items I have in my dataset in total.










share|improve this question


















  • 1





    Post your attempt at solving this, i.e. code please.

    – Happypig375
    Nov 25 '18 at 14:08











  • RDD.count() #This did not return what I wanted

    – soumbo
    Nov 25 '18 at 14:12











  • No, something more than that. For example, a custom function for counting each record.

    – Happypig375
    Nov 25 '18 at 14:28






  • 1





    Look at this question for example. you should ask a question in a more proper way. stackoverflow.com/questions/53153149/…

    – Ali AzG
    Nov 25 '18 at 15:30






  • 2





    It's not clear what you need. Give an example

    – Mahmoud Hanafy
    Nov 25 '18 at 16:18














-1












-1








-1








as you can tell I’m fairly new to using Pyspark Python my RDD is set out as follows:
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
Is there anyway I can count how many of these records I have stored within my RDD such as count all the IDs in the RDD. So that the output would tell me I have 5 of them.
I have tried using RDD.count() but that just seems to return how many items I have in my dataset in total.










share|improve this question














as you can tell I’m fairly new to using Pyspark Python my RDD is set out as follows:
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
(ID, First name, Last name, Address)
Is there anyway I can count how many of these records I have stored within my RDD such as count all the IDs in the RDD. So that the output would tell me I have 5 of them.
I have tried using RDD.count() but that just seems to return how many items I have in my dataset in total.







python scala pyspark






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 25 '18 at 13:55









soumbosoumbo

6




6








  • 1





    Post your attempt at solving this, i.e. code please.

    – Happypig375
    Nov 25 '18 at 14:08











  • RDD.count() #This did not return what I wanted

    – soumbo
    Nov 25 '18 at 14:12











  • No, something more than that. For example, a custom function for counting each record.

    – Happypig375
    Nov 25 '18 at 14:28






  • 1





    Look at this question for example. you should ask a question in a more proper way. stackoverflow.com/questions/53153149/…

    – Ali AzG
    Nov 25 '18 at 15:30






  • 2





    It's not clear what you need. Give an example

    – Mahmoud Hanafy
    Nov 25 '18 at 16:18














  • 1





    Post your attempt at solving this, i.e. code please.

    – Happypig375
    Nov 25 '18 at 14:08











  • RDD.count() #This did not return what I wanted

    – soumbo
    Nov 25 '18 at 14:12











  • No, something more than that. For example, a custom function for counting each record.

    – Happypig375
    Nov 25 '18 at 14:28






  • 1





    Look at this question for example. you should ask a question in a more proper way. stackoverflow.com/questions/53153149/…

    – Ali AzG
    Nov 25 '18 at 15:30






  • 2





    It's not clear what you need. Give an example

    – Mahmoud Hanafy
    Nov 25 '18 at 16:18








1




1





Post your attempt at solving this, i.e. code please.

– Happypig375
Nov 25 '18 at 14:08





Post your attempt at solving this, i.e. code please.

– Happypig375
Nov 25 '18 at 14:08













RDD.count() #This did not return what I wanted

– soumbo
Nov 25 '18 at 14:12





RDD.count() #This did not return what I wanted

– soumbo
Nov 25 '18 at 14:12













No, something more than that. For example, a custom function for counting each record.

– Happypig375
Nov 25 '18 at 14:28





No, something more than that. For example, a custom function for counting each record.

– Happypig375
Nov 25 '18 at 14:28




1




1





Look at this question for example. you should ask a question in a more proper way. stackoverflow.com/questions/53153149/…

– Ali AzG
Nov 25 '18 at 15:30





Look at this question for example. you should ask a question in a more proper way. stackoverflow.com/questions/53153149/…

– Ali AzG
Nov 25 '18 at 15:30




2




2





It's not clear what you need. Give an example

– Mahmoud Hanafy
Nov 25 '18 at 16:18





It's not clear what you need. Give an example

– Mahmoud Hanafy
Nov 25 '18 at 16:18












1 Answer
1






active

oldest

votes


















0














If you have RDD of tuples like RDD[(ID, First name, Last name, Address)] then you can perform below operation to do different types of counting.





  1. Count the total number of elements/Rows in your RDD.



    rdd.count()




  2. Count Distinct IDs from your above RDD. Select the ID element and then do a distinct on top of it.



    rdd.map(lambda x : x[0]).distinct().count()




Hope it helps to do the different sort of counting.



Let me know if you need any further help here.



Regards,



Neeraj






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53468203%2fhow-do-i-count-how-many-items-are-in-a-specific-row-in-my-rdd%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    If you have RDD of tuples like RDD[(ID, First name, Last name, Address)] then you can perform below operation to do different types of counting.





    1. Count the total number of elements/Rows in your RDD.



      rdd.count()




    2. Count Distinct IDs from your above RDD. Select the ID element and then do a distinct on top of it.



      rdd.map(lambda x : x[0]).distinct().count()




    Hope it helps to do the different sort of counting.



    Let me know if you need any further help here.



    Regards,



    Neeraj






    share|improve this answer




























      0














      If you have RDD of tuples like RDD[(ID, First name, Last name, Address)] then you can perform below operation to do different types of counting.





      1. Count the total number of elements/Rows in your RDD.



        rdd.count()




      2. Count Distinct IDs from your above RDD. Select the ID element and then do a distinct on top of it.



        rdd.map(lambda x : x[0]).distinct().count()




      Hope it helps to do the different sort of counting.



      Let me know if you need any further help here.



      Regards,



      Neeraj






      share|improve this answer


























        0












        0








        0







        If you have RDD of tuples like RDD[(ID, First name, Last name, Address)] then you can perform below operation to do different types of counting.





        1. Count the total number of elements/Rows in your RDD.



          rdd.count()




        2. Count Distinct IDs from your above RDD. Select the ID element and then do a distinct on top of it.



          rdd.map(lambda x : x[0]).distinct().count()




        Hope it helps to do the different sort of counting.



        Let me know if you need any further help here.



        Regards,



        Neeraj






        share|improve this answer













        If you have RDD of tuples like RDD[(ID, First name, Last name, Address)] then you can perform below operation to do different types of counting.





        1. Count the total number of elements/Rows in your RDD.



          rdd.count()




        2. Count Distinct IDs from your above RDD. Select the ID element and then do a distinct on top of it.



          rdd.map(lambda x : x[0]).distinct().count()




        Hope it helps to do the different sort of counting.



        Let me know if you need any further help here.



        Regards,



        Neeraj







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 25 '18 at 20:35









        neeraj bhadanineeraj bhadani

        915313




        915313
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53468203%2fhow-do-i-count-how-many-items-are-in-a-specific-row-in-my-rdd%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Tonle Sap (See)

            I get strange results when I access the Sqlitedatabase with Unity C# via XAMPP

            Guatemaltekische Davis-Cup-Mannschaft