How to standardize the output of USQL to have data for all the columns when converted from JSON












0















How to standardize the output of USQL to have data for all the columns when converted from JSON



We have a requirement to standardize the output of USQL. The USQL reads the JSON (source file) data and convert it to csv format. The problem is that the number of columns we have for each row in the csv is not the same because of missing data in the JSON. Sometimes the result set of USQL have a row in csv with "N" columns, another row is with "N+1" columns (cells). We would like to standardize the output, having the same number columns in csv for all the rows. How do we achieve this? We don't have any control over source file data, we would need to do the standardization while processing. Has anyone faced similar challenges and found a solution? Thanks for your help!



Input details :



{"map": {"key1": 100,    "key2": 101,    "key3": 102},  "id": 2,  "time": 1540300241230}
{"map": {"key1": 200, "key2": 201, "key3": 202 "key4": 203}, "id": 2, "time": 1540320246930}
{"map": {"key1": 300, "key3": 301, "key4": 303}, "id": 2, "time": 1540350246930}


Once the above JSON is converted to CSV based on some calculation



Output as is which is not correct



key1, key2, key3, key4



100, 101, 102
200, 201, 202, 203
300, 301, 303


Value "301" is not associated with the key2



Output expected - # is the default for missing column values



key1, key2, key3, key4



100, 101, 102,  #
200, 201, 202, 203
300, #, 301, 303


Later all the headings ( key1, key2..) will be replaced with actual header names ( Pressure, Velocity...etc)










share|improve this question

























  • Can you post the u-sql query you have so far so we have a starting point?

    – Peter Bons
    Nov 26 '18 at 9:19











  • github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files

    – Akhilesh Balakrishnan
    Nov 27 '18 at 15:50
















0















How to standardize the output of USQL to have data for all the columns when converted from JSON



We have a requirement to standardize the output of USQL. The USQL reads the JSON (source file) data and convert it to csv format. The problem is that the number of columns we have for each row in the csv is not the same because of missing data in the JSON. Sometimes the result set of USQL have a row in csv with "N" columns, another row is with "N+1" columns (cells). We would like to standardize the output, having the same number columns in csv for all the rows. How do we achieve this? We don't have any control over source file data, we would need to do the standardization while processing. Has anyone faced similar challenges and found a solution? Thanks for your help!



Input details :



{"map": {"key1": 100,    "key2": 101,    "key3": 102},  "id": 2,  "time": 1540300241230}
{"map": {"key1": 200, "key2": 201, "key3": 202 "key4": 203}, "id": 2, "time": 1540320246930}
{"map": {"key1": 300, "key3": 301, "key4": 303}, "id": 2, "time": 1540350246930}


Once the above JSON is converted to CSV based on some calculation



Output as is which is not correct



key1, key2, key3, key4



100, 101, 102
200, 201, 202, 203
300, 301, 303


Value "301" is not associated with the key2



Output expected - # is the default for missing column values



key1, key2, key3, key4



100, 101, 102,  #
200, 201, 202, 203
300, #, 301, 303


Later all the headings ( key1, key2..) will be replaced with actual header names ( Pressure, Velocity...etc)










share|improve this question

























  • Can you post the u-sql query you have so far so we have a starting point?

    – Peter Bons
    Nov 26 '18 at 9:19











  • github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files

    – Akhilesh Balakrishnan
    Nov 27 '18 at 15:50














0












0








0








How to standardize the output of USQL to have data for all the columns when converted from JSON



We have a requirement to standardize the output of USQL. The USQL reads the JSON (source file) data and convert it to csv format. The problem is that the number of columns we have for each row in the csv is not the same because of missing data in the JSON. Sometimes the result set of USQL have a row in csv with "N" columns, another row is with "N+1" columns (cells). We would like to standardize the output, having the same number columns in csv for all the rows. How do we achieve this? We don't have any control over source file data, we would need to do the standardization while processing. Has anyone faced similar challenges and found a solution? Thanks for your help!



Input details :



{"map": {"key1": 100,    "key2": 101,    "key3": 102},  "id": 2,  "time": 1540300241230}
{"map": {"key1": 200, "key2": 201, "key3": 202 "key4": 203}, "id": 2, "time": 1540320246930}
{"map": {"key1": 300, "key3": 301, "key4": 303}, "id": 2, "time": 1540350246930}


Once the above JSON is converted to CSV based on some calculation



Output as is which is not correct



key1, key2, key3, key4



100, 101, 102
200, 201, 202, 203
300, 301, 303


Value "301" is not associated with the key2



Output expected - # is the default for missing column values



key1, key2, key3, key4



100, 101, 102,  #
200, 201, 202, 203
300, #, 301, 303


Later all the headings ( key1, key2..) will be replaced with actual header names ( Pressure, Velocity...etc)










share|improve this question
















How to standardize the output of USQL to have data for all the columns when converted from JSON



We have a requirement to standardize the output of USQL. The USQL reads the JSON (source file) data and convert it to csv format. The problem is that the number of columns we have for each row in the csv is not the same because of missing data in the JSON. Sometimes the result set of USQL have a row in csv with "N" columns, another row is with "N+1" columns (cells). We would like to standardize the output, having the same number columns in csv for all the rows. How do we achieve this? We don't have any control over source file data, we would need to do the standardization while processing. Has anyone faced similar challenges and found a solution? Thanks for your help!



Input details :



{"map": {"key1": 100,    "key2": 101,    "key3": 102},  "id": 2,  "time": 1540300241230}
{"map": {"key1": 200, "key2": 201, "key3": 202 "key4": 203}, "id": 2, "time": 1540320246930}
{"map": {"key1": 300, "key3": 301, "key4": 303}, "id": 2, "time": 1540350246930}


Once the above JSON is converted to CSV based on some calculation



Output as is which is not correct



key1, key2, key3, key4



100, 101, 102
200, 201, 202, 203
300, 301, 303


Value "301" is not associated with the key2



Output expected - # is the default for missing column values



key1, key2, key3, key4



100, 101, 102,  #
200, 201, 202, 203
300, #, 301, 303


Later all the headings ( key1, key2..) will be replaced with actual header names ( Pressure, Velocity...etc)







json azure u-sql azure-data-factory-2






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 8 '18 at 19:20







Akhilesh Balakrishnan

















asked Nov 26 '18 at 9:09









Akhilesh BalakrishnanAkhilesh Balakrishnan

168




168













  • Can you post the u-sql query you have so far so we have a starting point?

    – Peter Bons
    Nov 26 '18 at 9:19











  • github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files

    – Akhilesh Balakrishnan
    Nov 27 '18 at 15:50



















  • Can you post the u-sql query you have so far so we have a starting point?

    – Peter Bons
    Nov 26 '18 at 9:19











  • github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files

    – Akhilesh Balakrishnan
    Nov 27 '18 at 15:50

















Can you post the u-sql query you have so far so we have a starting point?

– Peter Bons
Nov 26 '18 at 9:19





Can you post the u-sql query you have so far so we have a starting point?

– Peter Bons
Nov 26 '18 at 9:19













github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files

– Akhilesh Balakrishnan
Nov 27 '18 at 15:50





github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files

– Akhilesh Balakrishnan
Nov 27 '18 at 15:50












1 Answer
1






active

oldest

votes


















0














USE DATABASE [ADLSDB];



DECLARE EXTERNAL @INPUT_FILE string = "/adlspath/keyValue.txt";
DECLARE EXTERNAL @PIVOT_FILE string = "/adlspath/pivot.txt";



/* The meta data about the request starts - contents of the file request.json */



@requestData = EXTRACT id int, timestamp string, key string, value int FROM @INPUT_FILE USING Extractors.Csv();
@data = SELECT id AS id, timestamp AS timestamp, key AS key, value AS value FROM @requestData;



DECLARE EXTERNAL @ids string = "key1,key2,key3,key4"; //"external declaration"



@result = SELECT * FROM (SELECT id, timestamp, key, value FROM @data )
AS D PIVOT(SUM(value) FOR key IN(@ids AS heading)) AS P;



OUTPUT @result TO @PIVOT_FILE USING Outputters.Csv(quoting:false, outputHeader:false);



I was able to get close to the solution by using the above code, however I am stuck at passing multiple values to the IN clause. The list of @ids, I will get at compile time of the USQL, but passing it as a comma separated scalar variable does not produce the result. If I pass only one value ( assume key1) then the IN condition matches and output the rows for Key1. Anyone knows how to pass multiple values to IN clause in USQL PIVOT function.



------Updated------------



We were able to solve the problem by using dynamic USQL. One USQL will write the USQL statements to the output in required format. Then another data factory activity will read the dynamically generated USQL.






share|improve this answer


























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53477783%2fhow-to-standardize-the-output-of-usql-to-have-data-for-all-the-columns-when-conv%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    USE DATABASE [ADLSDB];



    DECLARE EXTERNAL @INPUT_FILE string = "/adlspath/keyValue.txt";
    DECLARE EXTERNAL @PIVOT_FILE string = "/adlspath/pivot.txt";



    /* The meta data about the request starts - contents of the file request.json */



    @requestData = EXTRACT id int, timestamp string, key string, value int FROM @INPUT_FILE USING Extractors.Csv();
    @data = SELECT id AS id, timestamp AS timestamp, key AS key, value AS value FROM @requestData;



    DECLARE EXTERNAL @ids string = "key1,key2,key3,key4"; //"external declaration"



    @result = SELECT * FROM (SELECT id, timestamp, key, value FROM @data )
    AS D PIVOT(SUM(value) FOR key IN(@ids AS heading)) AS P;



    OUTPUT @result TO @PIVOT_FILE USING Outputters.Csv(quoting:false, outputHeader:false);



    I was able to get close to the solution by using the above code, however I am stuck at passing multiple values to the IN clause. The list of @ids, I will get at compile time of the USQL, but passing it as a comma separated scalar variable does not produce the result. If I pass only one value ( assume key1) then the IN condition matches and output the rows for Key1. Anyone knows how to pass multiple values to IN clause in USQL PIVOT function.



    ------Updated------------



    We were able to solve the problem by using dynamic USQL. One USQL will write the USQL statements to the output in required format. Then another data factory activity will read the dynamically generated USQL.






    share|improve this answer






























      0














      USE DATABASE [ADLSDB];



      DECLARE EXTERNAL @INPUT_FILE string = "/adlspath/keyValue.txt";
      DECLARE EXTERNAL @PIVOT_FILE string = "/adlspath/pivot.txt";



      /* The meta data about the request starts - contents of the file request.json */



      @requestData = EXTRACT id int, timestamp string, key string, value int FROM @INPUT_FILE USING Extractors.Csv();
      @data = SELECT id AS id, timestamp AS timestamp, key AS key, value AS value FROM @requestData;



      DECLARE EXTERNAL @ids string = "key1,key2,key3,key4"; //"external declaration"



      @result = SELECT * FROM (SELECT id, timestamp, key, value FROM @data )
      AS D PIVOT(SUM(value) FOR key IN(@ids AS heading)) AS P;



      OUTPUT @result TO @PIVOT_FILE USING Outputters.Csv(quoting:false, outputHeader:false);



      I was able to get close to the solution by using the above code, however I am stuck at passing multiple values to the IN clause. The list of @ids, I will get at compile time of the USQL, but passing it as a comma separated scalar variable does not produce the result. If I pass only one value ( assume key1) then the IN condition matches and output the rows for Key1. Anyone knows how to pass multiple values to IN clause in USQL PIVOT function.



      ------Updated------------



      We were able to solve the problem by using dynamic USQL. One USQL will write the USQL statements to the output in required format. Then another data factory activity will read the dynamically generated USQL.






      share|improve this answer




























        0












        0








        0







        USE DATABASE [ADLSDB];



        DECLARE EXTERNAL @INPUT_FILE string = "/adlspath/keyValue.txt";
        DECLARE EXTERNAL @PIVOT_FILE string = "/adlspath/pivot.txt";



        /* The meta data about the request starts - contents of the file request.json */



        @requestData = EXTRACT id int, timestamp string, key string, value int FROM @INPUT_FILE USING Extractors.Csv();
        @data = SELECT id AS id, timestamp AS timestamp, key AS key, value AS value FROM @requestData;



        DECLARE EXTERNAL @ids string = "key1,key2,key3,key4"; //"external declaration"



        @result = SELECT * FROM (SELECT id, timestamp, key, value FROM @data )
        AS D PIVOT(SUM(value) FOR key IN(@ids AS heading)) AS P;



        OUTPUT @result TO @PIVOT_FILE USING Outputters.Csv(quoting:false, outputHeader:false);



        I was able to get close to the solution by using the above code, however I am stuck at passing multiple values to the IN clause. The list of @ids, I will get at compile time of the USQL, but passing it as a comma separated scalar variable does not produce the result. If I pass only one value ( assume key1) then the IN condition matches and output the rows for Key1. Anyone knows how to pass multiple values to IN clause in USQL PIVOT function.



        ------Updated------------



        We were able to solve the problem by using dynamic USQL. One USQL will write the USQL statements to the output in required format. Then another data factory activity will read the dynamically generated USQL.






        share|improve this answer















        USE DATABASE [ADLSDB];



        DECLARE EXTERNAL @INPUT_FILE string = "/adlspath/keyValue.txt";
        DECLARE EXTERNAL @PIVOT_FILE string = "/adlspath/pivot.txt";



        /* The meta data about the request starts - contents of the file request.json */



        @requestData = EXTRACT id int, timestamp string, key string, value int FROM @INPUT_FILE USING Extractors.Csv();
        @data = SELECT id AS id, timestamp AS timestamp, key AS key, value AS value FROM @requestData;



        DECLARE EXTERNAL @ids string = "key1,key2,key3,key4"; //"external declaration"



        @result = SELECT * FROM (SELECT id, timestamp, key, value FROM @data )
        AS D PIVOT(SUM(value) FOR key IN(@ids AS heading)) AS P;



        OUTPUT @result TO @PIVOT_FILE USING Outputters.Csv(quoting:false, outputHeader:false);



        I was able to get close to the solution by using the above code, however I am stuck at passing multiple values to the IN clause. The list of @ids, I will get at compile time of the USQL, but passing it as a comma separated scalar variable does not produce the result. If I pass only one value ( assume key1) then the IN condition matches and output the rows for Key1. Anyone knows how to pass multiple values to IN clause in USQL PIVOT function.



        ------Updated------------



        We were able to solve the problem by using dynamic USQL. One USQL will write the USQL statements to the output in required format. Then another data factory activity will read the dynamically generated USQL.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Dec 14 '18 at 7:41

























        answered Dec 8 '18 at 19:18









        Akhilesh BalakrishnanAkhilesh Balakrishnan

        168




        168
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53477783%2fhow-to-standardize-the-output-of-usql-to-have-data-for-all-the-columns-when-conv%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Wiesbaden

            Marschland

            Dieringhausen