How to standardize the output of USQL to have data for all the columns when converted from JSON

How to standardize the output of USQL to have data for all the columns when converted from JSON

We have a requirement to standardize the output of USQL. The USQL reads the JSON (source file) data and convert it to csv format. The problem is that the number of columns we have for each row in the csv is not the same because of missing data in the JSON. Sometimes the result set of USQL have a row in csv with "N" columns, another row is with "N+1" columns (cells). We would like to standardize the output, having the same number columns in csv for all the rows. How do we achieve this? We don't have any control over source file data, we would need to do the standardization while processing. Has anyone faced similar challenges and found a solution? Thanks for your help!

Input details :

{"map": {"key1": 100,    "key2": 101,    "key3": 102},  "id": 2,  "time": 1540300241230}

{"map": {"key1": 200,    "key2": 201,    "key3": 202    "key4": 203},  "id": 2,  "time": 1540320246930}

{"map": {"key1": 300,    "key3": 301,    "key4": 303},  "id": 2,  "time": 1540350246930}

Once the above JSON is converted to CSV based on some calculation

Output as is which is not correct

key1, key2, key3, key4

100, 101, 102

200, 201, 202, 203

300, 301, 303

Value "301" is not associated with the key2

Output expected - # is the default for missing column values

key1, key2, key3, key4

100, 101, 102,  #

200, 201, 202, 203

300, #, 301, 303

Later all the headings ( key1, key2..) will be replaced with actual header names ( Pressure, Velocity...etc)

edited Dec 8 '18 at 19:20

asked Nov 26 '18 at 9:09

Akhilesh Balakrishnan

168

Can you post the u-sql query you have so far so we have a starting point?

– Peter Bons
Nov 26 '18 at 9:19

github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files

– Akhilesh Balakrishnan
Nov 27 '18 at 15:50

add a comment |

How to standardize the output of USQL to have data for all the columns when converted from JSON

Input details :

{"map": {"key1": 100,    "key2": 101,    "key3": 102},  "id": 2,  "time": 1540300241230}

{"map": {"key1": 200,    "key2": 201,    "key3": 202    "key4": 203},  "id": 2,  "time": 1540320246930}

{"map": {"key1": 300,    "key3": 301,    "key4": 303},  "id": 2,  "time": 1540350246930}

Once the above JSON is converted to CSV based on some calculation

Output as is which is not correct

key1, key2, key3, key4

100, 101, 102

200, 201, 202, 203

300, 301, 303

Value "301" is not associated with the key2

Output expected - # is the default for missing column values

key1, key2, key3, key4

100, 101, 102,  #

200, 201, 202, 203

300, #, 301, 303

Later all the headings ( key1, key2..) will be replaced with actual header names ( Pressure, Velocity...etc)

edited Dec 8 '18 at 19:20

asked Nov 26 '18 at 9:09

Akhilesh Balakrishnan

168

Can you post the u-sql query you have so far so we have a starting point?

– Peter Bons
Nov 26 '18 at 9:19

github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files

– Akhilesh Balakrishnan
Nov 27 '18 at 15:50

add a comment |

How to standardize the output of USQL to have data for all the columns when converted from JSON

Input details :

{"map": {"key1": 100,    "key2": 101,    "key3": 102},  "id": 2,  "time": 1540300241230}

{"map": {"key1": 200,    "key2": 201,    "key3": 202    "key4": 203},  "id": 2,  "time": 1540320246930}

{"map": {"key1": 300,    "key3": 301,    "key4": 303},  "id": 2,  "time": 1540350246930}

Once the above JSON is converted to CSV based on some calculation

Output as is which is not correct

key1, key2, key3, key4

100, 101, 102

200, 201, 202, 203

300, 301, 303

Value "301" is not associated with the key2

Output expected - # is the default for missing column values

key1, key2, key3, key4

100, 101, 102,  #

200, 201, 202, 203

300, #, 301, 303

Later all the headings ( key1, key2..) will be replaced with actual header names ( Pressure, Velocity...etc)

edited Dec 8 '18 at 19:20

asked Nov 26 '18 at 9:09

Akhilesh Balakrishnan

168

How to standardize the output of USQL to have data for all the columns when converted from JSON

Input details :

{"map": {"key1": 100,    "key2": 101,    "key3": 102},  "id": 2,  "time": 1540300241230}

{"map": {"key1": 200,    "key2": 201,    "key3": 202    "key4": 203},  "id": 2,  "time": 1540320246930}

{"map": {"key1": 300,    "key3": 301,    "key4": 303},  "id": 2,  "time": 1540350246930}

Once the above JSON is converted to CSV based on some calculation

Output as is which is not correct

key1, key2, key3, key4

100, 101, 102

200, 201, 202, 203

300, 301, 303

Value "301" is not associated with the key2

Output expected - # is the default for missing column values

key1, key2, key3, key4

100, 101, 102,  #

200, 201, 202, 203

300, #, 301, 303

Later all the headings ( key1, key2..) will be replaced with actual header names ( Pressure, Velocity...etc)

json azure u-sql azure-data-factory-2

edited Dec 8 '18 at 19:20

asked Nov 26 '18 at 9:09

Akhilesh Balakrishnan

168

edited Dec 8 '18 at 19:20

asked Nov 26 '18 at 9:09

Akhilesh Balakrishnan

168

edited Dec 8 '18 at 19:20

asked Nov 26 '18 at 9:09

Akhilesh Balakrishnan

168

asked Nov 26 '18 at 9:09

Akhilesh Balakrishnan

168

asked Nov 26 '18 at 9:09

Akhilesh Balakrishnan

168

Can you post the u-sql query you have so far so we have a starting point?

– Peter Bons
Nov 26 '18 at 9:19

github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files

– Akhilesh Balakrishnan
Nov 27 '18 at 15:50

add a comment |

Can you post the u-sql query you have so far so we have a starting point?

– Peter Bons
Nov 26 '18 at 9:19

github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files

– Akhilesh Balakrishnan
Nov 27 '18 at 15:50

Can you post the u-sql query you have so far so we have a starting point?

– Peter Bons
Nov 26 '18 at 9:19

github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files

– Akhilesh Balakrishnan
Nov 27 '18 at 15:50

add a comment |

1 Answer
1

active

oldest

votes

USE DATABASE [ADLSDB];

DECLARE EXTERNAL @INPUT_FILE string = "/adlspath/keyValue.txt";
DECLARE EXTERNAL @PIVOT_FILE string = "/adlspath/pivot.txt";

/* The meta data about the request starts - contents of the file request.json */

@requestData = EXTRACT id int, timestamp string, key string, value int FROM @INPUT_FILE USING Extractors.Csv();
@data = SELECT id AS id, timestamp AS timestamp, key AS key, value AS value FROM @requestData;

DECLARE EXTERNAL @ids string = "key1,key2,key3,key4"; //"external declaration"

@result = SELECT * FROM (SELECT id, timestamp, key, value FROM @data )
AS D PIVOT(SUM(value) FOR key IN(@ids AS heading)) AS P;

OUTPUT @result TO @PIVOT_FILE USING Outputters.Csv(quoting:false, outputHeader:false);

I was able to get close to the solution by using the above code, however I am stuck at passing multiple values to the IN clause. The list of @ids, I will get at compile time of the USQL, but passing it as a comma separated scalar variable does not produce the result. If I pass only one value ( assume key1) then the IN condition matches and output the rows for Key1. Anyone knows how to pass multiple values to IN clause in USQL PIVOT function.

------Updated------------

We were able to solve the problem by using dynamic USQL. One USQL will write the USQL statements to the output in required format. Then another data factory activity will read the dynamically generated USQL.

edited Dec 14 '18 at 7:41

answered Dec 8 '18 at 19:18

Akhilesh Balakrishnan

168

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53477783%2fhow-to-standardize-the-output-of-usql-to-have-data-for-all-the-columns-when-conv%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

USE DATABASE [ADLSDB];

DECLARE EXTERNAL @INPUT_FILE string = "/adlspath/keyValue.txt";
DECLARE EXTERNAL @PIVOT_FILE string = "/adlspath/pivot.txt";

/* The meta data about the request starts - contents of the file request.json */

DECLARE EXTERNAL @ids string = "key1,key2,key3,key4"; //"external declaration"

@result = SELECT * FROM (SELECT id, timestamp, key, value FROM @data )
AS D PIVOT(SUM(value) FOR key IN(@ids AS heading)) AS P;

OUTPUT @result TO @PIVOT_FILE USING Outputters.Csv(quoting:false, outputHeader:false);

------Updated------------

edited Dec 14 '18 at 7:41

answered Dec 8 '18 at 19:18

Akhilesh Balakrishnan

168

add a comment |

USE DATABASE [ADLSDB];

DECLARE EXTERNAL @INPUT_FILE string = "/adlspath/keyValue.txt";
DECLARE EXTERNAL @PIVOT_FILE string = "/adlspath/pivot.txt";

/* The meta data about the request starts - contents of the file request.json */

DECLARE EXTERNAL @ids string = "key1,key2,key3,key4"; //"external declaration"

@result = SELECT * FROM (SELECT id, timestamp, key, value FROM @data )
AS D PIVOT(SUM(value) FOR key IN(@ids AS heading)) AS P;

OUTPUT @result TO @PIVOT_FILE USING Outputters.Csv(quoting:false, outputHeader:false);

------Updated------------

edited Dec 14 '18 at 7:41

answered Dec 8 '18 at 19:18

Akhilesh Balakrishnan

168

add a comment |

USE DATABASE [ADLSDB];

DECLARE EXTERNAL @INPUT_FILE string = "/adlspath/keyValue.txt";
DECLARE EXTERNAL @PIVOT_FILE string = "/adlspath/pivot.txt";

/* The meta data about the request starts - contents of the file request.json */

DECLARE EXTERNAL @ids string = "key1,key2,key3,key4"; //"external declaration"

@result = SELECT * FROM (SELECT id, timestamp, key, value FROM @data )
AS D PIVOT(SUM(value) FOR key IN(@ids AS heading)) AS P;

OUTPUT @result TO @PIVOT_FILE USING Outputters.Csv(quoting:false, outputHeader:false);

------Updated------------

edited Dec 14 '18 at 7:41

answered Dec 8 '18 at 19:18

Akhilesh Balakrishnan

168

USE DATABASE [ADLSDB];

DECLARE EXTERNAL @INPUT_FILE string = "/adlspath/keyValue.txt";
DECLARE EXTERNAL @PIVOT_FILE string = "/adlspath/pivot.txt";

/* The meta data about the request starts - contents of the file request.json */

DECLARE EXTERNAL @ids string = "key1,key2,key3,key4"; //"external declaration"

@result = SELECT * FROM (SELECT id, timestamp, key, value FROM @data )
AS D PIVOT(SUM(value) FOR key IN(@ids AS heading)) AS P;

OUTPUT @result TO @PIVOT_FILE USING Outputters.Csv(quoting:false, outputHeader:false);

------Updated------------

edited Dec 14 '18 at 7:41

answered Dec 8 '18 at 19:18

Akhilesh Balakrishnan

168

edited Dec 14 '18 at 7:41

answered Dec 8 '18 at 19:18

Akhilesh Balakrishnan

168

answered Dec 8 '18 at 19:18

Akhilesh Balakrishnan

168

answered Dec 8 '18 at 19:18

Akhilesh Balakrishnan

168

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

ssoo,eJ0l4J3QN8bSl

搜尋此網誌

Ytukyg