How to standardize the output of USQL to have data for all the columns when converted from JSON
How to standardize the output of USQL to have data for all the columns when converted from JSON
We have a requirement to standardize the output of USQL. The USQL reads the JSON (source file) data and convert it to csv format. The problem is that the number of columns we have for each row in the csv is not the same because of missing data in the JSON. Sometimes the result set of USQL have a row in csv with "N" columns, another row is with "N+1" columns (cells). We would like to standardize the output, having the same number columns in csv for all the rows. How do we achieve this? We don't have any control over source file data, we would need to do the standardization while processing. Has anyone faced similar challenges and found a solution? Thanks for your help!
Input details :
{"map": {"key1": 100, "key2": 101, "key3": 102}, "id": 2, "time": 1540300241230}
{"map": {"key1": 200, "key2": 201, "key3": 202 "key4": 203}, "id": 2, "time": 1540320246930}
{"map": {"key1": 300, "key3": 301, "key4": 303}, "id": 2, "time": 1540350246930}
Once the above JSON is converted to CSV based on some calculation
Output as is which is not correct
key1, key2, key3, key4
100, 101, 102
200, 201, 202, 203
300, 301, 303
Value "301" is not associated with the key2
Output expected - # is the default for missing column values
key1, key2, key3, key4
100, 101, 102, #
200, 201, 202, 203
300, #, 301, 303
Later all the headings ( key1, key2..) will be replaced with actual header names ( Pressure, Velocity...etc)
json azure u-sql azure-data-factory-2
add a comment |
How to standardize the output of USQL to have data for all the columns when converted from JSON
We have a requirement to standardize the output of USQL. The USQL reads the JSON (source file) data and convert it to csv format. The problem is that the number of columns we have for each row in the csv is not the same because of missing data in the JSON. Sometimes the result set of USQL have a row in csv with "N" columns, another row is with "N+1" columns (cells). We would like to standardize the output, having the same number columns in csv for all the rows. How do we achieve this? We don't have any control over source file data, we would need to do the standardization while processing. Has anyone faced similar challenges and found a solution? Thanks for your help!
Input details :
{"map": {"key1": 100, "key2": 101, "key3": 102}, "id": 2, "time": 1540300241230}
{"map": {"key1": 200, "key2": 201, "key3": 202 "key4": 203}, "id": 2, "time": 1540320246930}
{"map": {"key1": 300, "key3": 301, "key4": 303}, "id": 2, "time": 1540350246930}
Once the above JSON is converted to CSV based on some calculation
Output as is which is not correct
key1, key2, key3, key4
100, 101, 102
200, 201, 202, 203
300, 301, 303
Value "301" is not associated with the key2
Output expected - # is the default for missing column values
key1, key2, key3, key4
100, 101, 102, #
200, 201, 202, 203
300, #, 301, 303
Later all the headings ( key1, key2..) will be replaced with actual header names ( Pressure, Velocity...etc)
json azure u-sql azure-data-factory-2
Can you post the u-sql query you have so far so we have a starting point?
– Peter Bons
Nov 26 '18 at 9:19
github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files
– Akhilesh Balakrishnan
Nov 27 '18 at 15:50
add a comment |
How to standardize the output of USQL to have data for all the columns when converted from JSON
We have a requirement to standardize the output of USQL. The USQL reads the JSON (source file) data and convert it to csv format. The problem is that the number of columns we have for each row in the csv is not the same because of missing data in the JSON. Sometimes the result set of USQL have a row in csv with "N" columns, another row is with "N+1" columns (cells). We would like to standardize the output, having the same number columns in csv for all the rows. How do we achieve this? We don't have any control over source file data, we would need to do the standardization while processing. Has anyone faced similar challenges and found a solution? Thanks for your help!
Input details :
{"map": {"key1": 100, "key2": 101, "key3": 102}, "id": 2, "time": 1540300241230}
{"map": {"key1": 200, "key2": 201, "key3": 202 "key4": 203}, "id": 2, "time": 1540320246930}
{"map": {"key1": 300, "key3": 301, "key4": 303}, "id": 2, "time": 1540350246930}
Once the above JSON is converted to CSV based on some calculation
Output as is which is not correct
key1, key2, key3, key4
100, 101, 102
200, 201, 202, 203
300, 301, 303
Value "301" is not associated with the key2
Output expected - # is the default for missing column values
key1, key2, key3, key4
100, 101, 102, #
200, 201, 202, 203
300, #, 301, 303
Later all the headings ( key1, key2..) will be replaced with actual header names ( Pressure, Velocity...etc)
json azure u-sql azure-data-factory-2
How to standardize the output of USQL to have data for all the columns when converted from JSON
We have a requirement to standardize the output of USQL. The USQL reads the JSON (source file) data and convert it to csv format. The problem is that the number of columns we have for each row in the csv is not the same because of missing data in the JSON. Sometimes the result set of USQL have a row in csv with "N" columns, another row is with "N+1" columns (cells). We would like to standardize the output, having the same number columns in csv for all the rows. How do we achieve this? We don't have any control over source file data, we would need to do the standardization while processing. Has anyone faced similar challenges and found a solution? Thanks for your help!
Input details :
{"map": {"key1": 100, "key2": 101, "key3": 102}, "id": 2, "time": 1540300241230}
{"map": {"key1": 200, "key2": 201, "key3": 202 "key4": 203}, "id": 2, "time": 1540320246930}
{"map": {"key1": 300, "key3": 301, "key4": 303}, "id": 2, "time": 1540350246930}
Once the above JSON is converted to CSV based on some calculation
Output as is which is not correct
key1, key2, key3, key4
100, 101, 102
200, 201, 202, 203
300, 301, 303
Value "301" is not associated with the key2
Output expected - # is the default for missing column values
key1, key2, key3, key4
100, 101, 102, #
200, 201, 202, 203
300, #, 301, 303
Later all the headings ( key1, key2..) will be replaced with actual header names ( Pressure, Velocity...etc)
json azure u-sql azure-data-factory-2
json azure u-sql azure-data-factory-2
edited Dec 8 '18 at 19:20
Akhilesh Balakrishnan
asked Nov 26 '18 at 9:09
Akhilesh BalakrishnanAkhilesh Balakrishnan
168
168
Can you post the u-sql query you have so far so we have a starting point?
– Peter Bons
Nov 26 '18 at 9:19
github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files
– Akhilesh Balakrishnan
Nov 27 '18 at 15:50
add a comment |
Can you post the u-sql query you have so far so we have a starting point?
– Peter Bons
Nov 26 '18 at 9:19
github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files
– Akhilesh Balakrishnan
Nov 27 '18 at 15:50
Can you post the u-sql query you have so far so we have a starting point?
– Peter Bons
Nov 26 '18 at 9:19
Can you post the u-sql query you have so far so we have a starting point?
– Peter Bons
Nov 26 '18 at 9:19
github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files
– Akhilesh Balakrishnan
Nov 27 '18 at 15:50
github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files
– Akhilesh Balakrishnan
Nov 27 '18 at 15:50
add a comment |
1 Answer
1
active
oldest
votes
USE DATABASE [ADLSDB];
DECLARE EXTERNAL @INPUT_FILE string = "/adlspath/keyValue.txt";
DECLARE EXTERNAL @PIVOT_FILE string = "/adlspath/pivot.txt";
/* The meta data about the request starts - contents of the file request.json */
@requestData = EXTRACT id int, timestamp string, key string, value int FROM @INPUT_FILE USING Extractors.Csv();
@data = SELECT id AS id, timestamp AS timestamp, key AS key, value AS value FROM @requestData;
DECLARE EXTERNAL @ids string = "key1,key2,key3,key4"; //"external declaration"
@result = SELECT * FROM (SELECT id, timestamp, key, value FROM @data )
AS D PIVOT(SUM(value) FOR key IN(@ids AS heading)) AS P;
OUTPUT @result TO @PIVOT_FILE USING Outputters.Csv(quoting:false, outputHeader:false);
I was able to get close to the solution by using the above code, however I am stuck at passing multiple values to the IN clause. The list of @ids, I will get at compile time of the USQL, but passing it as a comma separated scalar variable does not produce the result. If I pass only one value ( assume key1) then the IN condition matches and output the rows for Key1. Anyone knows how to pass multiple values to IN clause in USQL PIVOT function.
------Updated------------
We were able to solve the problem by using dynamic USQL. One USQL will write the USQL statements to the output in required format. Then another data factory activity will read the dynamically generated USQL.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53477783%2fhow-to-standardize-the-output-of-usql-to-have-data-for-all-the-columns-when-conv%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
USE DATABASE [ADLSDB];
DECLARE EXTERNAL @INPUT_FILE string = "/adlspath/keyValue.txt";
DECLARE EXTERNAL @PIVOT_FILE string = "/adlspath/pivot.txt";
/* The meta data about the request starts - contents of the file request.json */
@requestData = EXTRACT id int, timestamp string, key string, value int FROM @INPUT_FILE USING Extractors.Csv();
@data = SELECT id AS id, timestamp AS timestamp, key AS key, value AS value FROM @requestData;
DECLARE EXTERNAL @ids string = "key1,key2,key3,key4"; //"external declaration"
@result = SELECT * FROM (SELECT id, timestamp, key, value FROM @data )
AS D PIVOT(SUM(value) FOR key IN(@ids AS heading)) AS P;
OUTPUT @result TO @PIVOT_FILE USING Outputters.Csv(quoting:false, outputHeader:false);
I was able to get close to the solution by using the above code, however I am stuck at passing multiple values to the IN clause. The list of @ids, I will get at compile time of the USQL, but passing it as a comma separated scalar variable does not produce the result. If I pass only one value ( assume key1) then the IN condition matches and output the rows for Key1. Anyone knows how to pass multiple values to IN clause in USQL PIVOT function.
------Updated------------
We were able to solve the problem by using dynamic USQL. One USQL will write the USQL statements to the output in required format. Then another data factory activity will read the dynamically generated USQL.
add a comment |
USE DATABASE [ADLSDB];
DECLARE EXTERNAL @INPUT_FILE string = "/adlspath/keyValue.txt";
DECLARE EXTERNAL @PIVOT_FILE string = "/adlspath/pivot.txt";
/* The meta data about the request starts - contents of the file request.json */
@requestData = EXTRACT id int, timestamp string, key string, value int FROM @INPUT_FILE USING Extractors.Csv();
@data = SELECT id AS id, timestamp AS timestamp, key AS key, value AS value FROM @requestData;
DECLARE EXTERNAL @ids string = "key1,key2,key3,key4"; //"external declaration"
@result = SELECT * FROM (SELECT id, timestamp, key, value FROM @data )
AS D PIVOT(SUM(value) FOR key IN(@ids AS heading)) AS P;
OUTPUT @result TO @PIVOT_FILE USING Outputters.Csv(quoting:false, outputHeader:false);
I was able to get close to the solution by using the above code, however I am stuck at passing multiple values to the IN clause. The list of @ids, I will get at compile time of the USQL, but passing it as a comma separated scalar variable does not produce the result. If I pass only one value ( assume key1) then the IN condition matches and output the rows for Key1. Anyone knows how to pass multiple values to IN clause in USQL PIVOT function.
------Updated------------
We were able to solve the problem by using dynamic USQL. One USQL will write the USQL statements to the output in required format. Then another data factory activity will read the dynamically generated USQL.
add a comment |
USE DATABASE [ADLSDB];
DECLARE EXTERNAL @INPUT_FILE string = "/adlspath/keyValue.txt";
DECLARE EXTERNAL @PIVOT_FILE string = "/adlspath/pivot.txt";
/* The meta data about the request starts - contents of the file request.json */
@requestData = EXTRACT id int, timestamp string, key string, value int FROM @INPUT_FILE USING Extractors.Csv();
@data = SELECT id AS id, timestamp AS timestamp, key AS key, value AS value FROM @requestData;
DECLARE EXTERNAL @ids string = "key1,key2,key3,key4"; //"external declaration"
@result = SELECT * FROM (SELECT id, timestamp, key, value FROM @data )
AS D PIVOT(SUM(value) FOR key IN(@ids AS heading)) AS P;
OUTPUT @result TO @PIVOT_FILE USING Outputters.Csv(quoting:false, outputHeader:false);
I was able to get close to the solution by using the above code, however I am stuck at passing multiple values to the IN clause. The list of @ids, I will get at compile time of the USQL, but passing it as a comma separated scalar variable does not produce the result. If I pass only one value ( assume key1) then the IN condition matches and output the rows for Key1. Anyone knows how to pass multiple values to IN clause in USQL PIVOT function.
------Updated------------
We were able to solve the problem by using dynamic USQL. One USQL will write the USQL statements to the output in required format. Then another data factory activity will read the dynamically generated USQL.
USE DATABASE [ADLSDB];
DECLARE EXTERNAL @INPUT_FILE string = "/adlspath/keyValue.txt";
DECLARE EXTERNAL @PIVOT_FILE string = "/adlspath/pivot.txt";
/* The meta data about the request starts - contents of the file request.json */
@requestData = EXTRACT id int, timestamp string, key string, value int FROM @INPUT_FILE USING Extractors.Csv();
@data = SELECT id AS id, timestamp AS timestamp, key AS key, value AS value FROM @requestData;
DECLARE EXTERNAL @ids string = "key1,key2,key3,key4"; //"external declaration"
@result = SELECT * FROM (SELECT id, timestamp, key, value FROM @data )
AS D PIVOT(SUM(value) FOR key IN(@ids AS heading)) AS P;
OUTPUT @result TO @PIVOT_FILE USING Outputters.Csv(quoting:false, outputHeader:false);
I was able to get close to the solution by using the above code, however I am stuck at passing multiple values to the IN clause. The list of @ids, I will get at compile time of the USQL, but passing it as a comma separated scalar variable does not produce the result. If I pass only one value ( assume key1) then the IN condition matches and output the rows for Key1. Anyone knows how to pass multiple values to IN clause in USQL PIVOT function.
------Updated------------
We were able to solve the problem by using dynamic USQL. One USQL will write the USQL statements to the output in required format. Then another data factory activity will read the dynamically generated USQL.
edited Dec 14 '18 at 7:41
answered Dec 8 '18 at 19:18
Akhilesh BalakrishnanAkhilesh Balakrishnan
168
168
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53477783%2fhow-to-standardize-the-output-of-usql-to-have-data-for-all-the-columns-when-conv%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you post the u-sql query you have so far so we have a starting point?
– Peter Bons
Nov 26 '18 at 9:19
github.com/pbakhil/usql-sample @Peter Bons - Please find the sample usql and associated files
– Akhilesh Balakrishnan
Nov 27 '18 at 15:50