Creating a google cloud function that edits a jsonl file based on the filename












1















new to stack overflow so hopefully this isn't a dumb question but I couldn't find the answer elsewhere.



I want to create a google cloud function that runs on a jsonl file when it arrives in a bucket and transforms it, saves it in a new bucket. The files will arrive in a gcs bucket everyday at 3 pm on a schedule, so I am fine if it is just a google cloud function that runs on a schedule if that is easier. The files are also .jsonl.gz files, so I am not sure if I need to do anything with decompressing them before trying to transform them.



A jsonl file is basically a json where each line is a valid json file, like this:



{"index": 1, "met": "1043205", "no": "A"}
{"index": 2, "met": "000031043206", "no": "B"}
{"index": 3, "met": "0031043207", "no": "C"}


There are about 500,000 lines in each file. The filename would be something like 20181017_nyc_midtown.jsonl.gz. I have a big query table that has each file name and then a given unique id for that file. So for example, the unique id for 20181017_nyc_midtown.jsonl.gz might be 001924 (random unique id that you would need the table to find). I can make the big query table a json instead to do the lookup for, but then I would have to make sure the json is updated whereas a big query table seems like a better option.
The big query table looks like this:



+-------------------------------+----------+
| Filename | UniqueId |
+-------------------------------+----------+
| 20181017_nyc_midtown.jsonl.gz | 001924 |
+-------------------------------+----------+


If this is the file 20181017_nyc_midtown.jsonl.gz and the unique id Is 001924 I want to add another column to this jsonl with the unique id. The column would be added to every row, so the output would be.



{"index": 1, "met": "1043205", "no": "A", "unique_id":001924}
{"index": 2, "met": "000031043206", "no": "B", "unique_id":001924}
{"index": 3, "met": "0031043207", "no": "C", "unique_id":001924}


I would want to save the new jsonl files with the unique id to a different bucket (within the same project). The filename would be the same as the input filename.



in theory, this seems like a pretty simple concept/thing to do, even if I explained it confusing, I was just trying to be specific and follow the posting guidelines.



[I'm fine with a solution in node or python, I don't know node so python is preferred so I can understand it/edit it need be but either works]










share|improve this question


















  • 2





    Welcome to StackOverflow. We do not write code for you. Start with the Google Cloud Storage and Cloud Functions documentation and write your code. When you have a technical challenge in your code, post it here and someone might be able to help you. Search SO for other similar questions. There are lots of example code that has already been posted. Google has numerous documents with sample code.

    – John Hanley
    Nov 24 '18 at 4:25






  • 1





    In regards to intimidating, I do understand. It is not my goal to offend - we all started with zero knowledge and I do try to help developers that are just getting started. However, you are asking for someone else to do your work - we don't support that.

    – John Hanley
    Nov 24 '18 at 4:39






  • 1





    @JohnHanley I apologize, I am really lost and that wasn’t my intention and i can see how that comes off as if i was looking for someone to do the work for me. I can post the code tomorrow that I tried. I just wasn’t even sure about cloud functions.

    – JustinL
    Nov 24 '18 at 4:41






  • 1





    I would start simple. Write some code that can list buckets, read and write objects. Then create a hello world cloud function. If you start with one simple task and then another, it will soon come together in your head. There are lots of Python examples, start playing with the simple ones. I can think of many projects where I was head down for days trying to understand the technology before I could write a single line of code.

    – John Hanley
    Nov 24 '18 at 4:45






  • 1





    @JohnHanley okay thanks, I’ll try again tomorrow with that in mind. I appreciate it lot.

    – JustinL
    Nov 24 '18 at 4:46
















1















new to stack overflow so hopefully this isn't a dumb question but I couldn't find the answer elsewhere.



I want to create a google cloud function that runs on a jsonl file when it arrives in a bucket and transforms it, saves it in a new bucket. The files will arrive in a gcs bucket everyday at 3 pm on a schedule, so I am fine if it is just a google cloud function that runs on a schedule if that is easier. The files are also .jsonl.gz files, so I am not sure if I need to do anything with decompressing them before trying to transform them.



A jsonl file is basically a json where each line is a valid json file, like this:



{"index": 1, "met": "1043205", "no": "A"}
{"index": 2, "met": "000031043206", "no": "B"}
{"index": 3, "met": "0031043207", "no": "C"}


There are about 500,000 lines in each file. The filename would be something like 20181017_nyc_midtown.jsonl.gz. I have a big query table that has each file name and then a given unique id for that file. So for example, the unique id for 20181017_nyc_midtown.jsonl.gz might be 001924 (random unique id that you would need the table to find). I can make the big query table a json instead to do the lookup for, but then I would have to make sure the json is updated whereas a big query table seems like a better option.
The big query table looks like this:



+-------------------------------+----------+
| Filename | UniqueId |
+-------------------------------+----------+
| 20181017_nyc_midtown.jsonl.gz | 001924 |
+-------------------------------+----------+


If this is the file 20181017_nyc_midtown.jsonl.gz and the unique id Is 001924 I want to add another column to this jsonl with the unique id. The column would be added to every row, so the output would be.



{"index": 1, "met": "1043205", "no": "A", "unique_id":001924}
{"index": 2, "met": "000031043206", "no": "B", "unique_id":001924}
{"index": 3, "met": "0031043207", "no": "C", "unique_id":001924}


I would want to save the new jsonl files with the unique id to a different bucket (within the same project). The filename would be the same as the input filename.



in theory, this seems like a pretty simple concept/thing to do, even if I explained it confusing, I was just trying to be specific and follow the posting guidelines.



[I'm fine with a solution in node or python, I don't know node so python is preferred so I can understand it/edit it need be but either works]










share|improve this question


















  • 2





    Welcome to StackOverflow. We do not write code for you. Start with the Google Cloud Storage and Cloud Functions documentation and write your code. When you have a technical challenge in your code, post it here and someone might be able to help you. Search SO for other similar questions. There are lots of example code that has already been posted. Google has numerous documents with sample code.

    – John Hanley
    Nov 24 '18 at 4:25






  • 1





    In regards to intimidating, I do understand. It is not my goal to offend - we all started with zero knowledge and I do try to help developers that are just getting started. However, you are asking for someone else to do your work - we don't support that.

    – John Hanley
    Nov 24 '18 at 4:39






  • 1





    @JohnHanley I apologize, I am really lost and that wasn’t my intention and i can see how that comes off as if i was looking for someone to do the work for me. I can post the code tomorrow that I tried. I just wasn’t even sure about cloud functions.

    – JustinL
    Nov 24 '18 at 4:41






  • 1





    I would start simple. Write some code that can list buckets, read and write objects. Then create a hello world cloud function. If you start with one simple task and then another, it will soon come together in your head. There are lots of Python examples, start playing with the simple ones. I can think of many projects where I was head down for days trying to understand the technology before I could write a single line of code.

    – John Hanley
    Nov 24 '18 at 4:45






  • 1





    @JohnHanley okay thanks, I’ll try again tomorrow with that in mind. I appreciate it lot.

    – JustinL
    Nov 24 '18 at 4:46














1












1








1








new to stack overflow so hopefully this isn't a dumb question but I couldn't find the answer elsewhere.



I want to create a google cloud function that runs on a jsonl file when it arrives in a bucket and transforms it, saves it in a new bucket. The files will arrive in a gcs bucket everyday at 3 pm on a schedule, so I am fine if it is just a google cloud function that runs on a schedule if that is easier. The files are also .jsonl.gz files, so I am not sure if I need to do anything with decompressing them before trying to transform them.



A jsonl file is basically a json where each line is a valid json file, like this:



{"index": 1, "met": "1043205", "no": "A"}
{"index": 2, "met": "000031043206", "no": "B"}
{"index": 3, "met": "0031043207", "no": "C"}


There are about 500,000 lines in each file. The filename would be something like 20181017_nyc_midtown.jsonl.gz. I have a big query table that has each file name and then a given unique id for that file. So for example, the unique id for 20181017_nyc_midtown.jsonl.gz might be 001924 (random unique id that you would need the table to find). I can make the big query table a json instead to do the lookup for, but then I would have to make sure the json is updated whereas a big query table seems like a better option.
The big query table looks like this:



+-------------------------------+----------+
| Filename | UniqueId |
+-------------------------------+----------+
| 20181017_nyc_midtown.jsonl.gz | 001924 |
+-------------------------------+----------+


If this is the file 20181017_nyc_midtown.jsonl.gz and the unique id Is 001924 I want to add another column to this jsonl with the unique id. The column would be added to every row, so the output would be.



{"index": 1, "met": "1043205", "no": "A", "unique_id":001924}
{"index": 2, "met": "000031043206", "no": "B", "unique_id":001924}
{"index": 3, "met": "0031043207", "no": "C", "unique_id":001924}


I would want to save the new jsonl files with the unique id to a different bucket (within the same project). The filename would be the same as the input filename.



in theory, this seems like a pretty simple concept/thing to do, even if I explained it confusing, I was just trying to be specific and follow the posting guidelines.



[I'm fine with a solution in node or python, I don't know node so python is preferred so I can understand it/edit it need be but either works]










share|improve this question














new to stack overflow so hopefully this isn't a dumb question but I couldn't find the answer elsewhere.



I want to create a google cloud function that runs on a jsonl file when it arrives in a bucket and transforms it, saves it in a new bucket. The files will arrive in a gcs bucket everyday at 3 pm on a schedule, so I am fine if it is just a google cloud function that runs on a schedule if that is easier. The files are also .jsonl.gz files, so I am not sure if I need to do anything with decompressing them before trying to transform them.



A jsonl file is basically a json where each line is a valid json file, like this:



{"index": 1, "met": "1043205", "no": "A"}
{"index": 2, "met": "000031043206", "no": "B"}
{"index": 3, "met": "0031043207", "no": "C"}


There are about 500,000 lines in each file. The filename would be something like 20181017_nyc_midtown.jsonl.gz. I have a big query table that has each file name and then a given unique id for that file. So for example, the unique id for 20181017_nyc_midtown.jsonl.gz might be 001924 (random unique id that you would need the table to find). I can make the big query table a json instead to do the lookup for, but then I would have to make sure the json is updated whereas a big query table seems like a better option.
The big query table looks like this:



+-------------------------------+----------+
| Filename | UniqueId |
+-------------------------------+----------+
| 20181017_nyc_midtown.jsonl.gz | 001924 |
+-------------------------------+----------+


If this is the file 20181017_nyc_midtown.jsonl.gz and the unique id Is 001924 I want to add another column to this jsonl with the unique id. The column would be added to every row, so the output would be.



{"index": 1, "met": "1043205", "no": "A", "unique_id":001924}
{"index": 2, "met": "000031043206", "no": "B", "unique_id":001924}
{"index": 3, "met": "0031043207", "no": "C", "unique_id":001924}


I would want to save the new jsonl files with the unique id to a different bucket (within the same project). The filename would be the same as the input filename.



in theory, this seems like a pretty simple concept/thing to do, even if I explained it confusing, I was just trying to be specific and follow the posting guidelines.



[I'm fine with a solution in node or python, I don't know node so python is preferred so I can understand it/edit it need be but either works]







python node.js google-cloud-platform google-cloud-storage google-cloud-functions






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 24 '18 at 1:48









JustinLJustinL

91




91








  • 2





    Welcome to StackOverflow. We do not write code for you. Start with the Google Cloud Storage and Cloud Functions documentation and write your code. When you have a technical challenge in your code, post it here and someone might be able to help you. Search SO for other similar questions. There are lots of example code that has already been posted. Google has numerous documents with sample code.

    – John Hanley
    Nov 24 '18 at 4:25






  • 1





    In regards to intimidating, I do understand. It is not my goal to offend - we all started with zero knowledge and I do try to help developers that are just getting started. However, you are asking for someone else to do your work - we don't support that.

    – John Hanley
    Nov 24 '18 at 4:39






  • 1





    @JohnHanley I apologize, I am really lost and that wasn’t my intention and i can see how that comes off as if i was looking for someone to do the work for me. I can post the code tomorrow that I tried. I just wasn’t even sure about cloud functions.

    – JustinL
    Nov 24 '18 at 4:41






  • 1





    I would start simple. Write some code that can list buckets, read and write objects. Then create a hello world cloud function. If you start with one simple task and then another, it will soon come together in your head. There are lots of Python examples, start playing with the simple ones. I can think of many projects where I was head down for days trying to understand the technology before I could write a single line of code.

    – John Hanley
    Nov 24 '18 at 4:45






  • 1





    @JohnHanley okay thanks, I’ll try again tomorrow with that in mind. I appreciate it lot.

    – JustinL
    Nov 24 '18 at 4:46














  • 2





    Welcome to StackOverflow. We do not write code for you. Start with the Google Cloud Storage and Cloud Functions documentation and write your code. When you have a technical challenge in your code, post it here and someone might be able to help you. Search SO for other similar questions. There are lots of example code that has already been posted. Google has numerous documents with sample code.

    – John Hanley
    Nov 24 '18 at 4:25






  • 1





    In regards to intimidating, I do understand. It is not my goal to offend - we all started with zero knowledge and I do try to help developers that are just getting started. However, you are asking for someone else to do your work - we don't support that.

    – John Hanley
    Nov 24 '18 at 4:39






  • 1





    @JohnHanley I apologize, I am really lost and that wasn’t my intention and i can see how that comes off as if i was looking for someone to do the work for me. I can post the code tomorrow that I tried. I just wasn’t even sure about cloud functions.

    – JustinL
    Nov 24 '18 at 4:41






  • 1





    I would start simple. Write some code that can list buckets, read and write objects. Then create a hello world cloud function. If you start with one simple task and then another, it will soon come together in your head. There are lots of Python examples, start playing with the simple ones. I can think of many projects where I was head down for days trying to understand the technology before I could write a single line of code.

    – John Hanley
    Nov 24 '18 at 4:45






  • 1





    @JohnHanley okay thanks, I’ll try again tomorrow with that in mind. I appreciate it lot.

    – JustinL
    Nov 24 '18 at 4:46








2




2





Welcome to StackOverflow. We do not write code for you. Start with the Google Cloud Storage and Cloud Functions documentation and write your code. When you have a technical challenge in your code, post it here and someone might be able to help you. Search SO for other similar questions. There are lots of example code that has already been posted. Google has numerous documents with sample code.

– John Hanley
Nov 24 '18 at 4:25





Welcome to StackOverflow. We do not write code for you. Start with the Google Cloud Storage and Cloud Functions documentation and write your code. When you have a technical challenge in your code, post it here and someone might be able to help you. Search SO for other similar questions. There are lots of example code that has already been posted. Google has numerous documents with sample code.

– John Hanley
Nov 24 '18 at 4:25




1




1





In regards to intimidating, I do understand. It is not my goal to offend - we all started with zero knowledge and I do try to help developers that are just getting started. However, you are asking for someone else to do your work - we don't support that.

– John Hanley
Nov 24 '18 at 4:39





In regards to intimidating, I do understand. It is not my goal to offend - we all started with zero knowledge and I do try to help developers that are just getting started. However, you are asking for someone else to do your work - we don't support that.

– John Hanley
Nov 24 '18 at 4:39




1




1





@JohnHanley I apologize, I am really lost and that wasn’t my intention and i can see how that comes off as if i was looking for someone to do the work for me. I can post the code tomorrow that I tried. I just wasn’t even sure about cloud functions.

– JustinL
Nov 24 '18 at 4:41





@JohnHanley I apologize, I am really lost and that wasn’t my intention and i can see how that comes off as if i was looking for someone to do the work for me. I can post the code tomorrow that I tried. I just wasn’t even sure about cloud functions.

– JustinL
Nov 24 '18 at 4:41




1




1





I would start simple. Write some code that can list buckets, read and write objects. Then create a hello world cloud function. If you start with one simple task and then another, it will soon come together in your head. There are lots of Python examples, start playing with the simple ones. I can think of many projects where I was head down for days trying to understand the technology before I could write a single line of code.

– John Hanley
Nov 24 '18 at 4:45





I would start simple. Write some code that can list buckets, read and write objects. Then create a hello world cloud function. If you start with one simple task and then another, it will soon come together in your head. There are lots of Python examples, start playing with the simple ones. I can think of many projects where I was head down for days trying to understand the technology before I could write a single line of code.

– John Hanley
Nov 24 '18 at 4:45




1




1





@JohnHanley okay thanks, I’ll try again tomorrow with that in mind. I appreciate it lot.

– JustinL
Nov 24 '18 at 4:46





@JohnHanley okay thanks, I’ll try again tomorrow with that in mind. I appreciate it lot.

– JustinL
Nov 24 '18 at 4:46












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53454508%2fcreating-a-google-cloud-function-that-edits-a-jsonl-file-based-on-the-filename%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53454508%2fcreating-a-google-cloud-function-that-edits-a-jsonl-file-based-on-the-filename%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Tonle Sap (See)

I get strange results when I access the Sqlitedatabase with Unity C# via XAMPP

Guatemaltekische Davis-Cup-Mannschaft