Snakemake Combine analysis of different input types in one workflow












0















I am sorry for the newbie question regarding snakemake:



Genrally put:
What is the most elegant way to generate a workflow with two different input types in a combined way.



Let's say I have a number of samples wit different input types. Type a) is raw data in fastq format. Type b) is already assembled.



Now I want a pipeline which does assembly for all samples of type a. Next, it should do annotation with all samples (a and b).



More exemplaric:
Currently, I have a config file with the entries "samples" (type a) and "genomes" (type b).



I can generate a rule spades for samples and follow-up rule prokka for samples.
I could of course add a second rule prokka2 for genomes but how can I have a combined rule prokka for both types?










share|improve this question



























    0















    I am sorry for the newbie question regarding snakemake:



    Genrally put:
    What is the most elegant way to generate a workflow with two different input types in a combined way.



    Let's say I have a number of samples wit different input types. Type a) is raw data in fastq format. Type b) is already assembled.



    Now I want a pipeline which does assembly for all samples of type a. Next, it should do annotation with all samples (a and b).



    More exemplaric:
    Currently, I have a config file with the entries "samples" (type a) and "genomes" (type b).



    I can generate a rule spades for samples and follow-up rule prokka for samples.
    I could of course add a second rule prokka2 for genomes but how can I have a combined rule prokka for both types?










    share|improve this question

























      0












      0








      0








      I am sorry for the newbie question regarding snakemake:



      Genrally put:
      What is the most elegant way to generate a workflow with two different input types in a combined way.



      Let's say I have a number of samples wit different input types. Type a) is raw data in fastq format. Type b) is already assembled.



      Now I want a pipeline which does assembly for all samples of type a. Next, it should do annotation with all samples (a and b).



      More exemplaric:
      Currently, I have a config file with the entries "samples" (type a) and "genomes" (type b).



      I can generate a rule spades for samples and follow-up rule prokka for samples.
      I could of course add a second rule prokka2 for genomes but how can I have a combined rule prokka for both types?










      share|improve this question














      I am sorry for the newbie question regarding snakemake:



      Genrally put:
      What is the most elegant way to generate a workflow with two different input types in a combined way.



      Let's say I have a number of samples wit different input types. Type a) is raw data in fastq format. Type b) is already assembled.



      Now I want a pipeline which does assembly for all samples of type a. Next, it should do annotation with all samples (a and b).



      More exemplaric:
      Currently, I have a config file with the entries "samples" (type a) and "genomes" (type b).



      I can generate a rule spades for samples and follow-up rule prokka for samples.
      I could of course add a second rule prokka2 for genomes but how can I have a combined rule prokka for both types?







      snakemake






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 23 '18 at 15:18









      JoergLJoergL

      1




      1
























          2 Answers
          2






          active

          oldest

          votes


















          0














          Snakemake will figure out by itself that some samples are already partially processed and it will take them forward as required. For example, given these input files:



          touch s1.fq.gz s2.fq.gz s3.bam s4.bam 


          This workflow will apply rule "assemble" to s1.fq.gz and s2.fq.gz only and rule annotate to all the four:



          samples= ['s1', 's2', 's3', 's4']

          rule all:
          input:
          expand('{sample}.annotated.bam', sample= samples)

          rule assemble:
          input:
          fq= '{sample}.fq.gz'
          output:
          bam= '{sample}.bam'
          shell:
          r"""
          my_assembler {input.fq} > {output.bam}
          """

          rule annotate:
          input:
          bam= '{sample}.bam'
          output:
          bam= '{sample}.annotated.bam'
          shell:
          r"""
          my_annotator {input.bam} > {output.bam}
          """


          You can test the execution with Snakemake -p -n






          share|improve this answer
























          • Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.

            – JoergL
            Nov 27 '18 at 10:53













          • Sorry for the inconvenience. config.yaml appeared just below my initial question.

            – JoergL
            Nov 27 '18 at 11:37











          • Snakemake reads the config file using the --configfile options and the content of the file is automatically stored into the variable config accessible within the Snakefile. You can use print(config) towards the top of the Snakefile to see the structure of config (which is basically a dictionary)

            – dariober
            Nov 27 '18 at 14:59











          • or course. My question is just how I solve the problem when using a config.yaml file? I have an example above

            – JoergL
            Nov 28 '18 at 15:03



















          0














          this is how my config.yaml looks like. I just do not know how I can handle files from samples and genomes in one rule (or finde a better config solution)



          samples:
          SRR653893:
          fw: SRR653893_1.fastq.gz
          rv: SRR653893_2.fastq.gz
          genomes:
          GCF:
          fasta: GCF_000008985.1_ASM898v1_genomic.fna





          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53449173%2fsnakemake-combine-analysis-of-different-input-types-in-one-workflow%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            Snakemake will figure out by itself that some samples are already partially processed and it will take them forward as required. For example, given these input files:



            touch s1.fq.gz s2.fq.gz s3.bam s4.bam 


            This workflow will apply rule "assemble" to s1.fq.gz and s2.fq.gz only and rule annotate to all the four:



            samples= ['s1', 's2', 's3', 's4']

            rule all:
            input:
            expand('{sample}.annotated.bam', sample= samples)

            rule assemble:
            input:
            fq= '{sample}.fq.gz'
            output:
            bam= '{sample}.bam'
            shell:
            r"""
            my_assembler {input.fq} > {output.bam}
            """

            rule annotate:
            input:
            bam= '{sample}.bam'
            output:
            bam= '{sample}.annotated.bam'
            shell:
            r"""
            my_annotator {input.bam} > {output.bam}
            """


            You can test the execution with Snakemake -p -n






            share|improve this answer
























            • Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.

              – JoergL
              Nov 27 '18 at 10:53













            • Sorry for the inconvenience. config.yaml appeared just below my initial question.

              – JoergL
              Nov 27 '18 at 11:37











            • Snakemake reads the config file using the --configfile options and the content of the file is automatically stored into the variable config accessible within the Snakefile. You can use print(config) towards the top of the Snakefile to see the structure of config (which is basically a dictionary)

              – dariober
              Nov 27 '18 at 14:59











            • or course. My question is just how I solve the problem when using a config.yaml file? I have an example above

              – JoergL
              Nov 28 '18 at 15:03
















            0














            Snakemake will figure out by itself that some samples are already partially processed and it will take them forward as required. For example, given these input files:



            touch s1.fq.gz s2.fq.gz s3.bam s4.bam 


            This workflow will apply rule "assemble" to s1.fq.gz and s2.fq.gz only and rule annotate to all the four:



            samples= ['s1', 's2', 's3', 's4']

            rule all:
            input:
            expand('{sample}.annotated.bam', sample= samples)

            rule assemble:
            input:
            fq= '{sample}.fq.gz'
            output:
            bam= '{sample}.bam'
            shell:
            r"""
            my_assembler {input.fq} > {output.bam}
            """

            rule annotate:
            input:
            bam= '{sample}.bam'
            output:
            bam= '{sample}.annotated.bam'
            shell:
            r"""
            my_annotator {input.bam} > {output.bam}
            """


            You can test the execution with Snakemake -p -n






            share|improve this answer
























            • Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.

              – JoergL
              Nov 27 '18 at 10:53













            • Sorry for the inconvenience. config.yaml appeared just below my initial question.

              – JoergL
              Nov 27 '18 at 11:37











            • Snakemake reads the config file using the --configfile options and the content of the file is automatically stored into the variable config accessible within the Snakefile. You can use print(config) towards the top of the Snakefile to see the structure of config (which is basically a dictionary)

              – dariober
              Nov 27 '18 at 14:59











            • or course. My question is just how I solve the problem when using a config.yaml file? I have an example above

              – JoergL
              Nov 28 '18 at 15:03














            0












            0








            0







            Snakemake will figure out by itself that some samples are already partially processed and it will take them forward as required. For example, given these input files:



            touch s1.fq.gz s2.fq.gz s3.bam s4.bam 


            This workflow will apply rule "assemble" to s1.fq.gz and s2.fq.gz only and rule annotate to all the four:



            samples= ['s1', 's2', 's3', 's4']

            rule all:
            input:
            expand('{sample}.annotated.bam', sample= samples)

            rule assemble:
            input:
            fq= '{sample}.fq.gz'
            output:
            bam= '{sample}.bam'
            shell:
            r"""
            my_assembler {input.fq} > {output.bam}
            """

            rule annotate:
            input:
            bam= '{sample}.bam'
            output:
            bam= '{sample}.annotated.bam'
            shell:
            r"""
            my_annotator {input.bam} > {output.bam}
            """


            You can test the execution with Snakemake -p -n






            share|improve this answer













            Snakemake will figure out by itself that some samples are already partially processed and it will take them forward as required. For example, given these input files:



            touch s1.fq.gz s2.fq.gz s3.bam s4.bam 


            This workflow will apply rule "assemble" to s1.fq.gz and s2.fq.gz only and rule annotate to all the four:



            samples= ['s1', 's2', 's3', 's4']

            rule all:
            input:
            expand('{sample}.annotated.bam', sample= samples)

            rule assemble:
            input:
            fq= '{sample}.fq.gz'
            output:
            bam= '{sample}.bam'
            shell:
            r"""
            my_assembler {input.fq} > {output.bam}
            """

            rule annotate:
            input:
            bam= '{sample}.bam'
            output:
            bam= '{sample}.annotated.bam'
            shell:
            r"""
            my_annotator {input.bam} > {output.bam}
            """


            You can test the execution with Snakemake -p -n







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 26 '18 at 8:59









            darioberdariober

            1,0361221




            1,0361221













            • Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.

              – JoergL
              Nov 27 '18 at 10:53













            • Sorry for the inconvenience. config.yaml appeared just below my initial question.

              – JoergL
              Nov 27 '18 at 11:37











            • Snakemake reads the config file using the --configfile options and the content of the file is automatically stored into the variable config accessible within the Snakefile. You can use print(config) towards the top of the Snakefile to see the structure of config (which is basically a dictionary)

              – dariober
              Nov 27 '18 at 14:59











            • or course. My question is just how I solve the problem when using a config.yaml file? I have an example above

              – JoergL
              Nov 28 '18 at 15:03



















            • Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.

              – JoergL
              Nov 27 '18 at 10:53













            • Sorry for the inconvenience. config.yaml appeared just below my initial question.

              – JoergL
              Nov 27 '18 at 11:37











            • Snakemake reads the config file using the --configfile options and the content of the file is automatically stored into the variable config accessible within the Snakefile. You can use print(config) towards the top of the Snakefile to see the structure of config (which is basically a dictionary)

              – dariober
              Nov 27 '18 at 14:59











            • or course. My question is just how I solve the problem when using a config.yaml file? I have an example above

              – JoergL
              Nov 28 '18 at 15:03

















            Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.

            – JoergL
            Nov 27 '18 at 10:53







            Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.

            – JoergL
            Nov 27 '18 at 10:53















            Sorry for the inconvenience. config.yaml appeared just below my initial question.

            – JoergL
            Nov 27 '18 at 11:37





            Sorry for the inconvenience. config.yaml appeared just below my initial question.

            – JoergL
            Nov 27 '18 at 11:37













            Snakemake reads the config file using the --configfile options and the content of the file is automatically stored into the variable config accessible within the Snakefile. You can use print(config) towards the top of the Snakefile to see the structure of config (which is basically a dictionary)

            – dariober
            Nov 27 '18 at 14:59





            Snakemake reads the config file using the --configfile options and the content of the file is automatically stored into the variable config accessible within the Snakefile. You can use print(config) towards the top of the Snakefile to see the structure of config (which is basically a dictionary)

            – dariober
            Nov 27 '18 at 14:59













            or course. My question is just how I solve the problem when using a config.yaml file? I have an example above

            – JoergL
            Nov 28 '18 at 15:03





            or course. My question is just how I solve the problem when using a config.yaml file? I have an example above

            – JoergL
            Nov 28 '18 at 15:03













            0














            this is how my config.yaml looks like. I just do not know how I can handle files from samples and genomes in one rule (or finde a better config solution)



            samples:
            SRR653893:
            fw: SRR653893_1.fastq.gz
            rv: SRR653893_2.fastq.gz
            genomes:
            GCF:
            fasta: GCF_000008985.1_ASM898v1_genomic.fna





            share|improve this answer




























              0














              this is how my config.yaml looks like. I just do not know how I can handle files from samples and genomes in one rule (or finde a better config solution)



              samples:
              SRR653893:
              fw: SRR653893_1.fastq.gz
              rv: SRR653893_2.fastq.gz
              genomes:
              GCF:
              fasta: GCF_000008985.1_ASM898v1_genomic.fna





              share|improve this answer


























                0












                0








                0







                this is how my config.yaml looks like. I just do not know how I can handle files from samples and genomes in one rule (or finde a better config solution)



                samples:
                SRR653893:
                fw: SRR653893_1.fastq.gz
                rv: SRR653893_2.fastq.gz
                genomes:
                GCF:
                fasta: GCF_000008985.1_ASM898v1_genomic.fna





                share|improve this answer













                this is how my config.yaml looks like. I just do not know how I can handle files from samples and genomes in one rule (or finde a better config solution)



                samples:
                SRR653893:
                fw: SRR653893_1.fastq.gz
                rv: SRR653893_2.fastq.gz
                genomes:
                GCF:
                fasta: GCF_000008985.1_ASM898v1_genomic.fna






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 27 '18 at 11:01









                JoergLJoergL

                1




                1






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53449173%2fsnakemake-combine-analysis-of-different-input-types-in-one-workflow%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Wiesbaden

                    Marschland

                    Dieringhausen