Additing gtf file
up vote
0
down vote
favorite
I had to get only ENSEMBLE non-chromosomal pseudogenes from given gtf file
add additional attribute field "filtered" with value "manually" for each of the annotated pseudogenes and save as new file. So I had to filter the given file by containing "ENSEMBLY" "pseudogenes" and not containing "Chr" save it in new file and add to the last column additional property(filter-manually). Could you tell me how can I do this using awk or sed preferably?
##description: evidence-based annotation of the human genome (GRCh38), version 29 (Ensembl 94)
##provider: GENCODE
##contact: gencode-help@ebi.ac.uk
##format: gtf
##date: 2018-08-30
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name
"DDX11L1-202"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 13221 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA transcript 12010 13670 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; tr
anscript_name "DDX11L1-201"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12010 12057 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 1; exon_id "ENSE00001948541.1"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12179 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 2; exon_id "ENSE00001671638.2"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12613 12697 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unp
regex bash awk sed bioinformatics
New contributor
|
show 1 more comment
up vote
0
down vote
favorite
I had to get only ENSEMBLE non-chromosomal pseudogenes from given gtf file
add additional attribute field "filtered" with value "manually" for each of the annotated pseudogenes and save as new file. So I had to filter the given file by containing "ENSEMBLY" "pseudogenes" and not containing "Chr" save it in new file and add to the last column additional property(filter-manually). Could you tell me how can I do this using awk or sed preferably?
##description: evidence-based annotation of the human genome (GRCh38), version 29 (Ensembl 94)
##provider: GENCODE
##contact: gencode-help@ebi.ac.uk
##format: gtf
##date: 2018-08-30
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name
"DDX11L1-202"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 13221 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA transcript 12010 13670 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; tr
anscript_name "DDX11L1-201"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12010 12057 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 1; exon_id "ENSE00001948541.1"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12179 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 2; exon_id "ENSE00001671638.2"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12613 12697 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unp
regex bash awk sed bioinformatics
New contributor
What have you already tried?
– Didier Trosset
Nov 19 at 13:46
1
What is the expected output?
– zx8754
Nov 19 at 14:30
Which lines in the example do describe aENSEMBLE non-chromosomal pseudogene
? and why (what are the related strings) ?
– Jay jargot
Nov 19 at 14:35
This are lines that match patterns:ENSEMBL exon 169224 169502 . - . gene_id "ENSG00000284215.2"; transcript_id "ENST00000639764.2"; gene_type "pseudogene"; gene_name "AC245056.4"; transcript_type "pseudogene"; transcript_name "AC245056.4-201"; exon_number 2; exon_id "ENSE00003804365.1"; level 3; tag "basic"; Filtered: manually;
– Sergei
Nov 19 at 15:05
Actually I have managed to do this but maybe there is better solution using only awk?
– Sergei
Nov 19 at 15:05
|
show 1 more comment
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I had to get only ENSEMBLE non-chromosomal pseudogenes from given gtf file
add additional attribute field "filtered" with value "manually" for each of the annotated pseudogenes and save as new file. So I had to filter the given file by containing "ENSEMBLY" "pseudogenes" and not containing "Chr" save it in new file and add to the last column additional property(filter-manually). Could you tell me how can I do this using awk or sed preferably?
##description: evidence-based annotation of the human genome (GRCh38), version 29 (Ensembl 94)
##provider: GENCODE
##contact: gencode-help@ebi.ac.uk
##format: gtf
##date: 2018-08-30
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name
"DDX11L1-202"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 13221 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA transcript 12010 13670 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; tr
anscript_name "DDX11L1-201"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12010 12057 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 1; exon_id "ENSE00001948541.1"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12179 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 2; exon_id "ENSE00001671638.2"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12613 12697 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unp
regex bash awk sed bioinformatics
New contributor
I had to get only ENSEMBLE non-chromosomal pseudogenes from given gtf file
add additional attribute field "filtered" with value "manually" for each of the annotated pseudogenes and save as new file. So I had to filter the given file by containing "ENSEMBLY" "pseudogenes" and not containing "Chr" save it in new file and add to the last column additional property(filter-manually). Could you tell me how can I do this using awk or sed preferably?
##description: evidence-based annotation of the human genome (GRCh38), version 29 (Ensembl 94)
##provider: GENCODE
##contact: gencode-help@ebi.ac.uk
##format: gtf
##date: 2018-08-30
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name
"DDX11L1-202"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 13221 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1
-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA transcript 12010 13670 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; tr
anscript_name "DDX11L1-201"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12010 12057 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 1; exon_id "ENSE00001948541.1"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12179 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript
_name "DDX11L1-201"; exon_number 2; exon_id "ENSE00001671638.2"; level 2; transcript_support_level "NA"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1 HAVANA exon 12613 12697 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "transcribed_unp
regex bash awk sed bioinformatics
regex bash awk sed bioinformatics
New contributor
New contributor
edited Nov 19 at 14:29
zx8754
28.6k76394
28.6k76394
New contributor
asked Nov 19 at 13:25
Sergei
31
31
New contributor
New contributor
What have you already tried?
– Didier Trosset
Nov 19 at 13:46
1
What is the expected output?
– zx8754
Nov 19 at 14:30
Which lines in the example do describe aENSEMBLE non-chromosomal pseudogene
? and why (what are the related strings) ?
– Jay jargot
Nov 19 at 14:35
This are lines that match patterns:ENSEMBL exon 169224 169502 . - . gene_id "ENSG00000284215.2"; transcript_id "ENST00000639764.2"; gene_type "pseudogene"; gene_name "AC245056.4"; transcript_type "pseudogene"; transcript_name "AC245056.4-201"; exon_number 2; exon_id "ENSE00003804365.1"; level 3; tag "basic"; Filtered: manually;
– Sergei
Nov 19 at 15:05
Actually I have managed to do this but maybe there is better solution using only awk?
– Sergei
Nov 19 at 15:05
|
show 1 more comment
What have you already tried?
– Didier Trosset
Nov 19 at 13:46
1
What is the expected output?
– zx8754
Nov 19 at 14:30
Which lines in the example do describe aENSEMBLE non-chromosomal pseudogene
? and why (what are the related strings) ?
– Jay jargot
Nov 19 at 14:35
This are lines that match patterns:ENSEMBL exon 169224 169502 . - . gene_id "ENSG00000284215.2"; transcript_id "ENST00000639764.2"; gene_type "pseudogene"; gene_name "AC245056.4"; transcript_type "pseudogene"; transcript_name "AC245056.4-201"; exon_number 2; exon_id "ENSE00003804365.1"; level 3; tag "basic"; Filtered: manually;
– Sergei
Nov 19 at 15:05
Actually I have managed to do this but maybe there is better solution using only awk?
– Sergei
Nov 19 at 15:05
What have you already tried?
– Didier Trosset
Nov 19 at 13:46
What have you already tried?
– Didier Trosset
Nov 19 at 13:46
1
1
What is the expected output?
– zx8754
Nov 19 at 14:30
What is the expected output?
– zx8754
Nov 19 at 14:30
Which lines in the example do describe a
ENSEMBLE non-chromosomal pseudogene
? and why (what are the related strings) ?– Jay jargot
Nov 19 at 14:35
Which lines in the example do describe a
ENSEMBLE non-chromosomal pseudogene
? and why (what are the related strings) ?– Jay jargot
Nov 19 at 14:35
This are lines that match patterns:ENSEMBL exon 169224 169502 . - . gene_id "ENSG00000284215.2"; transcript_id "ENST00000639764.2"; gene_type "pseudogene"; gene_name "AC245056.4"; transcript_type "pseudogene"; transcript_name "AC245056.4-201"; exon_number 2; exon_id "ENSE00003804365.1"; level 3; tag "basic"; Filtered: manually;
– Sergei
Nov 19 at 15:05
This are lines that match patterns:ENSEMBL exon 169224 169502 . - . gene_id "ENSG00000284215.2"; transcript_id "ENST00000639764.2"; gene_type "pseudogene"; gene_name "AC245056.4"; transcript_type "pseudogene"; transcript_name "AC245056.4-201"; exon_number 2; exon_id "ENSE00003804365.1"; level 3; tag "basic"; Filtered: manually;
– Sergei
Nov 19 at 15:05
Actually I have managed to do this but maybe there is better solution using only awk?
– Sergei
Nov 19 at 15:05
Actually I have managed to do this but maybe there is better solution using only awk?
– Sergei
Nov 19 at 15:05
|
show 1 more comment
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
If you are using Awk anyway, you don't need grep
at all.
Also, less crucially, modifying $0
is mildly wasteful. print
lets you specify precisely what you want to print.
awk '!/##/ && !/chr/ && /pseudogene/ && /ENSEMBL/ {
print $0" Filtered: manually;"}' gencode.v29.chr_patch_hapl_scaff.basic.annotation.gtf > gencode.v29.filtered.gtf
Thanks, yes this is much better)
– Sergei
Nov 19 at 16:25
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
If you are using Awk anyway, you don't need grep
at all.
Also, less crucially, modifying $0
is mildly wasteful. print
lets you specify precisely what you want to print.
awk '!/##/ && !/chr/ && /pseudogene/ && /ENSEMBL/ {
print $0" Filtered: manually;"}' gencode.v29.chr_patch_hapl_scaff.basic.annotation.gtf > gencode.v29.filtered.gtf
Thanks, yes this is much better)
– Sergei
Nov 19 at 16:25
add a comment |
up vote
1
down vote
accepted
If you are using Awk anyway, you don't need grep
at all.
Also, less crucially, modifying $0
is mildly wasteful. print
lets you specify precisely what you want to print.
awk '!/##/ && !/chr/ && /pseudogene/ && /ENSEMBL/ {
print $0" Filtered: manually;"}' gencode.v29.chr_patch_hapl_scaff.basic.annotation.gtf > gencode.v29.filtered.gtf
Thanks, yes this is much better)
– Sergei
Nov 19 at 16:25
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
If you are using Awk anyway, you don't need grep
at all.
Also, less crucially, modifying $0
is mildly wasteful. print
lets you specify precisely what you want to print.
awk '!/##/ && !/chr/ && /pseudogene/ && /ENSEMBL/ {
print $0" Filtered: manually;"}' gencode.v29.chr_patch_hapl_scaff.basic.annotation.gtf > gencode.v29.filtered.gtf
If you are using Awk anyway, you don't need grep
at all.
Also, less crucially, modifying $0
is mildly wasteful. print
lets you specify precisely what you want to print.
awk '!/##/ && !/chr/ && /pseudogene/ && /ENSEMBL/ {
print $0" Filtered: manually;"}' gencode.v29.chr_patch_hapl_scaff.basic.annotation.gtf > gencode.v29.filtered.gtf
answered Nov 19 at 15:36
tripleee
86.9k12121176
86.9k12121176
Thanks, yes this is much better)
– Sergei
Nov 19 at 16:25
add a comment |
Thanks, yes this is much better)
– Sergei
Nov 19 at 16:25
Thanks, yes this is much better)
– Sergei
Nov 19 at 16:25
Thanks, yes this is much better)
– Sergei
Nov 19 at 16:25
add a comment |
Sergei is a new contributor. Be nice, and check out our Code of Conduct.
Sergei is a new contributor. Be nice, and check out our Code of Conduct.
Sergei is a new contributor. Be nice, and check out our Code of Conduct.
Sergei is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53375630%2fadditing-gtf-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What have you already tried?
– Didier Trosset
Nov 19 at 13:46
1
What is the expected output?
– zx8754
Nov 19 at 14:30
Which lines in the example do describe a
ENSEMBLE non-chromosomal pseudogene
? and why (what are the related strings) ?– Jay jargot
Nov 19 at 14:35
This are lines that match patterns:ENSEMBL exon 169224 169502 . - . gene_id "ENSG00000284215.2"; transcript_id "ENST00000639764.2"; gene_type "pseudogene"; gene_name "AC245056.4"; transcript_type "pseudogene"; transcript_name "AC245056.4-201"; exon_number 2; exon_id "ENSE00003804365.1"; level 3; tag "basic"; Filtered: manually;
– Sergei
Nov 19 at 15:05
Actually I have managed to do this but maybe there is better solution using only awk?
– Sergei
Nov 19 at 15:05