HOME‎ > ‎Bioinformatics‎ > ‎

Yam

============
2016-12-06: for D. rotoundata reference

-DR_PseudoChromosome_21_v1.fasta.gz
This fasta is our final 21 Pseudo Chromosome reference of D. rotundata.

-TDr96_F1_v1.0.fasta.gz
This fasta is our original De novo assembly of D. rotundata.

-TDr96_F1_q30p90_MSR_Cov_10_S-snp_RYKMSWBDHV2ACGT_NoHetero20150121_02.fa.gz
This fasta is a reference that was used for population genomics analysis. All hetero-code were replaced randomly by either one of the ACGT bases it represents.


============
2016-12-06: for population genomics

-TDr1489A_q30p90_MSR.bam
reference: TDr96_F1_q30p90_MSR_Cov_10_S-snp_RYKMSWBDHV2ACGT_NoHetero20150121_02.fa.gz
fastq: TDr1489A_*.gz (with a Phred quality score of ≥ 30 comprising of ≥ 90% of the reads were retained)

-TDr3527A_q30p90_MSR.bam
reference: TDr96_F1_q30p90_MSR_Cov_10_S-snp_RYKMSWBDHV2ACGT_NoHetero20150121_02.fa.gz
fastq: TDr3527A_*.gz (with a Phred quality score of ≥ 30 comprising of ≥ 90% of the reads were retained)

-fastq: these are all raw reads, not filtered by quarity.
-TDr1489A_p_1_1_sequence.txt.gz
-TDr1489A_p_1_2_sequence.txt.gz
-TDr1489A_p_2_1_sequence.txt.gz
-TDr1489A_p_2_2_sequence.txt.gz

-TDr3527A_p_1_1_sequence.txt.gz
-TDr3527A_p_1_2_sequence.txt.gz
-TDr3527A_p_2_1_sequence.txt.gz
-TDr3527A_p_2_2_sequence.txt.gz



============
These are the SNP matrix summarizing SNPs at all SNP positions for all accessions.

  • SNP_information_for_TDr26.txt.gz
target: all 26 accessions
number of SNPs: 6,679,590
compress size: 278,436,693 byte (278 MB)
non compress size: 1,341,820,551 byte(1.34 GB)
MD5: SNP_information_for_TDr26.txt.gz.md5
  • SNP_information_for_TDr26_and_alata2.txt.gz
target: all 26 accession and D.alata
number of SNPs: 8,180,312
compress size: 338,583,766 (339 MB)
non compress size: 1,719,281,597 (1.72 GB)
MD5: SNP_information_for_TDr26_and_alata2.txt.gz.md5
  • TDr96_F1_NoHetero_v1.0.fasta.gz
This is TDr96_F1 reference fasta.
MD5: TDr96_F1_NoHetero_v1.0.fasta.gz.md5

go to "download folder" provided by Google Drive

File format:

These SNP files are tab separated text file. The first 2 lines are header. Both headers are started by "chr" and "position". The "chr" columns indicates the chromosome or contig name in reference sequence. The "position" columns indicates the position of SNP.
The first header describes the sample names using 3 columns, the second header describes the data within 3 columns. The "cons" indicates the consensus base of SNP, the "depth" indicates the depth of aligned reads  on the SNP position and the index indicates depth index value if that SNP is valid.

If the value of the index column is shown with "_", the consensus base is the same as the reference base.
If the value of the three columns are shown with "- NA NA" or "- - -", there is no aligned reads about this sample.

Comments