Supported Files¶
Supported File Types¶
Mango supports the following file types:
Data Type | Supported File Types | |
---|---|---|
Alignments | Parquet, bam, indexed bam, sam | |
Variants | Parquet, vcf, indexed vcf, vcf.gz | |
Features | Parquet, bed, narrowPeak | |
Genome | Parquet, twoBit*, fa, fasta |
*TwoBit files must be staged locally for access.
Accessing http files through Mango¶
Mango can copy and read http files. To do so, when running mango-submit
, set spark.local.dir
to a path in the user’s home directory:
./bin/mango-submit
--conf spark.local.dir=<user home>/spark-tmp
This will allow Spark to access temporary http files.
Accessing s3a files through Mango¶
To access s3a files when running on AWS, you need the net.fnothaft:jsr203-s3a
package, and the bam splitter to be enabled:
./bin/mango-submit \
--packages org.apache.parquet:parquet-avro:1.8.2 \
--packages net.fnothaft:jsr203-s3a:0.0.2 \
--conf spark.hadoop.hadoopbam.bam.enable-bai-splitter=true \
-- hg19.2bit \
-reads s3a://1000genomes/phase1/data/NA12878/exome_alignment/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam