Supported Files

Supported File Types

Mango supports the following file types:

Data Type Supported File Types
Alignments Parquet, bam, indexed bam, sam
Variants Parquet, vcf, indexed vcf, vcf.gz
Features Parquet, bed, narrowPeak
Genome Parquet, twoBit*, fa, fasta

*TwoBit files must be staged locally for access.

Accessing http files through Mango

Mango can copy and read http files. To do so, when running mango-submit, set spark.local.dir to a path in the user’s home directory:

    --conf spark.local.dir=<user home>/spark-tmp

This will allow Spark to access temporary http files.

Accessing s3a files through Mango

To access s3a files when running on AWS, you need the net.fnothaft:jsr203-s3a package, and the bam splitter to be enabled:

./bin/mango-submit \
        --packages org.apache.parquet:parquet-avro:1.8.2 \
        --packages net.fnothaft:jsr203-s3a:0.0.2 \
        --conf spark.hadoop.hadoopbam.bam.enable-bai-splitter=true \
        -- hg19.2bit \
        -reads s3a://1000genomes/phase1/data/NA12878/exome_alignment/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam