Skip to content

GtfToBed errors out with because "gene" is null #9103

@mmahmoudian

Description

@mmahmoudian

This bug report is regarding a new tool, GtfToBed, which was introduced in #8942 PR. The following code creates a reproducible example of the error:

Get the necessary files

Reference genome

if [ ! -f 'hg38.fa.gz' ]; then
    echo 'Downloading the reference genome'
    wget https://hgdownload.soe.ucsc.edu/goldenpath/hg38/bigZips/latest/hg38.fa.gz
fi

sha256sum 'hg38.fa.gz'
c1dd87068c254eb53d944f71e51d1311964fce8de24d6fc0effc9c61c01527d4  hg38.fa.gz

GTF file

if [ ! -f 'hg38.ncbiRefSeq.gtf.gz' ]; then
    echo 'Downloading the reference genome'
    wget https://hgdownload.soe.ucsc.edu/goldenpath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz
fi

sha256sum 'hg38.ncbiRefSeq.gtf.gz'
856919cfc5854079e70dd016048045092fd79b782aa8da9dbbd1c51a9046d8a4  hg38.ncbiRefSeq.gtf.gz

Prepare files

Unpack the compressed files

gunzip --keep 'hg38.ncbiRefSeq.gtf.gz' 'hg38.fa.gz'

Create the dict file

./gatk-4.6.1.0/gatk CreateSequenceDictionary \
                    --REFERENCE 'hg38.fa' \
                    --VERBOSITY WARNING
[Thu Feb 27 12:20:49 EET 2025] CreateSequenceDictionary --VERBOSITY WARNING --REFERENCE hg38.fa --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Thu Feb 27 12:20:49 EET 2025] Executing as mehrad@pamp-precision-tower on Linux 6.12.16-1-lts amd64; OpenJDK 64-Bit Server VM 23.0.2; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.6.1.0
[Thu Feb 27 12:21:00 EET 2025] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.18 minutes.
Runtime.totalMemory()=3816816640

Convert GTF to BED

./gatk-4.6.1.0/gatk GtfToBed \
                    --gtf-path 'hg38.ncbiRefSeq.gtf' \
                    --sequence-dictionary 'hg38.dict' \
                    --output 'blah.bed' \
                    --verbosity WARNING
Using GATK jar /home/mehrad/tmp/gatk-4.6.1.0/gatk-package-4.6.1.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/mehrad/tmp/gatk-4.6.1.0/gatk-package-4.6.1.0-local.jar GtfToBed --gtf-path hg38.ncbiRefSeq.gtf --sequence-dictionary hg38.dict --output blah.bed --verbosity WARNING
SLF4J(W): Class path contains multiple SLF4J providers.
SLF4J(W): Found provider [org.apache.logging.slf4j.SLF4JServiceProvider@4ee8051c]
SLF4J(W): Found provider [ch.qos.logback.classic.spi.LogbackServiceProvider@53125718]
SLF4J(W): See https://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J(I): Actual provider is of type [org.apache.logging.slf4j.SLF4JServiceProvider@4ee8051c]
[February 27, 2025, 12:26:04 PM EET] org.broadinstitute.hellbender.tools.walkers.conversion.GtfToBed done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=134217728
java.lang.NullPointerException: Cannot invoke "org.broadinstitute.hellbender.utils.codecs.gtf.GencodeGtfGeneFeature.addTranscript(org.broadinstitute.hellbender.utils.codecs.gtf.GencodeGtfTranscriptFeature)" because "gene" is null
	at org.broadinstitute.hellbender.utils.codecs.gtf.AbstractGtfCodec.aggregateRecordsIntoGeneFeature(AbstractGtfCodec.java:339)
	at org.broadinstitute.hellbender.utils.codecs.gtf.AbstractGtfCodec.decode(AbstractGtfCodec.java:170)
	at org.broadinstitute.hellbender.utils.codecs.gtf.AbstractGtfCodec.decode(AbstractGtfCodec.java:23)
	at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:377)
	at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.<init>(TribbleIndexedFeatureReader.java:344)
	at htsjdk.tribble.TribbleIndexedFeatureReader.iterator(TribbleIndexedFeatureReader.java:311)
	at org.broadinstitute.hellbender.engine.FeatureDataSource.iterator(FeatureDataSource.java:531)
	at java.base/java.lang.Iterable.spliterator(Unknown Source)
	at org.broadinstitute.hellbender.utils.Utils.stream(Utils.java:1182)
	at org.broadinstitute.hellbender.engine.FeatureWalker.traverse(FeatureWalker.java:97)
	at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1119)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:150)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:203)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:222)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
	at org.broadinstitute.hellbender.Main.main(Main.java:306)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions