The large Gondwanan plant family Proteaceae is an early-diverging eudicot lineage renowned for its morphological, taxonomic and ecological diversity. Macadamia is the most economically important Proteaceae crop and represents an ancient rainforest-restricted lineage. The family is a focus for studies of adaptive radiation due to remarkable species diversification in Mediterranean-climate biodiversity hotspots, and numerous evolutionary transitions between biomes. Despite a long history of research, comparative analyses in the Proteaceae and macadamia breeding programs are restricted by a paucity of genetic information. To address this, we sequenced the genome and transcriptome of the widely grown Macadamia integrifolia cultivar 741.
Over 95 gigabases of DNA and RNA-seq sequence data were de novo assembled and annotated. The draft assembly has a total length of 518 Mb and spans approximately 79% of the estimated genome size. Following annotation, 35,337 protein-coding genes were predicted of which over 90% were expressed in at least one of the leaf, shoot or flower tissues examined. Gene family comparisons with five other eudicot species revealed 13,689 clusters containing macadamia genes and 1005 macadamia-specific clusters, and provides evidence for linage-specific expansion of gene families involved in pathogen recognition, plant defense and monoterpene synthesis. Cyanogenesis is an important defense strategy in the Proteaceae, and a detailed analysis of macadamia gene homologues potentially involved in cyanogenic glycoside biosynthesis revealed several highly expressed candidate genes.
The gene space of macadamia provides a foundation for comparative genomics, gene discovery and the acceleration of molecular-assisted breeding. This study presents the first available genomic resources for the large basal eudicot family Proteaceae, access to most macadamia genes and opportunities to uncover the genetic basis of traits of importance for adaptation and crop improvement.