Ginseng, which contains ginsenosides as bioactive compounds, has been regarded as an important traditional medicine for
several millennia. However, the genetic background of ginseng remains poorly understood, partly because of the plant’s
large and complex genome composition. We report the entire genome sequence of Panax ginseng using next-generation sequencing. The 3.5-Gb nucleotide sequence contains more than 60% repeats and encodes 42 006 predicted genes. Twenty-two transcriptome datasets and mass spectrometry images of ginseng roots were adopted to precisely quantify the
functional genes. Thirty-one genes were identified to be involved in the mevalonic acid pathway. Eight of these genes
were annotated as 3-hydroxy-3-methylglutaryl-CoA reductases, which displayed diverse structures and expression
characteristics. A total of 225 UDP-glycosyltransferases (UGTs) were identified, and these UGTs accounted for one of the
largest gene families of ginseng. Tandem repeats contributed to the duplication and divergence of UGTs. Molecular
modeling of UGTs in the 71st, 74th, and 94th families revealed a regiospecific conserved motif located at the N-terminus.
Molecular docking predicted that this motif captures ginsenoside precursors. The ginseng genome represents a valuable
resource for understanding and improving the breeding, cultivation, and synthesis biology of this key herb.