This application note demonstrates how to apply MarkerLynx Software for sample profiling using a multivariate statistical approach. As a result, sample comparison can be completed in hours with complete profile information on hand. This significantly reduces analysis time and manpower required for THM sample profiling.
*MarkerLynx is now replaced with Progenesis QI*.
Sample profiling is important for Traditional Herbal Medicine (THM) or Traditional Chinese Medicine (TCM) studies simply because there is very little reproducibility from sample to sample. The contents of plant extracts may vary significantly depending on the plant location, harvest time, as well as the extraction method. One cannot assume identical contents for two samples even if they were extracted from the same plant, or from two plants having the same name.
In addition, there is also a strong need to compare THM samples for quality control. THM sample profiling is also critical for the study of the THM’s physiological working mechanisms.
We have developed a simple and fast generic analytical workflow for THM sample analysis (Figure 1). This workflow takes advantage of Waters UPLC technology for high resolution, high sensitivity, and high-speed separations, as well as the SYMAPT HDMS for its TOF-exact mass measurement capability. This workflow can be adopted for either compound identification or for sample profiling.
Compound identification for THM has been discussed elsewhere.1 This application note demonstrates how to apply this workflow for sample profiling using a multivariate statistical approach. As a result, sample comparison can be completed in hours with complete profile information on hand. This significantly reduces analysis time and manpower required for THM sample profiling.
Two samples of a Chinese Ginseng extract drink were used for this work.
Each sample was filtered prior to injection.
LC system: |
ACQUITY UPLC System |
Column: |
ACQUITY UPLC HSS T3 Column 2.1 x 100 mm, 1.7 μm, 65 °C |
Flow rate: |
600 μL/min |
Mobile phase A: |
Water + 0.1% Formic Acid |
Mobile phase B: |
MeOH |
Time |
Composition |
Curve |
---|---|---|
0 min |
95% A |
- |
10 min |
30% A |
Curve 6 |
17 min |
0% A |
Curve 6 |
20 min |
95% A |
Curve 1 |
MS system: |
SYNAPT HDMS System |
Ionization mode: |
Electrospray |
Capillary voltage: |
3000 V |
Cone voltage: |
35 V |
Desolvation temp.: |
450 °C |
Desolvation gas: |
800 L/Hr |
Source temp.: |
120 °C |
Acquisition range: |
50 to 1500 m/z |
Collision gas: |
Argon |
Compound screening and profiling: |
MarkerLynx Application Manager |
Multivariate statistical analysis: |
SIMPCA-P |
To ensure the statistical validity and significance of the results, each sample needed to be injected with no less than three replicates. To obtain the complete profile from a sample, it is necessary to run the LC-MS analysis in both positive and negative modes. For this work, each sample was injected six times: three in ESI+ mode and three in ESI- mode. For demonstration purposes, only the results from ESI- are discussed.
Figure 2 shows the comparison of the two base peak ion (BPI) chromatograms obtained from the two Ginseng extract drinks. It appears as though the Extra Strong Ginseng contains a larger number of components at higher concentrations compared with the Wild Panax Ginseng. Further chemical profiling of the two samples requires the use of multivariate statistical tools.
The first step for multivariate statistical analysis of the LC-MS dataset was to convert the 3D LC-MS data into a 2D matrix. This critical step was accomplished by using the MarkerLynx, an Application Manager for MassLynx Software. MarkerLynx converts each data point into an Exact Mass Retention Time (EMRT) pair and tabulates the results into a 2D matrix (Figure 3).
There were 1184 EMRT pairs found here. The number of EMRT pairs detected depends on the peak detection threshold, which is a userdefined parameter.
The EMRT table can be automatically imported into SIMCA-P by simply clicking on the P+ button. The data is processed first by using principal component analysis (PCA). Then a minimum supervised statistical model, Orthogonal Partial Least Square-Data Analysis (OPLS-DA), can be applied for orthogonal data analysis. Figure 4 shows the scores plot obtained as the result of the OPLS-DA. The scores plot clearly displays the differences of the two sample groups along the x-axis as well as the differences within the same sample group along the y-axis.
To further chemically identify the differences between the two sample groups, a scatter plot (S-plot) based on the OPLS-DA was obtained and is shown in Figure 5.
In the S-plot, each point represents an EMRT pair. The x-axis shows the variable contributions. The further away a data point from 0, the more contribution it has for the variance from the sample. The y-axis shows the sample correlations within the same sample group. The further away an EMRT pair from the value 0, the better correlation it has among the injections. As a result, the EMRT pairs on both ends of the S-shaped curve represent the leading contributing ions from each sample group with the highest confidence.
For example, in Figure 5, the EMRT pairs close to upper-right corner of the S-plot are the leading contributing markers from the Wild Panax Ginseng with high confidence; the EMRT pairs close to the lower left corner of the S-plot are the leading contributing markers from the Extra Strong Ginseng with high confidence.
The leading contributing EMRT pairs can be selectively captured so that a list of top contributing markers for each sample group can be generated and saved as a text file. This text file can be later imported back into MarkerLynx as a results table for elemental composition searches as well as database searches. Figure 6 shows the two lists of the top 10 leading EMRT pairs obtained from the S-plot for both the sample groups.
Figure 6 shows the fact that the m/z 945.5419 ion at retention time 6.54 min is the most significant marker in the Extra Strong Ginseng, and this is at the confidence level of 0.999. And the m/z 801.5021 ion at retention time of 6.33 min is the most significant marker in Wild Panax Ginseng, with the confidence level of 0.994.
In addition, the top 10 EMRT pairs are in lower MW ranges in Wild Panax Ginseng (from m/z 623 to m/z 955) compared with those found in Extra Strong Ginseng (from m/z 783 to m/z 1187). This indicates that the top 10 markers in Extra Strong Ginseng contained mostly 3–4 sugar rings, while the top 10 markers in Wild Panax Ginseng contained mostly 2–3 sugar rings.
The top 10 EMRTs can also be reviewed in a bar chart format. Figure 7 shows the bar charts for the top 10 markers for Extra Strong Ginseng (7a) and Wild Panax Ginseng (7b).
The bar chart offers additional information for the markers that were already identified on the list, showing a direct comparison of the top 10 EMRTs between the two sample groups in question. In figure 7, the top 10 markers from Extra Strong Ginseng were barely detected in the Wild Panax Ginseng. While the top 10 markers from Wild Panax Ginseng were detected with very low intensities from the Extra Strong Ginseng, some were undetectable as well.
In addition, some semi-quantitative information is provided by the bar chart. The top 10 markers of Wild Panax Ginseng were detected at much higher intensities than the top 10 markers from the Extra Strong Ginseng. This is an indication that the Wild Panax Ginseng drink is a cleaner extract than the Extra Strong Ginseng drink.
As mentioned, the text files obtained from SIMCA-P can be directly imported to the MarkerLynx results table. Figure 8 shows the screen shot of the MarkerLynx result window with the two results tables filled, one for each sample group.
From the MarkerLynx results table, the exact mass reported for each EMRT pair can be searched for elemental composition. This information can be used for further querying of existing databases to find putative chemical structures (if the marker found resides in the database).
As an example, we chose a marker from Panax Ginseng with m/z 971.4880 and elemental composition of C48H76O20 to search a database available from the public domain, Chemspider. One of the possible hits is shown in Figure 9.
From this information, it is easy to go back to the LC-MS raw data and confirm the structure found using the fragment ions obtained from the TOF MSE data.1
This application note demonstrates generic intelligent workflow for Traditional Herbal Medicine (THM) sample profiling. This approach is very effective for general comparison of extremely complex samples.
By using the ACQUITY UPLC-SYNAPT HDMS systems with TOF MS for analysis, raw data with exact mass measurements are first collected. The multivariate statistical analysis can be performed for the dataset after they are converted into a 2D matrix as an EMRT pair. The top contributing ions for each sample can be easily obtained from the OPLS-DA S-plot generated in SIMCA-P. Results can be imported back into the results table in MarkerLynx. A database query can then be performed to obtain elemental composition, as well as chemical structure if the marker is a characterized compound.
This entire approach is easy, fast, and generic. It can be easily adapted for profiling various types of THM samples. As a result, significant resource savings can be accomplished with maximum information obtained.
720002541, March 2008