Introduction

The turnoveR package has been developed to analyse isotopically labeled proteomics datasets with the goal of quantifying and visualizing protein turnover and protein degradation in an accessible, efficient and reproducible manner, and to easily contextualize the data with direct links to the uniprot and KEGG online databases. To install the package, see the Installation Instructions.

Experimental datasets intended for use with this package are generated from a series of samples taken from a steady-state bacterial culture. Samples are taken at different timepoints following an isotopic label “pulse”.

Functions in the turnoveR package accept mass spectrometry datasets that have been pre-processed to estimate areas of the “heavy” and “light” isotopic labeled fractions for each protein (by programs such as Massacre).

This vignette illustrates the use of the turnoveR pakage for the processing of an example dataset. The following flow chart illustrates the overall structure and work flow of the whole package. For more details and a higher resolution version, see the Package Structure Vignette.

turnoveR: export
turnoveR: visualization
turnoveR: information
turnoveR: calculations
turnoverR: data preparation
turnoveR: database functions
upstream of turnoveR
taxon ID, name
uniprot ID, gene, mass, etc.
tor_export_data
turnoveR export (xlsx)
visualization & export (at any intermediate step)
tor_plot_label_rate_error
tor_plot_label_rate_hist
tor_plot_labeling_curves
tor_plot_on_kegg_pathway
tor_plot_comparison
tor_add_uniprot_info
tor_add_protein_counts_info
tor_add_kegg_pathways_info
tor_fetch_uniprot_proteins
tor_filter_label_rate_fits
tor_filter_decoy
tor_fetch_kegg_pathway_details
tor_calculate_label_rate
tor_calculate_labeled_fraction
tor_calculate_growth_params
tor_calculate_degradation_dissipation
tor_add_metadata
flow rate
volume
tor_read_protein_counts_data
tor_read_svm_data_file
tor_recode_protein_ids
tor_calculate_spectral_fit_quality
tor_filter_peptides_by_spectral_fit_quality
metadata file (xlsx)
protein list (xlsx)
protein abundances (psms.csv)
SVM results (svm_pred_results.csv)
peptide 15N/14N (iso.csv)
tor_fetch_uniprot_species
tor_fetch_kegg_species
tor_fetch_kegg_pathways_for_species
uniprot
KEGG
raw data files (mzXML)
unknown output (???)
X!Tandem & SpectraST
Vlad script?
Massacre
Vlad SVM script

Input Files

protein_sums.csv or psms.csv:

  • a file containing a sum of how many times each protein was identified in each sample of a dataset
  • Preferred file type: .csv
  • Important column names:

svm_results.csv:

  • typically a massacre output file curated to remove poorly fitted curves
  • Preferred file type: .csv
  • Important column names:

metadata.xlsx:

  • an excel sheet with sample timepoints, other sampling information
  • Preferred file type: .xlsx
  • Important column names:

Getting started

Load the turnoveR package to get access to all functions. The exported functions of the package all start with the prefix tor_ to make easy for auto-completion. To get a list of all available turnoveR functions, simply start typing tor_ in the RStudio console after the package is loaded. Additional packages that are helpful for data processing is the core set of packages loaded by the tidyverse (e.g. dplyr, tidyr, ggplot2) and the non-core readxl for reading Excel files. Lastly, the plotly library makes it easy to make plots interactive which can be very useful for data exploration.

library(turnoveR)
library(tidyverse)
library(readxl)
library(plotly)

# data base path
path <- file.path("vignettes", "vignette_data")

Reading in SVM files

One way to get started is to load in an SVM data files. This pre-processed proteomics data (supported vector machine - SVM - evaluated spectral output from Massacre) is read in through the function tor_read_svm_data_file. Note that some of the column names are standardized upon data loading for compatibility with downstream processing. Also note that most functions provide a quick summary message of what is happening. If descired, this can be turned off by specifying quiet = TRUE in each function call.

svm_data <- 
  # read svm data file
  tor_read_svm_data_file(
    filepath = file.path(path, "svm_pred_results_0.03gr.csv")) 
#> Info: successfully read 27220 records from SVM file 'svm_pred_results_0.03gr.csv'
# show first 100 records
svm_data %>% head(100)
ABCDEFGHIJ0123456789
prot_id
<chr>
uniprot_id
<chr>
peptide_seq
<chr>
peptide_mz
<dbl>
svm_pred
<dbl>
amp_ulab
<dbl>
amp_lab
<dbl>
sample
<chr>
6PGDP00350VLSGPQAQPAGDK/2634.33620.7894035600.00492572.1293e-031tp
6PGDP00350ELSAEGFNFIGTGVSGGEEGALKGPSIMPGGQK/31074.53150.0120682090.00509725.9419e-031tp
6PGDP00350AASEEYNWDLNYGEIAK/2986.95030.2594865260.02341908.1968e-031tp
6PGDP00350LVPYYTVK/2491.78460.8994425870.02749201.1085e-021tp
6PGDP00350VLSGPQAQPAGDKAEFIEK/3662.35000.0320893540.03889701.7490e-021tp
6PGDP00350GDIIIDGGNTFFQDTIR/2941.47120.1134677890.01276701.9549e-021tp
6PGDP00350IAAVAEDGEPCVTYIGADGAGHYVK/3855.07650.3052847660.08518602.9411e-021tp
6PGDP00350GYTVSIFNR/2528.77780.0726446880.02636503.1067e-021tp
6PGDP00350DVVAYAVQNGIPVPTFSAAVAYYDSYR/3979.48760.0591282150.01141803.1177e-021tp
6PGDP00350ITDAYAENPQIANLLLAPYFK/3789.08670.2290156940.08088303.3500e-021tp

Reading in Massacre output

Alternatively, massacre output files can be read and evaluated directly (see flow chart for details). Functionality and description coming soon…

Data Preparation

Spectral Quality filtering

Typically the first step is to remove data with poor quality spectral fits. In the case of SVM pre-processed data, the quality probability column is svm_pred and should be filtered with a sensible cut-off (here we use 75% from the SVM estimate of being a good fit).

data_filtered <- 
  svm_data %>% 
  # spectral quality filtering
  tor_filter_peptides_by_spectral_fit_quality(svm_pred > 0.75)
#> Info: kept 6961 of 27220 (25.6%) peptide measurements during spectral fit quality filtering (condition 'svm_pred > 0.75')

Recoding protein IDs

Sometimes not all protein IDs are up to date and need to be recoded to make sure they can matched to database records.

data_recoded <-
  data_filtered %>% 
  tor_recode_protein_ids(file.path(path, "rename_prot.xlsx"))
#> Info: renamed 5 protein entries for 1 different proteins.

Adding metadata

Often it is necessary to add additional metadata to the data set (e.g. the times of the individual samples for later time course fitting). If this is the case, the metadata should include information on sampling timepoints, and is accepted in an excel format.

# read metadata
metadata <- read_excel(file.path(path, "metadata_CMW.xlsx"))

# show metadata
metadata
ABCDEFGHIJ0123456789
sample
<chr>
datetime
<S3: POSIXct>
hours
<dbl>
t0a2018-01-11 12:05:000.000000
t0b2018-01-11 16:42:000.000000
t0c2018-01-11 22:09:000.000000
1tp2018-01-12 05:58:007.816667
2tp2018-01-12 08:58:0010.816667
3tp2018-01-12 11:56:0013.783333
4tp2018-01-12 14:45:0016.600000
5tp2018-01-12 17:29:0019.333333
# add to data
data_w_metadata <-
  data_recoded %>% 
  tor_add_metadata(metadata, join_by = "sample")
#> Info: adding metadata to mass spec data, joining by 'sample'...8 metadata entries successfully added to 6961 data recors, 0 could not be matched to metadata

Calculations

Labeling Rate, Degradation and Dissipation

The calculation of the labelling rate is in two steps to more easily export/examine the data in between. First is the calculation of the labeling fractions from the light an heavy signals using tor_calculate_labeled_fraction. This function returns the fraction of labeled and unlabeled peptides for each sample.

With the labeled and unlabeled fractions known for each peptide at each timepoint, it is possible to fit an exponential to either individual peptides or all peptides in a protein to calculate a labeling rate. The fuction tor_calculate_label_rate thus uses the calculated labeled fractions to fit a curve describing isoptope labeling over time. Peptides can be combined to return a labeling rate for the entire protein by choosing the option combine_peptides = TRUE in the function arguments.

The calculated labeling rates can then be used to calculate degradation rates and dissipation rates using the function tor_calculate_degradation_dissipation. Protein degradation is calculated as the labeling rate minus the growth rate (deg_rate = label_rate - growth_rate). Protein dissipation is calculated as the degradation rate divided by the labeling rate, and is displayed as a percent (dissipation = (deg_rate / label_rate)*100).

For this last step, it is critical to know the growth rate of the experiment. The growth rate can be calculated using the culture volume and media flow rate and turnoveR provides the tor_calculate_growth_params function to simplify this step.

# calculate growth parameters
growth_params <- 
  tor_calculate_growth_params(
    flow_rate = 0.46 * 60, # [g/hour]
    flow_rate_se = 0.02 * 60, # estimated error [g/hour]
    volume = 894.4, # [g]
    volume_se = 20 # estimated error [g]
  )
growth_params %>% rmarkdown::paged_table()
ABCDEFGHIJ0123456789
growth_rate
<dbl>
growth_rate_se
<dbl>
gen_time
<dbl>
gen_time_se
<dbl>
0.030858680.0015087322.461991.098203
# data calculations
data_w_calcs <- 
  data_w_metadata %>% 
  # calculate labeled fraction
  tor_calculate_labeled_fraction() %>% 
  # calculate the label rate
  tor_calculate_label_rate(
    time_col = "hours", 
    min_num_timepoints = 3,
    combine_peptides = TRUE) %>% 
  # degradation rate and dissipation
  tor_calculate_degradation_dissipation(
    growth_rate = growth_params$growth_rate,
    growth_rate_se = growth_params$growth_rate_se) 
#> Info: calculated labeled/unlabeled fraction for 6961 peptides
#> Info: processing data for 421 proteins, this may take a few seconds... 336 of the proteins could be fit to a labeling curve, 85 did not have enough time points
#> Info: calculated degradation rate and dissipation for 336 records
# show results (remove complex data column for printout)
data_w_calcs 
ABCDEFGHIJ0123456789
prot_id
<chr>
uniprot_id
<chr>
nested_data
<list>
num_peptides
<int>
num_timepoints
<int>
num_datapoints
<int>
enough_data
<lgl>
fit_error
<lgl>
label_rate
<dbl>
label_rate_se
<dbl>
6PGDP00350<tibble>5517TRUEFALSE0.0409902520.0013781929
6PGLP52697<tibble>259TRUEFALSE0.0347910540.0002620142
AATP00509<tibble>7524TRUEFALSE0.0305602760.0036753045
ACCAP0ABD5<tibble>335TRUEFALSE0.0348582580.0017393424
ACCCP24182<tibble>144TRUEFALSE0.0332930770.0015632613
ACEAP0A9G6<tibble>25599TRUEFALSE0.0517717870.0003542789
ACKAP0A6A3<tibble>235TRUEFALSE0.0349822590.0019357015
ACON1P25516<tibble>245TRUEFALSE0.0390410040.0029873404
ACON2P36683<tibble>18559TRUEFALSE0.0513289670.0017682078
ACPP0A6A8<tibble>3513TRUEFALSE0.0324908740.0004352906

Visualization

The data quality can be visualized using a few basic graphing functions. To see a histogram plotting the labeling rates of all the proteins in an experiment (or in multiple data sets combined), use tor_plot_label_rate_hist. Similar plots could be generated for deg_rate and dissipation.

data_w_calcs %>% tor_plot_label_rate_hist()

The function tor_plot_label_rate_error displays the residual standard error for each calculated label rate to easily evalute what quality cutoffs might be useful.

data_w_calcs %>% tor_plot_label_rate_error()

To visualize what the labeling curves and least squares fit actually look like, use tor_plot_labeling_curves to see the labeling curves for a randomly selected protein or proteins. The argument plot_number allows the user to select the desired number of curves to display.

data_w_calcs %>% tor_plot_labeling_curves(plot_number = 4, random_seed = 123)

Data Quality

The output from the calculation function should be filtered to remove curves with missing fits (not enough data) and poor curve fits. It is always helpful to also look at the data that is getting discarded to see if something is amiss. Any filtered dataset can easily be passed to any of the plotting functions (e.g. tor_plot_labeling_curves) for a closer look.

Not enough data

Let’s take a look at some of the proteins that did not have enough data for a fit (which we defined earlier to be at least 3 separate time points) .

# show all records that don't have enough data
data_w_calcs %>% 
  tor_filter_label_rate_fits(!enough_data) 
#> Info: fetching 85 of 421 entries based on filter condition '!enough_data'
ABCDEFGHIJ0123456789
prot_id
<chr>
uniprot_id
<chr>
nested_data
<list>
num_peptides
<int>
num_timepoints
<int>
num_datapoints
<int>
enough_data
<lgl>
fit_error
<lgl>
label_rate
<dbl>
label_rate_se
<dbl>
ABDHP77674<tibble>111FALSEFALSENANA
APTP69503<tibble>111FALSEFALSENANA
ASSYP0A6E4<tibble>122FALSEFALSENANA
BTUEP06610<tibble>122FALSEFALSENANA
CPDAP0AEW4<tibble>122FALSEFALSENANA
CUEOP36649<tibble>111FALSEFALSENANA
CURAP76113<tibble>111FALSEFALSENANA
CYDAP0ABJ9<tibble>111FALSEFALSENANA
CYOAP0ABJ1<tibble>222FALSEFALSENANA
DAPDP0A9D8<tibble>111FALSEFALSENANA
# visualize some of these problematic records
data_w_calcs %>% 
  tor_filter_label_rate_fits(!enough_data) %>% 
  tor_plot_labeling_curves(random_seed = 123)
#> Info: fetching 85 of 421 entries based on filter condition '!enough_data'

Low quality fits

Let’s take a closer look at the label rate estimates of low quality fits and use plotly to make the plot interactive. Mouse over individual data points to see the protein ID.

data_w_calcs %>% 
  tor_filter_label_rate_fits(fit_rse > 0.05) %>% 
  tor_plot_label_rate_error() %>% 
  ggplotly()
#> Info: fetching 66 of 421 entries based on filter condition 'fit_rse > 0.05'
0.020.040.0601020
0.031label rate [1/time]residual standard error of fit [% labelled]growth rate

This could likewise be used with tor_plot_labeling_curves, which reveals a little more detail about wy the quality is bad (and potentially allows reconsidering the exclusion or closer examination of some peptides that behave clearly differently from the rest of the protein). Use the mouseover to see the peptide sequences for each data point.

data_w_calcs %>% 
  tor_filter_label_rate_fits(fit_rse > 0.05) %>% 
  tor_plot_labeling_curves(random_seed = 11) %>% 
  ggplotly()
#> Info: fetching 66 of 421 entries based on filter condition 'fit_rse > 0.05'
0%20%40%051015200%20%40%0510152005101520
(AAT,1)(ATPD,2)(HSLU,3)(PLIG,4)(UDP,5)(ZINT,6)(expected,1)(expected,2)(expected,3)(expected,4)(expected,5)(expected,6)(fit,1)(fit,2)(fit,3)(fit,4)(fit,5)(fit,6)hoursfraction labeledAATATPDHSLUPLIGUDPZINTprot_idcurve

High quality fits

Lastly, it is useful to continue with just the high quality fits to analyze the most robust part of the data set. Here we focus on everything that had enough_data AND the residual standard error of the fit is smaller than 5% (here in two separate statements for clarity but could be combined into one). Also, focusing just on the key data columns going forward.

data_hq <- 
  data_w_calcs %>% 
  tor_filter_label_rate_fits(enough_data) %>% 
  tor_filter_label_rate_fits(
    fit_rse <= 0.05,
    select = c(matches("prot"), matches("rate"), matches("dissipation"))
  )
#> Info: fetching 336 of 421 entries based on filter condition 'enough_data'
#> Info: fetching 270 of 336 entries based on filter condition 'fit_rse <= 0.05', keeping 10 of 18 columns.

Adding Information

Most steps in the information section can be performed in arbitrary order, however, it is generally advisable to follow the flow chart to make sure the few information that does build on one another is available when each function is called (if anything is missing, turnoveR will complain and point to the missing information).

Adding uniprot information

In order to get the most up-to-date protein information, all proteins for the experimental organisms (here E. coli) is queried directly from the uniprot online database and matched to the mass spec data. This adds useful information including the recommended gene and protein names and the molecular weight of the protein. If the organism uniprot ID is not known yet, the tor_fetch_uniprot_species function can be helpful as shown here.

# look for taxon ID for K12 strain
uniprot_species <- tor_fetch_uniprot_species("strain K12")
#> Info: querying uniprot database for taxa with 'strain K12' in the name... retrieved 5 records
uniprot_species
ABCDEFGHIJ0123456789
taxon_id
<int>
taxon_name
<chr>
75379Thiomonas intermedia (strain K12)
316385Escherichia coli (strain K12 / DH10B)
316407Escherichia coli (strain K12 / W3110 / ATCC 27325 / DSM 5911)
595496Escherichia coli (strain K12 / MC4100 / BW2952)
83333Escherichia coli (strain K12)
# retrieve uniprot info for the K12 strain
uniprot_data <- tor_fetch_uniprot_proteins(taxon = 83333)
#> Info: reading uniprot proteins of taxon 83333 from cached file (use read_cache = FALSE to disable)... retrieved 4497 records
uniprot_data %>% head(20)
ABCDEFGHIJ0123456789
prot_id
<chr>
uniprot_id
<chr>
gene
<chr>
prot_name
<chr>
prot_mw
<int>
ACPH_ECOLIP21515acpHAcyl carrier protein phosphodiesterase22961
ACRD_ECOLIP24177acrDProbable aminoglycoside efflux pump113047
AIS_ECOLIP45565aisLipopolysaccharide core heptose(II)-phosphate phosphatase22257
AHR_ECOLIP27250ahrAldehyde reductase Ahr36502
ALSA_ECOLIP32721alsAD-allose import ATP-binding protein AlsA56745
AQPZ_ECOLIP60844aqpZAquaporin Z23703
AVTA_ECOLIP09053avtAValine--pyruvate aminotransferase46711
BGLR_ECOLIP05804uidABeta-glucuronidase68447
BGLH_ECOLIP26218bglHCryptic outer membrane porin BglH60657
CCMC_ECOLIP0ABM1ccmCHeme exporter protein C27885
# add the uniprot information
data_w_uniprot <-
  data_hq %>% 
  tor_add_uniprot_info(uniprot_data)
#> Info: adding uniprot information to mass spec data, joining by 'uniprot_id'...266 records joined successfully, 4 could not be matched with uniprot records
# check on the ones with missing uniprot info
data_w_uniprot %>% filter(missing_uniprot)
ABCDEFGHIJ0123456789
uniprot_id
<chr>
gene
<chr>
prot_name
<chr>
prot_mw
<int>
prot_id
<chr>
label_rate
<dbl>
label_rate_se
<dbl>
deg_rate
<dbl>
deg_rate_se
<dbl>
growth_rate
<dbl>
NANANANADCEA:DCEB0.020450480.0006819394-0.0104081920.0016556900.03085868
NANANANALIVK:LIVJ0.028489020.0002523579-0.0023696600.0015296900.03085868
NANANANATALA0.032792530.00130213340.0019338540.0019929420.03085868
NANANANATKT1:TKT20.051160570.00182737660.0203018910.0023697200.03085868

Adding protein counts information

The protein_sums.csv or psms.csv file should contain the protein IDs and corresponding counts for all proteins identified in each sample and is read by tor_read_protein_counts_data in csv format and then added to the data set using tor_add_protein_counts_info, which also calculates the relative protein counts, relative protein mass (based on the molecular weight of each protein retrieved from uniprot in the previous step). The relative protein abundance information can be combined with the dissipation and degradation values to calculate a weighted degradation and dissipation rate for each protein. This weighted calculation uses the relative mass of each protein identified in the samples to better describe overall cellular investment in protein turnover for each protein.

# read protein count info
protein_count <- tor_read_protein_counts_data(file.path(path, "psms.csv"))
#> Info: reading protein counts data from 'psms.csv'... read 913 records
# add to dataset
data_w_counts <- data_w_uniprot %>% 
  tor_add_protein_counts_info(protein_count)
#> Info: protein counts added and weighted rates calculated for 270/270 datasets
# look at some of the data
data_w_counts %>% 
  select(gene, prot_name, prot_rel_mass, starts_with("deg"), starts_with("diss")) %>% 
  head(10) %>% 
  rmarkdown::paged_table()
ABCDEFGHIJ0123456789
gene
<chr>
prot_name
<chr>
prot_rel_mass
<dbl>
deg_rate
<dbl>
deg_rate_se
<dbl>
deg_rate_weighted
<dbl>
dissipation
<dbl>
dissipation_se
<dbl>
tufAElongation factor Tu 10.049972492.322514e-030.0016029181.160618e-046.99949114.832152
aceAIsocitrate lyase0.035440982.091311e-020.0015497687.411812e-0440.39480173.006196
glpKGlycerol kinase0.039919832.587747e-020.0015377031.033024e-0345.61020132.720772
ompCOuter membrane protein C0.022506881.603725e-030.0015592393.609484e-054.94025324.803587
ompAOuter membrane protein A0.01747325-5.744113e-030.001552020-1.003683e-04-22.87164426.188645
groL60 kDa chaperonin0.02425516-1.858430e-030.001585458-4.507651e-05-6.40832535.468110
ompFOuter membrane protein F0.016147684.399747e-050.0015742167.104573e-070.14237435.094109
aldALactaldehyde dehydrogenase0.019492098.971291e-030.0015531011.748692e-0422.52397293.904894
dnaKChaperone protein DnaK0.021435623.138349e-030.0015660626.727248e-059.23124714.607879
enoEnolase0.013095622.577444e-030.0015420913.375322e-057.70856114.612637

Adding KEGG pathway info

Coming soon…

1-step processing

The entire set of operations listed section-by-section above can also be easily performed in a single step piping (%>%) from one operation to the next.

# data base path
path <- file.path("vignettes", "vignette_data")

# growth params
growth_params <- 
  tor_calculate_growth_params(
    flow_rate = 0.46 * 60, # [g/hour]
    flow_rate_se = 0.02 * 60, # estimated error [g/hour]
    volume = 894.4, # [g]
    volume_se = 20 # estimated error [g]
  )

# data processing in one pipeline
data_one_pipe <-
  # read SVM file
  tor_read_svm_data_file(
    filepath = file.path(path, "svm_pred_results_0.03gr.csv")
  ) %>% 
  # spectral quality filtering
  tor_filter_peptides_by_spectral_fit_quality(svm_pred > 0.75) %>% 
  # recode protein ids
  tor_recode_protein_ids(file.path(path, "rename_prot.xlsx")) %>% 
  # add metadata
  tor_add_metadata(
    read_excel(file.path(path, "metadata_CMW.xlsx")), 
    join_by = "sample"
  ) %>% 
  # calculate labeled fraction
  tor_calculate_labeled_fraction() %>% 
  # calculate the label rate
  tor_calculate_label_rate(
    time_col = "hours", 
    min_num_timepoints = 3,
    combine_peptides = TRUE
  ) %>% 
  # degradation rate and dissipation
  tor_calculate_degradation_dissipation(
    growth_rate = growth_params$growth_rate,
    growth_rate_se = growth_params$growth_rate_se
  ) %>% 
  # focus on high quality fits
  tor_filter_label_rate_fits(enough_data & fit_rse <= 0.05) %>% 
  # add uniprot data
  tor_add_uniprot_info(
    tor_fetch_uniprot_proteins(taxon = 83333)
  ) %>% 
  # add protein count info
  tor_add_protein_counts_info(
    tor_read_protein_counts_data(file.path(path, "psms.csv"))
  )
#> Info: successfully read 27220 records from SVM file 'svm_pred_results_0.03gr.csv'
#> Info: kept 6961 of 27220 (25.6%) peptide measurements during spectral fit quality filtering (condition 'svm_pred > 0.75')
#> Info: renamed 5 protein entries for 1 different proteins.
#> Info: adding metadata to mass spec data, joining by 'sample'...8 metadata entries successfully added to 6961 data recors, 0 could not be matched to metadata
#> Info: calculated labeled/unlabeled fraction for 6961 peptides
#> Info: processing data for 421 proteins, this may take a few seconds... 336 of the proteins could be fit to a labeling curve, 85 did not have enough time points
#> Info: calculated degradation rate and dissipation for 336 records
#> Info: fetching 270 of 421 entries based on filter condition 'enough_data & fit_rse <= 0.05'
#> Info: reading uniprot proteins of taxon 83333 from cached file (use read_cache = FALSE to disable)... retrieved 4497 records
#> Info: adding uniprot information to mass spec data, joining by 'uniprot_id'...266 records joined successfully, 4 could not be matched with uniprot records
#> Info: reading protein counts data from 'psms.csv'... read 913 records
#> Info: protein counts added and weighted rates calculated for 270/270 datasets

# then continue with plotting and analysis
data_one_pipe %>% tor_plot_label_rate_error() %>% ggplotly()
0.0250.0500.0750.100012345
0.031label rate [1/time]residual standard error of fit [% labelled]growth rate

Analysis

Example: looking at the degradation data

To take a quick first look at the important proteins contributing to protein turnover, it is helpful to list the proteins with the highest weighted degradation rates / dissipation in the experiment. For a list of the most turned over proteins (regardless of pool size), one would look at the hightest dissipation instead.

# get the total degradation and dissipation per generation
data_w_counts %>% 
  summarize(
    total_deg_rate = sum(deg_rate_weighted, na.rm = TRUE),
    total_dissipation = sum(dissipation_weighted, na.rm = TRUE)
  ) %>% 
  rmarkdown::paged_table()
ABCDEFGHIJ0123456789
total_deg_rate
<dbl>
total_dissipation
<dbl>
0.00665629413.66043
# list top proteins by dissipation
data_w_counts %>% 
  arrange(desc(dissipation)) %>% 
  select(prot_name, everything()) %>% 
  head(20)
ABCDEFGHIJ0123456789
prot_name
<chr>
prot_id
<chr>
prot_counts
<int>
uniprot_id
<chr>
gene
<chr>
prot_mw
<int>
label_rate
<dbl>
3-isopropylmalate dehydratase small subunitLEUD27P30126leuD224870.10319285
Cysteine desulfurase IscSISCS19P0A6B7iscS450900.08072142
Malate synthase AMASY150P08997aceB602740.07228791
Cold shock-like protein CspCCSPC23P0A9Y6cspC74020.07100145
Alkyl hydroperoxide reductase CAHPC56P0AE08ahpC207610.06628085
5-methyltetrahydropteroyltriglutamate--homocysteine methyltransferaseMETE39P25665metE846740.06510413
Glycerol uptake facilitator proteinGLPF5P0AER0glpF297800.06277233
D-tagatose-1,6-bisphosphate aldolase subunit GatYGATY58P0C8J6gatY308120.06090431
Glycerol kinaseGLPK396P0A6F3glpK562310.05673615
7-alpha-hydroxysteroid dehydrogenaseHDHA9P0AET8hdhA267790.05531149
# list top proteins by relative mass
data_w_counts %>% 
  arrange(desc(prot_rel_mass)) %>% 
  select(prot_name, everything()) %>% 
  head(20)
ABCDEFGHIJ0123456789
prot_name
<chr>
prot_id
<chr>
prot_counts
<int>
uniprot_id
<chr>
gene
<chr>
prot_mw
<int>
label_rate
<dbl>
label_rate_se
<dbl>
deg_rate
<dbl>
Elongation factor Tu 1EFTU1644P0CE47tufA432840.033181190.00054136712.322514e-03
Glycerol kinaseGLPK396P0A6F3glpK562310.056736150.00029708892.587747e-02
Isocitrate lyaseACEA416P0A9G6aceA475220.051771790.00035427892.091311e-02
DNA-directed RNA polymerase subunit beta'RPOC103P0A8T7rpoC1551600.040418130.00081312389.559453e-03
DNA-directed RNA polymerase subunit betaRPOB103P0A8V2rpoB1506320.036170740.00110421765.312064e-03
60 kDa chaperoninCH60236P0A6F5groL573290.029000250.0004872467-1.858430e-03
Outer membrane protein COMPC311P06996ompC403680.032462400.00039364641.603725e-03
Chaperone protein DnaKDNAK173P0A6Y8dnaK691150.033997030.00041986113.138349e-03
Aldehyde-alcohol dehydrogenaseADHE122P0A9Q7adhE961270.032465720.00045722941.607040e-03
Elongation factor GEFG141P0A6M8fusA775810.030457490.0005427855-4.011831e-04
# list top proteins by weighted degradation
# list top proteins by dissipation
data_w_counts %>% 
  arrange(desc(dissipation_weighted)) %>% 
  select(prot_name, everything()) %>% 
  head(20)
ABCDEFGHIJ0123456789
prot_name
<chr>
prot_id
<chr>
prot_counts
<int>
uniprot_id
<chr>
gene
<chr>
prot_mw
<int>
label_rate
<dbl>
Glycerol kinaseGLPK396P0A6F3glpK562310.05673615
Isocitrate lyaseACEA416P0A9G6aceA475220.05177179
Malate synthase AMASY150P08997aceB602740.07228791
DNA-directed RNA polymerase subunit beta'RPOC103P0A8T7rpoC1551600.04041813
Acetyl-coenzyme A synthetaseACSA140P27550acs720940.04701843
Lactaldehyde dehydrogenaseALDA208P25553aldA522730.03982997
DNA-directed RNA polymerase subunit betaRPOB103P0A8V2rpoB1506320.03617074
Catalase-peroxidaseKATG120P13029katG800240.04001764
Elongation factor Tu 1EFTU1644P0CE47tufA432840.03318119
5-methyltetrahydropteroyltriglutamate--homocysteine methyltransferaseMETE39P25665metE846740.06510413