turnoveR: Data Processing

Introduction

The turnoveR package has been developed to analyse isotopically labeled proteomics datasets with the goal of quantifying and visualizing protein turnover and protein degradation in an accessible, efficient and reproducible manner, and to easily contextualize the data with direct links to the uniprot and KEGG online databases. To install the package, see the Installation Instructions.

Experimental datasets intended for use with this package are generated from a series of samples taken from a steady-state bacterial culture. Samples are taken at different timepoints following an isotopic label “pulse”.

Functions in the turnoveR package accept mass spectrometry datasets that have been pre-processed to estimate areas of the “heavy” and “light” isotopic labeled fractions for each protein (by programs such as Massacre).

This vignette illustrates the use of the turnoveR pakage for the processing of an example dataset. The following flow chart illustrates the overall structure and work flow of the whole package. For more details and a higher resolution version, see the Package Structure Vignette.

Input Files

`protein_sums.csv` or `psms.csv`:

a file containing a sum of how many times each protein was identified in each sample of a dataset
Preferred file type: .csv
Important column names:

`svm_results.csv`:

typically a massacre output file curated to remove poorly fitted curves
Preferred file type: .csv
Important column names:

`metadata.xlsx`:

an excel sheet with sample timepoints, other sampling information
Preferred file type: .xlsx
Important column names:

Getting started

Load the turnoveR package to get access to all functions. The exported functions of the package all start with the prefix tor_ to make easy for auto-completion. To get a list of all available turnoveR functions, simply start typing tor_ in the RStudio console after the package is loaded. Additional packages that are helpful for data processing is the core set of packages loaded by the tidyverse (e.g. dplyr, tidyr, ggplot2) and the non-core readxl for reading Excel files. Lastly, the plotly library makes it easy to make plots interactive which can be very useful for data exploration.

library(turnoveR)
library(tidyverse)
library(readxl)
library(plotly)

# data base path
path <- file.path("vignettes", "vignette_data")

Reading in SVM files

One way to get started is to load in an SVM data files. This pre-processed proteomics data (supported vector machine - SVM - evaluated spectral output from Massacre) is read in through the function tor_read_svm_data_file. Note that some of the column names are standardized upon data loading for compatibility with downstream processing. Also note that most functions provide a quick summary message of what is happening. If descired, this can be turned off by specifying quiet = TRUE in each function call.

svm_data <- 
  # read svm data file
  tor_read_svm_data_file(
    filepath = file.path(path, "svm_pred_results_0.03gr.csv"))

#> Info: successfully read 27220 records from SVM file 'svm_pred_results_0.03gr.csv'

# show first 100 records
svm_data %>% head(100)

ABCDEFGHIJ0123456789

prot_id <chr>	uniprot_id <chr>	peptide_seq <chr>	peptide_mz <dbl>	svm_pred <dbl>	amp_ulab <dbl>	amp_lab <dbl>	sample <chr>
6PGD	P00350	VLSGPQAQPAGDK/2	634.3362	0.789403560	0.0049257	2.1293e-03	1tp
6PGD	P00350	ELSAEGFNFIGTGVSGGEEGALKGPSIMPGGQK/3	1074.5315	0.012068209	0.0050972	5.9419e-03	1tp
6PGD	P00350	AASEEYNWDLNYGEIAK/2	986.9503	0.259486526	0.0234190	8.1968e-03	1tp
6PGD	P00350	LVPYYTVK/2	491.7846	0.899442587	0.0274920	1.1085e-02	1tp
6PGD	P00350	VLSGPQAQPAGDKAEFIEK/3	662.3500	0.032089354	0.0388970	1.7490e-02	1tp
6PGD	P00350	GDIIIDGGNTFFQDTIR/2	941.4712	0.113467789	0.0127670	1.9549e-02	1tp
6PGD	P00350	IAAVAEDGEPCVTYIGADGAGHYVK/3	855.0765	0.305284766	0.0851860	2.9411e-02	1tp
6PGD	P00350	GYTVSIFNR/2	528.7778	0.072644688	0.0263650	3.1067e-02	1tp
6PGD	P00350	DVVAYAVQNGIPVPTFSAAVAYYDSYR/3	979.4876	0.059128215	0.0114180	3.1177e-02	1tp
6PGD	P00350	ITDAYAENPQIANLLLAPYFK/3	789.0867	0.229015694	0.0808830	3.3500e-02	1tp

Reading in Massacre output

Alternatively, massacre output files can be read and evaluated directly (see flow chart for details). Functionality and description coming soon…

Data Preparation

Spectral Quality filtering

Typically the first step is to remove data with poor quality spectral fits. In the case of SVM pre-processed data, the quality probability column is svm_pred and should be filtered with a sensible cut-off (here we use 75% from the SVM estimate of being a good fit).

data_filtered <- 
  svm_data %>% 
  # spectral quality filtering
  tor_filter_peptides_by_spectral_fit_quality(svm_pred > 0.75)

#> Info: kept 6961 of 27220 (25.6%) peptide measurements during spectral fit quality filtering (condition 'svm_pred > 0.75')

Recoding protein IDs

Sometimes not all protein IDs are up to date and need to be recoded to make sure they can matched to database records.

data_recoded <-
  data_filtered %>% 
  tor_recode_protein_ids(file.path(path, "rename_prot.xlsx"))

#> Info: renamed 5 protein entries for 1 different proteins.

Adding metadata

Often it is necessary to add additional metadata to the data set (e.g. the times of the individual samples for later time course fitting). If this is the case, the metadata should include information on sampling timepoints, and is accepted in an excel format.

# read metadata
metadata <- read_excel(file.path(path, "metadata_CMW.xlsx"))

# show metadata
metadata

ABCDEFGHIJ0123456789

sample <chr>	datetime <S3: POSIXct>	hours <dbl>
t0a	2018-01-11 12:05:00	0.000000
t0b	2018-01-11 16:42:00	0.000000
t0c	2018-01-11 22:09:00	0.000000
1tp	2018-01-12 05:58:00	7.816667
2tp	2018-01-12 08:58:00	10.816667
3tp	2018-01-12 11:56:00	13.783333
4tp	2018-01-12 14:45:00	16.600000
5tp	2018-01-12 17:29:00	19.333333

# add to data
data_w_metadata <-
  data_recoded %>% 
  tor_add_metadata(metadata, join_by = "sample")

#> Info: adding metadata to mass spec data, joining by 'sample'...8 metadata entries successfully added to 6961 data recors, 0 could not be matched to metadata

Calculations

Labeling Rate, Degradation and Dissipation

The calculation of the labelling rate is in two steps to more easily export/examine the data in between. First is the calculation of the labeling fractions from the light an heavy signals using tor_calculate_labeled_fraction. This function returns the fraction of labeled and unlabeled peptides for each sample.

With the labeled and unlabeled fractions known for each peptide at each timepoint, it is possible to fit an exponential to either individual peptides or all peptides in a protein to calculate a labeling rate. The fuction tor_calculate_label_rate thus uses the calculated labeled fractions to fit a curve describing isoptope labeling over time. Peptides can be combined to return a labeling rate for the entire protein by choosing the option combine_peptides = TRUE in the function arguments.

The calculated labeling rates can then be used to calculate degradation rates and dissipation rates using the function tor_calculate_degradation_dissipation. Protein degradation is calculated as the labeling rate minus the growth rate (deg_rate = label_rate - growth_rate). Protein dissipation is calculated as the degradation rate divided by the labeling rate, and is displayed as a percent (dissipation = (deg_rate / label_rate)*100).

For this last step, it is critical to know the growth rate of the experiment. The growth rate can be calculated using the culture volume and media flow rate and turnoveR provides the tor_calculate_growth_params function to simplify this step.

# calculate growth parameters
growth_params <- 
  tor_calculate_growth_params(
    flow_rate = 0.46 * 60, # [g/hour]
    flow_rate_se = 0.02 * 60, # estimated error [g/hour]
    volume = 894.4, # [g]
    volume_se = 20 # estimated error [g]
  )
growth_params %>% rmarkdown::paged_table()

ABCDEFGHIJ0123456789

growth_rate <dbl>	growth_rate_se <dbl>	gen_time <dbl>	gen_time_se <dbl>
0.03085868	0.00150873	22.46199	1.098203

# data calculations
data_w_calcs <- 
  data_w_metadata %>% 
  # calculate labeled fraction
  tor_calculate_labeled_fraction() %>% 
  # calculate the label rate
  tor_calculate_label_rate(
    time_col = "hours", 
    min_num_timepoints = 3,
    combine_peptides = TRUE) %>% 
  # degradation rate and dissipation
  tor_calculate_degradation_dissipation(
    growth_rate = growth_params$growth_rate,
    growth_rate_se = growth_params$growth_rate_se)

#> Info: calculated labeled/unlabeled fraction for 6961 peptides

#> Info: processing data for 421 proteins, this may take a few seconds... 336 of the proteins could be fit to a labeling curve, 85 did not have enough time points
#> Info: calculated degradation rate and dissipation for 336 records

# show results (remove complex data column for printout)
data_w_calcs

ABCDEFGHIJ0123456789

prot_id <chr>	uniprot_id <chr>	nested_data <list>	num_peptides <int>	num_timepoints <int>	num_datapoints <int>	enough_data <lgl>	fit_error <lgl>	label_rate <dbl>	label_rate_se <dbl>
6PGD	P00350	<tibble>	5	5	17	TRUE	FALSE	0.040990252	0.0013781929
6PGL	P52697	<tibble>	2	5	9	TRUE	FALSE	0.034791054	0.0002620142
AAT	P00509	<tibble>	7	5	24	TRUE	FALSE	0.030560276	0.0036753045
ACCA	P0ABD5	<tibble>	3	3	5	TRUE	FALSE	0.034858258	0.0017393424
ACCC	P24182	<tibble>	1	4	4	TRUE	FALSE	0.033293077	0.0015632613
ACEA	P0A9G6	<tibble>	25	5	99	TRUE	FALSE	0.051771787	0.0003542789
ACKA	P0A6A3	<tibble>	2	3	5	TRUE	FALSE	0.034982259	0.0019357015
ACON1	P25516	<tibble>	2	4	5	TRUE	FALSE	0.039041004	0.0029873404
ACON2	P36683	<tibble>	18	5	59	TRUE	FALSE	0.051328967	0.0017682078
ACP	P0A6A8	<tibble>	3	5	13	TRUE	FALSE	0.032490874	0.0004352906

Visualization

The data quality can be visualized using a few basic graphing functions. To see a histogram plotting the labeling rates of all the proteins in an experiment (or in multiple data sets combined), use tor_plot_label_rate_hist. Similar plots could be generated for deg_rate and dissipation.

data_w_calcs %>% tor_plot_label_rate_hist()

The function tor_plot_label_rate_error displays the residual standard error for each calculated label rate to easily evalute what quality cutoffs might be useful.

data_w_calcs %>% tor_plot_label_rate_error()

To visualize what the labeling curves and least squares fit actually look like, use tor_plot_labeling_curves to see the labeling curves for a randomly selected protein or proteins. The argument plot_number allows the user to select the desired number of curves to display.

data_w_calcs %>% tor_plot_labeling_curves(plot_number = 4, random_seed = 123)

Data Quality

The output from the calculation function should be filtered to remove curves with missing fits (not enough data) and poor curve fits. It is always helpful to also look at the data that is getting discarded to see if something is amiss. Any filtered dataset can easily be passed to any of the plotting functions (e.g. tor_plot_labeling_curves) for a closer look.

Not enough data

Let’s take a look at some of the proteins that did not have enough data for a fit (which we defined earlier to be at least 3 separate time points) .

# show all records that don't have enough data
data_w_calcs %>% 
  tor_filter_label_rate_fits(!enough_data)

#> Info: fetching 85 of 421 entries based on filter condition '!enough_data'

ABCDEFGHIJ0123456789

prot_id <chr>	uniprot_id <chr>	nested_data <list>	num_peptides <int>	num_timepoints <int>	num_datapoints <int>	enough_data <lgl>	fit_error <lgl>	label_rate <dbl>	label_rate_se <dbl>
ABDH	P77674	<tibble>	1	1	1	FALSE	FALSE	NA	NA
APT	P69503	<tibble>	1	1	1	FALSE	FALSE	NA	NA
ASSY	P0A6E4	<tibble>	1	2	2	FALSE	FALSE	NA	NA
BTUE	P06610	<tibble>	1	2	2	FALSE	FALSE	NA	NA
CPDA	P0AEW4	<tibble>	1	2	2	FALSE	FALSE	NA	NA
CUEO	P36649	<tibble>	1	1	1	FALSE	FALSE	NA	NA
CURA	P76113	<tibble>	1	1	1	FALSE	FALSE	NA	NA
CYDA	P0ABJ9	<tibble>	1	1	1	FALSE	FALSE	NA	NA
CYOA	P0ABJ1	<tibble>	2	2	2	FALSE	FALSE	NA	NA
DAPD	P0A9D8	<tibble>	1	1	1	FALSE	FALSE	NA	NA

# visualize some of these problematic records
data_w_calcs %>% 
  tor_filter_label_rate_fits(!enough_data) %>% 
  tor_plot_labeling_curves(random_seed = 123)

#> Info: fetching 85 of 421 entries based on filter condition '!enough_data'

Low quality fits

Let’s take a closer look at the label rate estimates of low quality fits and use plotly to make the plot interactive. Mouse over individual data points to see the protein ID.

data_w_calcs %>% 
  tor_filter_label_rate_fits(fit_rse > 0.05) %>% 
  tor_plot_label_rate_error() %>% 
  ggplotly()

#> Info: fetching 66 of 421 entries based on filter condition 'fit_rse > 0.05'

This could likewise be used with tor_plot_labeling_curves, which reveals a little more detail about wy the quality is bad (and potentially allows reconsidering the exclusion or closer examination of some peptides that behave clearly differently from the rest of the protein). Use the mouseover to see the peptide sequences for each data point.

data_w_calcs %>% 
  tor_filter_label_rate_fits(fit_rse > 0.05) %>% 
  tor_plot_labeling_curves(random_seed = 11) %>% 
  ggplotly()

#> Info: fetching 66 of 421 entries based on filter condition 'fit_rse > 0.05'

High quality fits

Lastly, it is useful to continue with just the high quality fits to analyze the most robust part of the data set. Here we focus on everything that had enough_data AND the residual standard error of the fit is smaller than 5% (here in two separate statements for clarity but could be combined into one). Also, focusing just on the key data columns going forward.

data_hq <- 
  data_w_calcs %>% 
  tor_filter_label_rate_fits(enough_data) %>% 
  tor_filter_label_rate_fits(
    fit_rse <= 0.05,
    select = c(matches("prot"), matches("rate"), matches("dissipation"))
  )

#> Info: fetching 336 of 421 entries based on filter condition 'enough_data'
#> Info: fetching 270 of 336 entries based on filter condition 'fit_rse <= 0.05', keeping 10 of 18 columns.

Adding Information

Most steps in the information section can be performed in arbitrary order, however, it is generally advisable to follow the flow chart to make sure the few information that does build on one another is available when each function is called (if anything is missing, turnoveR will complain and point to the missing information).

Adding uniprot information

In order to get the most up-to-date protein information, all proteins for the experimental organisms (here E. coli) is queried directly from the uniprot online database and matched to the mass spec data. This adds useful information including the recommended gene and protein names and the molecular weight of the protein. If the organism uniprot ID is not known yet, the tor_fetch_uniprot_species function can be helpful as shown here.

# look for taxon ID for K12 strain
uniprot_species <- tor_fetch_uniprot_species("strain K12")

#> Info: querying uniprot database for taxa with 'strain K12' in the name... retrieved 5 records

uniprot_species

ABCDEFGHIJ0123456789

taxon_id <int>	taxon_name <chr>
75379	Thiomonas intermedia (strain K12)
316385	Escherichia coli (strain K12 / DH10B)
316407	Escherichia coli (strain K12 / W3110 / ATCC 27325 / DSM 5911)
595496	Escherichia coli (strain K12 / MC4100 / BW2952)
83333	Escherichia coli (strain K12)

# retrieve uniprot info for the K12 strain
uniprot_data <- tor_fetch_uniprot_proteins(taxon = 83333)

#> Info: reading uniprot proteins of taxon 83333 from cached file (use read_cache = FALSE to disable)... retrieved 4497 records

uniprot_data %>% head(20)

ABCDEFGHIJ0123456789

prot_id <chr>	uniprot_id <chr>	gene <chr>	prot_name <chr>	prot_mw <int>
ACPH_ECOLI	P21515	acpH	Acyl carrier protein phosphodiesterase	22961
ACRD_ECOLI	P24177	acrD	Probable aminoglycoside efflux pump	113047
AIS_ECOLI	P45565	ais	Lipopolysaccharide core heptose(II)-phosphate phosphatase	22257
AHR_ECOLI	P27250	ahr	Aldehyde reductase Ahr	36502
ALSA_ECOLI	P32721	alsA	D-allose import ATP-binding protein AlsA	56745
AQPZ_ECOLI	P60844	aqpZ	Aquaporin Z	23703
AVTA_ECOLI	P09053	avtA	Valine--pyruvate aminotransferase	46711
BGLR_ECOLI	P05804	uidA	Beta-glucuronidase	68447
BGLH_ECOLI	P26218	bglH	Cryptic outer membrane porin BglH	60657
CCMC_ECOLI	P0ABM1	ccmC	Heme exporter protein C	27885

# add the uniprot information
data_w_uniprot <-
  data_hq %>% 
  tor_add_uniprot_info(uniprot_data)

#> Info: adding uniprot information to mass spec data, joining by 'uniprot_id'...266 records joined successfully, 4 could not be matched with uniprot records

# check on the ones with missing uniprot info
data_w_uniprot %>% filter(missing_uniprot)

ABCDEFGHIJ0123456789

uniprot_id <chr>	gene <chr>	prot_name <chr>	prot_mw <int>	prot_id <chr>	label_rate <dbl>	label_rate_se <dbl>	deg_rate <dbl>	deg_rate_se <dbl>	growth_rate <dbl>
NA	NA	NA	NA	DCEA:DCEB	0.02045048	0.0006819394	-0.010408192	0.001655690	0.03085868
NA	NA	NA	NA	LIVK:LIVJ	0.02848902	0.0002523579	-0.002369660	0.001529690	0.03085868
NA	NA	NA	NA	TALA	0.03279253	0.0013021334	0.001933854	0.001992942	0.03085868
NA	NA	NA	NA	TKT1:TKT2	0.05116057	0.0018273766	0.020301891	0.002369720	0.03085868

Adding protein counts information

The protein_sums.csv or psms.csv file should contain the protein IDs and corresponding counts for all proteins identified in each sample and is read by tor_read_protein_counts_data in csv format and then added to the data set using tor_add_protein_counts_info, which also calculates the relative protein counts, relative protein mass (based on the molecular weight of each protein retrieved from uniprot in the previous step). The relative protein abundance information can be combined with the dissipation and degradation values to calculate a weighted degradation and dissipation rate for each protein. This weighted calculation uses the relative mass of each protein identified in the samples to better describe overall cellular investment in protein turnover for each protein.

# read protein count info
protein_count <- tor_read_protein_counts_data(file.path(path, "psms.csv"))

#> Info: reading protein counts data from 'psms.csv'... read 913 records

# add to dataset
data_w_counts <- data_w_uniprot %>% 
  tor_add_protein_counts_info(protein_count)

#> Info: protein counts added and weighted rates calculated for 270/270 datasets

# look at some of the data
data_w_counts %>% 
  select(gene, prot_name, prot_rel_mass, starts_with("deg"), starts_with("diss")) %>% 
  head(10) %>% 
  rmarkdown::paged_table()

ABCDEFGHIJ0123456789

gene <chr>	prot_name <chr>	prot_rel_mass <dbl>	deg_rate <dbl>	deg_rate_se <dbl>	deg_rate_weighted <dbl>	dissipation <dbl>	dissipation_se <dbl>
tufA	Elongation factor Tu 1	0.04997249	2.322514e-03	0.001602918	1.160618e-04	6.9994911	4.832152
aceA	Isocitrate lyase	0.03544098	2.091311e-02	0.001549768	7.411812e-04	40.3948017	3.006196
glpK	Glycerol kinase	0.03991983	2.587747e-02	0.001537703	1.033024e-03	45.6102013	2.720772
ompC	Outer membrane protein C	0.02250688	1.603725e-03	0.001559239	3.609484e-05	4.9402532	4.803587
ompA	Outer membrane protein A	0.01747325	-5.744113e-03	0.001552020	-1.003683e-04	-22.8716442	6.188645
groL	60 kDa chaperonin	0.02425516	-1.858430e-03	0.001585458	-4.507651e-05	-6.4083253	5.468110
ompF	Outer membrane protein F	0.01614768	4.399747e-05	0.001574216	7.104573e-07	0.1423743	5.094109
aldA	Lactaldehyde dehydrogenase	0.01949209	8.971291e-03	0.001553101	1.748692e-04	22.5239729	3.904894
dnaK	Chaperone protein DnaK	0.02143562	3.138349e-03	0.001566062	6.727248e-05	9.2312471	4.607879
eno	Enolase	0.01309562	2.577444e-03	0.001542091	3.375322e-05	7.7085611	4.612637

Adding KEGG pathway info

Coming soon…

1-step processing

The entire set of operations listed section-by-section above can also be easily performed in a single step piping (%>%) from one operation to the next.

# data base path
path <- file.path("vignettes", "vignette_data")

# growth params
growth_params <- 
  tor_calculate_growth_params(
    flow_rate = 0.46 * 60, # [g/hour]
    flow_rate_se = 0.02 * 60, # estimated error [g/hour]
    volume = 894.4, # [g]
    volume_se = 20 # estimated error [g]
  )

# data processing in one pipeline
data_one_pipe <-
  # read SVM file
  tor_read_svm_data_file(
    filepath = file.path(path, "svm_pred_results_0.03gr.csv")
  ) %>% 
  # spectral quality filtering
  tor_filter_peptides_by_spectral_fit_quality(svm_pred > 0.75) %>% 
  # recode protein ids
  tor_recode_protein_ids(file.path(path, "rename_prot.xlsx")) %>% 
  # add metadata
  tor_add_metadata(
    read_excel(file.path(path, "metadata_CMW.xlsx")), 
    join_by = "sample"
  ) %>% 
  # calculate labeled fraction
  tor_calculate_labeled_fraction() %>% 
  # calculate the label rate
  tor_calculate_label_rate(
    time_col = "hours", 
    min_num_timepoints = 3,
    combine_peptides = TRUE
  ) %>% 
  # degradation rate and dissipation
  tor_calculate_degradation_dissipation(
    growth_rate = growth_params$growth_rate,
    growth_rate_se = growth_params$growth_rate_se
  ) %>% 
  # focus on high quality fits
  tor_filter_label_rate_fits(enough_data & fit_rse <= 0.05) %>% 
  # add uniprot data
  tor_add_uniprot_info(
    tor_fetch_uniprot_proteins(taxon = 83333)
  ) %>% 
  # add protein count info
  tor_add_protein_counts_info(
    tor_read_protein_counts_data(file.path(path, "psms.csv"))
  )
#> Info: successfully read 27220 records from SVM file 'svm_pred_results_0.03gr.csv'
#> Info: kept 6961 of 27220 (25.6%) peptide measurements during spectral fit quality filtering (condition 'svm_pred > 0.75')
#> Info: renamed 5 protein entries for 1 different proteins.
#> Info: adding metadata to mass spec data, joining by 'sample'...8 metadata entries successfully added to 6961 data recors, 0 could not be matched to metadata
#> Info: calculated labeled/unlabeled fraction for 6961 peptides
#> Info: processing data for 421 proteins, this may take a few seconds... 336 of the proteins could be fit to a labeling curve, 85 did not have enough time points
#> Info: calculated degradation rate and dissipation for 336 records
#> Info: fetching 270 of 421 entries based on filter condition 'enough_data & fit_rse <= 0.05'
#> Info: reading uniprot proteins of taxon 83333 from cached file (use read_cache = FALSE to disable)... retrieved 4497 records
#> Info: adding uniprot information to mass spec data, joining by 'uniprot_id'...266 records joined successfully, 4 could not be matched with uniprot records
#> Info: reading protein counts data from 'psms.csv'... read 913 records
#> Info: protein counts added and weighted rates calculated for 270/270 datasets

# then continue with plotting and analysis
data_one_pipe %>% tor_plot_label_rate_error() %>% ggplotly()

Analysis

Example: looking at the degradation data

To take a quick first look at the important proteins contributing to protein turnover, it is helpful to list the proteins with the highest weighted degradation rates / dissipation in the experiment. For a list of the most turned over proteins (regardless of pool size), one would look at the hightest dissipation instead.

# get the total degradation and dissipation per generation
data_w_counts %>% 
  summarize(
    total_deg_rate = sum(deg_rate_weighted, na.rm = TRUE),
    total_dissipation = sum(dissipation_weighted, na.rm = TRUE)
  ) %>% 
  rmarkdown::paged_table()

ABCDEFGHIJ0123456789

total_deg_rate <dbl>	total_dissipation <dbl>
0.006656294	13.66043

# list top proteins by dissipation
data_w_counts %>% 
  arrange(desc(dissipation)) %>% 
  select(prot_name, everything()) %>% 
  head(20)

ABCDEFGHIJ0123456789

prot_name <chr>	prot_id <chr>	prot_counts <int>	uniprot_id <chr>	gene <chr>	prot_mw <int>	label_rate <dbl>
3-isopropylmalate dehydratase small subunit	LEUD	27	P30126	leuD	22487	0.10319285
Cysteine desulfurase IscS	ISCS	19	P0A6B7	iscS	45090	0.08072142
Malate synthase A	MASY	150	P08997	aceB	60274	0.07228791
Cold shock-like protein CspC	CSPC	23	P0A9Y6	cspC	7402	0.07100145
Alkyl hydroperoxide reductase C	AHPC	56	P0AE08	ahpC	20761	0.06628085
5-methyltetrahydropteroyltriglutamate--homocysteine methyltransferase	METE	39	P25665	metE	84674	0.06510413
Glycerol uptake facilitator protein	GLPF	5	P0AER0	glpF	29780	0.06277233
D-tagatose-1,6-bisphosphate aldolase subunit GatY	GATY	58	P0C8J6	gatY	30812	0.06090431
Glycerol kinase	GLPK	396	P0A6F3	glpK	56231	0.05673615
7-alpha-hydroxysteroid dehydrogenase	HDHA	9	P0AET8	hdhA	26779	0.05531149

# list top proteins by relative mass
data_w_counts %>% 
  arrange(desc(prot_rel_mass)) %>% 
  select(prot_name, everything()) %>% 
  head(20)

ABCDEFGHIJ0123456789

prot_name <chr>	prot_id <chr>	prot_counts <int>	uniprot_id <chr>	gene <chr>	prot_mw <int>	label_rate <dbl>	label_rate_se <dbl>	deg_rate <dbl>
Elongation factor Tu 1	EFTU1	644	P0CE47	tufA	43284	0.03318119	0.0005413671	2.322514e-03
Glycerol kinase	GLPK	396	P0A6F3	glpK	56231	0.05673615	0.0002970889	2.587747e-02
Isocitrate lyase	ACEA	416	P0A9G6	aceA	47522	0.05177179	0.0003542789	2.091311e-02
DNA-directed RNA polymerase subunit beta'	RPOC	103	P0A8T7	rpoC	155160	0.04041813	0.0008131238	9.559453e-03
DNA-directed RNA polymerase subunit beta	RPOB	103	P0A8V2	rpoB	150632	0.03617074	0.0011042176	5.312064e-03
60 kDa chaperonin	CH60	236	P0A6F5	groL	57329	0.02900025	0.0004872467	-1.858430e-03
Outer membrane protein C	OMPC	311	P06996	ompC	40368	0.03246240	0.0003936464	1.603725e-03
Chaperone protein DnaK	DNAK	173	P0A6Y8	dnaK	69115	0.03399703	0.0004198611	3.138349e-03
Aldehyde-alcohol dehydrogenase	ADHE	122	P0A9Q7	adhE	96127	0.03246572	0.0004572294	1.607040e-03
Elongation factor G	EFG	141	P0A6M8	fusA	77581	0.03045749	0.0005427855	-4.011831e-04

# list top proteins by weighted degradation
# list top proteins by dissipation
data_w_counts %>% 
  arrange(desc(dissipation_weighted)) %>% 
  select(prot_name, everything()) %>% 
  head(20)

ABCDEFGHIJ0123456789

prot_name <chr>	prot_id <chr>	prot_counts <int>	uniprot_id <chr>	gene <chr>	prot_mw <int>	label_rate <dbl>
Glycerol kinase	GLPK	396	P0A6F3	glpK	56231	0.05673615
Isocitrate lyase	ACEA	416	P0A9G6	aceA	47522	0.05177179
Malate synthase A	MASY	150	P08997	aceB	60274	0.07228791
DNA-directed RNA polymerase subunit beta'	RPOC	103	P0A8T7	rpoC	155160	0.04041813
Acetyl-coenzyme A synthetase	ACSA	140	P27550	acs	72094	0.04701843
Lactaldehyde dehydrogenase	ALDA	208	P25553	aldA	52273	0.03982997
DNA-directed RNA polymerase subunit beta	RPOB	103	P0A8V2	rpoB	150632	0.03617074
Catalase-peroxidase	KATG	120	P13029	katG	80024	0.04001764
Elongation factor Tu 1	EFTU1	644	P0CE47	tufA	43284	0.03318119
5-methyltetrahydropteroyltriglutamate--homocysteine methyltransferase	METE	39	P25665	metE	84674	0.06510413

Introduction

Input Files

protein_sums.csv or psms.csv:

svm_results.csv:

metadata.xlsx:

Getting started

Reading in SVM files

Reading in Massacre output

Data Preparation

Spectral Quality filtering

Recoding protein IDs

Adding metadata

Calculations

Labeling Rate, Degradation and Dissipation

Visualization

Data Quality

Not enough data

Low quality fits

High quality fits

Adding Information

Adding uniprot information

Adding protein counts information

Adding KEGG pathway info

1-step processing

Analysis

Example: looking at the degradation data

Contents

`protein_sums.csv` or `psms.csv`:

`svm_results.csv`:

`metadata.xlsx`: