| Title: | Functionality for Characterizing Peaks in Mass Spectrometry in a Scan-Centric Manner |
|---|---|
| Description: | Provides a functions and classes for detecting, characterizing, and integrating peaks in a scan-centric manner from direct-injection mass spectrometry data. |
| Authors: | Robert M Flight [aut, cre] |
| Maintainer: | Robert M Flight <[email protected]> |
| License: | file LICENSE |
| Version: | 0.3.65 |
| Built: | 2026-05-15 09:26:04 UTC |
| Source: | https://github.com/MoseleyBioinformaticsLab/ScanCentricPeakCharacterization |
takes a list from xmlToList for "run" and looks at whether all scans are positive, negative, or mixed
.get_scan_polarity(spectrum_list).get_scan_polarity(spectrum_list)
spectrum_list |
the list of spectra |
removes a list entry called ".attrs" from a list, and makes them first level partners
.remove_attrs(in_list).remove_attrs(in_list)
in_list |
the list to work on |
transform to data frame
.to_data_frame(in_list).to_data_frame(in_list)
in_list |
the list of xml nodes to work on |
add scan level info
add_scan_info(mzml_data)add_scan_info(mzml_data)
mzml_data |
the MSnbase mzml data object |
returns a data.frame with:
scanIndex: the indices of the scans
scan: the number of the scan by number. This will be used to name scans.
polarity: +1 or -1 depending on if the scan is positive or negative
rtime: the retention time or injection time of the scan for for direct-injection data
tic: the total intensity of the scan
rtime_lag: how long between this scan and previous scan
rtime_lead: how long between this scan and next scan
After running predict_frequency(), the following fields are added
from the information returned from frequency conversion:
mad: mean absolute deviation of residuals
frequency model coefficients: the coefficients from the
fit frequency, named whatever you named them
mz model coefficients: similar, but for the m/z model
data.frame, see Details
given an output object, filename, and zip file, write the output object to the file, and then add to the zip file
add_to_zip(object, filename, zip_file)add_to_zip(object, filename, zip_file)
object |
the object to write |
filename |
the file that it should be |
zip_file |
the zip file to add to |
a directory created by tempdir is used to hold the file,
which is then added to the zip file.
calculate the area based on summing the points
area_sum_points(peak_mz, peak_intensity, zero_value = 0)area_sum_points(peak_mz, peak_intensity, zero_value = 0)
peak_mz |
the mz in the peak |
peak_intensity |
the peak intensities |
zero_value |
what value actually represents zero |
numeric
characterize peaks from points and picked peaks
characterize_peaks(peak_region, calculate_peak_area = FALSE)characterize_peaks(peak_region, calculate_peak_area = FALSE)
peak_region |
the PeakRegion object to work on |
list
check r2
check_frequency_r2(mz_frequency_list)check_frequency_r2(mz_frequency_list)
mz_frequency_list |
the list of predicted frequency data.frames |
Given M/Z point data in a data.frame, create IRanges based point "regions" of
width 1, using the frequency_multiplier argument to convert from the floating
point double to an integer.
check_ranges_convert_to_regions(frequency_list, frequency_multiplier = 400)check_ranges_convert_to_regions(frequency_list, frequency_multiplier = 400)
frequency_list |
a list of with a |
frequency_multiplier |
a value used to convert to integers. |
checks that the zip file has the basic contents it should have, and that files listed in the metadata actually exist.
check_zip_file(zip_dir)check_zip_file(zip_dir)
zip_dir |
the directory of the unzipped data |
default single model
choose_frequency_model_builtin(sc_mzml)choose_frequency_model_builtin(sc_mzml)
sc_mzml |
the sc_mzml object |
This is the default function to choose a single frequency and mz model. It takes the scan_info after filtering scans, and calculates the median of the square root terms, and chooses the one closest to the median value.
Please examine this function and write your own if needed. You can view the function definition using choose_frequency_model_builtin
SCmzml
Given a data.frame of m/z, generate frequency values for the data.
convert_mz_frequency(mz_data, keep_all = TRUE)convert_mz_frequency(mz_data, keep_all = TRUE)
mz_data |
a data.frame with |
keep_all |
keep all the variables generated, or just the original + frequency? |
The M/Z values from FTMS data do not have constant spacing between them. This produces challenges in working with ranged intervals and windows. The solution for FTMS data then is to convert them to frequency space. This is done by:
taking subsequent M/Z points
averaging their M/Z
taking the difference to get an offset value
dividing averaged M/Z by offset to generate frequency
taking subsequent differences of frequency points
keep points with a difference in the supplied range as valid for modeling
After deciding on the valid points for modeling, each point gets an interpolated frequency value using the two averaged points to the left and right in M/Z.
list
mz_scans_to_frequency
Given a corrected SD, corrects the mean assuming that it is the result of a truncated normal distribution.
correct_mean(observed_mean, corrected_sd, fraction)correct_mean(observed_mean, corrected_sd, fraction)
observed_mean |
the observed mean |
corrected_sd |
a corrected sd, generated by |
fraction |
the fraction of total observations |
corrected mean
https://en.wikipedia.org/wiki/Truncated_normal_distribution
correct_peak() correct_variance()
Assuming that an observed mean (intensity) and sd are from a truncated normal distribution that is truncated on one side only.
correct_peak(observed_mean, observed_sd, n_observed, n_should_observe)correct_peak(observed_mean, observed_sd, n_observed, n_should_observe)
observed_mean |
the observed mean |
observed_sd |
the observed sd |
n_observed |
how many observations went into this mean |
n_should_observe |
how many observations should there have been? |
data.frame, with corrected mean and sd
https://en.wikipedia.org/wiki/Truncated_normal_distribution
correct_mean() correct_variance()
correct peak height and sd
correct_peak_sd_height( original_height, list_of_heights, n_observed, n_should_observe )correct_peak_sd_height( original_height, list_of_heights, n_observed, n_should_observe )
original_height |
the original height estimate to correct |
list_of_heights |
the set of peak heights |
n_observed |
how many were observed |
n_should_observe |
how many should have been observed |
data.frame
Given a variance observed from a truncated normal distribution, correct it assuming that it should have had 100% observaionts
correct_variance(observed_variance, fraction)correct_variance(observed_variance, fraction)
observed_variance |
the observed variance |
fraction |
what fraction was it observed in |
corrected variance
https://en.wikipedia.org/wiki/Truncated_normal_distribution
Given a point-point spacing and a frequency range, create IRanges based regions
of specified width. Overlapping sliding regions can be creating by specifying
a region_size bigger than delta, adjacent tiled regions can be created
by specifying a region_size == delta.
create_frequency_regions( point_spacing = 0.5, frequency_range = NULL, n_point = 10, delta_point = 1, multiplier = 500 )create_frequency_regions( point_spacing = 0.5, frequency_range = NULL, n_point = 10, delta_point = 1, multiplier = 500 )
point_spacing |
how far away are subsequent points. |
frequency_range |
the range of frequency to use |
n_point |
how many points you want to cover |
delta_point |
the step size between the beginning of each subsequent region |
multiplier |
multiplier to convert from frequency to integer space |
For Fourier-transform mass spec, points are equally spaced in frequency space, which will lead to unequal spacing in M/Z space. Therefore, we create regions using the point-point differences in frequency space.
What will be returned is an IRanges object, where the widths are constantly
increasing over M/Z space.
IRanges
given a mantissa and exponent, returns the actual value as a numeric
create_value(mantissa, exponent)create_value(mantissa, exponent)
mantissa |
the base part of the number |
exponent |
the exponent part |
numeric
Given a MasterPeakList object and the MultiScansPeakList that generated it, correct the m/z values using offset predictions
default_correct_offset_function( master_peak_list, multi_scan_peaklist, min_scan = 0.1 )default_correct_offset_function( master_peak_list, multi_scan_peaklist, min_scan = 0.1 )
master_peak_list |
the MasterPeakList object of correspondent peaks |
multi_scan_peaklist |
the MultiScansPeakList to be corrected |
min_scan |
what is the minimum number of scans a peak should be in to be used for correction. |
list
The offset predictor using loess
default_offset_predict_function(model, x)default_offset_predict_function(model, x)
model |
the model to use |
x |
the new values |
numeric
There may be good reasons to turn the logging off after it's been turned on. This basically tells the package that the logger isn't available.
disable_logging()disable_logging()
Choose to enable logging, to a specific file if desired.
enable_logging(log_file = NULL, memory = FALSE)enable_logging(log_file = NULL, memory = FALSE)
log_file |
the file to log to |
memory |
provide memory logging too? Only available on Linux and MacOS |
Uses the logger package under the hood, which is suggested in the dependencies. Having logging enabled is nice to see when things are starting and stopping, and what exactly has been done, without needing to write messages to the console. It is especially useful if you are getting errors, but can't really see them, then you can add "memory" logging to see if you are running out of memory.
Default log file has the pattern:
YYYY.MM.DD.HH.MM.SS_ScanCentricPeakCharacterization_run.log
Often we want to transform a number into it's exponential representation, having the number itself and the number of decimal places. This function provides that functionality
extract(x)extract(x)
x |
the number to extract the parts from |
list
Given a Thermo ".raw" file, attempts to extract the "method" definition from a translated hexdump of the file.
extract_raw_method(in_file, output_type = "data.frame")extract_raw_method(in_file, output_type = "data.frame")
in_file |
The Thermo raw file to extract |
output_type |
string, data.frame or json |
string or data.frame
built in filter scan function
filter_scans_builtin(sc_mzml)filter_scans_builtin(sc_mzml)
sc_mzml |
the sc_mzml object |
This is the built in filtering and removing outliers function. It is based on the Moseley groups normal samples and experience. However, it does not reflect everyone's experience and needs. We expect that others have different use cases and needs, and therefore they should create their own function and use it appropriately.
Please examine this function and write your own as needed.
It must take an SCMzml object, work on the scan_info slot,
and then create a column with the name "keep" denoting which scans to keep.
To view the current definition, you can do filter_scans_builtin
SCmzml
Given some regions and point_regions, find the regions that actually should contain real data. See details for an explanation of what is considered real.
find_signal_regions( regions, point_regions_list, region_percentile = 0.99, multiplier = 1.5, n_point_region = 2000 )find_signal_regions( regions, point_regions_list, region_percentile = 0.99, multiplier = 1.5, n_point_region = 2000 )
regions |
the regions we want to query |
point_regions_list |
the individual points |
region_percentile |
the cumulative percentile cutoff to use |
multiplier |
how much above base quantiles to use (default = 1.5) |
n_point_region |
how many points make up a large segment to do percentile on? |
IRanges
Given a set of frequency points in a data.frame, create IRanges based point "regions"
of width 1, using the multiplier to convert from a floating point double
to an integer
frequency_points_to_frequency_regions( frequency_data, frequency_variable = "frequency", multiplier = 400 )frequency_points_to_frequency_regions( frequency_data, frequency_variable = "frequency", multiplier = 400 )
frequency_data |
a |
frequency_variable |
which column is the |
multiplier |
value used to convert to integers |
given the peak, returns the location and intensity
get_fitted_peak_info( possible_peak, use_loc = "mz", w = NULL, addend = 1e-08, calculate_peak_area = FALSE )get_fitted_peak_info( possible_peak, use_loc = "mz", w = NULL, addend = 1e-08, calculate_peak_area = FALSE )
possible_peak |
data.frame of mz, intensity and log intensity |
use_loc |
which field to use for locations, default is "mz" |
w |
the weights to use for the points |
addend |
how much was added to the peak intensity |
calculate_area |
should the area of the peak be calculated too? |
list
extract mzML header
get_mzml_header(mzml_file)get_mzml_header(mzml_file)
mzml_file |
the mzML file to get the header from |
get mzML metadata
get_mzml_metadata(mzml_file)get_mzml_metadata(mzml_file)
mzml_file |
the mzML file to get metadata from |
figures out which metadata function to run, and returns back the metadata generated by it.
get_raw_ms_metadata(in_file)get_raw_ms_metadata(in_file)
in_file |
the file to use |
list
import json from a file correctly given some things where things get written differently
import_json(json_file)import_json(json_file)
json_file |
the json file to read |
list
function to import mzml mass spec data in a way that provides what we need to work
with it. mzml_data should be the full path to the data.
import_sc_mzml(mzml_data, ms_level = 1)import_sc_mzml(mzml_data, ms_level = 1)
mzml_data |
the mzml mass spec file to import |
ms_level |
which MS-level data to import |
MSnbase
Given a directory of characterized samples, attempts to determine which peaks may be standards or contaminants that should be removed after assignment.
indicate_standards_contaminents( zip_dir, file_pattern = ".zip", blank_pattern = "^blank", save_dir = NULL, conversion_factor = 400, progress = TRUE )indicate_standards_contaminents( zip_dir, file_pattern = ".zip", blank_pattern = "^blank", save_dir = NULL, conversion_factor = 400, progress = TRUE )
zip_dir |
which directories to look for files within |
file_pattern |
what files are we actually using |
blank_pattern |
regex indicating that a sample may be a blank |
save_dir |
where to save the files (default is to overwrite originals) |
conversion_factor |
how much to multiply frequencies by |
progress |
should progress messages be displayed? |
For each sample, the scan level frequencies are read in and converted to ranges, and then compared with tiled ranges over the whole frequency range. For those ranges that have 90 to 110% of scan level peaks in ALL blanks, and have 10 to 110% of scan level peaks in at least N-sample - 1, we consider a possible standard or contaminant. The peak is marked so that it can be removed by filtering out it's assignments later.
NULL nothing is returned, files are overwritten
initialize metadata from mzML
initialize_metadata_from_mzml(zip_dir, mzml_file)initialize_metadata_from_mzml(zip_dir, mzml_file)
zip_dir |
the directory containing unzipped data |
mzml_file |
the mzML file to extract metadata from |
initialize metadata
initialize_zip_metadata(zip_dir)initialize_zip_metadata(zip_dir)
zip_dir |
the temp directory that represents the final zip |
provides the area integration for the peak that fits the parabolic model
integrate_model(model_mz, model_coeff, n_point = 100, log_transform = "log")integrate_model(model_mz, model_coeff, n_point = 100, log_transform = "log")
model_mz |
the mz values for the model peak |
model_coeff |
the model of the peak |
n_point |
how many points to use for integration |
log_transform |
what kind of transform was applied |
numeric
provides ability to calculate the area on the sides of a peak that are not caught by the parabolic model assuming a triangle to each side of the parabola
integrate_sides(peak_mz, peak_int, full_peak_loc, model_peak_loc)integrate_sides(peak_mz, peak_int, full_peak_loc, model_peak_loc)
peak_mz |
the mz in the peak |
peak_int |
the intensity in the peak |
full_peak_loc |
what defines all of the peak |
model_peak_loc |
what defined the peak fitting the parabolic model |
numeric
gives the area of the peak based on integrating the model bits and the sides
integration_based_area( mz_data, int_data, full_peak_loc, model_peak_loc, model_coeff, n_point = 100, log_transform = "log" )integration_based_area( mz_data, int_data, full_peak_loc, model_peak_loc, model_coeff, n_point = 100, log_transform = "log" )
mz_data |
peak mz values |
int_data |
peak intensity values |
full_peak_loc |
indices defining the full peak |
model_peak_loc |
indices defining the model peak |
model_coeff |
the model of the peak |
n_point |
number of points for integration of the model section |
log_transform |
which log transformation was used |
takes json representing a PeakList object, and generates the data.frame version
json_2_peak_list(json_string, in_var = "Peaks")json_2_peak_list(json_string, in_var = "Peaks")
json_string |
the json to convert |
in_var |
the top level variable containing the "Peaks" |
tbl_df
Given a json file or list of lists, return a data.frame with the most important bits of the data.
json_mzML_2_df(in_file)json_mzML_2_df(in_file)
in_file |
the file to read from |
data.frame
lists_2_json
lists_2_json( lists_to_save, zip_file = NULL, digits = 8, temp_dir = tempfile(pattern = "json") )lists_2_json( lists_to_save, zip_file = NULL, digits = 8, temp_dir = tempfile(pattern = "json") )
lists_to_save |
the set of lists to create the json from |
zip_file |
should the JSON files be zipped into a zip file? Provide the zip file name |
digits |
how many digits to use for the JSON representation |
temp_dir |
temp directory to write the JSON files to |
character
given a zip and a metadata file, load it and return it
load_metadata(zip_dir, metadata_file)load_metadata(zip_dir, metadata_file)
zip_dir |
the directory of the unzipped data |
metadata_file |
the metadata file |
list
Given a loess model, creates a data.frame suitable for plotting via ggplot2
loess_to_df(loess_model)loess_to_df(loess_model)
loess_model |
the model object generated by loess |
data.frame
Logs the amount of memory being used to a log file if it is available, and generating warnings if the amount of RAM hits zero.
log_memory()log_memory()
If a log_appender is available, logs the given message at the info level.
log_message(message_string)log_message(message_string)
message_string |
the string to put in the message |
performs a log-transform while adding a small value to the data based on finding the smallest non-zero value in the data
log_with_min(data, min_value = NULL, order_mag = 3, log_fun = log)log_with_min(data, min_value = NULL, order_mag = 3, log_fun = log)
data |
the data to work with |
min_value |
the minimum value |
order_mag |
how many orders of magnitute smaller should min value be? |
log_fun |
what log function to use for the transformation |
matrix
export the list metadata to a json string
meta_export_json(meta_list)meta_export_json(meta_list)
meta_list |
a list of metadata |
use the derivative of the parabolic equation to find the peak center, and then put the center into the equation to find the intensity at that point.
model_peak_center_intensity(x, coefficients)model_peak_center_intensity(x, coefficients)
x |
the x-values to use (non-centered) |
coefficients |
the model coefficients generated from centered model |
The coefficients are generated using the linear model:
.
The derivative of this is:
The peak of a parabola is defined where y is zero for the derivative.
We can use this to derive where the center of the peak is, and then put the center value back into the equation to get the intensity.
numeric
Given a query, and either two values of M/Z and two values of frequency or a previously generated model, return a data.frame with the predicted value, and the slope and the intercept so the model can be re-used later for other points when needed.
mz_frequency_interpolation( mz_query, mz_values = NULL, frequency_values = NULL, model = NULL )mz_frequency_interpolation( mz_query, mz_values = NULL, frequency_values = NULL, model = NULL )
mz_query |
the M/Z value to fit |
mz_values |
two M/Z values |
frequency_values |
two frequency values |
model |
a model to use instead of actual values |
data.frame with predicted_value, intercept, and slope
Given a multi-scan data.frame of m/z, generate frequency values for the data.
mz_scans_to_frequency( mz_df_list, frequency_fit_description, mz_fit_description, ... )mz_scans_to_frequency( mz_df_list, frequency_fit_description, mz_fit_description, ... )
mz_df_list |
a list of data.frame with at least |
frequency_fit_description |
the exponentials to use in fitting the frequency ~ mz model |
mz_fit_description |
the exponentials to use in fitting the mz ~ frequency model |
... |
other parameters for |
list
convert_mz_frequency
given an mzML file, create the initial zip file containing the zipped mzML, metadata.json, and mzml_metadata.json. This zip file is what will be operated on by anything that accesses files, so that our interface is consistent.
mzml_to_zip(mzml_file, out_file)mzml_to_zip(mzml_file, out_file)
mzml_file |
the mzML file to zip up |
out_file |
the directory to save the zip file |
calculates the coefficients of a parabolic fit (y = x + x^2) of x to y
parabolic_fit(x, y, w = NULL)parabolic_fit(x, y, w = NULL)
x |
the x-values, independent |
y |
the y-values, dependent |
w |
weights |
list
takes a PeakList object, and generates a json version
peak_list_2_json(peak_list)peak_list_2_json(peak_list)
peak_list |
a data.frame or tbl_df to convert |
json_string
calculate r2
predicted_frequency_r2(mz_frequency_df)predicted_frequency_r2(mz_frequency_df)
mz_frequency_df |
the data.frame with predicted frequencies |
When raw files are copied, we also generated metadata about their original locations and new locations, and some other useful info. We would like to capture it, and keep it along with the metadata from the mzml file. So, given a list of mzml files, and a location for the raw files, this function creates metadata json files for the mzml files.
raw_metadata_mzml(mzml_files, raw_file_loc, recursive = TRUE)raw_metadata_mzml(mzml_files, raw_file_loc, recursive = TRUE)
mzml_files |
the paths to the mzml files |
raw_file_loc |
the directory holding raw files and json metadata files |
recursive |
should we go recursively down the directories or not (default = TRUE) |
Given a previously generated zip file of characterized peaks, now we've realized that the offsets on each peak should be somehow different. This function takes a zip file, adjusts the offsets, and writes the file back out.
recalculate_offsets(in_zip, offset = 2, out_file = in_zip)recalculate_offsets(in_zip, offset = 2, out_file = in_zip)
in_zip |
the zip file to work with |
offset |
the offset to use |
out_file |
the file to write too (optional) |
Given a data.frame or character vector of files to run characterization on, processes them in sequence, to a particular saved location.
run_mzml_list( mzml_files, json_files = NULL, progress = TRUE, save_loc = ".", ... )run_mzml_list( mzml_files, json_files = NULL, progress = TRUE, save_loc = ".", ... )
mzml_files |
the list of mzML files to use |
json_files |
the list of corresponding json meta-data files |
progress |
whether to give messages about the progress of things |
save_loc |
where should the file files be saved |
... |
other parameters for |
list
determine sample run time
sample_run_time(zip, units = "m")sample_run_time(zip, units = "m")
zip |
the zip object you want to use |
units |
what units should the run time be in? (s, m, h) |
data.frame with sample, start and end time
make a new SCZip
sc_zip( in_file, mzml_meta_file = NULL, out_file = NULL, load_raw = TRUE, load_peak_list = TRUE )sc_zip( in_file, mzml_meta_file = NULL, out_file = NULL, load_raw = TRUE, load_peak_list = TRUE )
in_file |
the file to use (either .zip or .mzML) |
mzml_meta_file |
metadata file (.json) |
out_file |
the file to save to at the end |
load_raw |
logical to load the raw data |
load_peak_list |
to load the peak list if it exists |
SCZip
The ScanCentricPeakCharacterization package provides several classes and functions for working with direct injection, high-resolution mass spectrometry data.
Peak characterization control
Peak characterization associates data with the SCZip,
SCPeakRegionFinder, and controls their execution.
found_peakspeaks found by a function
ida holder for the ID of the sample
frequency_fit_descriptionthe model for conversion to frequency
mz_fit_descriptionthe model for converting back to m/z
calculate_peak_areawhether to calculate peak area or not
sc_peak_region_finderthe peak finder object
sc_zipthe SCZip that represents the final file
in_filethe input file
metadata_filethe metadata file
out_filewhere everything gets saved
temp_locwhere intermediates get saved
load_file()
Loads the mzml data into the SCZip
SCCharacterizePeaks$load_file()
filter_scans()
Filter the scans in data.
SCCharacterizePeaks$filter_scans()
choose_frequency_model()
Choose the single frequency model.
SCCharacterizePeaks$choose_frequency_model()
prepare_mzml_data()
Prepare the mzml data.
SCCharacterizePeaks$prepare_mzml_data()
set_frequency_fit_description()
Set the frequency fit description
SCCharacterizePeaks$set_frequency_fit_description(frequency_fit_description)
frequency_fit_descriptionthe frequency model description
set_mz_fit_description()
Set the mz fit description
SCCharacterizePeaks$set_mz_fit_description(mz_fit_description)
mz_fit_descriptionthe m/z model description
generate_filter_scan_function()
Sets the scan filtering and check for outlier function.
SCCharacterizePeaks$generate_filter_scan_function( rtime = NA, y.freq = NA, f_function = NULL )
rtimeretention time limits of scans to keep
y.freqy-frequency coefficient limits of scans to keep (NA)
f_functiona full function to set as the filtering function
generate_choose_frequency_model_function()
Sets the function for choosing a single frequency model
SCCharacterizePeaks$generate_choose_frequency_model_function(f_function = NULL)
f_functionthe function for choosing a single model
predict_frequency()
Run frequency prediction
SCCharacterizePeaks$predict_frequency()
check_frequency_model()
Check the frequency model
SCCharacterizePeaks$check_frequency_model()
get_frequency_data()
Get the frequency data from the SCMzml bits
SCCharacterizePeaks$get_frequency_data()
scan_info()
Get the SCMzml$scan_info out
SCCharacterizePeaks$scan_info()
find_peaks()
Do the peak characterization without saving
SCCharacterizePeaks$find_peaks(stop_after_initial_detection = FALSE)
stop_after_initial_detectionshould it stop after the initial peak finding
summarize()
Generates the JSON output summary.
SCCharacterizePeaks$summarize()
save_peaks()
Saves the peaks and JSON to the temp file
SCCharacterizePeaks$save_peaks()
write_zip()
Write the zip file
SCCharacterizePeaks$write_zip()
run_all()
Runs all of the pieces for peak characterization in order
SCCharacterizePeaks$run_all( filter_scan_function = NULL, choose_frequency_model_function = NULL )
filter_scan_functionthe scan filtering function
choose_frequency_model_functionthe function for choosing a frequency model
prep_data()
Loads and preps the data for characterization
SCCharacterizePeaks$prep_data()
add_regions()
Adds initial regions for finding real peak containing regions
SCCharacterizePeaks$add_regions()
run_splitting()
Does initial region splitting and peak finding in scans
SCCharacterizePeaks$run_splitting()
new()
Creates a new SCCharacterizePeaks class
SCCharacterizePeaks$new(
in_file,
metadata_file = NULL,
out_file = NULL,
temp_loc = tempfile("scpcms"),
frequency_fit_description = NULL,
mz_fit_description = NULL,
filter_remove_outlier_scans = NULL,
choose_single_frequency_model = NULL,
sc_peak_region_finder = NULL,
calculate_peak_area = FALSE
)in_filethe mass spec data file to use (required)
metadata_filea json metadata file (optional)
out_filewhere to save the final zip container
temp_loca specified temporary location
frequency_fit_descriptionmz -> frequency model
mz_fit_descriptionfrequency -> mz model
filter_remove_outlier_scansfunction for scan filtering
choose_single_frequency_modelfunction to choose a single frequency model
sc_peak_region_findera blank SCPeakRegionFinder to use instead of the default
calculate_peak_areashould peak areas be returned as well as height?
clone()
The objects of this class are cloneable with this method.
SCCharacterizePeaks$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run: lipid_sample = system.file("extdata", "lipid_example.mzML", package = "ScanCentricPeakCharacterization") sc_char = SCCharacterizePeaks$new(lipid_sample) # prep data and check model library(ggplot2) library(patchwork) sc_char$load_file() sc_char$generate_filter_scan_function() sc_char$generate_choose_frequency_model_function() sc_char$prepare_mzml_data() sc_char$check_frequency_model() # run characterization save_loc = "test.zip" sc_char = SCCharacterizePeaks$new(lipid_sample, out_file = save_loc) sc_char$run_all() ## End(Not run)## Not run: lipid_sample = system.file("extdata", "lipid_example.mzML", package = "ScanCentricPeakCharacterization") sc_char = SCCharacterizePeaks$new(lipid_sample) # prep data and check model library(ggplot2) library(patchwork) sc_char$load_file() sc_char$generate_filter_scan_function() sc_char$generate_choose_frequency_model_function() sc_char$prepare_mzml_data() sc_char$check_frequency_model() # run characterization save_loc = "test.zip" sc_char = SCCharacterizePeaks$new(lipid_sample, out_file = save_loc) sc_char$run_all() ## End(Not run)
mzML mass spectrometry data container with some useful methods.
Provides our own container for mzML data, and does conversion to frequency, filtering scans, choosing a single frequency regression model, and generating the frequency data for use in the peak characterization.
mzml_filethe mzml file location
mzml_metadatametadata from an external json file
mzml_datathe actual mzml data from MSnbase
mzml_df_dataa list of data.frames of the data
scan_rangethe range of scans to be used
rtime_rangethe range of retention times to keep
mz_rangethe mz range to use
scan_infodata.frame of scan information
remove_zeroshould zero intensity data points be removed?
frequency_fit_descriptionthe model for m/z -> frequency
mz_fit_descriptionthe model for going from frequency -> m/z
frequency_coefficientsthe coefficients for the frequency model
mz_coefficientsthe coefficients for the m/z model
ms_levelwhich MS level will we be using from the mzml file?
memory_modehow will the mzml data be worked with to start, inMemory or onDisk?
difference_rangehow wide to consider adjacent frequency points as good
choose_frequency_model_functionwhere the added model selection function will live
filter_scan_functionwhere the added filter scan function will live.
choose_single_frequency_modelfunction to choose a single frequency model
import_mzml()
import the mzml file defined
SCMzml$import_mzml( mzml_file = self$mzml_file, ms_level = self$ms_level, memory_mode = self$memory_mode )
mzml_filewhat file are we reading in?
ms_levelwhich ms level to import (default is 1)
memory_modeuse inMemory or onDisk mode
extract_mzml_data()
get the mzml data into data.frame form so we can use it
SCMzml$extract_mzml_data(remove_zero = self$remove_zero)
remove_zerowhether to remove zero intensity points or not
predict_frequency()
predict frequency and generate some summary information. This does regression of frequency ~ m/z for each scan separately.
SCMzml$predict_frequency( frequency_fit_description = self$frequency_fit_description, mz_fit_description = self$mz_fit_description )
frequency_fit_descriptionthe regression model definition
mz_fit_descriptionthe regression model definition
convert_to_frequency()
actually do the conversion of m/z to frequency
SCMzml$convert_to_frequency()
choose_frequency_model()
choose a frequency model using the previously added function
SCMzml$choose_frequency_model()
generate_choose_frequency_model_function()
generate a frequency model choosing function and attach it
SCMzml$generate_choose_frequency_model_function(f_function = NULL)
f_functionthe function you want to pass in
Creates a new function that access the scan_info slot of an SCMzml object
after conversion to frequency space, and chooses a single model based on the information
there.
filter_scans()
filter the scans using the previously added function
SCMzml$filter_scans()
generate_filter_scan_function()
generate a filter function and attach it
SCMzml$generate_filter_scan_function( rtime = NA, y.freq = NA, f_function = NULL )
rtimeretention time limits of scans to keep (NA)
y.freqy-frequency coefficient limits of scans to keep (NA)
f_functiona full function to set as the filtering function
Creates a new function that accesses the scan_info slot of
an SCMzml object, filters the scans by their retention-time and
y-frequency coefficients, tests for outliers in the y-frequency
coefficients, and denotes which scans will be kept for further
processing.
NA means no filtering will be done, one-sided limits, eg. (NA, 10) or (10, NA)
implies to filter <= or >=, respectively.
check_frequency_model()
check how well a given frequency model works for this data
SCMzml$check_frequency_model(scan = NULL, as_list = FALSE)
scanwhich scan to show predictions for
as_listwhether plots should be returned as a single plot or a list of plots
get_instrument()
get instrument data from associated mzml file metadata
SCMzml$get_instrument()
get_frequency_data()
get the frequency data to go into the next steps of analysis.
SCMzml$get_frequency_data()
new()
SCMzml$new( mzml_file, frequency_fit_description = c(a.freq = 0, x.freq = -1, y.freq = -1/2, z.freq = -1/3), mz_fit_description = c(a.mz = 0, x.mz = -1, y.mz = -2, z.mz = -3), metadata_file = NULL, scan_range = NULL, rtime_range = NULL, mz_range = NULL, remove_zero = FALSE, ms_level = 1, memory_mode = "inMemory" )
mzml_filethe file to load and use
frequency_fit_descriptiona description of the regression model for frequency ~ m/z
mz_fit_descriptiona description of the regression model for m/z ~ frequency
metadata_filea metadata file generated by ...
scan_rangewhich scans can be used for analysis
rtime_rangethe retention time to use for scans
mz_rangewhat m/z range to use
remove_zeroshould zero intensity data be removed?
ms_levelwhat MS level should be extracted (default is 1)
memory_modewhat memory mode should MSnbase be using (inMemory or onDisk)
clone()
The objects of this class are cloneable with this method.
SCMzml$clone(deep = FALSE)
deepWhether to make a deep clone.
choose_single_frequency_model_default()
## Not run: lipid_sample = system.file("extdata", "lipid_example.mzML", package = "ScanCentricPeakCharacterization") ## End(Not run)## Not run: lipid_sample = system.file("extdata", "lipid_example.mzML", package = "ScanCentricPeakCharacterization") ## End(Not run)
R6 Peak Region Finder
R6 Peak Region Finder
Think of it like managing all the stuff that needs to happen to find the peaks in the regions.
run_timehow long did the process take
start_timewhen did we start
stop_timewhen did we start
peak_regionsSCPeakRegions object
sliding_region_sizehow big are the sliding regions in data points
sliding_region_deltahow much space between sliding region starts
quantile_multiplierhow much to multiply quantile based cutoff by
n_point_regionhow many points are there in the big tiled regions for quantile based cutoff
tiled_region_sizehow wide are the tiled regions in data points
tiled_region_deltahow far in between each tiled region
region_percentile??
peak_methodwhat method to extract peak center, height, area, etc
min_pointshow many points wide does a peak have to be to get characterized
sample_idwhat sample are we processing
n_zero_tileshow many zero count tiled regions split up a region into multiple peaks?
zero_normalizationdo we want to pretend to do normalization
calculate_peak_areashould peak area be calculated as well?
add_regions()
Add the sliding and tiled regions
SCPeakRegionFinder$add_regions()
reduce_sliding_regions()
Find the regions most likely to contain real signal
SCPeakRegionFinder$reduce_sliding_regions()
split_peak_regions()
Split up signal regions by peaks found
SCPeakRegionFinder$split_peak_regions( use_regions = NULL, stop_after_initial_detection = FALSE )
use_regionsan index of the regions we want to split up
stop_after_initial_detectionshould it do full characterization or stop
remove_double_peaks_in_scans()
Check for the presence of two peaks with the same scan number in each region and remove them. Any regions with zero peaks left, remove the region.
SCPeakRegionFinder$remove_double_peaks_in_scans()
normalize_data()
Normalize the intensity data
SCPeakRegionFinder$normalize_data(which_data = "both")
which_dataraw, characterized, or both (default)
find_peaks_in_regions()
Find the peaks in the regions.
SCPeakRegionFinder$find_peaks_in_regions()
model_mzsd()
Model the m/z standard deviation.
SCPeakRegionFinder$model_mzsd()
model_heightsd()
Model the intensity height standard deviation.
SCPeakRegionFinder$model_heightsd()
indicate_high_frequency_sd()
Look for peaks with higher than expected frequency standard deviation.
SCPeakRegionFinder$indicate_high_frequency_sd()
add_data()
Add the data from an SCMzml object to the underlying SCPeakRegions object.
SCPeakRegionFinder$add_data(sc_mzml)
sc_mzmlthe SCMzml object being passed in
summarize_peaks()
Summarize the peaks to go into JSON form.
SCPeakRegionFinder$summarize_peaks()
add_offset()
Add an offset based on width in frequency space to m/z to describe how wide the peak is.
SCPeakRegionFinder$add_offset()
sort_ascending_mz()
Sort the data in m/z order, as the default is frequency order
SCPeakRegionFinder$sort_ascending_mz()
characterize_peaks()
Run the overall peak characterization from start to finish.
SCPeakRegionFinder$characterize_peaks(stop_after_initial_detection = FALSE)
stop_after_initial_detectiondo we stop the whole process after finding initial peaks in each scan?
summarize()
Summarize everything for output to the zip file after completion.
SCPeakRegionFinder$summarize( package_used = "package:ScanCentricPeakCharacterization" )
package_usedwhich package is being used for this work.
peak_meta()
Generate the meta data that goes into the accompanying JSON file.
SCPeakRegionFinder$peak_meta()
new()
Make a new SCPeakRegionFinder object.
SCPeakRegionFinder$new( sc_mzml = NULL, sliding_region_size = 10, sliding_region_delta = 1, tiled_region_size = 1, tiled_region_delta = 1, region_percentile = 0.99, offset_multiplier = 1, frequency_multiplier = 400, quantile_multiplier = 1.5, n_point_region = 2000, peak_method = "lm_weighted", min_points = 4, n_zero_tiles = 1, zero_normalization = FALSE, calculate_peak_area = FALSE )
sc_mzmlthe SCMzml object to use (can be missing)
sliding_region_sizehow wide to make the sliding regions in data points
sliding_region_deltahow far apart are the starting locations of the sliding regions
tiled_region_sizehow wide are the tiled regions
tiled_region_deltahow far apart are the tiled reigons
region_percentilecumulative percentile cutoff to use
offset_multiplierwhat offset multiplier should be used
frequency_multiplierhow much to multiply frequency points to interval ranges
quantile_multiplierhow much to adjust the quantile cutoff by
n_point_regionhow many points in the large tiled regions
peak_methodthe peak characterization method to use (lm_weighted)
min_pointshow many points to say there is a peak (4)
n_zero_tileshow many tiles in a row do there need to be to split things up? (1)
zero_normalizationdon't actually do normalization (FALSE)
calculate_peak_areashould peak area as well as peak height be returned? (FALSE)
clone()
The objects of this class are cloneable with this method.
SCPeakRegionFinder$clone(deep = FALSE)
deepWhether to make a deep clone.
Holds all the peak region data
Holds all the peak region data
This reference class represents the peak region data.
frequency_point_regionsthe frequency data
frequency_fit_descriptionthe model of frequency ~ m/z
mz_fit_descriptionthe model of m/z ~ frequency
peak_regionsthe peak regions
sliding_regionsthe sliding regions used for density calculations
tiled_regionsthe tiled regions used for grouping and splitting peak regions
peak_region_listlist of regions
frequency_multiplierhow much to multiplier frequency by to make interval points
scan_peaksthe peaks by scans
peak_datathe data.frame of final peak data
scan_level_arraysscan level peak data as matrices
is_normalizedare the peak intensities normalized
normalization_factorsthe normalization factors calculated
n_scanhow many scans are we working with
scans_per_peak??
scan_percwhat percentage of scans is a minimum
min_scanbased on scan_perc, how many scans minimum does a peak
need to be in
max_subsets??
scan_subsets??
frequency_rangewhat is the range in frequency space
scan_correlation??
keep_peakswhich peaks are we keeping out of all the peaks we had
peak_indexthe indices for the peaks
scan_indicesthe names of the scans
instrumentthe instrument serial number if available
set_min_scan()
sets the minimum number of scans to use
SCPeakRegions$set_min_scan()
add_data()
Adds the data from an SCMzml object to the SCPeakRegion.
SCPeakRegions$add_data(sc_mzml)
sc_mzmlthe SCMzml object being passed in
new()
Creates a new SCPeakRegions object
SCPeakRegions$new( sc_mzml = NULL, frequency_multiplier = 400, scan_perc = 0.1, max_subsets = 100 )
sc_mzmlthe SCMzml object to get data from
frequency_multiplierhow much to multiply frequency by
scan_perchow many scans are required to be in to be a "peak"
max_subsets??
clone()
The objects of this class are cloneable with this method.
SCPeakRegions$clone(deep = FALSE)
deepWhether to make a deep clone.
Represents the zip mass spec file
Represents the zip mass spec file
This reference class represents the zip mass spec file. It does this by providing objects for the zip file, the metadata, as well as various bits underneath such as the mzml data and peak lists, and their associated metadata. Although it is possible to work with the SCZip object directly, it is heavily recommended to use the SCCharacterizePeaks object for carrying out the various steps of an analysis, including peak finding.
zip_filethe actual zip file
zip_metadatathe metadata about the zip file
metadatathe metadata itself
metadata_filethe metadata file
sc_mzmlthe mzML data object.
peaks??
sc_peak_region_finderthe peak finder object
json_summaryjsonized summary of the peak characterization
idthe identifier of the sample
out_filewhere to put the final file
temp_directorywhere we keep everything until peak characterization is done
load_mzml()
Loads the mzML file
SCZip$load_mzml()
load_sc_peak_region_finder()
Loads the SCPeakRegionFinder object
SCZip$load_sc_peak_region_finder()
save_json()
Save the jsonized summary out to actual json files
SCZip$save_json()
save_sc_peak_region_finder()
Saves the SCPeakRegionFinder binary object
SCZip$save_sc_peak_region_finder()
load_peak_list()
loads just the peak list data-frame instead of peak region finder
SCZip$load_peak_list()
compare_mzml_corresponded_densities()
compare peak densities
SCZip$compare_mzml_corresponded_densities( mz_range = c(150, 1600), window = 1, delta = 0.1 )
mz_rangethe mz range to work over
windowthe window size in m/z
deltahow much to move the window
new()
Create a new SCZip object.
SCZip$new(
in_file,
mzml_meta_file = NULL,
out_file = NULL,
load_mzml = TRUE,
load_peak_list = TRUE,
temp_loc = tempfile("scpcms")
)in_filethe mzML file to load
mzml_meta_filean optional metadata file
out_filewhere to save the final file
load_mzmlshould the mzML file actually be loaded into an SCMzml object?
load_peak_listshould the peak list be loaded if this is previously characterized?
temp_locwhere to make the temp file while working with the data
show_temp_dir()
Show the temp directory where everything is being worked with
SCZip$show_temp_dir()
write_zip()
Write the zip file
SCZip$write_zip(out_file = NULL)
out_filewhere to save the zip file
cleanup()
delete the temp directory
SCZip$cleanup()
finalize()
delete when things are done
SCZip$finalize()
add_peak_list()
Add peak list data to the temp directory
SCZip$add_peak_list(peak_list_data)
peak_list_datathe peak list data
clone()
The objects of this class are cloneable with this method.
SCZip$clone(deep = FALSE)
deepWhether to make a deep clone.
Allows the user to set which mapping function is being used internally in the functions.
set_internal_map(map_function = NULL)set_internal_map(map_function = NULL)
map_function |
which function to use, assigns it to an internal object |
by default, the package uses purrr::map to iterate over things. However, if you have the furrr package installed, you could switch it to use furrr::future_map instead.
## Not run: library(furrr) future::plan(multicore) set_internal_map(furrr::future_map) ## End(Not run)## Not run: library(furrr) future::plan(multicore) set_internal_map(furrr::future_map) ## End(Not run)
Allow the user to turn progress messages to the console and off. Default is to provide messages to the console.
show_progress(progress = TRUE)show_progress(progress = TRUE)
progress |
logical to have it on or off |
Does a single pass of normalizing scans to each other.
single_pass_normalization( scan_peaks, intensity_measure = c("RawHeight", "Height"), summary_function = median, use_peaks = NULL, min_ratio = 0.7 )single_pass_normalization( scan_peaks, intensity_measure = c("RawHeight", "Height"), summary_function = median, use_peaks = NULL, min_ratio = 0.7 )
scan_peaks |
the scan peaks to normalize |
intensity_measure |
which intensities to normalize |
summary_function |
which function to use to calculate summaries (median) |
use_peaks |
which peaks to use for normalization |
min_ratio |
what ratio of maximum intensity of peaks should we use for normalization |
scan_peaks list
Given a region that should contain signal, and the point data within it, find the peaks, and return the region, and the set of points that make up each peak from each scan.
split_region_by_peaks( region_list, min_points = 4, metadata = NULL, calculate_peak_area = FALSE )split_region_by_peaks( region_list, min_points = 4, metadata = NULL, calculate_peak_area = FALSE )
region_list |
a list with points and tiles IRanges objects |
min_points |
how many points are needed for a peak |
metadata |
metadata that tells how things should be processed |
list
returns the sum of squares residuals from an lm object
ssr(object)ssr(object)
object |
the lm object |
numeric
given a set of original and fitted values and a transform, return a set of transformed residuals.
transform_residuals(original, fitted, transform = exp)transform_residuals(original, fitted, transform = exp)
original |
the original points |
fitted |
the fitted points |
transform |
the function that should be used to transform the values |
numeric
given a zip file, list the contents
zip_list_contents(zip_file)zip_list_contents(zip_file)
zip_file |
the zip file |