cns.analyze package

cns.analyze.aneuploidy module

cns.analyze.aneuploidy.calc_ane_bases(samples_df, cns_df, cn_columns, allele_spec, assembly=<cns.utils.assemblies.Assembly object>)

Calculates the length of aneuploidy bases for each sample.

Parameters:

samples_df (pandas.DataFrame) – DataFrame containing sample information.
cns_df (pandas.DataFrame) – DataFrame containing CNS data.
cn_columns (list of str) – List of column names for copy number data.
allele_spec (str) – Allele specification, either “any” or “both”.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

DataFrame with the length of aneuploidy bases for each sample.

Return type:

pandas.DataFrame

cns.analyze.aneuploidy.calc_chrom_mean(df, cn_column)

Calculates the mean of a specified column grouped by chromosome.

Parameters:

cns_df (pandas.DataFrame) – DataFrame containing CNS data.
cn_column (str) – Column name for which to calculate the mean.

Returns:

Series with the mean of the specified column for each chromosome.

Return type:

pandas.Series

cns.analyze.aneuploidy.calc_chrom_var(df, cn_column)

Calculates the variance of a specified column grouped by chromosome.

Parameters:

cns_df (pandas.DataFrame) – DataFrame containing CNS data.
cn_column (str) – Column name for which to calculate the variance.

Returns:

Series with the variance of the specified column for each chromosome.

Return type:

pandas.Series

cns.analyze.aneuploidy.calc_imb_bases(cns_df, samples_df, cn_columns, col_index=0, assembly=<cns.utils.assemblies.Assembly object>)

Calculates the length of imbalance bases for each sample.

Parameters:

cns_df (pandas.DataFrame) – DataFrame containing CNS data.
samples_df (pandas.DataFrame) – DataFrame containing sample information.
cn_columns (list of str) – List of column names for copy number data.
col_index (int, optional) – Index of the column to use for imbalance calculation. Default is 0.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

DataFrame with the length of imbalance bases for each sample.

Return type:

pandas.DataFrame

cns.analyze.aneuploidy.calc_loh_bases(samples_df, cns_df, cn_columns, allele_spec, assembly=<cns.utils.assemblies.Assembly object>)

Calculates the length of Loss of Heterozygosity (LOH) bases for each sample.

Parameters:

samples_df (pandas.DataFrame) – DataFrame containing sample information.
cns_df (pandas.DataFrame) – DataFrame containing CNS data.
cn_columns (list of str) – List of column names for copy number data.
allele_spec (str) – Allele specification, either “any” or “both”.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

DataFrame with the length of LOH bases for each sample.

Return type:

pandas.DataFrame

cns.analyze.aneuploidy.calc_ploidy_per_column(cns_df, cn_column)

Calculates the ploidy for each sample based on a specified CN column.

Parameters:

cns_df (pandas.DataFrame) – DataFrame containing CNS data.
cn_column (str) – Column name for copy number data.

Returns:

Series with the ploidy value for each sample.

Return type:

pandas.Series

cns.analyze.breakage module

cns.analyze.breakage.calc_breaks_per_chr(cns_df)

Calculates the number of breakpoints per chromosome for each sample.

Parameters:: cns_df (pandas.DataFrame) – DataFrame containing CNS data.
Returns:: DataFrame with the number of breakpoints per chromosome for each sample.
Return type:: pandas.DataFrame

cns.analyze.breakage.calc_breaks_per_sample(cns_df, samples_df, cn_col, assembly=<cns.utils.assemblies.Assembly object>)

Calculates the number of breakpoints per sample.

Parameters:

cns_df (pandas.DataFrame) – DataFrame containing CNS data.
samples_df (pandas.DataFrame) – DataFrame containing sample information.
cn_col (str) – Column name for copy number data.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

DataFrame with the number of breakpoints per sample.

Return type:

pandas.DataFrame

cns.analyze.breakage.calc_step_per_chr(cns_df, cn_col)

Calculates the step size per chromosome for each sample.

Parameters:

cns_df (pandas.DataFrame) – DataFrame containing CNS data.
cn_col (str) – Column name for copy number data.

Returns:

DataFrame with the step size per chromosome for each sample.

Return type:

pandas.DataFrame

cns.analyze.breakage.calc_step_per_sample(cns_df, samples_df, cn_col, assembly=<cns.utils.assemblies.Assembly object>)

Calculates the step size per sample.

Parameters:

cns_df (pandas.DataFrame) – DataFrame containing CNS data.
samples_df (pandas.DataFrame) – DataFrame containing sample information.
cn_col (str) – Column name for copy number data.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

DataFrame with the step size per sample.

Return type:

pandas.DataFrame

cns.analyze.coverage module

cns.analyze.coverage.get_covered_bases(nan_bases_df, samples_df, either_allele)

Calculates the number of covered bases for each sample.

Parameters:

nan_bases_df (pandas.DataFrame) – DataFrame containing CNS data with NaN values indicating uncovered bases.
samples_df (pandas.DataFrame) – DataFrame containing sample information.
either_allele (bool) – If True, considers either allele for coverage. If False, considers both alleles.

Returns:

DataFrame with the number of covered bases for each sample.

Return type:

pandas.DataFrame

cns.analyze.coverage.get_missing_chroms(cns_df, samples_df, segs=None, assembly=<cns.utils.assemblies.Assembly object>)

Identifies missing chromosomes for each sample.

Parameters:

cns_df (pandas.DataFrame) – DataFrame containing CNS data.
samples_df (pandas.DataFrame) – DataFrame containing sample information.
segs (dict, optional) – Dictionary of segments to consider. If None, all segments are considered.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

DataFrame with the count of chromosomes and missing chromosomes for each sample.

Return type:

pandas.DataFrame

cns.analyze.plot module

cns.analyze.plot.add_cytoband_legend(ax)

Adds a legend for cytobands to the given axis.

Parameters:: ax (matplotlib.axes.Axes) – The axes to which the legend is added.
Returns:: The axes with the added legend.
Return type:: matplotlib.axes.Axes

cns.analyze.plot.add_gap_legend(ax)

Adds a legend for gaps to the given axis.

Parameters:: ax (matplotlib.axes.Axes) – The axes to which the legend is added.
Returns:: The axes with the added legend.
Return type:: matplotlib.axes.Axes

cns.analyze.plot.fig_bars(cns_df, cn_columns=None, colors=None, size=1, assembly=<cns.utils.assemblies.Assembly object>)

Creates a bar plot for each of the CN columns.

Parameters:

cns_df (pandas.DataFrame) – DataFrame containing CNS data.
cn_columns (list of str, optional) – List of column names for copy number data. If None, columns are inferred from cns_df.
colors (list of str, optional) – List of colors to use for the plots. If None, colors are generated automatically.
size (int, optional) – Size of the plot. Default is 1.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

matplotlib.figure.Figure – The created figure.
list of matplotlib.axes.Axes – List of axes in the figure.

cns.analyze.plot.fig_dots(cns_df, cn_columns=None, colors=None, size=1, assembly=<cns.utils.assemblies.Assembly object>)

Creates a dot plot for each of the CN columns.

Parameters:

cns_df (pandas.DataFrame) – DataFrame containing CNS data.
cn_columns (list of str, optional) – List of column names for copy number data. If None, columns are inferred from cns_df.
colors (list of str, optional) – List of colors to use for the plots. If None, colors are generated automatically.
size (int, optional) – Size of the plot. Default is 1.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

matplotlib.figure.Figure – The created figure.
list of matplotlib.axes.Axes – List of axes in the figure.

cns.analyze.plot.fig_heatmap(cns_df, cn_columns=None, min_cn=0, max_cn=10, vertical=None, assembly=<cns.utils.assemblies.Assembly object>)

Creates a heatmap figure from copy number segment (CNS) data.

Parameters:

cns_df (pandas.DataFrame) – DataFrame containing CNS data with columns for sample_id, chromosome positions, and copy number values.
cn_columns (list of str, optional) – List of column names containing copy number values. If None, columns are inferred from the DataFrame.
min_cn (int, optional) – Minimum copy number value for the color scale. Default is 0.
max_cn (int, optional) – Maximum copy number value for the color scale. Default is 10.
vertical (bool, optional) – Whether to stack plots vertically. If None, orientation is determined by the aspect ratio. Default is None.
assembly (Assembly object, optional) – Genome assembly object defining chromosome sizes. Default is hg19.

Returns:

A tuple containing: - matplotlib.figure.Figure: The created figure - numpy.ndarray: Array of matplotlib.axes.Axes objects

Return type:

tuple

Notes

The figure size is automatically determined based on the genome size and number of samples. The layout (vertical vs horizontal) is determined by the aspect ratio unless explicitly specified.

cns.analyze.plot.fig_lines(cns_df, cn_columns=None, colors=None, size=1, assembly=<cns.utils.assemblies.Assembly object>)

Creates a line plot for each of the CN columns.

Parameters:

cns_df (pandas.DataFrame) – DataFrame containing CNS data.
cn_columns (list of str, optional) – List of column names for copy number data. If None, columns are inferred from cns_df.
colors (list of str, optional) – List of colors to use for the plots. If None, colors are generated automatically.
size (int, optional) – Size of the feature of the plot - line/boundary width or dot size. Default is 1.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

matplotlib.figure.Figure – The created figure.
list of matplotlib.axes.Axes – List of axes in the figure.

cns.analyze.plot.fig_steps(cns_df, cn_columns=None, colors=None, size=1, assembly=<cns.utils.assemblies.Assembly object>)

Creates a step plot for each of the CN columns.

Parameters:

cns_df (pandas.DataFrame) – DataFrame containing CNS data.
cn_columns (list of str, optional) – List of column names for copy number data. If None, columns are inferred from cns_df.
colors (list of str, optional) – List of colors to use for the plots. If None, colors are generated automatically.
size (int, optional) – Size of the plot. Default is 1.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

matplotlib.figure.Figure – The created figure.
list of matplotlib.axes.Axes – List of axes in the figure.

cns.analyze.plot.no_x_ticks(ax)

Removes x-axis ticks from the given axis.

Parameters:: ax (matplotlib.axes.Axes) – The axes from which to remove the x-axis ticks.
Returns:: The axes with the x-axis ticks removed.
Return type:: matplotlib.axes.Axes

cns.analyze.plot.no_y_ticks(ax)

Removes y-axis ticks from the given axis.

Parameters:: ax (matplotlib.axes.Axes) – The axes from which to remove the y-axis ticks.
Returns:: The axes with the y-axis ticks removed.
Return type:: matplotlib.axes.Axes

cns.analyze.plot.plot_bars(ax, cns_df, cn_column, color='green', label=None, alpha=1.0, size=1.0, assembly=<cns.utils.assemblies.Assembly object>)

Plots bars representing segments on the given axis.

Parameters:

ax (matplotlib.axes.Axes) – The axes on which to plot the bars.
cns_df (pandas.DataFrame) – DataFrame containing CNS data.
cn_column (str) – Column name for copy number data.
color (str, optional) – Color of the bars. Default is “green”.
label (str, optional) – Label for the bars. Default is None.
alpha (float, optional) – Alpha value for the bars. Default is 1.
size (float, optional) – Line width of the bars. Default is 1.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

The axes with the plotted bars.

Return type:

matplotlib.axes.Axes

cns.analyze.plot.plot_chr_bg(ax, y_min=0, y_max=2, assembly=<cns.utils.assemblies.Assembly object>, alpha=0.2)

Plots the chromosome background on the given axis.

Parameters:

ax (matplotlib.axes.Axes) – The axes on which to plot the chromosome background.
assembly (object, optional) – Genome assembly to use. Default is hg19.
y_min (float, optional) – Minimum y-axis value. Default is 0.
y_max (float, optional) – Maximum y-axis value. Default is 1.
alpha (float, optional) – Alpha value for the background. Default is 0.2.

Returns:

The axes with the plotted chromosome background.

Return type:

matplotlib.axes.Axes

cns.analyze.plot.plot_cytobands(ax, y_min=0, y_max=2, assembly=<cns.utils.assemblies.Assembly object>, alpha=0.2, color=None)

Plots cytobands on the background of the ax.

Parameters:

ax (matplotlib.axes.Axes) – The axes on which to plot the cytobands.
bounds (tuple, optional) – Bounds for the plot (x_min, y_min, x_max, y_max). If None, it is inferred from the data.
assembly (object, optional) – Genome assembly to use. Default is hg19.
alpha (float, optional) – Alpha value for the cytobands. Default is 0.2.

Returns:

The axes with the plotted cytobands.

Return type:

matplotlib.axes.Axes

cns.analyze.plot.plot_dots(ax, cns_df, cn_column, color='green', label=None, alpha=1.0, size=1, assembly=<cns.utils.assemblies.Assembly object>)

Plots dots representing segments on the given axis.

Parameters:

ax (matplotlib.axes.Axes) – The axes on which to plot the dots.
cns_df (pandas.DataFrame) – DataFrame containing CNS data.
cn_column (str) – Column name for copy number data.
color (str, optional) – Color of the dots. Default is “green”.
label (str, optional) – Label for the dots. Default is None.
alpha (float, optional) – Alpha value for the dots. Default is 1.
size (float, optional) – Size of the dots. Default is 1.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

The axes with the plotted dots.

Return type:

matplotlib.axes.Axes

cns.analyze.plot.plot_gaps(ax, y_min=0, y_max=2, assembly=<cns.utils.assemblies.Assembly object>, alpha=0.2, color=None)

cns.analyze.plot.plot_heatmap(ax, cns_df, cn_column, min_cn=0, max_cn=16, assembly=<cns.utils.assemblies.Assembly object>)

Plots a heatmap of the Copy Number (CN) data.

Parameters:

ax (matplotlib.axes.Axes) – The axes on which to plot the heatmap.
cns_df (pandas.DataFrame) – DataFrame containing CNS data.
cn_column (str) – Column name for copy number data.
min_cn (float, optional) – Minimum copy number value for the color gradient. Default is 0.
max_cn (int, optional) – Maximum copy number value for the color gradient. Default is 16.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

The axes with the plotted heatmap.

Return type:

matplotlib.axes.Axes

cns.analyze.plot.plot_lines(ax, cns_df, cn_column, color='green', label=None, alpha=1.0, size=1, assembly=<cns.utils.assemblies.Assembly object>)

Plots consecutive segments as lines on the given axis - centers of each segment are used as endpoints of each line.

NOTE: A single segment will not be plotted, at least two segments must exist.

Parameters:

ax (matplotlib.axes.Axes) – The axes on which to plot the lines.
cns_df (pandas.DataFrame) – DataFrame containing CNS data.
cn_column (str) – Column name for copy number data.
color (str, optional) – Color of the lines. Default is “green”.
label (str, optional) – Label for the lines. Default is None.
alpha (float, optional) – Alpha value for the lines. Default is 1.
size (float, optional) – Line width. Default is 1.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

The axes with the plotted lines.

Return type:

matplotlib.axes.Axes

cns.analyze.plot.plot_steps(ax, cns_df, cn_column, color='green', label=None, alpha=1.0, size=1, assembly=<cns.utils.assemblies.Assembly object>)

cns.analyze.plot.plot_x_lines(ax, assembly=<cns.utils.assemblies.Assembly object>, positions=None, width=1, alpha=0.5)

Plots vertical lines at chromosome boundaries on the given axis.

Parameters:

ax (matplotlib.axes.Axes) – The axes on which to plot the vertical lines.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

The axes with the plotted vertical lines.

Return type:

matplotlib.axes.Axes

cns.analyze.plot.plot_x_ticks(ax, assembly=<cns.utils.assemblies.Assembly object>, min_x=0, max_x=None)

Plots the x-axis ticks for the chromosomes.

Parameters:

ax (matplotlib.axes.Axes) – The axes on which to plot the x-axis ticks.
assembly (object, optional) – Genome assembly to use. Default is hg19.
x_min (float, optional) – Minimum x-axis value. If None, it is inferred from the data.
x_max (float, optional) – Maximum x-axis value. If None, it is inferred from the data.

Returns:

List of x-axis tick positions.

Return type:

list of float

cns.analyze.plot.x_limits(cns_df, assembly=<cns.utils.assemblies.Assembly object>)

Calculates the x-axis limits for the plot based on the CNS data and genome assembly.

Parameters:

cns_df (pandas.DataFrame) – DataFrame containing CNS data.
assembly (object, optional) – Genome assembly to use. Default is hg19.

Returns:

A tuple containing the minimum and maximum x-axis limits.

Return type:

tuple

cns.analyze.plot.y_limits(cns_df, column)

Get the limits for the y-axis based on the CNS data and the column.

Parameters:

cns_df (pandas.DataFrame) – DataFrame containing CNS data.
column (str) – Column name for the y-axis data.

Returns:

A tuple containing the minimum and maximum y-axis limits.

Return type:

tuple