plastid.bin.test_table_equality module

Regression-testing script designed to test equality between a newly-generated file and a reference file that is intended to contain the same data. Rows and columns are not expected to be in the same order. Float values are only required to be equal within a user-specified tolerance. NaN values evaluate as equal if and only if they occur in the same cell after sorting rows and columns. Finally, specific columns may be excluded by name (or number, if there is no header row).

Exit status is 0 if files are identical, 1 otherwise


Command-line arguments

Positional arguments

Argument Description
file1  
file2  

Optional arguments

Argument Description
-h, --help show this help message and exit
-v Give verbose output
--sort_keys  key [key ...] If specified, values will be sorted by the column(s) corresponding to these name or numbers (0-indexed) before comparison
--exclude  key [key ...] Key or number (0-indexed) of columns to exclude
--no_header If specified, no header row is present. Columns for all other command-line flags must be referenced by number (starting at zero) rather than name, and will be assumed to be in the same order in both files.
--tol  TOL Tolerance by which floats are allowed to differ (Default: 1e-8)

Script contents

plastid.bin.test_table_equality.equal_enough(col1, col2, tol=1e-10, printer=NullWriter())[source]

If col1 and col2 are both numeric, test that all their values are within tol of each other. numpy.nan values, if present, must be in the same place in each column. Ditto numpy.inf values.

If col1 and col2 are not numeric, return true if they have the same dtype and the same values in all cells.

Parameters:
col1 : numpy.ndarray

First column of data

col2 : numpy.ndarray

Second column of data

tol : float

Error tolerance for numeric data between col1 and col2, value-wise

printer : anything implementing a write() method (e.g. a NameDateWriter)

if not None, rich comparison information will be sent to this writer

Returns:
bool

True if col1 == col2 for non-numeric data; True if abs(col1 - col2) <= tol for numeric data; False otherwise

plastid.bin.test_table_equality.main(argv=['-T', '-E', '-b', 'readthedocs', '-d', '_build/doctrees-readthedocs', '-D', 'language=en', '.', '_build/html'], verbose=False)[source]

Command-line program

Parameters:
argv : list, optional

A list of command-line arguments, which will be processed as if the script were called from the command line if main() is called directly.

Default: sys.argv[1:]. The command-line arguments, if the script is invoked from the command line

verbose : bool, optional

If True, return

Returns:
int

0 if files are identical, 1 otherwise

str

Only returned if verbose is selected. String describing how tables are unequal (e.g. which columns failed, et c).

plastid.bin.test_table_equality.test_dataframe_equality(df1, df2, tol=1e-08, sort_columns=[], printer=NullWriter(), print_verbose=False, return_verbose=False)[source]

Test equality of dataframes over multiple columns, with verbose output. If NaNs or Infs are present, these must be present in corresponding cells in both dataframes for the dataframes to evaluate as equal.

Parameters:
df1 : pd.DataFrame

First dataframe

df2 : pd.DataFrame

Second dataframe

tol : float, optoinal

Maximum tolerated difference between floating point numbers (Default: 1e-8)

sort_columns : list, optional

List of column names or indices on which to sort data before comparing values

printer : file-like, optional

Any logger importing a write() method

print_verbose : bool, optional

Print verbose output to stderr (Default: False)

return_verbose : bool, optional

If True, return a list of failure messages

Returns:
bool

True if dataframes are equal, False otherwise

list

A list of strings explaining how df1 and df2 differ. Only returned if return_verbose is True