Finding patterns in data

About PATNCluster analysis

PATN v3 is the first commercial release of PATN for Windows™. PATN started life way back in 1984 as a practical tool for research into 'pattern analysis', a field established by Professor Bill Williams and Dr Godfrey Lance at CSIRO in Canberra Australia. Experience of hundreds of analyses over nearly 30 years has gone into this new version.

The DOS version contained an extremely broad range of modules and options. Experience suggested that 5% of the options were used 95% of the time. PATN was designed to confront users with an array of options, to  ensure that it was not used as a 'black box'.

PATN (for Windows) v1.00 built on the knowledge of thousands of analyses. PATN examines an imported dataset and derives a suite of appropriate options. A 'standard' analysis can now be done in a few seconds by pressing one button. The other major innovation in PATN v3 is the animated, interactive ordination plot display. This is the window where most interpretation occurs.

PATN v3, like it's predecessor DOS PATN, contains four basic components-

Input and output of data and results

PATN can import data in Excel™ or 'csv' (comma-separated-variable) format or even DOS PATN (both .prm/.dat/.clb/.rlb and .arc formats). PATN can also and input or export data tables, group compositions, ordination coordinates, dendrogram ordering and all basic statistics.

Preparation or pre-processing

PATN contains a range of transformation and standardization options that can be quickly applied to selected rows and columns of the Data Table. 'Visible Statistics' such as minima, means, maxima,  values > 0, sum and missing values (user selectable) are also displayed in margins of the Data Table. PATN can also rapidly display (a button press).

  • Histograms of one or more columns or rows of the Data Table
  • Scatter plots of two columns or rows of the Data Table

Analysis

PATN contains a broad mix of analysis utilities-

  • Association measures: Bray and Curtis, Minkowski Series, Two-Step and Kulczynski.
  • Classification: Hierarchical classification module with options Nearest Neighbour, Furthest Neighbour, Flexible UPGMA and Flexible WPGMA and a fast non-hierarchical module based on ALOC/B that can handle millions of objects.
  • Ordination: PATN uses the SSH algorithm (semi-strong hybrid multidimensional scaling). This algorithm can perform anything from pure metric to pure non-metric multidimensional scaling.
  • Networks: PATN can display the display a minimum spanning tree on the ordination plot to assist the  evaluation of data structures.

Analysis Evaluation

PATN contains a range of options that are designed to help you evaluate the analysis

  • Two-Way Table: PATN generates a coloured coded two-way table the classification of objects and variables.
  • Box and Whisker Plots. PATN's traditional box and whisker plots can be sorted by Data Table order of variables, by Kruskal-Wallis values or alphabetically.
  • ANOSIM across all groups (intrinsic or extrinsic) or all pair-wise groups.
  • Kruskal-Wallis values are used to estimate the utility of variables to discriminate between a set of groups. The Kruskal-Wallis values are a non-parametric equivalent to an f-ratio (using ranked values).
  • Principal Component Correlation (PCC) runs multiple linear regression on selected variables fin the data Table and the ordination coordinates. The result is a vector for each variable embedded in the ordination plot (with correlations calculated).
  • Monte-Carlo Attributes in Ordination (MCAO) randomly assigns values of variables to objects in the ordination plot and runs PCC a user-selected number of times (usually 100) to evaluate the 'robustness' of the PCC results.
  • A-Priori groupings for objects can be imported and used for diagnosis in the 3d ordination plot, box and whisker plots and ANOSIM.

Once the analysis options have been selected, PATN will run the complete analysis in a few seconds.

The 3d ordination plot

Just about all aspects of an analysis can be visualized on the ordination plot. It is this window where you will (if you're' smart) spend most of your time while using PATN.

3d ordination display

Features of this display include-

  • Rotate the plot in any direction automatically or manually
  • Meaningful object colours (or black and white symbols or labels or user-defined colours if you wish) or even group colors (PATN's or user-defined/a-priori)
  • Display group centroids or all objects (size of objects is user-defined)
  • Highlight differences between PATN and a-priori groups
  • Preselcted user-defined groups of objects can be displayed
  • Clicking objects will display labels and highlight the record in Data Table
  • Drag between objects to display the association value
  • Drag between group centroids to display best Kruskal-Wallis values
  • Display minimum spanning tree
  • Display data vectors from PCC
  • Display a legend of objects
  • Annotate the 3d plot with text
  • Save plot as an image or a video with the plot rotating through 360

Other Features of PATN

  • Extensive help with a comprehensive browse sequence (logical learning sequence)
  • Analysis can be done on either rows (objects) or columns (variables) of the Data Table.
  • Row or columns can be designated as either intrinsic (used in the analysis) or extrinsic (used only in the evaluation).
  • Extensive right mouse button functions in context for each window
  • User-selectable basic statistical values can be displayed in the margins of the Data Table. This helps spot data issues.
  • Results are readily available in separate TABS from the Data Table or via icons.
  • The 'Recipe' of analysis and evaluation steps are maintained in a time stamped log file (similar to what DOS PATN used to use). This file can be exported as a text file.
  • Dendrograms are scalable (you can even display a group dendrogram) and the fusion history is still visible in a dual panel.
  • Association matrices can be displayed, printed, exported an imported.

System Requirements

The faster the processor and the more memory you have in your system, the faster PATN will run and the bigger the datasets that can be analzed without heading out for a meal: See timing for larger datasets. The minimal configuration is-

  • Pentium 200 (!) or above with at least 512MB memory, 5mb disk space for PATN and an 800 * 600 colour display (i.e., not a lot!) and
  • Windows™ 98, 2000, XP, Me, Vista, Windows 7, Windows 8 (i.e., anything vaguely Windows!).

"I think you will find that PATN will provide you with fresh insights into your data and the bonus is that it is easy and fun to use."

Lee Belbin