Views of New Testament Textual Space (DRAFT)


Table of Contents

1. Abstract
2. Introduction
3. Computing Environment
4. Data Sets
5. Analysis Methods
5.1. Ranked Distances
5.2. Classical Multidimensional Scaling (CMDS)
5.3. Divisive Clustering (DC)
5.4. Neighbour Joining (NJ)
5.5. Partitioning Around Medoids (PAM)
6. Analysis Results
7. Discussion
7.1. New Testament Sections
7.1.1. Gospels
7.1.1.1. Matthew
7.1.1.2. Mark
7.1.1.3. Luke
7.1.1.4. John
7.1.2. Acts and General Letters
7.1.2.1. Acts
7.1.2.2. James
7.1.2.3. 1 Peter
7.1.2.4. 2 Peter
7.1.2.5. 1 John
7.1.2.6. 2 John
7.1.2.7. 3 John
7.1.2.8. Jude
7.1.3. Paul's Letters
7.1.3.1. 2 Corinthians
7.1.3.2. Hebrews
7.1.4. The Apocalypse
7.2. Patristic Studies
7.2.1. Brooks
7.2.2. Cosaert
7.2.3. Cunningham
7.2.4. Donker
7.2.5. EFH (Ehrman, Fee, and Holmes)
7.2.6. Ehrman
7.2.7. Mullen
7.2.8. Osburn
7.2.9. Racine
7.3. Special Studies
7.3.1. Wasserman
8. Conclusions
9. What Difference Does It Make?
10. Acknowledgments
A. Supplementary Information
Bibliography
[Note]Note

This is a draft.

The Greek New Testament was copied by hand for almost fifteen centuries until the advent of mechanized printing provided an alternative means of propagation. Translations into other languages were produced as well. Some of these — such as the Latin, Coptic, Syriac, and Armenian versions — appeared early and thus preserve ancient states of the text. Patristic citations form another class of evidence which allows varieties of the text to be associated with particular localities and epochs. Multivariate analysis of textual variation between these New Testament witnesses provides insights into their relationships to one another. Various modes of analysis can be applied, one of which allows witness locations to be plotted in a reference frame which might be called textual space.

As with every widely read work from antiquity, the New Testament exhibits textual variation introduced by scribes and correctors. Sites where textual variation occurs are identified by comparing extant witnesses of the text.[1] Alternative readings at a variation site may be classified in a number of ways, including as orthographic or substantive. Orthographic variations are often ignored as they only affect the surface form of a text and not its meaning. Substantive variations do affect meaning: they are called variants. The list of witnesses which support a particular reading of a particular variation site is known as the reading's attestation. A list of all readings at a variation site along with the attestation of each reading is called a variation unit. Critical editions often present variation units in an apparatus. The present study is based on analysis of data sets which in many cases derive from the apparatus of one or more critical editions.

There is an ongoing effort to establish the initial text which stands behind the range of texts found among surviving witnesses of the New Testament.[2] The most important witnesses for establishing the initial text fall into these categories:

  • Greek manuscripts

  • ancient versions

  • patristic citations.

Greek manuscripts are the primary witnesses to the text of the New Testament. Ancient versions are early translations of the Greek text into languages such as Latin, Coptic, Syriac, and Armenian. It is often possible to establish which Greek variant a version supports by translating its text at a variation site back into Greek. Patristic citations are quotations of the scripture by Church Fathers. Which variant was in a Church Father's copy of the text at a particular variation site can often be discerned if that part of the text is covered by one of his quotations.

A large proportion of the textual evidence disappeared long ago. Even a comprehensive data set which includes all readings of all extant witnesses is still a mere sample of what once existed. In general, the older the copy, the more likely it is to have been lost. This lack of data presents a fundamental problem: if extant texts do not represent the oldest copies then the survivors will give a skewed impression of the initial text. The only way around this fundamental problem is to believe that extant texts adequately represent the initial text. Happily, there is a solid foundation for this belief which is the faithfulness of copyists made evident by surviving examples of their work.

Even though much is lost, a stupendous amount of evidence remains. There are many thousands of manuscripts in Greek, Latin, Syriac, Armenian, and other languages. Patristic citations are also very numerous. Given such a great cloud of witnesses, it can be difficult to see where each one stands in relation to the others. Fortunately, various methods of statistical analysis can be applied to data sets which relate to textual variation in order to explore relationships among the witnesses.

Analysis might begin from a number of starting points. One is a data set derived from a critical apparatus which gives attestations (i.e. lists of witnesses) in support of variants found at variation sites. In nearly all cases, practical considerations restrict an apparatus to presenting a sample of extant texts. Results obtained by analysis of these data sets are therefore provisional because it is always possible that including further data would produce different results. However, it is reasonable to expect that analysis results will approximate those that would be obtained if a more comprehensive data set were analysed provided that the sample is sufficiently large and has been selected without systematic bias.

The information contained in an apparatus must first be encoded as illustrated by reference to this entry from the fourth edition of the United Bible Societies' Greek New Testament (UBS4):


The data sets presented in this article use a number of encoding conventions. Exotic characters and superscripts can cause problems when plotting analysis results so witness identifiers (i.e. sigla) are Romanized and superscripts are replaced by hyphenated sequences of characters. Apart from these changes, the method of identifying witnesses used by the source of a data set is usually retained. Be warned, dear reader: this approach is liable to cause confusion when two sources use different identifiers for the same witness. For example, Codex Sinaiticus may be identified as Aleph or 01. Also, the critically established text used in the INTF's Editio Critica Maior may be referred to as A (for Ausgangstext), making it easy to confuse with the A often used to represent Codex Alexandrinus.

When it comes to encoding apparatus entries, the textual states found among the witnesses can be represented by numerals, letters, or other symbols. In the present example, the first variant is encoded as 1, the second as 2, and so on. The state of a witness is classified as undefined and encoded as NA (for not available) when it is not clear which variant the witness supports. For manuscripts this may be due to physical damage or because the manuscript does not include the section of text being examined; for versions, it may not be clear which state of the Greek text is supported by a back-translation of the version; for patristic citations, the reading of a Church Father's text may be unclear if the quotations are not exact (e.g. adaptations, allusions, or quotations from memory) or if different witnesses of the Church Father's text have different readings. In the present example, a number of versions (Latin, Syriac, Coptic) and patristic citations (e.g. those of Irenaeus, Ambrose, Chromatius, Jerome, and Augustine) are treated as undefined because it is not clear which variant each supports at this variation site.[3]


Encoded variants are entered into a data matrix which has a row for every witness and a column for every variation site. The appropriate code is entered at the cell corresponding to a particular witness and variation site, namely that cell located at the intersection of the witness row and variation site column. Manuscript correctors are treated as separate witnesses, as are supplements.


The next step is to construct a distance matrix which tabulates the simple matching distance between each pair of witnesses sufficiently represented in the data set. The simple matching distance between two witnesses is the proportion of disagreements between them in those variation units where the textual states of both are defined. Being a ratio of two pure numbers, this quantity is dimensionless (i.e. has no unit). It varies from a value of zero for complete agreement to a value of one for no agreement.[4] A witness only qualifies for inclusion in a distance matrix if all distances for that witness are calculated from at least a minimum number of variation units. This constraint is intended to reduce sampling errors to a tolerable level. The minimum required number for the distance matrices of this study is usually set at fifteen.[5]


Various analytical methods can applied to a data set derived from a critical apparatus to explore relationships between witnesses. All of the results presented in this article are obtained using a statistical computing language called R. The analysis is performed by means of R scripts written by the author which are available here. The R program and additional packages (e.g. cluster, rgl, ape) required to run the scripts can be installed using instructions provided at the R web site.

Readers are encouraged to use the scripts. There are various ways to run a script once the R environment is installed. For users who prefer a command line interface, typing R into a terminal window provides an R prompt. (It helps to change to the directory which holds the scripts before launching R.) A command can then be entered in order to run a script. As an example, the command source("dist.r") typed at the R prompt causes the dist.r script to construct a distance matrix from the specified data matrix. Parameters such as paths to input and output files are specified in the scripts, which users are free to edit.

The data sets analysed in this article derive from various sources. Each source is assigned an identifier based on the author or party who produced it. A source is often used to produce data sets for a number of New Testament sections such as individual gospels and letters. Each analysis result is keyed to the relevant section and source identifier so that its underlying data set can be identified.

The data sets generally retain the symbols used by their associated sources to represent New Testament witnesses. Some represent manuscripts by Gregory-Aland numbers (e.g. 01, 02, 03, 044) while others use letters or latinized forms (e.g. Aleph, A, B, Psi). These symbols carry through to the analysis results. In INTF data, ECM or A (for Ausgangstext or initial text) represents the text of the Editio Critica Maior. The A for Ausgangstext in INTF data sets should not be confused with the A for Codex Alexandrinus in other data sets. Abbreviations UBS, WH, and TR stand for the texts of the United Bible Societies' Greek New Testament, Westcott and Hort's New Testament in the Original Greek, and the Textus Receptus, respectively. Maj, Byz, and Lect stand for majority, Byzantine, and lectionary texts, respectively. The relevant printed editions should be consulted for explanations of what these group symbols represent.

A source may be in the form of apparatus entries, tables of percentage agreement, or lists of pairwise proportional agreement. If the source is an apparatus then it is used to construct one data matrix per desired section. Each data matrix includes those witnesses and variation sites covered by the apparatus, using symbols such as numerals or letters to encode reported textual states (i.e. readings). A distance matrix is then constructed from the data matrix. If the source only reports percentage or proportional agreement between witnesses then a distance matrix is constructed directly from the agreement data and no data matrix is produced. Distances are usually specified to three decimal places regardless of whether this level of precision is warranted.

Analysis cannot proceed if a distance matrix has missing entries. This problem can be avoided by manually producing multiple distance matrices from the same source data, each omitting a particular witness whose inclusion would create an empty cell. This is done for a number of the distance matrices presented below, including Brooks' table for John (where there is a missing cell for C and Old Latin j) and Fee's tables for John 1-8 (which lacks cells where the first hand and corrector intersect for P66 and Aleph).

It is helpful to know what analysis results look like when there is no clustering among the objects being analysed. (Generic terms such as object, observation, case, or item may be used for the things being compared when they are not necessarily New Testament witnesses.) We have a natural facility for recognising group structure but are also prone to mistake a purely random distribution of items for a cluster. One way to avoid this kind of error is to be familiar with analysis results produced from a data set that has no group structure. With this purpose in mind, a control data set may be generated which is analogous to its model data set in various respects (e.g. number of objects, number of variables, mean distance between objects) but has no actual clustering among its objects.

A control data set is generated by performing c trials to randomly select one of two possible states (1 and 2) then repeating this r times to produce a data matrix with r rows of objects and c columns of variables. The generator aims to produce objects which have a mean distance of d between them. Values for r, c, and d are derived from the model: r is the number of objects in the model distance matrix; c is the rounded mean number of variables in the objects from which the model distance matrix was calculated; and d is the mean of distances in the model distance matrix. The control data matrix is then used to calculate a control distance matrix which has the same number of objects as the model and approximately the same mean distance between objects.[6]

The binomial distribution predicts the range of distances expected to occur between pairs of objects generated in this way. A 95% confidence interval is the range of distances expected to occur for 95% of randomly generated cases. Only 5% of distances between two randomly generated objects fall outside the upper and lower limits defined by this interval. A distance outside this range, either less or more, is statistically significant in the sense that it is unlikely to happen by chance (though there is a 5% chance it will). A distance outside the normal range defined by the 95% confidence interval indicates an adjacent or opposite relationship between two objects: adjacent if the distance is less than normal and opposite if greater.[7]

While distances outside the normal range are unlikely to occur by chance, a distance inside that range does not necessarily imply lack of relationship between two objects: a relationship between the two may exist but it is not possible to say so with confidence. The relative size of the normal range contracts as the number of places compared increases so a distance which is not statistically significant in one data set may be statistically significant in another which includes more variation sites.

The following table presents the data sets and their sources. Links in the table provide access to data and distance matrices which are formatted as comma-separated vector (CSV) files so that they can be downloaded and imported into a spreadsheet program. A distance matrix is always provided but a data matrix is only included if one has been constructed. If there is no data matrix then NA for not available is entered in the relevant column.

Table 2. Data sets and their sources

Source Description Section Data matrix Distance matrix
Brooks Tables of percentage agreement from James Brooks' New Testament Text of Gregory of Nyssa covering: Matthew (table 1, 58-9); Luke (table 7, 90-1); John (table 13, 138-9); and Paul's Letters (table 18, 254-5). These were transcribed by Richard Mallett. Matthew NA
Luke NA
John (C) NA
John (it-j) NA
Paul's Letters NA
CB Data matrices for each Gospel compiled by Richard Mallett using Comfort's New Testament Text and Translation Commentary and Comfort and Barrett's Text of the Earliest New Testament Greek Manuscripts. Matthew
Mark
Luke
John
Cosaert Data matrices for each Gospel compiled from apparatus entries in Carl P. Cosaert's Text of the Gospels in Clement of Alexandria. Matthew
Mark
Luke
John
Cunningham Tables of percentage agreement for the Gospel of John and Paul's Letters from Arthur Cunningham's New Testament Text of St. Cyril of Alexandria, 421-2 and 753. John NA
Paul's Letters NA
Donker Data matrices for Acts, the General Letters, and Paul's Letters from Gerald Donker's Text of the Apostolos in Athanasius of Alexandria. Gerald Donker and the SBL have made this data available through an archive located at sbl-site.org/assets/pdfs/pubs/Donker/Athanasius.zip. May their respective tribes increase! Acts (all)
Acts 1-12
Acts 13-28
General Letters
Paul's Letters
Romans
1 Corinthians
2 Cor. - Titus
Hebrews
EFH Data used by Jared Anderson for his ThM thesis, Analysis of the Fourth Gospel in the Writings of Origen. The data was originally collected by Bart D. Ehrman, Gordon D. Fee, and Michael W. Holmes for their Text of the Fourth Gospel in the Writings of Origen. (Bruce Morrill did the statistical analysis presented in that volume.) A revised version of Anderson's thesis will be published in SBL's New Testament in the Greek Fathers series. John
Ehrman Table of percentage agreement for the Gospel of Matthew from Bart Ehrman's Didymus the Blind and the Text of the Gospels. This was transcribed by Richard Mallett. Matthew NA
Fee Tables of percentage agreement from three articles by Gordon Fee: (1) a table covering Luke 10 from The Myth of Early Textual Recension in Alexandria; (2) tables covering John 1-8, John 4, and John 9 from Codex Sinaiticus in the Gospel of John; (3) another table covering John 4 but including patristic data from The Text of John in Origen and Cyril of Alexandria. Two distance matrices are produced for each table of percentage agreement with a blank entry for agreement between the first hand and corrector of a manuscript. Luke 10 NA
John 1-8 NA
John 1-8 (corr.) NA
John 4 NA
John 4 (corr.) NA
John 4 (pat.) NA
John 4 (pat., corr.) NA
John 9 NA
John 9 (corr.) NA
Hurtado Tables of percentage agreement from Larry Hurtado's Text-Critical Methodology and the Pre-Caesarean Text. There is one table for each of the first fourteen chapters of the Gospel of Mark, one for Mark 15.1-16.8, and another for places where P45 is legible. Data from an augmented version of Hurtado's P45 table is presented below in the Mullen source entry. Mark 1 NA
Mark 2 NA
Mark 3 NA
Mark 4 NA
Mark 5 NA
Mark 6 NA
Mark 7 NA
Mark 8 NA
Mark 9 NA
Mark 10 NA
Mark 11 NA
Mark 12 NA
Mark 13 NA
Mark 14 NA
Mark 15.1-16.8 NA
Mark (P45) NA
INTF-General Distance matrices derived from information in a database related to the INTF's Novum Testamentum Graecum: Editio Critica Maior: Catholic Letters volumes. The INTF kindly granted access to this data. James NA
1 Peter NA
2 Peter NA
1 John NA
2 John NA
3 John NA
Jude NA
INTF-Parallel Distance matrices made from tables located at http://intf.uni-muenster.de/PPApparatus/. These present data related to Strutwolf and Wachtel (eds.), Novum Testamentum Graecum: Editio Critica Maior: Parallel Pericopes. The INTF has generously provided open access to this data. Matthew
Mark
Luke
John
Mullen Data extracted from Roderic Mullen's The New Testament Text of Cyril of Jerusalem. Two data sets have been prepared for the Gospel of Mark: one is a data matrix based on citations isolated by Mullen (112-7); the other is a distance matrix corresponding to a table of percentage agreement which relates to the parts of Mark's Gospel covered by P45 (41). Mullen based the latter on data compiled by Larry Hurtado then added other texts such as Family 1, 28, 157, and 700 (40, n. 81). Mark
Mark (P45) NA
Osburn Tables of percentage agreement for Acts and Paul's Letters from Carroll Osburn's Text of the Apostolos in Epiphanius of Salamis. Richard Mallett transcribed these tables. Acts NA
Paul's Letters NA
Racine Table of percentage agreement for Matthew's Gospel from Jean-François Racine's Text of Matthew in the Writings of Basil of Caesarea. This was transcribed by Richard Mallett. Matthew NA
Richards Table of percentage agreement from W. L. Richards' Classification of the Greek Manuscripts of the Johannine Epistles (72, 76-84). 1 John NA
UBS2 Tables of percentage agreement compiled from the apparatus of the second edition of the UBS Greek New Testament by Maurice A. Robinson. The tables were originally presented in Robinson's Determination of Textual Relationships and Textual Interrelationships. They were transcribed by Claire Hilliard and Kay Smith. Matthew NA
Mark NA
Luke NA
John NA
Acts NA
UBS4 Data matrices constructed from the apparatus of the fourth edition of the UBS Greek New Testament. Richard Mallett constructed the matrices for Mark, 2 Corinthians, and Revelation. A substantial part of the matrix for Matthew was encoded by Mark Spitsbergen. (Only the first fourteen chapters of Matthew are presently covered.) The UBS4 apparatus includes minuscule 2427 which is now regarded as a forgery. The data for this manuscript has been retained for the sake of interest; dropping it would have little effect on analysis results. In some cases, the evidence for a number of similar witnesses is consolidated to produce a group variant. For example, the majority reading of vg-cl, vg-st, and vg-ww is counted as the reading of the Vulgate (vg) in 1 John. Matthew 1-14
Mark
1 Peter
1 John
2 Corinthians
Hebrews
Revelation
Wasserman Tables of proportional agreement from Tommy Wasserman's Patmos Family of New Testament MSS covering Matt 19.13-26, Mark 11.15-26, Luke 13.34-14.11, John 6.60-7.1, and the Pericope Adulterae (usually John 7.53-8.11). The underlying collations used a reconstructed text to represent Family Π in Matt 19.13-26 and the Pericope Adulterae, which text is labelled f-Pi in the analysis results. Matthew 19.13-26 NA
Mark 11.15-26 NA
Luke 13.34-14.11 NA
John 6.60-7.1 NA
PA NA
Controls Data sets based on randomly generated objects which are analogous to their model data sets. Objects in control data sets are by definition unrelated. Mark, UBS4

This study presents results obtained by applying the following analysis methods to the data sets:

These analysis modes will now be introduced by reference to two data sets:

  1. a model derived from the UBS4 apparatus for the Gospel of Mark

  2. a control comprised of randomly generated objects which are by definition unrelated.

Clusters may be isolated by inspecting a CMDS or NJ plot, cutting a DC dendrogram, or producing a partition using PAM analysis. Similar objects tend to be similar distances from a reference object, near each other in a CMDS plot, in the same branch of DC and NJ plots, and in the same group of a PAM partition. The more eccentric an object when compared to others in the data set, the more isolated it will appear in analysis results. If an object is mixed, being comprised of a mixture of states characteristic of differing groups, then a CMDS result will locate it between the relevant groups, proportionally closer to those whose characteristics it most often contains. In DC, NJ, and PAM analysis, a slight change in the distance matrix can cause a mixed object to leap from one branch, cluster, or group to another.

The respective analysis results are often but not always consistent. If all of the analysis results point to the same conclusion with respect to implied clustering then that can be taken as a firm result; if they differ then each result needs to be handled with due caution. The distance matrix remains the final arbiter when the affiliation of an object is not clearly indicated by concurrence of analysis results. When the classification of an object is uncertain, further information may produce a more definite result. However, if an object has a mixed nature then it may remain difficult to classify as anything but a mixture. A mixed object will tend to be isolated unless other objects happen to have similar mixtures of states.

One aim of New Testament textual research is to recover the initial text, namely the common ancestor of extant New Testament texts. Some aspects of the results produced by the analysis modes used in this study can be interpreted in terms of temporal development. In particular, there may be points of contact between the the family tree of New Testament texts and the tree-like structures produced by divisive clustering and neighbour joining. However, these tree-like analysis results do not provide unequivocal guidance on the location of the initial text. Any node (i.e. junction) or leaf (i.e. terminal) of a DC or NJ tree could be closest to the initial text. If one were to make a string model of such a tree, with knots at every node tying together string segments of the appropriate lengths, the model could be picked up at any node or leaf. The point being held would become a new tree root so there would be as many possible trees as the number of nodes and leaves. The trick is to decide where the root of the tree is located, a topic which will occupy the field of New Testament textual research for some time to come. The Coherence-based Genealogical Method (CBGM) developed by the INTF can be used to investigate whether the witnesses in one branch are closer to the initial text than those in another.[8] Phylogenetic techniques such as described in Spencer, Wachtel and Howe's Greek Vorlage of the Syra Harclensis can also be used to investigate the priority of texts. Yet another possibility is to see where texts reconstructed from early patristic citations are located in trees produced by DC and NJ analysis.

Ranking involves selecting a reference object then extracting its row of the distance matrix. Entries in that row are then ordered by increasing distance from the reference. As an example, the following ranks witnesses in the UBS4 data set for the Gospel of Mark by distance from minuscule 205, which is a member of Family 1. The reference witness (i.e. 205) is a distance of zero from itself and would stand at the head of the list if included.[9]


Statistical analysis shows what range of distances is expected to occur between artificial objects comprised of randomly selected states. Distances in this normal range (i.e. those for 1243, slav, ..., Psi) are marked by asterices to show they are not statistically significant. Some texts (i.e. f-1, Lect, ..., 597) have an adjacent relationship to minuscule 205 while others (i.e. Delta, cop-bo, ..., D) are opposite.

A ranked list of distances from one member of the control data set shows what to expect for unrelated objects. The 95% confidence interval calculated using the binomial distribution with parameters derived from the model data set has lower and upper bounds of 0.374 and 0.553, respectively. (An interval of this kind can be compactly written as [0.374, 0.553].) As can be seen, distances in the control data set tend to fall within these bounds.[10]


A list of ranked distances can be produced for every object in a data set. While clustering among members of the data set might be discerned from lists of this kind, the other analysis modes are better suited to discovering inherent group structure.

Classical multidimensional scaling finds the set of object coordinates which best reproduces the actual distances between objects in the distance matrix. A plot of these coordinates shows how the objects are disposed with respect to one another when all distances are considered. This study refers to such a plot as a map and uses the term textual space for the space obtained when the objects are textual witnesses.

Achieving a perfect spatial representation of a distance matrix may require any number of dimensions up to one less than the number of objects. This presents a problem when a large number of objects is being examined because our spatial perception is three-dimensional. Fortunately, three dimensions is often sufficient to achieve a reasonably good approximation to the actual situation. CMDS analysis produces a coefficient called the proportion of variance which indicates how much of the information contained in a distance matrix is explained by the associated map. This coefficient ranges from a value of zero to one, with a value of one indicating that the map is a perfect representation of the entire set of actual distances.

The CMDS map obtained from the UBS4 data set for Mark's Gospel shows that the textual space formed by New Testament witnesses has structure. The associated proportion of variance figure is 0.51, meaning that about half of the entire distance information is captured in the plot.[11]


The galactic imagery Eldon J. Epp uses to describe text-types seems apt for the clusters evident in this analysis result:

A term such as group, cluster, or nucleus might be used to describe a local maximum in the density of objects within a CMDS map. A line which joins two items might be called a trajectory, and a region between groups where there is a higher than usual concentration of witnesses might be called a stream or corridor.[13]

CMDS analysis of the control distance matrix produces the following result:


Any appearance of clustering in the control map is illusory: its objects are by definition unrelated, having been randomly generated. There are various differences between the model and control maps. The model map has an irregular shape while the control map is globular. Another difference relates to the respective map diameters: the volume enclosed by map axes is greater for the model than the control. This indicates that dispersion among New Testament texts is greater than would be expected if those texts resulted from random selection among alternative readings. Yet another difference is the proportion of variance figures for the model and control maps, which are respectively 0.51 and 0.16. The dimensionality of the New Testament distance data is lower than for the control data, making it easier to squeeze into only three dimensions.

Divisive clustering begins with a single cluster and ends with individual objects. The R program documentation describes the clustering algorithm as follows:[14]

At each stage, the cluster with the largest diameter is selected. (The diameter of a cluster is the largest dissimilarity between any two of its observations.) To divide the selected cluster, the algorithm first looks for its most disparate observation (i.e., which has the largest average dissimilarity to the other observations of the selected cluster). This observation initiates the "splinter group". In subsequent steps, the algorithm reassigns observations that are closer to the "splinter group" than to the "old party". The result is a division of the selected cluster into two new clusters.

This type of analysis produces a dendrogram which shows the heights at which clusters divide into sub-clusters. A divisive coefficient which measures the amount of clustering is presented as well. The value of this coefficient ranges from zero to one with larger values indicating a greater degree of clustering. A DC dendrogram does not necessarily reflect the family tree of objects in the underlying data set. Instead, it merely shows a reasonable way to progressively subdivide an all-encompassing cluster until every sub-cluster is comprised of a single object.[15]


The vocabulary of tree structures is useful when discussing DC dendrograms. A branching point is called a node, each structure which descends from a node is called a branch, and terminals are called leaves. The dendrograms produced by analysing New Testament data have a self-similar character where, apart from scale, smaller parts have the same appearance as larger parts. Each branch contains its own sub-branches, unless terminated by leaves (i.e. individual witnesses).

A partition based on a DC dendrogram is obtained by means of a horizontal line which cuts across the dendrogram at some height to produce a set of separate branches. One possible height to cut a DC dendrogram is the upper critical limit of distances. Such a large distance is seldom encountered among unrelated objects. Cutting at the upper critical limit produces the following partition of the model data set.[16]


Performing DC analysis on the control distance matrix produces this dendrogram:


The model and control dendrograms seem quite similar at first glance although there are important differences: nearly all of the branching heights in the control dendrogram are in the normal range [0.374, 0.553], and the divisive coefficient for the model (0.74) is much larger than for the control (0.33).

The objects in the control can be grouped even though it is pointless to do so: if nearly all distances between objects fall within the normal range then partitioning may well be futile. In the present case, group sizes are more uniform for the control than model although there is no reason why a data set with actual groups cannot have uniform group sizes.


Neighbour joining (NJ) is an iterative process that begins with a starlike tree. A pair of neighbours is chosen at every step, being that pair of objects which gives the smallest sum of branch lengths. A node is then inserted between this pair, which node is regarded as a single object for subsequent steps. The procedure seeks to find the minimum-evolution tree, being that tree which most economically accounts for the observed set of distances between objects. While the method produces a unique final tree under the principle of minimum evolution, it does not always produce the minimum-evolution tree. However, computer simulations show that it is quite efficient in obtaining the correct tree topology.[17]

As with DC dendrograms, the vocabulary of tree structures is useful for discussing NJ analysis results. The NJ procedure produces an unrooted tree, meaning that any node or terminal in the result could be closest to the common ancestor of the entire tree.

Applying the NJ procedure to the model distance matrix produces a tree whose branches correspond to clusters seen in the CMDS and DC results obtained from the same distance matrix:[18]


The tree obtained from the control distance matrix retains the NJ algorithm's initial starlike structure. This shows what kind of topology (i.e. shape) to expect for an NJ result derived from a data set comprised of unrelated objects. The marked difference from the model result is another indication that clustering exists among texts of Mark's Gospel.


Partitioning around medoids (PAM) builds clusters around representative objects called medoids. The program documentation provides this description:[19]

The ‘pam’-algorithm is based on the search for ‘k’ representative objects or medoids among the observations of the dataset. These observations should represent the structure of the data. After finding a set of ‘k’ medoids, ‘k’ clusters are constructed by assigning each observation to the nearest medoid. The goal is to find ‘k’ representative objects which minimize the sum of the dissimilarities of the observations to their closest representative object.

Plotting a statistic called the mean silhouette width against each possible number of groups indicates which numbers of groups are more natural for the data set. In this case, the plot indicates that three, six, eleven, and twenty-four are among the more preferable numbers of groups.[20]


Using PAM analysis to divide the data set into these numbers of groups produces the following partitions:[21]


Brackets mark the medoid of each group. A medoid has the minimum mean distance to other group members and is the most central one for groups of three or more items. For two member groups, the PAM algorithm chooses one as the medoid.

[Note]Note

This study uses the bracketed medoid identifier as a label for the associated group. For example, [Psi] refers to the first group in the above three-way partition.

A singleton is a solitary item which forms its own group. It is isolated, not having any close relatives within the data set. The medoid of a singleton group is the sole member itself. The above table lists singletons under a separate heading so that the numbers of singletons and multiple member groups sum to the total number of groups in a partition. At other places in this study, a phrase such as the rest may be used instead of a list of identifiers when there are many singletons in a partition.

Not all members of a group need be a good fit. PAM analysis calculates a statistic called the silhouette width for each object in the data set being partitioned into a chosen number of groups. Its value ranges from +1 to -1: the closer it is to +1, the better the associated object fits into its assigned group; by contrast, the closer the statistic is to -1, the worse the fit. Like hammering square pegs into round holes (or vice versa), negative silhouette widths indicate that the affected objects are not well suited to their assigned places. The last column in the table lists witnesses with negative silhouette widths, putting those with the most negative values last. The worst classified witnesses lie farthest to the right in such a list. A poor fit may indicate that a witness has a mixed text or that the chosen number of groups is too small for a text to be grouped with like texts alone.

As a data set is partitioned into larger numbers of groups, parent groups tend to spawn child groups while themselves contracting into narrower, more coherent groups. Group [Byz] is an example: as the same data set is partitioned into three, then six, then eleven, then twenty-four groups, this group contributes items to various other groups while retaining a core membership. Partitioning a data set into a large number of groups reveals coherent cores, entirely comprised of closely related members.

Adding the partition's number of groups to the group label produces a more specific identifier. For example, [Byz] (3) refers to the group with medoid Byz in a three-way partition while [Byz] (24) refers to the group with medoid Byz in a 24-way partition. Corresponding groups such as [Byz] (3) and [Byz] (24) are often produced when the same data set is divided into different numbers of parts. However, the medoids of such groups are not necessarily the same. Adding or subtracting even a single member can cause the medoid of a group to change. Consequently, correspondence must be established on the basis of shared membership, not common medoids. If groups from different partitions have the same core membership but differing medoids then descendant groups can be labelled by chaining the respective medoids together. To give an example from the table above, [Psi] (3) and [B] (6) share members but their medoids differ. Labelling the subgroup as [Psi-B] (6) indicates the connection with the parent group from which its members are drawn.

The MSW plot for the control data set has a number of peaks despite the underlying data set having no actual groups.


Comparing the model and control MSW plots reveals a great difference in the respective magnitudes of the MSW statistic. Even though the control data set is randomly generated and consequently contains no actual groups, there is nevertheless random clustering which accounts for the peaks seen in the associated MSW plot. The MSW plot for the control data set establishes a noise level: peaks with such small magnitudes are worthless as indicators of grouping.

Table 8. Analysis results

Section Source CMDS DC NJ
Matthew Brooks
CB
Cosaert
Ehrman
INTF-Parallel
Racine
UBS2
UBS4
Wasserman
Mark CB
Cosaert
Hurtado (Mk 1)
Hurtado (Mk 2)
Hurtado (Mk 3)
Hurtado (Mk 4)
Hurtado (Mk 5)
Hurtado (Mk 6)
Hurtado (Mk 7)
Hurtado (Mk 8)
Hurtado (Mk 9)
Hurtado (Mk 10)
Hurtado (Mk 11)
Hurtado (Mk 12)
Hurtado (Mk 13)
Hurtado (Mk 14)
Hurtado (Mk 15+)
Hurtado (P45)
Mullen
Mullen (P45)
INTF-Parallel
UBS2
UBS4
UBS4 (control)
Wasserman
Luke Brooks
CB
Cosaert
Fee (Lk 10)
INTF-Parallel
UBS2
Wasserman
John Brooks (C)
Brooks (it-j)
CB
Cosaert
Cunningham
EFH
Fee (Jn 1-8)
Fee (Jn 1-8, corr.)
Fee (Jn 4)
Fee (Jn 4, corr.)
Fee (Jn 4, pat.)
Fee (Jn 4, pat., corr.)
Fee (Jn 9)
Fee (Jn 9, corr.)
INTF-Parallel
UBS2
Wasserman
PA Wasserman
Acts Donker
Donker (Acts 1-12)
Donker (Acts 13-28)
Osburn
UBS2
General Letters Donker
James INTF-General
1 Peter INTF-General
UBS4
2 Peter INTF-General
1 John INTF-General
Richards
UBS4
2 John INTF-General
3 John INTF-General
Jude INTF-General
Paul's Letters Brooks
Cunningham
Donker
Osburn
Romans Donker
1 Corinthians Donker
2 Corinthians UBS4
2 Cor. - Titus Donker
Hebrews Donker
UBS4
UBS4 (B)
Revelation UBS4

These analysis results provide much food for thought. There are so many results with so many nuances that a comprehensive discussion might never be finished. Only the most salient points are therefore presented, under the headings of (1) New Testament sections, (2) patristic studies, and (3) special studies. Discussion is usually restricted to data sets with greater breadth (i.e. covering more kinds of witnesses) and depth (i.e. based on more variation sites) although other data sets may be covered as well. Neighbour-joining (NJ) and partitioning around medoids (PAM) results typically serve as discussion starters.

Various caveats apply:



  • Groups exist but are not always well defined.

  • NJ branches have points of contact with PAM divisions but the correspondence is sometimes weak.

  • PAM divisions have points of contact with traditional divisions. E.g. in the six-way partition: [cop] = Alexandrian; [K] = Byzantine; [arm] = Streeter's Eastern; [it-d] + [it-b] = Western; [it-aur] = Vulgate.

  • Some branches are associated with ancient versions.

  • If the sampled texts developed from a single initial text then it is reasonable to look for the initial text's nearest extant relatives where major branches of the NJ tree converge. Accordingly, key texts to consider when recovering the initial text of Matthew include 33, 892, and 1546.

  • One way to recover the initial text at every variation unit is to take the most frequent reading across a number of texts near the junction of major branches of the NJ tree. Another approach is to take the most frequent reading across group medoids. (Doing this for artificial texts generated by a copying simulation program recovers the initial text in better than 8 out of ten cases.) Branches or groups that are known to be secondary (e.g. [it-aur] (6), namely the Vulgate group) can be eliminated from consideration before using these recovery procedures.





  • Key texts to consider when recovering initial text based on NJ result: 700, 892, Theta, Family 13, Peshitta Syriac, Diatessaron.

  • NJ result indicates that Syriac (except Harclean and Palestinian), Armenian, Georgian, and Latin occupy the same branch which is devoid of Greek MS support (except D).

  • 21-way partition has a cluster (W, Psi, Family 1, 565, 1009, 1079, 1365, 1546) centred on Basil (Cappadocia, d. 379). Did the writings of Church Fathers interfere with the biblical text? Monks might be expected to use theologically-correct phrases when copying (cf. Ehrman). Some members of this cluster are in Streeter's Caesarean group in Mark.

  • D, it-d, and Eusebius associate in 21-way partition. This suggests a link between the D-text and the text used by Eusebius. Is this a clue to the provenance of Codex Bezae? (Spelling analysis would be interesting.)

PAM partitions of INTF Parallel Pericopes data set can be compared with classifications proposed by (1) von Soden and (2) Wisse.

Table 12. PAM (Luke, INTF-Parallel)

No. Groups and their [medoids] Singletons Poorly classified (worst last)
3
01 019 03 040 1241 579 [A] P75 011 013 017 02 021 0211 022 024 028 030 031 032 033 034 036 037 038 039 04 041 044 045 047 05 07 09 1 1009 1012 1071 1093 1110 118 1230 1253 1273 1279 1296 130 131 1326 1328 1329 1330 1331 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1421 1424 1446 1451 1457 150 1502 1506 1528 1555 157 1574 1579 1582 1593 16 1602 1604 1661 1675 1692 174 176 1780 18 1823 184 191 205 209 2193 22 222 233 2372 2411 2542 2546 2680 2726 273 2737 2766 2786 28 3 31 33 348 [35] 372 4 427 555 565 61 700 713 732 735 740 752 79 791 792 807 827 829 851 863 892 954 968 979 124 13 346 543 69 788 [826] 828 983
33 04 892
11
01 019 03 1241 579 [A] P75 011 013 0211 022 028 030 031 034 036 037 039 045 047 07 09 1110 1273 1296 130 1326 1331 1335 1340 1341 1343 1344 1347 1348 1424 1555 1604 1675 176 1780 22 2372 2546 273 [3] 4 732 740 791 017 02 [041] 1346 1421 2411 021 044 1009 1093 1230 1253 1328 1329 1330 1333 1334 1336 1337 1338 1339 1342 1345 150 1502 1506 1574 1602 1661 1692 174 18 1823 191 222 233 2680 2737 2766 2786 28 31 [35] 372 427 565 61 713 735 807 851 863 954 979 [024] 032 033 038 04 040 1071 157 33 700 79 792 892 1 118 131 [1582] 205 209 2193 2542 [1012] 1451 968 124 13 346 543 69 788 [826] 828 983 1279 1528 1579 16 [184] 2726 348 555 752 829 [1446] 1457 1593 827
05 044 1337 1241 1692 807 1342 1253 233 2680 954 892 851 61 979 713 1336 1093 1661 191 1329 1009 1339 1338 28 18 1334 1328 35 372 2737 31 157 1502 427 565 1333 33 1330 1602 1823 2766 150 174 021 792 1071 222 79 033 700 1345 04 032 040 038
41
011 013 0211 022 028 030 031 034 036 037 039 045 047 07 09 1110 1273 1296 130 1326 1331 1335 1340 1341 1343 1344 1347 1348 1555 1780 22 2372 273 [3] 4 732 791 017 02 [041] 1346 1421 2411 019 [040] 33 021 044 1230 1328 1329 1333 1334 1336 1338 1339 1345 150 1502 1506 1574 1602 174 18 1823 191 233 2737 2766 2786 28 31 [35] 372 427 565 61 713 735 807 851 954 979 [024] 032 [03] A P75 1 118 131 [1582] 205 209 2193 [1012] 1451 968 124 13 346 543 69 788 [826] 828 983 [1279] 1579 184 2726 348 555 752 829 1424 [1675] [1446] 1457 1593 827 1528 [16] 1604 [2546]
01 033 038 04 05 1009 1071 1093 1241 1253 1330 1337 1342 157 1661 1692 176 222 2542 2680 579 700 740 79 792 863 892 044 28 150 1502 427 565 1823 1602 31 2766 021 791 1333 1506 174 1345 191 954

  • [826] (i.e. Family 13) is highly coherent, hardly changing as the data set is partitioned into more groups. A two-way partition of this data set separates Family 13 from the rest.

  • Streeter classifies some members of [024] (11) (i.e. 032, 038, 157, 700, 1071) as Caesarean and others as Alexandrian (i.e. 04, 033, 040, 33, 892). Wisse classifies a number of [024] (11) members into his B group for one or more of his test passages (i.e. 032, 040, 33, 157, 700, 892).


[Note]Note

The comparison is based on Wisse's table of profile classifications.[22] Each PAM group is labelled by its medoid (e.g. [A], which stands for the Ausgangstext, not Codex Alexandrinus). Figures in parentheses (e.g. 3/4) give the proportion of witnesses in a PAM group that von Soden or Wisse place together in one of their groups. E.g. in the [1446] row, of four witnesses in the PAM group whose classifications by von Soden are given, three are in his Iκ group. In the von Soden column, I and A categories count all witnesses in the corresponding subgroups. In some cases, however, subgroups are specifically listed. For figures given in the Wisse column, each witness is assigned the majority classification across Wisse's three test passages (i.e. Luke chapters 1, 10, and 20). A witness is not counted if there is no majority classification. For von Soden and Wisse columns, entries are made only for those categories which include more than one witness from the corresponding PAM group.

  • In an 11-way partition, PAM groups [A], [041], [1582], [1012], [826], and [1446] correlate well with groups identified by von Soden and Wisse while PAM groups [3], [35], [024], and [184] do not. Group [184] would be a good match if compared with combinations of (1) von Soden's Iβ and Iφ and (2) Wisse's groups 16 and 1216.


Table 14. PAM (John, UBS2)

No. Groups and their [medoids] Singletons Poorly classified (worst last)
3
P66 P75 B C L X [33] syr-c cop arm geo Nonnus Cyprian Tertullian Cyril Origen Aleph-c W-supp A K Delta Theta Pi Psi 063 [Byz] Lect f-1 f-13 28 565 700 892 1009 1010 1071 1079 1195 1216 1230 1241 1242 1253 1344 1365 1546 1646 2148 2174 it-f it-q syr-p syr-pal syr-h goth eth Chrysostom Theodoret D it-a it-aur it-b [it-c] it-d it-e it-ff-2 it-l it-r-1 vg syr-s Diatessaron Hilary Eusebius
arm syr-c Tertullian geo Cyril vg X Eusebius cop Aleph-c
5
P66 P75 [B] C L Nonnus Origen W-supp A K Delta Theta Psi [063] Byz Lect f-1 f-13 28 565 700 892 1009 1010 1071 1079 1195 1216 1241 1242 1344 1365 1546 2148 syr-h [D] it-d syr-c X Pi 33 [1230] 1253 1646 2174 it-f it-q syr-s syr-p syr-pal cop goth arm geo eth Diatessaron Chrysostom Cyprian Theodoret Tertullian Cyril Eusebius Aleph-c it-a it-aur it-b [it-c] it-e it-ff-2 it-l it-r-1 vg Hilary
arm syr-c Cyprian syr-s it-f vg geo Diatessaron 33 syr-p Tertullian eth Cyril it-q X syr-pal Eusebius Chrysostom 1230 cop goth Aleph-c 1253 Pi 1646 2174 Theodoret
19
P66 P75 [B] C Origen W-supp A K Theta Psi [063] Byz Lect f-13 565 700 1010 1071 1079 1230 1241 1242 1344 1365 [D] it-d L [cop] X [33] Nonnus Aleph-c [Delta] 28 1253 2174 syr-p goth Chrysostom Pi f-1 892 1009 1195 1216 1546 1646 2148 syr-h [Theodoret] [it-a] it-b it-ff-2 it-r-1 Tertullian it-aur it-c [vg] it-e [Hilary] [it-f] it-q it-l syr-s [Cyprian] arm [geo]
syr-c syr-pal eth Diatessaron Cyril Eusebius 1010 Lect 1071 syr-p 700 W-supp 1365 1230 X goth f-13 1344 Tertullian Origen K Chrysostom Byz it-r-1 it-l 1079 1253 28 Nonnus Aleph-c it-ff-2 2174

  • ...

  • ...

  • ...

The analysis results presented here highlight variations between witnesses of the New Testament. This naturally raises the question of what difference the variations make to the meaning of the text. Many variations are of little consequence — whether an added or dropped article, a change of word order, or substitution of a synonymous phrase. Some variations have a larger effect, the two most extreme examples being Mark 16.9-20 and John 7.53-8.11 which are absent from a number of witnesses.

One way to convey how much difference the variations make is to provide translations of a number of textual varieties for the same section of text. The following table gives a parallel translation of four varieties of the first chapter of Mark, highlighting the variation sites identified in the fourth edition of the United Bible Societies Greek New Testament. This edition only presents a selection of textual variations:

The variation units presented in the UBS apparatus constitute a small proportion of the total number of variation units that exist. However, the ones given below should provide a reasonably good impression of how much the respective varieties of text differ in meaning. This is because the great majority of variations which are not presented in the UBS apparatus have only a slight semantic effect.

The textual varieties shown in the table consist of four clusters identified by reference to the DC dendrogram of UBS4 combined data for Mark:[24]

  • A: The mainly Byzantine cluster comprised of A ... syr-pal

  • B: Aleph B C L Delta Psi 892 1342 cop-bo cop-sa it-k

  • C: W Theta f-1 28 205 565 arm geo syr-s

  • D: D it-a it-b it-c it-d it-ff-2 it-i it-q it-r-1

For each variation unit, the variant supported by a textual variety is taken to be the one that occurs most frequently among its members. To illustrate, suppose that a variation unit has three variants and that two witnesses in cluster C have the first, three have the second, and four have the third. The variant supported by cluster C would then be taken to be the third. For the purpose of this exercise, if a tie occurs then the supported variant is taken to be the one with the greatest tendency to isolate the variety.

Table 16. Four way parallel translation of Mark chapter one

Reference A B C D
1.1 The beginning of the good news about Jesus Christ, Son of God. The beginning of the good news about Jesus Christ, Son of God. The beginning of the good news about Jesus Christ. The beginning of the good news about Jesus Christ, Son of God.
1.2 As written in the prophets, "Look, I send my messenger before you, who will prepare your way;" As written in the prophet Isaiah, "Look, I send my messenger before you, who will prepare your way;" As written by Isaiah the prophet, "Look, I send my messenger before you, who will prepare your way;" As written in the prophet Isaiah, "Look, I send my messenger before you, who will prepare your way;"
1.3 "A voice shouting in the wilderness, 'Prepare the way of the Lord! Make his paths straight!'" "A voice shouting in the wilderness, 'Prepare the way of the Lord! Make his paths straight!'" "A voice shouting in the wilderness, 'Prepare the way of the Lord! Make his paths straight!'" "A voice shouting in the wilderness, 'Prepare the way of the Lord! Make his paths straight!'"
1.4 John appeared, baptizing in the wilderness and announcing a baptism of a changed attitude for forgiveness of wrong deeds. John the Baptist appeared in the wilderness, and [was] announcing a baptism of a changed attitude for forgiveness of wrong deeds. John the Baptist appeared in the wilderness, and [was] announcing a baptism of a changed attitude for forgiveness of wrong deeds. John appeared in the wilderness, baptizing and announcing a baptism of a changed attitude for forgiveness of wrong deeds.
1.5 They went out to him, all of the land of Judea and those of Jerusalem, and were baptized by him, confessing their wrong deeds. They went out to him, all of the land of Judea and those of Jerusalem, and were baptized by him, confessing their wrong deeds. They went out to him, all of the land of Judea and those of Jerusalem, and were baptized by him, confessing their wrong deeds. They went out to him, all of the land of Judea and those of Jerusalem, and were baptized by him, confessing their wrong deeds.
1.6 John was clothed [with] camel hair and a leather covering around his waist; he ate locusts and wild honey. John was clothed [with] camel hair and a leather covering around his waist; he ate locusts and wild honey. John was clothed [with] camel hair and a leather covering around his waist; he ate locusts and wild honey. John was clothed [with] camel hair and a leather covering around his waist; he ate locusts and wild honey.
1.7 He gave notice saying, "One more powerful than me comes after me, whose sandal straps I am not worthy to bend down and untie." He gave notice saying, "One more powerful than me comes after me, whose sandal straps I am not worthy to bend down and untie." He gave notice saying, "One more powerful than me comes after me, whose sandal straps I am not worthy to bend down and untie." He gave notice saying, "I baptize you in water. One more powerful than me comes after me, whose sandal straps I am not worthy to bend down and untie."
1.8 "I baptize you in water; he will baptize you in the Holy Spirit." "I baptize you [in] water; he will baptize you in the Holy Spirit." "I baptize you in water; he will baptize you in the Holy Spirit." "He will baptize you in the Holy Spirit."
1.9 In those days Jesus came from Nazareth, Galilee, and was baptized in the Jordan by John. In those days Jesus came from Nazareth, Galilee, and was baptized in the Jordan by John. In those days Jesus came from Nazareth, Galilee, and was baptized in the Jordan by John. In those days Jesus came from Nazareth, Galilee, and was baptized in the Jordan by John.
1.10 Then coming up from the water he saw the heavens being torn open and the Spirit coming down to him like a dove. Then coming up from the water he saw the heavens being torn open and the Spirit coming down to him like a dove. Then coming up from the water he saw the heavens being torn open and the Spirit coming down to him like a dove; Then coming up from the water he saw the heavens being torn open and the Spirit coming down to him like a dove;
1.11 There came from the heavens a voice: "You are my beloved Son; I am delighted with you." There came from the heavens a voice: "You are my beloved Son; I am delighted with you." from the heavens he heard a voice: "You are my beloved Son; I am delighted with you." from the heavens a voice: "You are my beloved Son; I am delighted with you."
1.12 Then the Spirit drives him into the wilderness. Then the Spirit drives him into the wilderness. Then the Spirit drives him into the wilderness. Then the Spirit drives him into the wilderness.
1.13 He was in the desert forty days being tested by Satan; he was with the wild animals and the angels waited on him. He was in the desert forty days being tested by Satan; he was with the wild animals and the angels waited on him. He was in the desert forty days being tested by Satan; he was with the wild animals and the angels waited on him. He was in the desert forty days being tested by Satan; he was with the wild animals and the angels waited on him.
1.14 After John had been arrested, Jesus went into Galilee announcing the good news of the kingdom of God After John had been arrested, Jesus went into Galilee announcing the good news of God After John had been arrested, Jesus went into Galilee announcing the good news of God After John had been arrested, Jesus went into Galilee announcing the good news of the kingdom of God
1.15 saying, "The time has come and God's kingdom is near. Change your attitude and believe the good news." saying, "The time has come and God's kingdom is near. Change your attitude and believe the good news." saying, "The time has come and God's kingdom is near. Change your attitude and believe the good news." saying, "The time has come and God's kingdom is near. Change your attitude and believe the good news."
1.16 Passing by the Sea of Galilee he saw Simon and Andrew, Simon's brother, throwing a net into the sea. (They were fishermen.) Passing by the Sea of Galilee he saw Simon and Andrew, Simon's brother, throwing a net into the sea. (They were fishermen.) Passing by the Sea of Galilee he saw Simon and Andrew, Simon's brother, throwing a net into the sea. (They were fishermen.) Passing by the Sea of Galilee he saw Simon and Andrew, Simon's brother, throwing nets into the sea. (They were fishermen.)
1.17 Jesus said to them, "Come with me and I will make you into fishers of men." Jesus said to them, "Come with me and I will make you into fishers of men." Jesus said to them, "Come with me and I will make you into fishers of men." Jesus said to them, "Come with me and I will make you into fishers of men."
1.18 Then they left the nets and followed him. Then they left the nets and followed him. Then they left the nets and followed him. Then they left the nets and followed him.
1.19 Going a bit further he saw Jacob Zebedee and his brother John who were in the boat fixing the nets. Going a bit further he saw Jacob Zebedee and his brother John who were in the boat fixing the nets. Going a bit further he saw Jacob Zebedee and his brother John who were in the boat fixing the nets. Going a bit further he saw Jacob Zebedee and his brother John who were in the boat fixing the nets.
1.20 Then he called them. Leaving their father Zebedee in the boat with the hired hands, they went after him. Then he called them. Leaving their father Zebedee in the boat with the hired hands, they went after him. Then he called them. Leaving their father Zebedee in the boat with the hired hands, they went after him. Then he called them. Leaving their father Zebedee in the boat with the hired hands, they went after him.
1.21 They go into Capernaum. Then, on the Sabbath, having gone into the synagogue, he taught. They go into Capernaum. Then, on the Sabbath, having gone into the synagogue, he taught. They go into Capernaum. Then, on the Sabbath, having gone into the synagogue, he taught. They go into Capernaum. Then, on the Sabbath, having gone into the synagogue, he taught.
1.22 They were shocked by his teaching because he taught them like someone with authority, not like the scholars. They were shocked by his teaching because he taught them like someone with authority, not like the scholars. They were shocked by his teaching because he taught them like someone with authority, not like the scholars. They were shocked by his teaching because he taught them like someone with authority, not like the scholars.
1.23 Then there was a man with an unclean spirit in their synagogue. He screamed, Then there was a man with an unclean spirit in their synagogue. He screamed, Then there was a man with an unclean spirit in their synagogue. He screamed, Then there was a man with an unclean spirit in their synagogue. He screamed,
1.24 "What's with us and you, Jesus Nazarene? Have you come to destroy us? I know who you are — God's holy one!" "What's with us and you, Jesus Nazarene? Have you come to destroy us? I know who you are — God's holy one!" "What's with us and you, Jesus Nazarene? Have you come to destroy us? I know who you are — God's holy one!" "What's with us and you, Jesus Nazarene? Have you come to destroy us? I know who you are — God's holy one!"
1.25 Jesus told it off saying, "Be quiet! Get out of him!" Jesus told it off saying, "Be quiet! Get out of him!" Jesus told it off saying, "Be quiet! Get out of him!" Jesus told it off saying, "Be quiet! Get out of him!"
1.26 Throwing a fit and shouting with a loud voice, the unclean spirit got out of him. Throwing a fit and shouting with a loud voice, the unclean spirit got out of him. Throwing a fit and shouting with a loud voice, the unclean spirit got out of him. Throwing a fit and shouting with a loud voice, the unclean spirit got out of him.
1.27 All being shocked they asked each other, "What is this? What new teaching is this, that with authority he gives orders even to unclean spirits and they obey him?" All being shocked they asked each other, "What is this new teaching with authority? He gives orders even to unclean spirits and they obey him." All being shocked they asked each other, "What is this, this new teaching with authority? He gives orders even to unclean spirits and they obey him." All being shocked they asked each other, "What is that teaching, this new one with authority, that he gives orders even to unclean spirits and they obey him?"
1.28 The news about him then got out everywhere in the whole region of Galilee. The news about him then got out everywhere in the whole region of Galilee. The news about him then got out everywhere in the whole region of Galilee. The news about him then got out everywhere in the whole region of Galilee.
1.29 Then, leaving the synagogue, they went to Simon and Andrew's house with Jacob and John. Then, leaving the synagogue, they went to Simon and Andrew's house with Jacob and John. Then, leaving the synagogue, he went to Simon and Andrew's house with Jacob and John. Leaving the synagogue, he went to Simon and Andrew's house with Jacob and John.
1.30 Simon's mother-in-law lay sick with fever. Then they tell him about her. Simon's mother-in-law lay sick with fever. Then they tell him about her. Simon's mother-in-law lay sick with fever. Then they tell him about her. Simon's mother-in-law lay sick with fever. Then they tell him about her.
1.31 He went over, took hold of her hand, and helped her up. The fever left her and she began to wait on them. He went over, took hold of her hand, and helped her up. The fever left her and she began to wait on them. He went over, took hold of her hand, and helped her up. The fever left her and she began to wait on them. He went over, took hold of her hand, and helped her up. The fever left her and she began to wait on them.
1.32 In the evening after sunset they began to bring everyone who was suffering from sickness and the demonized. In the evening after sunset they began to bring everyone who was suffering from sickness and the demonized. In the evening after sunset they began to bring everyone who was suffering from sickness and the demonized. In the evening after sunset they began to bring everyone who was suffering from sickness and the demonized.
1.33 The whole town was gathered at the door. The whole town was gathered at the door. The whole town was gathered at the door. The whole town was gathered at the door.
1.34 He cured a lot who suffered a variety of sicknesses and got out a lot of demons. He did not allow the demons to speak because they had recognized him. He cured a lot who suffered a variety of sicknesses and got out a lot of demons. He did not allow the demons to speak because they had recognized him to be Christ. He cured a lot who suffered a variety of sicknesses and got out a lot of demons. He did not allow the demons to speak because they had recognized him to be Christ. He cured a lot who suffered a variety of sicknesses and got out a lot of demons. He did not allow the demons to speak because they had recognized him.
1.35 Getting up early while it was still dark, he left and went away to a deserted spot and prayed there. Getting up early while it was still dark, he left and went away to a deserted spot and prayed there. Getting up early while it was still dark, he left and went away to a deserted spot and prayed there. Getting up early while it was still dark, he left and went away to a deserted spot and prayed there.
1.36 Simon and those with him hunted him down. Simon and those with him hunted him down. Simon and those with him hunted him down. Simon and those with him hunted him down.
1.37 They find him and say to him, "Everyone is looking for you." They find him and say to him, "Everyone is looking for you." They find him and say to him, "Everyone is looking for you." They find him and say to him, "Everyone is looking for you."
1.38 He says to them, "Let's go somewhere else -- into the next towns -- so that I can campaign there too, because I came out for this." He says to them, "Let's go somewhere else -- into the next towns -- so that I can campaign there too, because I came out for this." He says to them, "Let's go somewhere else -- into the next towns -- so that I can campaign there too, because I came out for this." He says to them, "Let's go somewhere else -- into the next towns -- so that I can campaign there too, because I came out for this."
1.39 He was campaigning in their synagogues throughout Galilee, driving out demons too. He went campaigning in their synagogues throughout Galilee, driving out demons too. He was campaigning in their synagogues throughout Galilee, driving out demons too. He was campaigning in their synagogues throughout Galilee, driving out demons too.
1.40 A leper came towards him begging and kneeling to him, saying "If you want to you can make me clean." A leper came towards him begging and kneeling, saying "If you want to you can make me clean." A leper came towards him begging and kneeling, saying "If you want to you can make me clean." A leper came towards him begging, saying "If you want to you can make me clean."
1.41 Deeply moved, reaching out his hand he takes hold of him and says: "I want to. Be clean." Deeply moved, reaching out his hand he takes hold of him and says: "I want to. Be clean." Deeply moved, reaching out his hand he takes hold of him and says: "I want to. Be clean." Getting annoyed, reaching out his hand he takes hold of him and says: "I want to. Be clean."
1.42 Then the leprosy left him and he was cleansed. Then the leprosy left him and he was cleansed. Then the leprosy left him and he was cleansed. Then the leprosy left him and he was cleansed.
1.43 He told him off then sent him away. He told him off then sent him away. He told him off then sent him away. He told him off then sent him away.
1.44 He says to him, "Look, don't say anything to anyone. Instead, go off, show yourself to the priest, and offer what Moses commanded for your cleansing as proof to them." He says to him, "Look, don't say anything to anyone. Instead, go off, show yourself to the priest, and offer what Moses commanded for your cleansing as proof to them." He says to him, "Look, don't say anything to anyone. Instead, go off, show yourself to the priest, and offer what Moses commanded for your cleansing as proof to them." He says to him, "Look, don't say anything to anyone. Instead, go off, show yourself to the priest, and offer what Moses commanded for your cleansing as proof to them."
1.45 However, he went out and began much campaigning and spreading the word so that Jesus couldn't openly go into a city anymore but stayed outside in remote places. They came to him from everywhere. However, he went out and began much campaigning and spreading the word so that Jesus couldn't openly go into a city anymore but stayed outside in remote places. They came to him from everywhere. However, he went out and began much campaigning and spreading the word so that Jesus couldn't openly go into a city anymore but stayed outside in remote places. They came to him from everywhere. However, he went out and began much campaigning and spreading the word so that Jesus couldn't openly go into a city anymore but stayed outside in remote places. They came to him from everywhere.

Notes

  1. Sometimes the most frequently supported variants of the four varieties are all the same, as in Mark 1.6 where two witnesses from cluster D have leather instead of hair.

  2. A variation unit may affect more than one verse, as at Mark 1.7-8.

  3. The translation attempts to produce contemporary English while retaining the atmosphere of the Greek. Consequently, "change your attitude" is preferred to the archaic "repent," and "campaign" is preferred to the rarely used "proclaim" or less vivid "preach." The simple present is used to translate Mark's "historic present." (E.g. "He says to them...")

Isaac Newton said, If I have seen further it is only by standing on the shoulders of giants. This sentiment truly applies to the results presented here. Our field owes a great debt to those who have compiled the information, both printed and electronic, upon which the data and distance matrices are based.

Compiling the basic data from which analysis proceeds is an arduous and painstaking task. Richard Mallett deserves special thanks in this respect, having encoded data matrices and transcribed tables of percentage agreement from numerous sources. Mark Spitsbergen helped to encode the UBS4 apparatus data for the first fourteen chapters of Matthew.

Maurice A. Robinson kindly provided tables of percentage agreement for the Gospels and Acts. These are derived from the apparatus of the second edition of the United Bible Societies' Greek New Testament. The exacting task of transforming the data into electronic format was performed by Claire Hilliard and Kay Smith.

A number of the results are produced from comprehensive data generously provided by the Institut für neutestamentliche Textforschung in Münster, Germany. Researchers at the INTF have spent many years on the gargantuan task of compiling this data. Holger Strutwolf, Klaus Wachtel, and Volker Krüger were instrumental in providing access to the data.

The analysis would scarcely have been possible without the marvellous R Language and Environment for Statistical Computing. Finally, thanks go to Gerald Donker for suggesting that the RGL plotting library be used to produce three-dimensional CMDS maps. He also encouraged me to take a less procrustean approach to missing data. As a consequence, the analysis results presented here include many more witnesses than they otherwise would.

A. Supplementary Information

This appendix provides supplementary information related to analysis results for the data sets:

Table A.1. Supplementary information

Section Source Proportion (CMDS) MSW plot (PAM)
Matthew Brooks 0.54
CB 0.73
Cosaert 0.60
Ehrman 0.63
INTF-Parallel 0.28
Racine 0.69
UBS2 0.35
UBS4 0.53
Wasserman 0.51
Mark CB 0.76
Cosaert 0.63
Hurtado (Mk 1) 0.76
Hurtado (Mk 2) 0.82
Hurtado (Mk 3) 0.76
Hurtado (Mk 4) 0.82
Hurtado (Mk 5) 0.76
Hurtado (Mk 6) 0.81
Hurtado (Mk 7) 0.85
Hurtado (Mk 8) 0.83
Hurtado (Mk 9) 0.79
Hurtado (Mk 10) 0.82
Hurtado (Mk 11) 0.77
Hurtado (Mk 12) 0.81
Hurtado (Mk 13) 0.82
Hurtado (Mk 14) 0.83
Hurtado (Mk 15+) 0.86
Hurtado (P45) 0.84
Mullen 0.62
Mullen (P45) 0.76
INTF-Parallel 0.33
UBS2 0.42
UBS4 0.51
UBS4 (control) 0.16
Wasserman 0.59
Luke Brooks 0.45
CB 0.75
Cosaert 0.57
Fee (Lk 10) 0.67
INTF-Parallel 0.27
UBS2 0.38
Wasserman 0.61
John Brooks (C) 0.50
Brooks (it-j) 0.47
CB 0.72
Cosaert 0.45
Cunningham 0.54
EFH 0.61
Fee (Jn 1-8) 0.80
Fee (Jn 1-8, corr.) 0.77
Fee (Jn 4) 0.83
Fee (Jn 4, corr.) 0.83
Fee (Jn 4, pat.) 0.60
Fee (Jn 4, pat., corr.) 0.60
Fee (Jn 9) 0.89
Fee (Jn 9, corr.) 0.87
INTF-Parallel 0.34
UBS2 0.36
Wasserman 0.57
PA Wasserman 0.66
Acts Donker 0.64
Donker (Acts 1-12) 0.66
Donker (Acts 13-28) 0.67
Osburn 0.76
UBS2 0.41
General Letters Donker 0.83
James INTF-General 0.35
1 Peter INTF-General 0.35
UBS4 0.45
2 Peter INTF-General 0.36
1 John INTF-General 0.33
Richards 0.50
UBS4 0.40
2 John INTF-General 0.32
3 John INTF-General 0.35
Jude INTF-General 0.32
Paul's Letters Brooks 0.72
Cunningham 0.70
Donker 0.70
Osburn 0.78
Romans Donker 0.71
1 Corinthians Donker 0.68
2 Corinthians UBS4 0.44
2 Cor. - Titus Donker 0.66
Hebrews Donker 0.72
UBS4 0.39
UBS4 (B) 0.41
Revelation UBS4 0.41



[1] Defining the limits of a variation site is a matter of editorial discretion. See Potential Computer Applications for a discussion of some approaches.

[2] Gerd Mink provides a definition of the term initial text in Problems of a Highly Contaminated Tradition, 25-26. Eldon J. Epp finds the term original text problematic, as discussed in his Multivalence of the Term 'Original Text.'

[3] See Analysis of Textual Variation for more details of the encoding conventions employed here.

[4] A distance matrix can be obtained from a table of percentage agreement by dividing each percentage by one hundred then subtracting the result from one. For example, a percentage agreement of 85% corresponds to a distance of 0.15.

[5] The number fifteen is chosen because given this many variation sites, a distance estimate of 0.5 has a sampling error of plus or minus 0.233. (In statistical terminology, the critical limits of the 95% confidence interval are 0.233 and 0.733.) That is, when using only fifteen variation sites, the sampling error associated with the distance estimate covers about one half of the entire range of possible distances. The relative size of the sampling error decreases as the number of sites from which the distance is calculated increases.

[6] The R script named control.r produces the control data matrix then dist.r is used to produce the corresponding distance matrix. Each variable in the control data matrix has only two possible states whereas more than two can occur in variables (i.e. variation sites) of the model data matrix. This is not a bad approximation as variables of the model data matrix often have only two states. The main aim, which is hardly affected by the number of states, is to produce a control with approximately the same mean distance between objects as the model. This is achieved using the R expression p = (1 + (1 - 2*d)^0.5)/2 to calculate the probability p of choosing the first state (i.e. 1) based on the desired mean distance d. This p is then used to set the chance of generating a 1 when c 1s and 2s are generated to form an object. Due to its stochastic nature, the procedure is unlikely to produce a control with exactly the same mean distance between objects as the model. However, if many controls were produced and their mean distances between objects were averaged then the result would tend towards d.

[7] The limits of the 95% confidence interval for the distance between two randomly generated objects can be obtained with the R expression qbinom(c(0.025, 0.975), c, d)/c where c is the number of variables and d is the mean distance.

[8] See Gerd Mink's Problems of a Highly Contaminated Tradition and Introductory Presentation for an explanation of the CBGM.

[9] Ranking is performed by rank.r.

[10] The bounds were calculated using the R expression qbinom(c(0.025, 0.975), 123, 0.464)/123. The mean distance between objects in the model distance matrix is 0.464 and the rounded mean number of variables in the objects from which that distance matrix was calculated is 123.

[11] CMDS analysis is performed by MVA-CMDS.r.

[12] Eldon J. Epp, Significance of the Papyri, 291.

[13] In this article, a trajectory refers to a line joining two endpoints in textual space. By contrast, Epp uses the term to describe a time sequence of witnesses with the same kind of text; see e.g. his Twentieth-Century Interlude, 93.

[14] Maechler and others, Cluster Analysis Basics and Extensions; diana method of the cluster package.

[15] DC analysis is performed by MVA-DC.r using the distance matrix and a table of counts which gives the number of variation sites covered by each witness.

[16] Branching heights correspond to distances between the clusters constituted by the branches. The upper critical limit calculation and partitioning are performed by MVA-DC.r. The order of groups is determined by the program and is not significant.

[17] Naruya Saitou and Masatoshi Nei, The Neighbor-Joining Method, 406-7.

[18] NJ analysis is performed by pheno-NJ.r.

[19] See documentation relating to the pam method of the cluster package by Maechler and others, Cluster Analysis Basics and Extensions.

[20] The MSW plot is produced by MVA-PAM-MSW.r. This script also identifies numbers of groups corresponding to peaks with above-average MSW values.

[21] PAM analysis is performed by MVA-PAM.r.

[22] Frederik Wisse, Profile Method, 52-90.

[23] Aland and others, Greek New Testament (4th ed.), 2*.

[24] Minuscule 2427, which is now considered to be spurious, has been dropped from the B cluster so that it does not affect decisions on cluster membership.