When I began my Innovating Knowledge project more than four years ago, one of my intentions was to map how manuscripts of the Etymologiae attracted interpolations and other ‘edits’ in the early Middle Ages. As the Etymologies is an encyclopaedia, and moreover one that its author never completed, it is no surprise that it acted as a scaffolding for the accumulation of new ‘useful’ information. Early in the project, I therefore decided on a system of recording what I termed innovative features (or innovations for short) that would later allow me to project them as a network. For the purpose of my project, I defined an innovative feature as:
- a textual interpolation (for example from other texts such as Isidore’s own De natura rerum);
- graphic interpolation (for example addition of diagrams and maps);
- a structural element (such as the division of the Etymologiae into 17 rather than 20 books);
- a substantial change in the layout (for example presentation of information originally included by Isidore as a continuous text in the form of a table or a list);
- transmission of the material from the Etymologiae as a novel entity (for example, the early medieval manuscripts preserve several catechetic collections produced by patch-working from excerpts of the Etymologiae); or
- the presence of peculiar text versions or the omission of particular passages from the Etymologiae.
In this network, manuscripts transmitting the Etymologiae represent nodes and innovations that they contain represent edges connecting manuscripts that share a particular feature. By plotting the network, I hoped to not only see which innovative features were widespread (and which not) and which manuscripts combined many different features, but also whether the patterns of sharing can provide us with a meaningful picture about the diffusion of certain innovations. It was evident to me from early on that while some of the features I was interested in spread genealogically, that is by copying from an exemplar like texts, others, perhaps the majority, circulated due to other processes. As a result, I reasoned, it is unlikely that the presence or absence of certain innovative features could be used to reconstruct a stemma of the early textual tradition of the Etymologiae (but some would emerge as being highly significant and reliable in this regard). Thus, a network could be constructed as a lighter alternative to a stemma and could serve as the first step towards discerning those innovative features that are indicative of a genealogical relationship from those that are not.
As the project progressed and my datasheet filled in, it became clear to me that my initial idea of a network of innovations was somewhat naïve and that my data exhibits typical problems that make plotting of a network problematic and biased. In particular, I was afraid I would end with the usual hairball graph that has limited added value. I also understood that I am likely missing many innovations altogether and many witnesses of identified innovations. Nevertheless, I thought that even though I could not use the network visualization that I imagined as I had originally intended, I could at least try to construct the network I had in mind as a kind of an explorative experiment. The result was a surprise in a good sense as it turned out that the graph that could be constructed from my data using the principles outlined above does work better than I thought.
The following network graph was plotted in Gephi using Fruchterman-Reingold algorithm. I performed some additional manual adjustments to minimize overlap of unrelated clusters and give it a cleaner look.

The network graph you see here includes 279 manuscripts (nodes) transmitting the Etymologiae that share an innovation with at least one other known manuscript. This is not a small number given that I have been able to identify 485 early medieval manuscripts transmitting the Etymologiae. Even though the manuscripts displayed in the network above also include 81 post-1000 manuscripts included because of their relevance to the picture, this still means that 198 manuscripts that are part of my early medieval corpus, or more than 40% of identified pre-1000 codices transmitting the Etymologiae, include at least one innovative feature also appearing in another known manuscript. That’s a useful ratio to begin with! Importantly, these are not all (or all early medieval) manuscripts of the Etymologiae featuring innovations, as this count does not include manuscripts transmitting innovations confined to a single codex, or such that have been so far identified in only a single manuscript, albeit they feature in more. Thus, we should reasonably assume that most of the early medieval copies of the Etymologiae, partial and full, contain some innovations.
The network graph above shows a total of 1042 shared innovations (edges). As can be gleaned from the graph, it is constituted by many isolated components which are not connected to other parts of the graph. These are instances of a particular innovative feature that appears in several manuscripts, while these manuscripts contain no other innovative features identified during my project. All in all, the graph I plotted using my imperfect data consists of 40 such components. Of these, 19 components, or almost half of all components in the graph, consist of only two manuscripts, i.e., I was able to identify only two manuscripts having a specific notable innovative feature. Seven more components consist of three or four manuscripts. These components are left uncoloured in the graph.
More than 40% of identified pre-1000 codices transmitting the Etymologiae include at least one innovative feature also appearing in another known manuscript
Many of the 14 larger components are also isolated. We can, therefore, say that the gathered data suggest that early medieval manuscripts transmitting the Etymologiae have a tendency of containing but a single innovation (rather than, say, two, four or six). There is a good explanation for this pattern. After all, most of the manuscripts in the graph above do not transmit the entire encyclopaedia in 20 books, but are rather partial witnesses in the form of handbooks, miscellanies, and collections transmitting a specific selection from the Etymologiae in a non-encyclopaedic setup. For example, the relatively large blue component on the bottom left side of the graph corresponds to the 15 known manuscripts transmitting separately parts of the third book of the Etymologiae dealing with music (Etym. III 15-23), known also as the Ars musica Isidori. The orange component on the bottom right side of the graph includes the nine identified witnesses of the shorter version of the so-called Collectio Unde, a patchwork collection of excerpts from books VI, VII, VIII, and IX.

Since the two textual entities do not overlap at all in their selection from the Etymologiae, it can be expected that they should appear isolated from each other in the graph. In theory, if a single manuscript transmitted both the Ars musica Isidori and the Collectio Unde, it would form a bridge between the two clusters. Indeed, this is the case with two manuscripts that connect three clusters into the second largest component of the graph consisting of 27 manuscripts.

The three clusters in question correspond to manuscripts transmitting a patchwork of excerpts from the Etymologiae dealing with the Church, its offices, and baptism known as De catholica ecclesia et eius ministris et de baptismatis officio (brown), those transmitting another similar collection known as the Collectio Sangermanensis (pink), and those transmitting a collection of excerpts on kinship called in some manuscripts Dicta Isidori (green). These, too, are examples of manuscripts in which only a selection from the entire Etymologiae circulated.
By far the most intriguing component in the network is the large component in the central and upper parts of the graph, which totals 102 manuscripts (i.e., about 36.5 % of all manuscripts included in the graph). This component is not notable only because of its large size, but also because it is the only component in which a significant overlap of clusters can be observed. Indeed, this component is assembled from 14 different coloured clusters (i.e., clusters consisting of at least 5 manuscripts). The most fully connected manuscript belonging to this large component is a member of four clusters, i.e., it contains four widely shared innovations. The largest component also incorporates the largest cluster of the graph (purple) consisting of thirty manuscripts in which two segments of De natura rerum are interpolated into the astronomical sections of the Etymologiae (Etym. III 51 and 53).

What is special about this component, apart from its size, is that unlike many of the smaller components, it does not consist of partial witnesses of the Etymologiae but of encyclopaedic copies featuring all twenty books of the work (or originally designed to include all books). The significant merging and blurring of clusters is a signal that these codices tended to acquire multiple innovations. For example, three of the eight manuscripts, in which book I lacks chapters 30-31 (blue in the center of the graph), also divide the Etymologiae into 17 rather than 20 books (dark purple in the center of the graph). All but two manuscripts that divided originally the first half of the Etymologiae into three books (red in the center of the component) have at least one other major innovation. Five of the eight manuscripts, in which the anonymous heresiological treatise Indiculus de haeresibus was interpolated into book VIII of the Etymologiae (brighter green at the bottom left of the largest component) also have the two segments of De natura rerum interpolated into book III (purple). Manuscripts containing a series of epigrams known as the Anthologia Isidoriana (bright green at the bottom right of the largest component) also tend to transmit an anonymous computistic treatise on the calculation of Easter (darker yellow at the bottom right of the largest component).
All of these overlaps raise questions. For example, we can ask ourselves, how would a largely or purely genealogical relationship be projected into a network graph like this and whether we see such a projection. We can start with imagining a hypothetical stemma in which a particular branch of witnesses characterized by a specific innovation (e.g., a textual interpolation) begets another branch, which is characterized by another innovation (e.g., a specific omission). In this scenario, all manuscripts having the latter innovation would also have the former. In a network projection following the rules outlined above, this hypothetical scenario would manifest as two clusters. All manuscripts which are member of a smaller cluster (specific omission) should also be members of a larger cluster (textual interpolation).
Indeed, we can note one cluster in the network graph constructed from my data about the Etymologiae that displays a high degree of such an overlap suggestive of a genealogical relationship between two innovations. This is the pale yellow cluster in the upper left side of the largest component that is almost fully merged with the large purple cluster corresponding to manuscripts with interpolated passages from the De natura rerum in book III. Some, but not a majority, of these manuscripts also received a set of astronomical diagrams with parallels in the De natura rerum. The uneven numbers may suggest that the diagrams represent an expansion of the older interpolated version featuring no diagrams.

I am planning to expand the network graph sketched above in the future as more manuscripts will be examined in detailed and the innovations they carry will be described and accounted for. Two of the most widespread innovations, the interpolations from the anonymous treatise on figures of speech De virtutibus et vitiis in book I and the inclusion of a TO-diagram into book XIV, moreover, are currently not included. Once the diagram becomes fuller, the next logical step is to investigate its network properties, for example the propensity towards innovations among the encyclopaedic copies of the Etymologiae (as opposed to the non-encycloapedic witnesses) or the degree of overlap between certain clusters (i.e., the degree of co-occurrence of certain innovative features). These could provide additional hints about one of the most intriguing questions of the diffusion of innovations in the copies of the Etymologiae: were they likely to be passed on vertically (by copying from a parent to an offspring), or was it more common that they traveled horizontally (e.g., as a result of contacts between neighbouring intellectual centers and movement of people)?
I stop here for now, but I promise more is coming on the diffusion of innovations and the use of network visualizations and analysis in the future.
PS: You may have noted that links from this blog post lead to the EtymoWiki on the Innovating Knowledge website, which is still empty. I hope to start adding entries describing some of the notable early medieval innovative features in the manuscripts of the Etymologiae soon.