I’ve had enough, I’m getting out to the city, the big-big city

I’ll be a big noise with all the big boys, so much stuff I will own

And I will pray to a big god, as I kneel in the big church

– Peter Gabriel, Big Time (1986)

For better or worse, we live in the age of BIG – big sodas () big deficits () and even big histories (; ; ; ). This is also the era of big data, with scholars from a number of disciplines recognizing the inherent research potential of datasets that dwarf in size and complexity those traditionally employed in their respective fields (e.g. ; ; ; ; ; ; ; ; ).

Although what constitutes big data varies greatly between disciplines, most agree this shift in scale holds the potential to revolutionize the way we view almost any phenomenon; providing economies of scale in data collection and leading to previously impossible (and often unimaginable) insights (; ; ; ). Not all researchers are equally thrilled with this unprecedented thirst for ever larger amounts of data, however, with some decrying the dystopian possibilities when an estimated 2.5 quintillian bytes of data are generated each day (; ; ; ; ; ).

The more troubling aspects of big data notwithstanding, the ability to amass larger and more complex datasets has necessitated new analytical techniques, computer infrastructures, and financial support from a variety of sources in the United States such as the National Science Foundation (NSF), National Institutes of Health (NIH), National Endowment for the Humanities (NEH), Gates Foundation, and others (; ; ). Although its origins lie in the natural sciences, the quest for massive datasets is increasingly common in the social sciences and humanities as well. Like researchers in other fields, archaeologists also have been forced to respond to the unique challenges represented by an era in which, to borrow the common surfer phrase, you either ‘go big or go home’ (; ; ; ; ). As Kintigh et al. () contend, exploiting the ‘explosion in systematically collected archaeological data that has occurred since the mid-twentieth century … will require demanding, long-term cross-disciplinary collaborations that have the potential to yield transformative results with impacts cascading far beyond archaeology’.

Despite the rush for new sources of research funding to address big questions with the requisite big projects and resulting big data, economist Richard Steckel () suggests there is no inherent connection between big questions, big projects, and big insights in the social sciences. Steckel () openly questions whether every worthwhile research question requires such massive amounts of data, stating: ‘To my mind, there is no reason to imitate physics, astronomy, the biological sciences, or any other discipline because it conducts large, collaborative projects. I do not have telescope envy, nor am I jealous of the hardware, staff, and laboratory space found in the sciences’. Thus, despite the possibilities of increased research funding and amplified relevance for archaeology on university campuses and among the public at large, there are numerous questions of importance that can be addressed through comparatively ‘small-scale’ research. Conveniently enough, Steckel () uses archaeology as an example of a discipline that, despite a long history of multidisciplinary research collaboration, continues to pursue research questions at somewhat limited regional and culture-specific scales. Steckel goes on to suggest that even the dissemination of research results in archaeology reinforces this trend, with most publication venues serving as outlets for research findings that are, ‘forbidding to outsiders and indeed have been organized out of necessity for insiders’ (; see for a similar critique applied to the field of sociology).

We agree that comparatively small-scale archaeological research has made, and will continue to make, significant contributions to our understanding of the past, even if our discipline has some admitted parochial idiosyncrasies (). Additionally, we concur with our colleagues (e.g. ; ; ; ; ; ) who suggest big data holds the potential to revolutionize the practice of archaeology, fostering completely new research questions, data visualization techniques, novel forms of professional collaboration, and perhaps an enhanced ability to address those really big questions only archaeology is capable of examining (; ; ). Perhaps the era of big data will even permit the next commemorative edition of the journal Science to enumerate at least one major scientific research question that could be addressed through the use of archaeological data among those it considers the ‘25 most important questions in science’ (; see also and ).

But what about cases where the research questions being asked are somewhat modest when compared with the enormity of available data? One might naturally assume that an abundance of data would permit researchers to swiftly dispatch research questions of comparatively lower-orders of complexity, but as Trevor Barnes () suggests in reviewing the impacts of big data on the field of geography, ‘big data will increasingly produce noise. But because its output comes in mathematical form, and since this is the hallmark of science (‘mathematics is nature’s language’ as Galileo said), it will be touted as knowledge’. Thus, increasing the size of any dataset simultaneously amplifies the volume of meaningless noise as well, and as statistician Nate Silver (, emphasis added) suggests, ‘most of the data is just noise’. Without sufficient understanding of how certain data will be marshaled to answer specific research questions, distinctions between useful data and noise become blurred, obscuring meaningful insights within needlessly bloated datasets.

Too often it appears that some researchers have been seduced into believing that by applying the growing array of analytical techniques to ever larger datasets, answers to as yet unformulated research questions will miraculously appear. The quest for numerical superiority betrays a troubling reification of quantification, where ‘computational techniques and the avalanche of numbers become ends in themselves, disconnected from what is important. That is, techniques and numbers become fetishized, put on a pedestal, prized for what they are rather than for what they do’ (). Taken to its most extreme – and absurd – conclusion, such views led one author to suggest big data will eliminate the need for antiquated notions like models, theories, hypotheses, explanations, and even academic pursuits like ‘taxonomy, ontology, and psychology’ (one might logically presume archaeology, anthropology, sociology, and other squishy social sciences as well) with scholars simply allowing ‘the numbers to speak for themselves’ ().

However, if postmodernism left us with only one nagging intellectual legacy, it is the realization that numbers – like knowledge of any kind – are never produced in a vacuum. As eloquently stated by Barnes (): ‘Numbers do not speak for themselves but speak only for the assumptions that they embody. Numbers emerge only from particular social institutions, arrangements and organizations mobilized by power, political agendas and vested interests’. Furthermore, as Bruno Latour () argues, when we change the instruments with which we conceptualize and measure our observations, ‘you will change the entire social theory that goes with them’. Thus, without the development and implementation of adequate research designs and information management plans prior to initial data collection, archaeologists (and other researchers) run the risk of overwhelming themselves and needlessly complicating their ability to address even modest research questions in the era of big data (; ).

To illustrate this point, we wish to examine the particular history of archaeological practice in the southeastern United States, emphasizing that the present era is not the first in which American archaeologists have sought to make use of big data. Several historical overviews of archaeology in the southeastern United States suggest that until the last decade of the twentieth-century the region contributed little to larger theoretical debates within the discipline as a whole (; ; ; ; ; ; ; ). This parochialism has been ascribed to both a unique form of regional pragmatism and a reluctance to chase the latest theoretical fashion du jour (). We suggest the scale of field investigations in the southeast, beginning with the ‘New Deal archaeology’ of the 1930s and stretching into the present, has generated enormous datasets that have, on occasion, impeded the region’s ability to address larger theoretical debates within the discipline.

Never satisfied with excavation strategies like the stratigraphic telephone booths derided by Kent Flannery () in The Early Mesoamerican Village, the southeast is a region with a long-standing preference for examining big sites with equally big excavations – not infrequently consisting of the complete or near complete excavation of entire sites (; ; ; ; ). Such practices have invariably resulted in massive collections of artefacts, field notes, maps, and laboratory analysis worksheets (in those cases where analysis was actually undertaken). Thus, prior to the development of the computing hardware and software necessary to manage such datasets (see ) many significant archaeological insights in the southeastern United States were slowed as a result of too much rather than too little data.

Although what constitutes big data in archaeological research is notoriously difficult to define (see and ), we suggest that large-scale, single-site excavations of more than a hectare, and multi-site investigations of comparable spatial dimensions, are normally sufficient to reveal hundreds of individual features and tens of thousands of individual artefacts. The datasets that result from such investigations are several orders of magnitude larger than those recovered in the majority of archaeological investigations. As Martin Wobst () notes, the spatial limits of contemporary academic-based archaeological investigations are generally determined by a research design developed prior to fieldwork. However, since the majority of current archaeological investigations in the United States and Europe are contract-based (i.e., archaeology for fee), such investigations are generally designed to meet the minimum level of spatial investigation and site sampling specified in their governing contract documents and scopes of work. Thus, the overwhelming majority of archaeological investigations presently conducted in the United States and Europe are of modest spatial extent and produce material assemblages that are commonly much smaller and less complex than those we define as falling into the category of big data.

The Big Data Tradition in Southeastern Archaeology

Although organized archaeological research in the southeast has a rich and complex history, like other regions of North America, before the 1930s it was dominated by what Willey and Sabloff () refer to as the Classificatory-Descriptive and Classificatory-Historical periods. As they propose, most archaeological studies during these periods were devoted to describing significant attributes of artefacts, constructing culture trait lists, and building regional chronologies. The highlights of this period in southeastern archaeology include research by Cyrus Thomas () that helped dispel the Moundbuilder Myth, and the recovery of exquisite ceramics, lapidary, and metal objects from many of the region’s numerous earthen mounds by Clarence B. Moore, Warren K. Moorehead, and others (see ; ; , ; ; ; ; ; ; ). However, as Lyon () suggests, southeastern archaeology initially developed more slowly than in other regions of the United States, in part due to the lack of financial resources to establish university programs and public museums in the South following the economic devastation of the US Civil War. Thus, rather than a homegrown tradition, much of the early archaeological research in the region was supported by ‘non-southeastern museums such as the Smithsonian Institution, the Peabody Museum, the American Museum of Natural History, and the Heye Foundation’ (; see also ).

This situation changed rapidly with the beginning of the Great Depression in 1929. As the United States struggled to solve an economic crisis that left more than a quarter of the its workforce unemployed (an estimated 18 million people), endeavours that could employ large numbers of unskilled laborers became a critical priority for the federal government (; ). As Bernard Means () suggests, archaeological projects were ideal for resolving this need since they were ‘shovel ready’. Additionally, ‘[archaeology] was labor-intensive and required little more than paper, pencils, shovels, and wheelbarrows to go along with the manpower’, with the southeast considered ideal for such projects given ‘its year-round temperate climate and deeply buried sites that required a lot of labor to excavate’ (). Although the southeast was not the only area of the United States in which New Deal archaeological employment projects were undertaken (see , , ), it was the region that experienced the largest number of such projects.

Unfortunately, since these programs were designed primarily as vehicles for employing large numbers of unskilled laborers, the production of useful archaeological knowledge was a secondary concern for most government officials overseeing the programs that sponsored these investigations. Thus, there was always a critical shortage of trained archaeologists to supervise these massive projects and the armies of untrained workers they employed (; ). Although established archaeologists like Fay Cooper Cole and Thorne Duel trained many project directors through the University of Chicago archaeological field school, there were always far too many workers to supervise and too much activity for every aspect of the fieldwork to be recorded adequately (). This is not to suggest that those archaeologists supervising these projects were not competent and dedicated professionals, or that the workers they supervised lacked the ability to quickly understand and adapt the demands of the work (). Rather, in many cases it was simply that the projects were too large, and the number of trained supervisors too small, to ensure these every aspect of these investigations was successfully managed (; ). Additionally, project budgets were devoted almost exclusively to fieldwork since it employed the largest crews, while funding for laboratory analysis, curation, and publication was always insufficient given the scale of these projects (; ). These shortcomings were magnified with the advent of World War II, when most project personnel either joined the military or were reassigned to tasks considered more essential to the war effort. Although there are some notable exceptions (; ) analysis, interpretation, and publication of many major New Deal archaeology projects languished for more than five decades, if they were completed at all (; ; ; , , ; ; ).

Gendered readings of the New Deal emphasis on archaeological fieldwork (identified more closely with males) at the expense of laboratory work (identified more closely with females) has not gone without comment (; ; ; ). Joan Gero () famously called the assignment of laboratory tasks to women ‘archaeological housework’. Although women did participate directly in New Deal fieldwork (), including the trailblazing Madeline Kneberg () and a field crew consisting of African American women at the Irene site in Georgia (), there were inherent inequalities in the gendered division of tasks, compensation, and operating budgets. Along with the diminished status assigned to laboratory work, United States involvement in World War II necessitated that most large-scale employment opportunities be redirected toward the war effort. Thus, it is of little wonder that many New Deal archaeological projects remained unpublished for decades because insufficient resources were devoted to the analysis of excavated materials (a large number remaining unpublished). Furthermore, as we commonly tell undergraduate students, it is not unusual to devote at least three days of effort in the laboratory for every day spent in the field. Given the scale of New Deal era field research, the reduced value placed on laboratory analyses, the redirection of workers toward war-related production, and the inability of many projects to yield comprehensive final reports, it is not surprising this period left the southeast with a reputation of being preoccupied with culture history and artefact classification, rather than with ‘the big issues’ of broader interest to American archaeology ().

Despite these uneven results, these New Deal projects precipitated a change in American archaeology, transforming the discipline from ‘an avocation to a vocation’ with new methods constantly being developed to deal with exigent circumstances encountered by New Deal archaeologists (). As might be imagined, the trial-and-error method for perfecting field practices, particularly in situations where it was common to excavate entire sites, had impacts on both the quality and the inherent comparability of the data generated by these various projects. However, many of our present archaeological research practices, including the recovery of macro- and microscopic plant and animal remains, the detailed mapping and photography of features, the use of standardized field and laboratory forms, and even the long-term curation of recovered materials, were developed as New Deal archaeologists attempted to improve the quality of their work (; ).

Unfortunately, these advancements in field methods occurred in an era with severe limitations in database construction and management. Prior to the widespread availability of mainframe computing, archaeologists lacked the ability to examine the enormous amounts of data they generated and reveal their underlying multidimensional relationships (). As Boyd and Crawford () suggest: ‘Big data is less about data that is big than it is about a capacity to search, aggregate, and cross-reference large data sets.’ Lacking these analytical capabilities and confronted by veritable mountains of data, southeastern archaeologists were beset by ‘baffling complexity’ as they attempted to understand the region’s temporal and spatial development (). Investigations of sites such as Jonathan Creek in Kentucky, where a large multi-mound Mississippian community was almost completely uncovered during more than two years of continuous excavation, resulted in the publication of a brief summary site report of less than one hundred pages, produced more than a decade after excavations were completed (; , ).

As a result of the challenges presented by organizing and analyzing these massive datasets, southeastern archaeologists commonly retreated to ceramic analysis, with other artefact classes frequently receiving less extensive examination (; , ; ). Although ceramics are certainly important elements in examining the spatio-temporal dimensions of the past, as recent analyses of New Deal archaeology demonstrate, domestic and public architecture, community patterning, and other artefact classes derived from these investigations can also be used to address a range of critical research questions (e.g. ; ; ; ; , , ; ; ; ). The problem, of course, was that New Deal researchers had only limited abilities to organize and analyze their data when compared to contemporary scholars. How exactly was an archaeologist, supervising a multi-year project that employed hundreds of excavators, uncovering more than 10 hectares of a deeply stratified multi-component site, and recovering more than 100,000 individual artefacts, to organize all of this disparate information in the era before computing? In truth, we are utterly amazed at the accomplishments of this generation of archaeologists given the severe limitations under which they worked.

Following the conclusion of World War II, many of those involved in New Deal archaeology, like William Webb and David DeJarnette, helped create or expand university anthropology programs and archaeology museums throughout the southeast (; ). Even without a need to employ large field crews as make-work projects, these scholars continued to have an affinity for large-scale archaeological investigations in the post-Depression era. In place of large crews, these investigations frequently made use of mechanical equipment to strip overburden and speed the process of identifying and excavating subsurface features. Furthermore, these excavations were seen as essential for training students; convincing a younger generation of southeastern archaeologists that, when possible, multi-hectare horizontal exposures were the preferred method of field investigation. Although many of these projects were undertaken for their inherent research potential, others were part of post-war regional development efforts (e.g. ; ; ). Thus, the construction of highways, dams, and other infrastructure projects encouraged a second wave of large-scale excavations in the southeast, despite the fact that in some states less than half of New Deal excavations had been completely analyzed and published ().

However, the end of the New Deal era was not the end of major federal funding for archaeology, with the US government sponsoring a series of large-scale archaeological salvage investigations in selected areas of the Mississippi River Valley in the 1960s and 1970s. Flood control and land leveling activities in Missouri, Arkansas, and Mississippi necessitated emergency salvage archaeology in areas impacted by these activities (). The pace of site destruction was rapacious, with archaeologists working alongside heavy machinery in an attempt to recover as much archaeological data as possible prior to their destruction. J. Raymond Williams () describes the pace of this salvage archaeology as ‘frantic’. Hester Davis () notes that the scale of site destruction brought about by these projects directly contributed to passage of the United States Archaeological and Historic Preservation Act of 1974 (AHPA) and the formation of the Society of Professional Archaeologists in 1976 (SOPA; now reorganized as the Registry of Professional Archaeologists, ROPA).

Major archaeological salvage projects were initiated in each of the effected states, with the University of Missouri alone excavating 22 sites in southeastern Missouri between 1966 and 1968 (see , , , , , , ). The full impact of these development projects on the archaeological record are unknown, but in a single decade more than 51,000 acres of Cahoma County, Mississippi were leveled, leading to the destruction of at least 14 known sites (). Making matters worse, many of the sites impacted by these projects were exceptionally large and complex, necessitating equally large salvage excavations. As with New Deal era projects, many of the major salvage efforts of this period, like those at Hoecake (23Mi8) and Lilbourn (23Nm38 and 23Nm49), have yet to be published in their entirety. However, certain aspects of these projects have been published (see ; , ; , ; ; ; , , ), with many of these analyses aided by advances in database management and computing technologies since the 1960s.

Dam construction along the major rivers of the southeast also contributed to a series of massive projects designed to salvage archaeological data before the creation of lakes for hydroelectric power generation. In the late 1960s the construction of the Normandy and Tellico reservoirs in Tennessee led to large-scale archaeological survey efforts that ‘elicited a level of archaeological research reminiscent of that of the 1930s’ (). University of Tennessee researchers went on to conduct 12 years of active archaeological surveys in the Tellico Reservoir prior to its inundation in 1979 (, ; ). However, as detailed by Gerald Schroedl (), unlike many of the New Deal archaeological research projects the Tellico investigations were undertaken with a sophisticated research design that included a complex sampling strategy, advanced data recording techniques, significant funding for laboratory analysis and the timely publication of research results. Given these advantages, the Tellico Archaeological Project is credited with contributing a range of culture historical, methodological, and theoretical insights to our understanding of southeastern prehistory (, ; ; see ).

Of course, no discussion of large-scale archaeological research in the southeast would be complete without acknowledging the Federal Aid Interstate-270 Archaeological Mitigation Project (FAI-270) on the Illinois side of the Mississippi River near St. Louis, Missouri. Managed by the University of Illinois at Urbana-Champaign (UIUC), fieldwork was conducted by crews from UIUC and numerous agencies and university sub-contractors at 59 archaeological sites within the proposed FAI-270 highway project corridor and an additional 43 sites within the Fish Lake Interchange and Industrial Park (). Research efforts at most of these sites were designed ‘to define the community plan at each site … [since this] information was lacking for the majority of the known prehistoric periods in American Bottom archaeology’ (). Unlike many of the large-scale projects that had come before, by the time the FAI-270 project was undertaken major advances in computing and in the aggregation and analysis of large datasets had made it much easier for archaeologists to manage big data. Aided by these developments, and excellent oversight by Charles Bareis, the project’s director, not only were reports published for each site in a timely manner, but a synthesis of the project’s culture historical findings also was prepared (). Additionally, given the widespread dissemination of data from the FAI-270 investigations, like the Tellico Archaeology Project, archaeologists continue to find new insights in these materials, using them to pursue a variety of important research questions.

As the Tellico and FAI-270 investigations demonstrate, when managed effectively, big data are capable of fostering novel understandings of the archaeological record. In a similar vein, archaeologists armed with new analytical techniques, inexpensive computers, and more anthropologically informed research questions, are turning to the literal mountains of unpublished materials from the New Deal era, demonstrating the ability of these materials to contribute to larger developments in the discipline (e.g. ; ; , ; , , ; ). Although ‘working with old collections from New Deal excavations can be somewhat daunting, since one is faced with the prospect of locating the artefacts and records, deciphering another person’s field notes and reports, and quantifying the data in a manner that is both useful and significant to modern archaeological objectives … the end result is well worth the effort as valuable information is added to the archaeological record’ (). Thus, as Bernard Means () contends, reexamination of New Deal archaeological research by researchers who possess our present computational abilities holds the potential to teach us about both these projects and various aspects of the archaeological record they revealed.

Conclusions: A Continuing Preference for Big Data

Since 1986 the authors of this paper, and a series of collaborators, have spent almost twenty years excavating major Creek village sites in Alabama to examine the impacts of European colonization on local Native American Communities. This research began at the village of Fusihatchee (1Ee191) with what was initially planned as a small project designed to investigate the density of occupation and establish the spatial limits of the community. However, that work coincided with the expansion of gravel quarrying at the site, the consequences of which eventually led to extensive archaeological salvage work. In the end, rather than several weeks of minimally invasive site testing, we emerged with data from 12 years of archaeology that revealed more than 7 acres of the village. Unfortunately work at Fusihatchee took place prior to the widespread availability of total instrument stations, global positioning systems (GPS), remote sensing techniques, geographic information systems (GIS), or inexpensive portable computing, making data management a matter of expediency rather than an integral component of our research design. Furthermore, these investigations took place as a series of undergraduate field schools, with student excavators making all of the mistakes that commonly plague any inexperienced field crew. Given these impediments, laboratory processing, coding and analysis of material from Fusihatchee remains an ongoing process more than twenty years after field investigations concluded. Although a series of publications and conference presentations have been prepared using data from Fusihatchee (, ; , ; ; , ), a final report of these investigations has not been completed.

In contrast, work directed by Cottier at the Hickory Ground site (1Ee89), from 2003 to 2008, employed the full range of digital analysis and data recording techniques that make contemporary large-scale archaeological field research more manageable. These excavations revealed more than 9 hectares of the village, with 8,674 features, 7,831 postholes, 71 historic Creek structures, and 42 proto-historic structures recorded. Using advanced data recording methods and an on-site laboratory, an almost immediate analysis of feature contents was provided. Additionally, detailed spatial data were recorded and immediately sent to a field computer to produce a series of GIS maps and data layers capable of informing ongoing field investigations. These data management and analysis methods permit a much clearer understand of the archeological record at Hickory Ground while highlighting the limitations of prior field and laboratory analytical techniques. Aided by these methodologies, large-scale relationships within the data from Hickory Ground were easier to identify, leading to the swift analysis of these materials and the production of several masters theses, conference presentations, and other publications (, ; , ; , ).

Despite their myriad associated problems, southeastern archaeologists continue to have a preference for big projects capable of producing big data and big insights. Excavations like those at the King (9Fl5) and Berry (31Bk22) sites have produced a variety of insights that would not have been possible without large-scale horizontal exposures (; ; ). Meanwhile, recent geophysical research at Etowah (9Br1), Moundville (1Tu500), and other sites throughout the Eastern Woodlands of North America demonstrates that big projects and big data need not necessarily involve big excavations to yield big insights (; ; ; ; ; ; ; ). Fortunately, we are presently capable of harnessing a range of data recording and analysis methods that were unimaginable only a few short decades ago ().

Based in no small part on our ability to analyze the enormous datasets recovered during earlier investigations, southeastern archaeology is experiencing something of a renaissance, contributing much to larger theoretical discussions within the larger discipline. The present prominence enjoyed by southeastern archaeology is due in part to the considerable intellectual abilities of many senior scholars working the region, but it has also benefitted greatly form the work of a younger generation of researchers who have returned to depression-era archaeological materials armed with a new suite of analytical techniques and social theories (e.g. ; ; ; ). Like Means (), we believe these scholars have demonstrated that much important information remains to be gleaned from these older collections. Given present resource limitations in the United States, including federal budget sequestration, a nation-wide reduction in the number of cultural resource management projects, congressional challenges to National Science Foundation funding for archaeological research, and the ever-present knowledge that Native Americans sharing a cultural affiliation with these materials may request their repatriation at any time, there may be no better time to pursue analysis of these materials.

Thus, despite the considerable problems big data may present in archaeological research, emerging digital techniques can be harnessed simultaneously to manage on-going field research and mine existing archaeological collections. In so doing, we are able to understand the nature of these earlier projects and the archaeology they revealed in novel ways. Fortunately, southeastern archaeologists no longer view laboratory-based research as of secondary in importance to new field projects. However, as with the general problems of big data previously discussed, these materials cannot be used uncritically, nor can we simply assume that they will allow us to address the full range of theoretical questions presently being asked by southeastern archaeologists (see ). Despite these limitations, southeastern archaeologists are actively challenging long-held perceptions that the region is not capable of engaging issues of importance to the larger discipline. As recent examinations of New Deal era materials demonstrate, despite the acknowledged problems with these collections, and the less than ideal circumstances in which they were recovered and curated, in the era of big data, big problems can be addressed in big and important ways.